I am thrilled to serve as a PI for a two-year NSF-funded postdoctoral research fellowship with Drs. Georgia Zellou, Zhou Yu, and Katharine Graf Estes to explore human-voice AI interaction. (Click here to see the official NSF posting)
We explore ways in which adults and children adapt their speech when talking to voice-activated digital assistants (e.g., Amazon’s Alexa), compared to adult human interlocutors.
This line of work provides a way to test differing theoretical predictions as to the extent that speech-register adjustments are driven by functional motives (e.g., intelligibility) and social factors (e.g., gender).
For instance, this research explores whether the same functional motivations that apply when correcting comprehension errors to human interlocutors apply in device-directed speech (DS), such as in manipulating the phonological nature of errors, to carefully control the level of intelligibility-related pressures in communication.
At the same time, this project explores how social factors may impact speech adaptation strategies, such as by interlocutor type, speaker age, or device gender. This project additionally involves important methodological innovations in programming and running experiments directly through a digital device platform.
Overall, this project aims to fill a gap in our knowledge in the acoustic-phonetic adjustments humans make when talking to voice-AI devices, and can ultimately reveal the underlying mechanisms in speech production by different speakers (e.g., based on age, gender, device experience), contributing to basic science research.
We are excited that several papers have been accepted for the Interspeech 2019 meeting in Graz, Austria!
Papers on human-voice AI interaction
Cohn, M., & Zellou, G.(2019). Expressiveness influences human vocal alignment toward voice-AI. (In press). 2019 Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
Snyder, C. Cohn, M., & Zellou, G. (2019). Individual variation in cognitive processing style predicts differences in phonetic imitation of device and human voices. (In press). 2019 Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
Ferenc Segedin, C. Cohn, M., & Zellou, G. (2019). Perceptual adaptation to device and human voices: learning and generalization of a phonetic shift across real and voice-AI talkers. (In press). 2019 Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
Paper on musical training & speech perception
Cohn, M., Zellou, G., Barreda, S. (2019) The role of musical experience in the perceptual weighting of acoustic cues for the obstruent coda voicing contrast in American English. (In press). 2019 Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
Undergraduate researcher, Melina Sarian, did a fantastic job presenting her research project at the ‘Most Innovative Research’ Panel. Her work extends our project exploring device expressiveness to human voices.
Sarian, M., Cohn, M., & Zellou, G. Human vocal alignment to voiceAI is mediated by acoustic expressiveness. [Talk].UC Davis Symposium on Language Research. Davis, CA.
Dynamics of Voice-AI Interaction Panel
Bruno Ferenc Segedin and I also presented two talks in our ‘Dynamics of Voice-AI Interaction’ panel
Cohn, M., Ferenc Segedin, B., & Zellou, G. Differences in cross-generational prosodic alignment toward device and human voices [Talk]. UC Davis Symposium on Language Research. Davis, CA.
Ferenc Segedin, B., Cohn, M., & Zellou, G. Perceptual adaptation to Amazon’s Alexa and human voices: asymmetries in learning and generalization of a novel accent across real and AI talkers. [Talk]. UC Davis Symposium on Language Research. Davis, CA.
We are thrilled that two papers from our lab were accepted for the 2019 International Congress of Phonetic Sciences meeting in Melbourne, Australia!
Cohn, M., Ferenc Segedin, B., Zellou, G. Imitating Siri: Socially-mediated vocal alignment to device and human voices. (In press). 2019 International Congress of Phonetic Sciences (ICPhS).
Brotherton, C., Cohn, M., Zellou, G., Barreda, S. Sub-regional variation in positioning and degree of nasalization of /æ/ allophones in California (In press). 2019 International Congress of Phonetic Sciences (ICPhS).
See below for the recording for the 5 Minute Linguist (5ML) competition, emceed by John McWhorter, at the Linguistic Society of America Annual Meeting in New York City. The aim of the competition to communicate a research project to a general audience in just 5 minutes (and with no notes!).
We are thrilled that two talks selected as finalists were from our lab! Talk 1 (0:00 – 8:22) Michelle Cohn (University of California, Davis): Phonologically motivated phonetic repair strategies in Siri- and human-directed speech
Talk 2 (9:45 – 15:43) Bruno Ferenc Segedin (University of California, Davis) & Georgia Zellou (University of California, Davis): Lexical frequency mediates compensation for coarticulation: Are the seeds of sound change word-specific?
Congratulations to the other presenters, as well!
Andrew Cheng (University of California, Berkeley): Style-shifting, Bilingualism, and the Koreatown Accent
Kristin Denlinger (University of Texas, Austin) & Michael Everdell (University of Texas, Austin): A Mereological Approach to Reduplicated Resultatives in O’dam
Jessi Grieser (University of Tennessee): Talking Place, Speaking Race: Topic-based style shifting in African American Language as an expression of place identity
Kate Mesh (University of Haifa): Gaze decouples from pointing as a result of grammaticalization: Evidence from Israeli Sign Language
Jennifer Schechter (University at Buffalo): What Donald Trump’s ‘thoughts’ reveal: An acoustic analysis of 45’s coffee vowel
Ai Taniguchi (Carleton University): Why we say stuff
We were thrilled to learn that two talks from our lab have been selected as finalists to compete in the 5 Minute Linguist (ML) competition at the Linguistic Society of America (LSA) meeting in 2019.
Phonologically motivated phonetic repair strategies in Siri- and human- directed speech. Presenter: Michelle Cohn
Lexical frequency mediates compensation for coarticulation: Are the seeds of sound change word-specific? Presenters: Bruno Ferenc Segedin & Georgia Zellou.
“The Five-Minute Linguist is a high-profile event which features eight LSA members giving lively and engaging presentations about their research in a manner accessible to the general public….These five-minute presentations will be judged by a panel of journalists as well as the audience itself, and a winner will be chosen at the end of the event. The goal of this event is to encourage LSA members to practice presenting their work to a broad audience and to showcase outstanding examples of members who can explain their research in an accessible way.”[www.linguisticsociety.org/content/five-minute-linguist-2019]
This past June I had the opportunity to attend the first inaugural Como Summer School on Music, Language, and Cognition (MLCS) held in Como, Italy. As someone working at the intersection of language/music perception—a growing field of research— this was an excellent opportunity to work closely with 32 other graduate and postdoctoral researchers from a range of international institutions, as well as hear 11 presentations by experts in these fields including: Ani Patel (Tufts University: neuroscience of language/music) , Ian Cross (Cambridge: musicology, musicality), Jessica Grahn (Western University: neuroscience of rhythm, therapeutic effects of music on Parkinson’s patients), Tecumseh Fitch (University of Vienna: biological foundations of music & language), Tom Fritz (Max Plank Institute: cross-cultural comparisons of music, biological effects of music on cognition and reward systems), and Alice Mado Proverbio (University of Milano-Bicocca: neuroscience of language/music).
In addition to faculty presentations and engaging group discussions, all students gave a short presentation of theirown research. I was amazed at the range of subfields in which graduate students/postdocs were exploring language, music, or a combination of both.
For example, Sinead Rocha, a doctoral student at UCL, found that infants’ production of spontaneous rhythmic patterns may be related to the height and walking rate of their parents (based on their time in utero and being carried).
In looking at the intersection of language & music, Giuliana Genovese, a graduate student at the University of Milano-Bicocca, found that infants better learned new phonemic contrasts through song vs. through speech.
Overall, the experience was quite enriching — particularly in the close collaboration with graduate students and faculty from a diverse range of disciplines — and I look forward to seeing them again at future programs/conferences.