Dynamics of Human-Voice-AI Interaction

I am currently working on projects related to how individuals engage with voice-activated artificially intelligent (voice-AI) devices, such as Amazon’s Alexa or Apple’s Siri. Working with Dr. Georgia Zellou (PI: UC Davis Phonetics Lab, Dept. of Linguistics), we are conducting a series of psycholinguistic experiments to test whether individuals engage with these devices in similar ways as they do with other humans.

We recently had a proceedings paper accepted to the 2019 International Phonetic Congress (ICPhS)! Cohn, M., Ferenc Segedin, B., Zellou, G. Imitating Siri: Socially-mediated vocal alignment to device and human voices. (In press). 2019 International Congress of Phonetic Sciences (ICPhS).

I was a finalist for the 5 Minute Linguist competition at the Linguistic Society of America (LSA) annual meeting for our project “Phonologically motivated phonetic repair strategies in Siri- and human-directed speech”. Click here for information about the competition & see below for the video! 


Adapting TTS Voices

I am also currently collaborating with Dr. Zhou Yu (PI: UC Davis Language and Multimodal Interaction Lab, Dept. of Computer Science) on projects related to the Amazon Alexa Prize chatbot, Gunrock.

At the celebration for “Gunrock” winning the 2018 Amazon Alexa Prize!

Dissertation research

My dissertation research explored the interaction between musical training and language using methods in phonetics, psycholinguistics, and neurolinguistics. I worked with Dr. Georgia Zellou, Dr. Santiago Barreda, and Dr. Antoine Shahin on these projects in the UC Davis Phonetics Lab.

You can read my full dissertation, “Investigating the Effect of Musical Training on Speech-in-Speech Perception: The Role of f0, Timing, and Spectral Cues” at:


Speech perception in adverse listening conditions: Musical training might influence how listeners perceive speech. We ask whether the “musician’s advantage” for speech perception increases in challenging listening conditions. For example, does musicianship improve your ability to understand your friend in a crowded restaurant?

Like the reed of a musical instrument, our vocal cords vibrate to create a pitch as we speak (known also as fundamental frequency). Since musicians have explicit training in identifying and manipulating musical pitch, this skill might transfer to human voices during speech. Listeners use pitch in speech to help identify different speakers – with higher pitches associated with female talkers and lower pitches associated with male talkers. My research tests (1) whether musicians are better at teasing apart two competing voices on the basis of this vocal pitch encoding; and (2) if age and musicianship interact to confer a benefit for speech perception over the lifespan.

Perception of double vowels on the basis of fundamental frequency (f0) differences: A comparison of musicians and nonmusicians: This study investigated the effect of f0 separation on the identification of two concurrently presented steady-state artificial vowel sounds – with musicians hypothesized to show greater accuracy than nonmusicians at smaller f0 differences based on their extensive training with pitch. Accuracy in identifying both vowels was coded as binomial data (i.e., both vowels identified =1 or not=0) and modeled using a logistic regression. Results suggest that double vowel intelligibility is significantly higher in the musician group (p<0.014) and significantly improves with increases in pitch separation (p<0.001), with decreasing listener age (p<0.017), and increasing Euclidean distance in F1/F2 frequency between the vowels (p<0.001). These results suggest that musicians’ purported “advantage” in perceptually challenging situations may be rooted in their differential encoding of f0 cues to aid in speech stream segregation. I presented this work in a talk at the Northwest Phon{etics; ology} Conference (NoWPhon) in Vancouver, BC and at the Music, Language, and Cognition International Summer School in Como, Italy.

Musicians’ vs. nonmusicians’ use of f0 cues for speech perception with a single-talker interferer: In a behavioral experiment, listeners (musicians and nonmusicians) ranging from ages 18-72 identified the target sentence they heard – while ignoring an interferer sentence, which differed in pitch. Cohn found that accuracy for identifying the target sentence was highest for younger musicians (compared to nonmusicians). There was no difference between musicians and nonmusicians over age 40, even when controlling for hearing ability. Importantly, this work provides evidence that musical experience can affect speech perception. This study shows that musicians have an advantage for speech perception in noisy situations (such as crowded restaurants, airports, etc.), but this advantage is lost as we grow older. I gave a talk on this work in January at the Linguistic Society of America’s Annual Meeting in Salt Lake City, UT.

Click here for the 2018 LSA Conference Proceeding Paper: Investigating a possible “musician advantage” for speech-in-speech perception: The role of f0 separation.

Other Ph.D research

Meta-analysis of language & music auditory processing: In an effort to test whether trained musicians (relative to nonmusicians) show different patterns of cerebral lateralization for speech and music perception, I conducted a Activation Likelihood Estimation (ALE) (Turkletaub et al., 2002) analysis. This method uses coordinates reported for particular contrasts (e.g., speech sounds vs. noise, sinusoidal tones, etc.) in published fMRI/PET studies to explore shared areas and networks of activation. While additional research is necessary, these preliminary findings suggest that musical training may drive a more bilateral pattern of activation for both speech and music perception, while nonmusicians show a more canonical left lateralization for language and right lateralization for music. I presented this research at the Society for the Neurobiology of Language Conference (October 2015, Chicago) and at the UC Davis Symposium on Language Research (April 2015, Davis). This project also won the UC Davis Department of Linguistics Lapointe Award in 2015.

2015-10-15 18.43.00
Society for the Neurobiology of Language (SNL) Annual Meeting 2015

Neural language processing in musicians vs. nonmusicians: An investigation of the ‘visual word form area’ (with David Corina & Laurel Lawyer): In 2014 we developed a pilot study examining the effects of language and musical experience on the visual processing pathways in the putative ‘visual word form area’ (VWFA) — a region in the left fusiform gyrus that preferentially responds to visually-presented language. Our preliminary results suggest that extensive musical training may have an effect on patterns of lateralization for word reading. I’ve presented this work at the Society for the Neurobiology of Language Conference (August 2014, Amsterdam), UC Davis Symposium on Language Research (April 2014, Davis), and the Interdisciplinary Graduate and Professional Symposium (April 2014, Davis), where it won the Chancellor’s Grand Prize for Best Oral Presentation ($5,000) and Dean’s Prize for Best Oral Presentation in Social Sciences ($1,000).

Chancellor’s Grand Prize for Best Oral Presentation, UC Davis Interdisciplinary Graduate & Professional Symposium (IGPS) 2014
Society for the Neurobiology of Language (SNL) Annual Meeting 2014

Distinguishing gesture processing from sign language processing: The contributions of the superior temporal lobe (with David Corina, Laurie Lawyer, and Shane Blau): An fMRI investigation of linguistically-relevant signs in American Sign Language (ASL) as compared to non-linguistic self-grooming gestures. Our analysis focused on the posterior superior temporal gyrus (pSTG) as there has been evidence to suggest that this area is particularly sensitive to linguistic auditory input. Our question was whether linguistically- relevant input from a visual domain, as in the case of ASL, would also activate this region. We’ve presented this work at the International Society for Gesture Studies Conference (July 2014, San Diego) and the Cognitive Neuroscience Society Annual Meeting (April 2014, Boston).

Cognitive Neuroscience Society (CNS) Annual Meeting 2014