My research program aims to uncover the cognitive sources of variation in how people produce, perceive, and learn phonetic details in speech.
Social connection / Sociophonetics
How well we can understand each other is shaped by our experience with similar situations and our expectations. How do people adapt their speech across presumed and actual barriers (e.g., face-masks, ASR systems)? How do people adapt their listening?
Adapting speech for voice assistants
- Cohn, M., Ferenc Segedin, B., & Zellou, G. (2022). The acoustic-phonetic properties of Siri- and human-DS: Differences by error type and rate. Journal of Phonetics. [Article]
- Cohn, M., & Zellou, G. (2021). Prosodic differences in human- and Alexa-directed speech, but similar error correction strategies. Frontiers in Communication. [Article]
- Cohn, M., Liang, K., Sarian, M., Zellou, G., & Yu, Z. (2021). Speech rate adjustments in conversations with an Amazon Alexa socialbot. Frontiers in Communication [Article]
Perceiving text-to-speech (TTS) voices
- Cohn, M. & Zellou, G. (2020). Perception of concatenative vs. Neural text-to-speech (TTS): Differences in intelligibility in noise and language attitudes. Interspeech [pdf] [Virtual Talk]
- Aoki, N., Cohn, M., & Zellou, G. (2022). The clear speech intelligibility benefit for text-to-speech voices: Effects of speaking style and visual guise. Journal of Acoustical Society of America (JASA) Express Letters. [Article]
- Cohn, M., Pycha, A., Zellou, G. (2021). Intelligibility of face-masked speech depends on speaking style: Comparing casual, smiled, and clear speech. Cognition [Article]
- Pycha, A., Cohn, M., & Zellou, G. (2022). Face-masked speech intelligibility: the influence of speaking style, visual information, and background noise. Frontiers in Communication. [Article]
2. Social connection / Sociophonetics
Millions of people now engage with voice technology (e.g., Alexa, Siri, Google Assistant) that have names, apparent genders/ages/emotion, and advanced text-to-speech (TTS) voices. How do the sociophonetic attributes of the TTS voices shape our responses? Do we apply social ‘rules’ from human-human interaction apply to voice technology? How does this have downstream effects on language production, perception, and learning?
Vocal alignment based on gender, age, role
- Cohn, M., Ferenc Segedin, B., Zellou, G. (2019). Imitating Siri: Socially-mediated vocal alignment to device and human voices. ICPhS [pdf]
- Cohn, M., Jonell, P., Kim, T., Beskow, J., Zellou, G. (2020). Embodiment and gender interact in alignment to TTS voices. Cognitive Science Society [pdf] [Virtual talk]
- Zellou, G., Cohn, M., Ferenc Segedin, B. (2021). Age- and gender-related differences in speech alignment toward humans and voice-AI. Frontiers in Communication [Article]
- Zellou, G., Cohn, M., & Kline, T. (2021). The Influence of Conversational Role on Phonetic Alignment toward Voice-AI and Human Interlocutors. Language, Cognition and Neuroscience [Article]
- Zellou, G., & Cohn, M. (2020). Top-down effects of apparent humanness on vocal alignment toward human and device interlocutors. Cognitive Science Society [pdf]
Responses to emotion from voice technology
- Cohn, M., Predeck, K., Sarian, M., & Zellou, G. (2021). Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers. Speech Communication. [Article]
- Cohn, M., Raveh, E., Predeck, K., Gessinger, I., Möbius, B., & Zellou, G. (2020). Differences in Gradient Emotion Perception: Human vs. Alexa Voices. Interspeech [pdf] [Virtual talk]
- Cohn, M., & Zellou, G. (2019). Expressiveness influences human vocal alignment toward voice-AI. Interspeech [pdf]
- Cohn, M., Chen, C., & Yu, Z. (2019). A Large-Scale User Study of an Alexa Prize Chatbot: Effect of TTS Dynamism on Perceived Quality of Social Dialog. SIGDial [pdf]
- Gessinger, I., Cohn, M., Möbius, B., & Zellou, G (2022). Cross-Cultural Comparison of Gradient Emotion Perception: Human vs. Alexa TTS Voices. Interspeech [pdf].
- Zhu, Q., Chau, A., Cohn, M., Liang, K-H, Wang, H-C, Zellou, G., & Yu, Z. (2022). Möbius, B., & Zellou, G. (2022). Effects of Emotional Expressiveness on Voice Chatbot Interactions. 4th Conference on Conversational User Interfaces (CUI). [pdf]
Perception of phonetic detail
- Ferenc Segedin, B. Cohn, M., Zellou, G. (2019). Perceptual adaptation to device and human voices: learning and generalization of a phonetic shift across real and voice-AI talkers. Interspeech [pdf]
- Zellou, G., Cohn, M., Block, A. (2021). Partial compensation for coarticulatory vowel nasalization across concatenative and neural text-to-speech. Journal of the Acoustic Society of America [Article]
- Block, A., Cohn, M., & Zellou, G. (2021). Variation in Perceptual Sensitivity and Compensation for Coarticulation Across Adult and Child Naturally-produced and TTS Voices. Interspeech. [Article]
3. Individual differences
How do cognitive factors shape the ways individuals produce and perceive speech (e.g., musical training, autistic-like traits)?
- Cohn, M., Barreda, S., & Zellou, G. (accepted) Differences in a musician’s advantage for speech-in-speech perception based on age and task. Journal of Speech Language, and Hearing Research.
- Cohn, M, Sarian, M., Predeck, K., & Zellou, G. (2020). Individual variation in language attitudes toward voice-AI: The role of listeners’ autistic-like traits. Interspeech [pdf] [Virtual talk]
- Snyder, C. Cohn, M., Zellou, G. (2019). Individual variation in cognitive processing style predicts differences in phonetic imitation of device and human voices. Interspeech [pdf]
- Cohn, M., Zellou, G., Barreda, S. (2019). The role of musical experience in the perceptual weighting of acoustic cues for the obstruent coda voicing contrast in American English. Interspeech [pdf]
- Cohn, M. (2018). Investigating a possible “musician advantage” for speech-in-speech perception: The role of f0 separation. Linguistic Society of America [pdf]