Dynamics of Human-Voice-AI Interaction
I am currently working on projects related to how individuals engage with voice-activated artificially intelligent (voice-AI) devices, such as Amazon’s Alexa or Apple’s Siri.
NSF Postdoc Fellowship
I am thrilled to serve as a PI for a two-year NSF-funded postdoctoral research fellowship with Drs. Georgia Zellou, Zhou Yu, and Katharine Graf Estes to explore human-voice AI interaction. (Click here to see the official NSF posting)
We explore ways in which adults and children adapt their speech when talking to voice-activated digital assistants (e.g., Amazon’s Alexa), compared to adult human interlocutors.
This line of work provides a way to test differing theoretical predictions as to the extent that speech-register adjustments are driven by functional motives (e.g., intelligibility) and social factors (e.g., gender).
For instance, this research explores whether the same functional motivations that apply when correcting comprehension errors to human interlocutors apply in device-directed speech (DS), such as in manipulating the phonological nature of errors, to carefully control the level of intelligibility-related pressures in communication.
At the same time, this project explores how social factors may impact speech adaptation strategies, such as by interlocutor type, speaker age, or device gender. This project additionally involves important methodological innovations in programming and running experiments directly through a digital device platform.
Overall, this project aims to fill a gap in our knowledge in the acoustic-phonetic adjustments humans make when talking to voice-AI devices, and can ultimately reveal the underlying mechanisms in speech production by different speakers (e.g., based on age, gender, device experience), contributing to basic science research.
Voice-AI research in the UC Davis Phonetics Lab
Together, with Dr. Georgia Zellou, Ph.D. student Bruno Ferenc Segedin, and undergraduates Cathryn Snyder and Melina Sarian in the UC Davis Phonetics Lab, I have worked on a series of studies investigating the phonetic adjustments speakers make when talking to voice-AI systems. We recently published papers showing that speakers display socially mediated patterns of vocal alignment toward Siri and Alexa (e.g., gender in Cohn, Ferenc Segedin, & Zellou, 2019) and even emotional alignment toward a voice-AI system (e.g., Cohn & Zellou, 2019). Additionally, we observed differences based on a speaker’s cognitive processing style, measured by the Autism Quotient, in how they spoke to Siri and human voices (Snyder, Cohn, & Zellou, 2019).
We have also explored whether there are distinct Siri- and human adult-directed speech registers. You can see me present on one of the projects (“Phonologically motivated phonetic repair strategies in Siri- and human-directed speech”) in the 5 Minute Linguist competition at the 2019 Linguistic Society of America (LSA) annual meeting. See below for the video (2:30 John McWhorter introduction, Talk 3:23-7:31).
In May 2019, undergraduate Melina Sarian presented our project comparing vocal alignment toward neutral/expressive human and Alexa voices at the ‘Most Innovative Research Panel’ at the UC Davis Symposium on Language Research.
Along with Ph.D. student Bruno Ferenc Segedin, we have explored whether listeners show differences in phonetic adaptation to human and device voices. We recently published a paper on the project (Ferenc Segedin, Cohn, & Zellou 2019).
Bruno Ferenc Segedin presenting our research exploring phonetic adaptation to human vs. Amazon Alexa voices at the 2019 UC Davis Symposium on Language Research (left) and Interspeech 2019 (right)
I have additionally co-mentored graduate and undergraduate projects exploring human-voice AI interaction. Here are a couple examples at the 2019 UC Davis Undergraduate Research Conference!
Human-socialbot interaction: Gunrock
I am also collaborating with Dr. Zhou Yu on projects related to the Amazon Alexa Prize chatbot, Gunrock. We were thrilled when Gunrock won the 2019 Amazon Alexa Prize! We recently published two papers: a paper on large-scale user study on TTS cognitive-emotional expressiveness (Cohn, Chen, & Yu, 2019) and a demo paper on the Gunrock system (Yu et al., 2019).
For a system demonstration, see: https://gunrock-ucdavis.weebly.com/2018-system-demonstration.html