Dynamics of Human-Voice-AI Interaction
I deeply curious about the underlying linguistic, psychological, and social aspects in how how individuals engage with voice-activated artificially intelligent (voice-AI) devices, such as Amazon’s Alexa or Apple’s Siri.
At its core, my central research question is whether speech communication with voice-AI is similar or different from speech communication with humans.
Q1. How do we talk to voice-AI?
Do we have a special voice-AI speech style?
I’m working with an interdisciplinary team of researchers, Dr. Georgia Zellou (Linguistics), Dr. Zhou Yu (Computer Science), and Dr. Katharine Graf Estes (Psychology), to test whether adults and kids have a special speech ‘register’ when talking to voice-AI, relative to a real human.
Do we vocally align to human & voice-AI?
… based on the social characteristics of the voices?
I’ve found some evidence that speakers display socially mediated patterns of vocal alignment toward Siri and Alexa
We also saw differences based on a speaker’s cognitive processing style, measured by the Autism Quotient, in how they spoke to Siri and human voices (Snyder, Cohn, & Zellou, 2019).
… based on human-likeness of the system?
While at the KTH Royal Institute of Technology (Stockholm, Sweden) in September 2019, I did an experiment with Dr. Jonas Beskow and Ph.D. student Patrik Jonell, testing the role of embodiment and gender in alignment toward three platforms: Amazon Echo, Nao, and Furhat.
- See also a related study we did manipulating the top-down expectations of participants: that the voice was either a voice-AI system or a real human (Zellou & Cohn, 2020)
… based on emotion?
I have also tested whether individuals vocally align to emotion in human and voice-AI productions.
- Cohn & Zellou (2019): We found that people vocally align to Alexa’s emotionally expressive productions
- Zellou & Cohn (to appear): People align more to utterances produced with emotionally expressive elements (e.g., interjections) than those without for voice-AI
A talk about this project was selected to be in the ‘Most Innovative Research Panel’ at the 2019 UC Davis Symposium on Language Research.
Q2. How do we perceive speech by voice-AI?
… how well do we hear TTS productions in noise?
In a recent project, I’ve tested whether the type of text-to-speech (TTS) method shapes listeners’ ability to hear productions in noise, comparing more natural ‘casual’ speech (neural TTS) and more careful ‘clear’ speech (concatenative TTS).
- Paper accepted to Interspeech 2020 (Cohn & Zellou, to appear), see our Virtual Talk
Q3. Do we learn language from voice-AI?
… do we learn phonetic patterns from voice-AI?
Along with Ph.D. student Bruno Ferenc Segedin, we’re exploring whether listeners show differences in phonetic adaptation to human and Amazon Alexa TTS voices.
- We recently published a paper on the project (Ferenc Segedin, Cohn, & Zellou 2019).
Human-socialbot interaction: Gunrock
I also collaborate with Dr. Zhou Yu on projects related to the Amazon Alexa Prize chatbot, Gunrock. We were thrilled when Gunrock won the 2019 Amazon Alexa Prize!
- We recently published two papers: a paper on large-scale user study on TTS cognitive-emotional expressiveness (Cohn, Chen, & Yu, 2019) and a demo paper on the Gunrock system (Yu et al., 2019).
For a system demonstration, see: https://gunrock-ucdavis.weebly.com/2018-system-demonstration.html
Mentoring voice-AI projects
I have additionally co-mentored graduate and undergraduate projects exploring human-voice AI interaction. Here are a couple examples at the 2019 UC Davis Undergraduate Research Conference!