Dynamics of Human-Voice-AI Interaction

For my postdoc, am currently working on projects related to how individuals engage with voice-activated artificially intelligent (voice-AI) devices, such as Amazon’s Alexa or Apple’s Siri.

NSF Postdoc Fellowship

I am thrilled to be a PI for a two-year NSF-funded postdoctoral research fellowship with Dr. Georgia Zellou (Linguistics), Dr. Zhou Yu (Computer Science), and Katharine Graf Estes (Psychology) to explore human-voice AI interaction. 

My co-PIs come from three departments / labs across campus:

Together, we are an interdisciplinary team exploring the ways adults and children adapt their speech when talking to voice-activated digital assistants (e.g., Amazon’s Alexa), compared to adult human interlocutors.

This line of work provides a way to test differing theoretical predictions as to the extent that speech-register adjustments are driven by functional motives (e.g., intelligibility) and social factors (e.g., gender).

For instance, this research explores whether the same functional motivations that apply when correcting comprehension errors to human interlocutors apply in device-directed speech (DS), such as in manipulating the phonological nature of errors, to carefully control the level of intelligibility-related pressures in communication.

At the same time, this project explores how social factors may impact speech adaptation strategies, such as by interlocutor type, speaker age, or device gender. This project additionally involves important methodological innovations in programming and running experiments directly through a digital device platform.

Overall, this project aims to fill a gap in our knowledge in the acoustic-phonetic adjustments humans make when talking to voice-AI devices, and can ultimately reveal the underlying mechanisms in speech production by different speakers (e.g., based on age, gender, device experience), contributing to basic science research.

Voice-AI research in the UC Davis Phonetics Lab

Together, with Dr. Georgia Zellou, grad students (Bruno Ferenc Segedin, Kris Predeck, Tyler Kline, and Jazmina Chavez), and our amazing team of undergraduate research assistants (RAs) in the  UC Davis Phonetics Lab, I’ve worked on a series of studies investigating different aspects of human/voice-AI interaction.

We recently published papers showing that speakers display socially mediated patterns of vocal alignment toward Siri and Alexa (e.g., gender in Cohn, Ferenc Segedin, & Zellou, 2019) and even emotional alignment toward a voice-AI system (e.g., Cohn & Zellou, 2019).

We also saw differences based on a speaker’s cognitive processing style, measured by the Autism Quotient, in how they spoke to Siri and human voices (Snyder, Cohn, & Zellou, 2019).

We’ve also tested whether there are distinct Siri- and human adult-directed speech registers. You can see me present on one of the projects (“Phonologically motivated phonetic repair strategies in Siri- and human-directed speech”) in the 5 Minute Linguist competition at the 2019 Linguistic Society of America (LSA) annual meeting. See above for the video (2:30 John McWhorter introduction, Talk 3:23-7:31).

Emotion in a voice-AI system?

In a series of projects with Dr. Georgia Zellou, we have tested whether individuals vocally align to emotion in human and voice-AI productions.

  • In May 2019, undergraduate Melina Sarian presented our project comparing vocal alignment toward neutral/expressive human and Alexa voices at the ‘Most Innovative Research Panel‘ at the UC Davis Symposium on Language Research!
  • See also our Interspeech 2019 paper: (Cohn & Zellou, 2019)

Learning from voice-AI systems?

Along with Ph.D. student Bruno Ferenc Segedin, we have explored whether listeners show differences in phonetic adaptation to human and Amazon Alexa TTS voices. We recently published a paper on the project (Ferenc Segedin, Cohn, & Zellou 2019).

Human-socialbot interaction: Gunrock

I also collaborate with Dr. Zhou Yu on projects related to the Amazon Alexa Prize chatbot, Gunrock. We were thrilled when Gunrock won the 2019 Amazon Alexa Prize! We recently published two papers: a paper on large-scale user study on TTS cognitive-emotional expressiveness (Cohn, Chen, & Yu, 2019) and a demo paper on the Gunrock system (Yu et al., 2019).

For a system demonstration, see:

Furhat collaboration: UC Davis & KTH

While at the KTH Royal Institute of Technology (Stockholm, Sweden) in September 2019, I met up with Dr. Jonas Beskow (pictured in the center), co-founder of Furhat Robotics, and Ph.D. student Patrik Jonell (pictured on the right). Together with Georgia Zellou, we’re conducting a study to test the role of embodiment and gender in human’s voice-AI interaction with three platforms: Amazon Echo, Nao, and Furhat. 

Mentoring voice-AI projects

I have additionally co-mentored graduate and undergraduate projects exploring human-voice AI interaction. Here are a couple examples at the 2019 UC Davis Undergraduate Research Conference!