The perception of intonational and emotional speech prosody produced with and without a face mask: an exploratory individual differences study (pdf)

Article PDF cannot be displayed. You can download it here:

https://cognitiveresearchjournal.springeropen.com/counter/pdf/10.1186/s41235-022-00439-w

The perception of intonational and emotional speech prosody produced with and without a face mask: an exploratory individual differences study

Sinagra and Wiener Cognitive Research: Principles and Implications https://doi.org/10.1186/s41235-022-00439-w (2022) 7:89 ORIGINAL ARTICLE Cognitive Research: Principles and Implications Open Access The perception of intonational and emotional speech prosody produced with and without a face mask: an exploratory individual differences study Chloe Sinagra and Seth Wiener* Abstract Face masks affect the transmission of speech and obscure facial cues. Here, we examine how this reduction in acoustic and facial information affects a listener’s understanding of speech prosody. English sentence pairs that differed in their intonational (statement/question) and emotional (happy/sad) prosody were created. These pairs were recorded by a masked and unmasked speaker and manipulated to contain audio or not. This resulted in a continuum from typical unmasked speech with audio (easiest) to masked speech without audio (hardest). English listeners (N = 129) were tested on their discrimination of these statement/question and happy/sad pairs. We also collected six individual difference measures previously reported to affect various linguistic processes: Autism Spectrum Quotient, musical background, phonological short-term memory (digit span, 2-back), and congruence task (flanker, Simon) behavior. The results indicated that masked statement/question and happy/sad prosodies were harder to discriminate than unmasked prosodies. Masks can therefore make it more difficult to understand a speaker’s intended intonation or emotion. Importantly, listeners differed considerably in their ability to understand prosody. When wearing a mask, speakers should try to speak clearer and louder, if possible, and make intentions and emotions explicit to the listener. Keywords: Face masks, Speech perception, Prosody, Intonation, Emotion, Individual differences, Autism, Memory Significance statement For surgeons and painters, communication in face masks is common. For others, COVID-19 marked the beginning of talking (speech production) and listening (speech perception) while wearing a mask. Masks can affect the transmission of the speech signal and obscure facial cues. This change in listening conditions has affected people differently. What are some of the factors that cause this individual variability in listeners? This study explored that question in terms of speech prosody. The utterance “it’s raining” can be a statement (flat intonation) or *Correspondence: Language Acquisition, Processing, and Pedagogy Lab, Department of Modern Languages, Carnegie Mellon University, Pittsburgh, PA, USA a question (rising intonation). Prosody is often accompanied with facial cues, such as head tilts and eyebrow raises. Masks can muffle speech cues and hide facial cues, which can make prosody difficult to understand. Our study found that masks make it harder to understand a speaker’s statement/question intonational prosody and happy/sad emotional prosody. Among the individual differences we tested, we found that Autism Spectrum Quotient predicted some performance on the prosody discrimination task. The findings have potential educational and clinical implications. When speaking with a mask, speakers should increase pitch and volume, if possible. Because facial cues may be obscured, speakers should also be more explicit about their intended emotions/questions (e.g., “I’m happy it’s raining.” “I have a question: is it raining?”). © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Sinagra and Wiener Cognitive Research: Principles and Implications (2022) 7:89 Introduction To fight the spread of the COVID-19 virus, facial mask mandates were put in place by governments throughout the world. For many people, this was the first time both the speaker and listener wore masks during communication. Masks have acoustic and visual consequences. Acoustically, the materials made to reduce the transmission of pathogens also reduce sound transmission (Magee et al., 2020). As a result, masks can reduce a speaker’s fundamental frequency (F0: what listeners perceive as pitch) and amplitude (what listeners perceive as volume or loudness). For many listeners, this reduction in acoustic information makes understanding speech more difficult (e.g., Brown et al., 2021; Fiorella et al., 2021; Mheidly et al., 2020). Visually, a mask obscures the mouth and hides facial cues. Visual information like mouth movements can help a listener better understand acoustic information (e.g., Best, 1995; Fowler, 1986; Saunders et al., 2021). For example, the relatively similar sounding English speech sounds /s/ and /ʃ/ differ in their liprounding, which listeners can use to better understand whether the speaker needs to sip the bottle or ship the bottle. For those listeners with hearing problems, communicating in noisy environments, and listening to nonnative speech, visual cues can be very helpful (Fiorella et al., 2021; House et al., 2001; Sueyoshi & Hardison, 2005; Winn et al., 2013). In the present study, we extend recent research into masks and speech perception by examining the perception of speech prosody and masks. Prosody is a broad term that includes pitch, stress, rhythm, and intonation (e.g., Cutler, 2012; Cutler et al., 1997). It is often described as not what a speaker says, but how it is said. For example, a student telling a friend, “Class is cancelled” could convey happiness because it is a boring class or sadness because it is the student’s favorite class. Acoustic cues like F0 and amplitude (among others) change given the prosody of the speech. Here, we examine intonational statement/question prosodies and emotional happy/sad prosodies produced with and without masks. Statements are usually characterized by their relatively falling volume and pitch, whereas questions are usually characterized by their relatively rising volume and pitch (Gussenhoven & Chen, 2000; Pell, 2001; Srinivasan & Massaro, 2003). Happy speech is typically characterized by its relatively high volume and high pitch; in contrast, sad speech is typically characterized by its relatively low volume and low pitch (Bänziger & Scherer, 2005; Scherer, 2003; Sobi (...truncated)