Rate dependent speech processing can be speech specific: Evidence from the perceptual disappearance of words under changes in context speech rate (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.3758%2Fs13414-015-0981-7.pdf

Rate dependent speech processing can be speech specific: Evidence from the perceptual disappearance of words under changes in context speech rate

Atten Percept Psychophys (2016) 78:334–345 DOI 10.3758/s13414-015-0981-7 Rate dependent speech processing can be speech specific: Evidence from the perceptual disappearance of words under changes in context speech rate Mark A. Pitt 1 & Christine Szostak 1 & Laura C. Dilley 2 Published online: 22 September 2015 # The Psychonomic Society, Inc. 2015 Abstract The perception of reduced syllables, including function words, produced in casual speech can be made to disappear by slowing the rate at which surrounding words are spoken (Dilley & Pitt, Psychological Science, 21(11), 1664–1670. doi: 10.1177/0956797610384743, 2010). The current study explored the domain generality of this speechrate effect, asking whether it is induced by temporal information found only in speech. Stimuli were short word sequences (e.g., minor or child) appended to precursors that were clear speech, degraded speech (low-pass filtered or sinewave), or tone sequences, presented at a spoken rate and a slowed rate. Across three experiments, only precursors heard as intelligible speech generated a speech-rate effect (fewer reports of function words with a slowed context), suggesting that ratedependent speech processing can be domain specific. Keywords Speech rate . Spoken word recognition . Domain generality . Phonetic perception The perception of speech requires sensitivity to precise timing over multiple time scales. To identify and discriminate phonemes, listeners must be sensitive to differences in voice onset time (VOT) (Miller 1981; Port 1979), segment duration, and relative cue timing (e.g., trading relations; Best et al. 1981). To perceive lexical * Mark A. Pitt 1 Department of Psychology, The Ohio State University, 1835 Neil Avenue, Columbus, OH 43220, USA 2 Department of Communicative Sciences and Disorders, Michigan State University, 1026 Red Cedar Road, East Lansing 48824-1220, MI, USA stress and syllabify words, listeners must be sensitive to durational differences across syllables (Reinisch et al. 2011a, b; Turk and Sawusch 1997; Turk and ShattuckHufnagel 2000). Although fluid communication requires the ability to perceive speech at different rates, little work has directly explored how speech rate contributes to the perception of spoken words. Dilley and Pitt (2010) argued that speech rate can be a valuable cue in spoken word recognition because it can partially compensate for the absence of other cues when the talker’s speech is highly reduced, such as when speaking in a casual style. Function words (e.g., of, or, in) are particularly vulnerable to distortion because they are heavily coarticulated with surrounding words and can be very short in duration (50 ms). In particular, when the phonemes of a function word match those in the rhyme of the preceding word (e.g., minor or), the two words can blend together when heavily coarticulated, creating what can be considered an elongated production of the first word (e.g., minorrr). When looked at spectrographically, the words are spectrally indistinct, with no changes in frequency or amplitude that would normally signal a word boundary. For this reason, Dilley and Pitt (2010) argued that timing information from the surrounding context is a crucial cue that listeners use to perceive short function words. That is, context speech rate assists in determining whether the talker said minor or minor or. In an experimental set-up designed to elicit casual-style speech, Dilley and Pitt (2010) had talkers produce sentences containing two-word sequences that are prone to such blending (e.g., Anyone must be a minor or child to enter). They then took these productions and varied the speech rate of a small (critical) region of the sentence containing the function word (e.g., -nor or ch-) and the remainder of the sentence (preceding and following words). In the two conditions of primary interest for the current study, the critical (proximal) region was held constant and the rate of the distal context (precursor) was varied, being presented at the rate spoken by the talker Atten Percept Psychophys (2016) 78:334–345 or time expanded by a factor of 1.9. Listeners heard the sentences over headphones and were instructed to type exactly what was heard using the computer keyboard. In the spoken-rate condition, listeners reported the function word 79 % of the time. Function word reports dropped to 33 % when the context was slowed. A similarly low report of function words (35 %) was found when the speaking rate of the critical region was sped up (using speech compression), while presenting the precursor at the spoken rate. Finally, their 2 showed that changes in speech rate can induce listeners to report function words that the talker never said. For example, when presented with a casual production of the sentence Anyone must be a minor child, listeners reported a function word (e.g., or, and) between minor and child 24 % of the time when the speech rate of the precursor, but not the critical region (e.g., -nor ch-), was sped up by a factor of 0.6. The fast rate of the distal context caused listeners to infer that the talker spoke an extra syllable in the critical region. Subsequent studies have probed various questions about this lexical rate effect (LRE). Heffner et al. (2013) examined the interaction among the distal and proximal cues by varying the strength of the acoustic cues (intensity, F0, duration) specifying the function word and the strength of rate cue in the precursor. The results showed that word duration interacted with speech rate in influencing function word reports, whereas intensity and F0 tended to combine with speech rate more additively. Notably, the results showed that the more immediate proximal acoustic cues do not simply override the more distal cue of speech rate. Both contribute to how the critical region is perceived. Importantly, this paper additionally showed that the strength of the LRE varies continuously (and linearly) as a function of distal speech rate. Other studies have examined the generalizability of the LRE. Dilley et al. (2013) demonstrated the LRE in a different morphosyntactic environment by replicating Dilley and Pitt (2010) in Russian. Significantly, their data suggested that the LRE is not specific to function words but instead applies more generally to reduced syllables. In addition, they reported preliminary evidence that language experience is positively related to the strength of the LRE. Morrill et al. (2014) showed that the LRE can be obtained by varying not just the rate of speech but also the rhythm of the distal context (binary vs. ternary patterns). Listeners were more likely to report a function word when the rhythmic organization of the critical region matched that of the context. As in Heffner et al. (2013), they also found that effects of rate and rhythm were additive, with function word reports being highest when rate and rhythm reinforced the same interpretation of the critical region. Most re (...truncated)