Rate dependent speech processing can be speech specific: Evidence from the perceptual disappearance of words under changes in context speech rate
Atten Percept Psychophys (2016) 78:334–345
DOI 10.3758/s13414-015-0981-7
Rate dependent speech processing can be speech specific:
Evidence from the perceptual disappearance of words
under changes in context speech rate
Mark A. Pitt 1 & Christine Szostak 1 & Laura C. Dilley 2
Published online: 22 September 2015
# The Psychonomic Society, Inc. 2015
Abstract The perception of reduced syllables, including
function words, produced in casual speech can be made to
disappear by slowing the rate at which surrounding words
are spoken (Dilley & Pitt, Psychological Science, 21(11),
1664–1670. doi: 10.1177/0956797610384743, 2010). The
current study explored the domain generality of this speechrate effect, asking whether it is induced by temporal information found only in speech. Stimuli were short word sequences
(e.g., minor or child) appended to precursors that were clear
speech, degraded speech (low-pass filtered or sinewave), or
tone sequences, presented at a spoken rate and a slowed rate.
Across three experiments, only precursors heard as intelligible
speech generated a speech-rate effect (fewer reports of function words with a slowed context), suggesting that ratedependent speech processing can be domain specific.
Keywords Speech rate . Spoken word recognition .
Domain generality . Phonetic perception
The perception of speech requires sensitivity to precise
timing over multiple time scales. To identify and discriminate phonemes, listeners must be sensitive to differences in voice onset time (VOT) (Miller 1981; Port
1979), segment duration, and relative cue timing (e.g.,
trading relations; Best et al. 1981). To perceive lexical
* Mark A. Pitt
1
Department of Psychology, The Ohio State University,
1835 Neil Avenue, Columbus, OH 43220, USA
2
Department of Communicative Sciences and Disorders,
Michigan State University, 1026 Red Cedar Road, East
Lansing 48824-1220, MI, USA
stress and syllabify words, listeners must be sensitive to
durational differences across syllables (Reinisch et al.
2011a, b; Turk and Sawusch 1997; Turk and ShattuckHufnagel 2000). Although fluid communication requires
the ability to perceive speech at different rates, little
work has directly explored how speech rate contributes
to the perception of spoken words.
Dilley and Pitt (2010) argued that speech rate can be a
valuable cue in spoken word recognition because it can partially compensate for the absence of other cues when the
talker’s speech is highly reduced, such as when speaking in
a casual style. Function words (e.g., of, or, in) are particularly
vulnerable to distortion because they are heavily coarticulated
with surrounding words and can be very short in duration (50
ms). In particular, when the phonemes of a function word
match those in the rhyme of the preceding word (e.g., minor
or), the two words can blend together when heavily
coarticulated, creating what can be considered an elongated
production of the first word (e.g., minorrr). When looked at
spectrographically, the words are spectrally indistinct, with no
changes in frequency or amplitude that would normally signal
a word boundary. For this reason, Dilley and Pitt (2010) argued that timing information from the surrounding context is a
crucial cue that listeners use to perceive short function words.
That is, context speech rate assists in determining whether the
talker said minor or minor or.
In an experimental set-up designed to elicit casual-style
speech, Dilley and Pitt (2010) had talkers produce sentences
containing two-word sequences that are prone to such blending (e.g., Anyone must be a minor or child to enter). They then
took these productions and varied the speech rate of a small
(critical) region of the sentence containing the function word
(e.g., -nor or ch-) and the remainder of the sentence (preceding
and following words). In the two conditions of primary interest for the current study, the critical (proximal) region was
held constant and the rate of the distal context (precursor)
was varied, being presented at the rate spoken by the talker
Atten Percept Psychophys (2016) 78:334–345
or time expanded by a factor of 1.9. Listeners heard the
sentences over headphones and were instructed to type exactly
what was heard using the computer keyboard.
In the spoken-rate condition, listeners reported the function
word 79 % of the time. Function word reports dropped to
33 % when the context was slowed. A similarly low report
of function words (35 %) was found when the speaking rate of
the critical region was sped up (using speech compression),
while presenting the precursor at the spoken rate. Finally, their
2 showed that changes in speech rate can induce listeners to
report function words that the talker never said. For example,
when presented with a casual production of the sentence
Anyone must be a minor child, listeners reported a function
word (e.g., or, and) between minor and child 24 % of the time
when the speech rate of the precursor, but not the critical
region (e.g., -nor ch-), was sped up by a factor of 0.6. The fast
rate of the distal context caused listeners to infer that the talker
spoke an extra syllable in the critical region.
Subsequent studies have probed various questions about
this lexical rate effect (LRE). Heffner et al. (2013) examined
the interaction among the distal and proximal cues by varying
the strength of the acoustic cues (intensity, F0, duration) specifying the function word and the strength of rate cue in the
precursor. The results showed that word duration interacted
with speech rate in influencing function word reports, whereas
intensity and F0 tended to combine with speech rate more
additively. Notably, the results showed that the more immediate proximal acoustic cues do not simply override the more
distal cue of speech rate. Both contribute to how the critical
region is perceived. Importantly, this paper additionally
showed that the strength of the LRE varies continuously
(and linearly) as a function of distal speech rate.
Other studies have examined the generalizability of the
LRE. Dilley et al. (2013) demonstrated the LRE in a different
morphosyntactic environment by replicating Dilley and Pitt
(2010) in Russian. Significantly, their data suggested that the
LRE is not specific to function words but instead applies more
generally to reduced syllables. In addition, they reported
preliminary evidence that language experience is positively
related to the strength of the LRE. Morrill et al. (2014)
showed that the LRE can be obtained by varying not just
the rate of speech but also the rhythm of the distal context
(binary vs. ternary patterns). Listeners were more likely to
report a function word when the rhythmic organization of
the critical region matched that of the context. As in
Heffner et al. (2013), they also found that effects of rate
and rhythm were additive, with function word reports being highest when rate and rhythm reinforced the same
interpretation of the critical region.
Most re (...truncated)