Decoding time for the identification of musical key
Atten Percept Psychophys (2015) 77:28–35
DOI 10.3758/s13414-014-0806-0
Decoding time for the identification of musical key
Morwaread M. Farbood & Jess Rowland & Gary Marcus &
Oded Ghitza & David Poeppel
Published online: 10 December 2014
# The Psychonomic Society, Inc. 2014
Abstract This study examines the decoding times at which
the brain processes structural information in music and compares them to timescales implicated in recent work on speech.
Combining an experimental paradigm based on Ghitza and
Greenberg (Phonetica, 66(1-2), 113–126, 2009) for speech
with the approach of Farbood et al. (Journal of Experimental
Psychology: Human Perception and Performance, 39(4),
911–918, 2013) for musical key-finding, listeners were asked
to judge the key of short melodic sequences that were presented at a highly a compressed rate with varying durations of
silence inserted in a periodic manner in the audio signal. The
distorted audio signals comprised signal-silence alternations
showing error rate curves that identify peak performance
centered around an event rate of 5–7 Hz (143–200 ms
interonset interval; 300–420 beats/min), where event rate is
defined as the average rate of pitch change. The data support
Electronic supplementary material The online version of this article
(doi:10.3758/s13414-014-0806-0) contains supplementary material,
which is available to authorized users.
M. M. Farbood (*)
Department of Music and Performing Arts Professions, New York
University, 35 W. 4th St., Suite 1077, New York, NY 10012, USA
e-mail:
J. Rowland : G. Marcus
Department of Psychology, New York University, New York, NY,
USA
O. Ghitza
Department of Biomedical Engineering and Hearing Research
Center, Boston University, Boston, MA, USA
D. Poeppel
Department of Psychology and Center for Neural Science, New York
University, New York, NY, USA
D. Poeppel
Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
the hypothesis that the perceptual analysis of music entails the
processes of parsing the signal into chunks of the appropriate
temporal granularity and decoding the signal for recognition.
The music-speech comparison points to similarities in how
auditory processing builds on the specific temporal structure
of the input, and how that structure interacts with the internal
temporal dynamics of the neural mechanisms underpinning
perception.
Keywords Key finding . Tonal induction . Neuronal
oscillations . Music structure . Brain rhythms . Speech
rate
Traditionally, most approaches to the perceptual analysis of
speech have focused on the rich frequency structure of the
signal within a short time window. Speech perception has
been––appropriately––characterized as a demanding spectral
analysis challenge, and considerable progress has been made
investigating the mechanisms underlying short-term frequency analysis (Gold, Morgan, & Ellis, 2011; Stevens, 1998,
2005). Prior work has examined how the temporal structure
of speech signals underpins perception in concert with the
spectral information (see Rosen, 1992 for review; Drullman,
Festen, & Plomp, 1994; Houtgast & Steeneken, 1985;
Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995). One
of the emerging generalizations from this line of research is
that there appears to be a fortuitous alignment between robust
temporal properties of speech, e.g., the envelope fluctuations
characteristic of the flow of syllabic information, and the brain
rhythms argued to play a role in perception and cognition
(Ghitza, 2011; Giraud & Poeppel, 2012; Poeppel 2003).
Although the precise mechanisms remain under vigorous
debate, there is consensus that both structure in time and
processing rate itself merit deeper investigation.
Atten Percept Psychophys (2015) 77:28–35
In the theoretical and experimental study of music, there is
a long and productive tradition of studying temporal structure
and tempo (see London, 2012 for review). However, those
approaches have not intersected in principled ways with related speech perception research. Here we capitalize on recent
progress in both domains, combining novel approaches to
temporal constraints on speech decoding (Ghitza, 2011,
2012; Ghitza & Greenberg, 2009) with results on music
perception, and in particular the analysis of key (Farbood,
Marcus, & Poeppel, 2013).
The current study builds on an experimental design by
Ghitza and Greenberg (2009) that explored the possible role
of brain rhythms in speech perception. They inserted periodically spaced silences into semantically unpredictable
sentences that were compressed by a factor of three, and
measured the error rate in word identification. Without
inserted silent gaps, the error rate for word identification in
compressed speech was > 50 %. However, when silence
intervals of varying durations (up to 160 ms) were added in
between 40-ms segments of audio signal, performance improved, resulting in a U-shaped error-rate curve with a preferred packaging rate of around 6–17 Hz (59–167 ms IOI).
Packaging rate is a term Ghitza (2011) uses to describe the
periodic silence-plus-audio-segment rate of compressed stimuli distorted by silence insertions. For example, stimuli with
audio segments of 40 ms and silence intervals of 80 ms would
have a 120-ms packaging rate (8.33 Hz). Ghitza and
Greenberg (2009) interpreted the decrease in error rate
resulting from the insertions of silence as the result of adding
necessary decoding time. Based on these results, they suggested an oscillatory mechanism on a specific timescale for
auditory processing and developed a phenomenological model to account for these counterintuitive data (Ghitza, 2011).
The association between temporal properties of speech (e.g.,
mean syllable duration, phoneme duration, etc.) and neuronal
oscillations was made explicit by Poeppel (2003), and has
subsequently been investigated empirically and computationally in a number of psychophysical and neurophysiological studies (for review, see Giraud & Poeppel, 2012). An important
computational angle was introduced by Ghitza (2011, 2013) in
the context of formulating a model designed to address how
speech signals are parsed into coarser, typically syllable-long
speech fragments, and then decoded. It has now been demonstrated convincingly (Ghitza, 2012) that lower-frequency, theta
oscillations are implicated in connected speech parsing; current
research is addressing the role of higher frequency beta and
gamma oscillations for decoding. Musical stimuli such as those
in the current study have not been used in this theoretical
context, but such materials can help shed light on the mechanistic role that neuronal oscillations might play in perception.
In a study exploring the psychophysics of structural keyfinding by Farbood et al. (2013), the influence of rate variation
in music was examined by asking musically trained listeners
29
to judge whether melodic sequences presented at different
tempi ended on a resolved or unresolved pitch. The tempi of
the sequences we (...truncated)