Decoding time for the identification of musical key

Attention, Perception, & Psychophysics, Dec 2014

This study examines the decoding times at which the brain processes structural information in music and compares them to timescales implicated in recent work on speech. Combining an experimental paradigm based on Ghitza and Greenberg (Phonetica, 66(1-2), 113–126, 2009) for speech with the approach of Farbood et al. (Journal of Experimental Psychology: Human Perception and Performance, 39(4), 911–918, 2013) for musical key-finding, listeners were asked to judge the key of short melodic sequences that were presented at a highly a compressed rate with varying durations of silence inserted in a periodic manner in the audio signal. The distorted audio signals comprised signal-silence alternations showing error rate curves that identify peak performance centered around an event rate of 5–7 Hz (143–200 ms interonset interval; 300–420 beats/min), where event rate is defined as the average rate of pitch change. The data support the hypothesis that the perceptual analysis of music entails the processes of parsing the signal into chunks of the appropriate temporal granularity and decoding the signal for recognition. The music-speech comparison points to similarities in how auditory processing builds on the specific temporal structure of the input, and how that structure interacts with the internal temporal dynamics of the neural mechanisms underpinning perception.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.3758%2Fs13414-014-0806-0.pdf

Decoding time for the identification of musical key

Atten Percept Psychophys (2015) 77:28–35 DOI 10.3758/s13414-014-0806-0 Decoding time for the identification of musical key Morwaread M. Farbood & Jess Rowland & Gary Marcus & Oded Ghitza & David Poeppel Published online: 10 December 2014 # The Psychonomic Society, Inc. 2014 Abstract This study examines the decoding times at which the brain processes structural information in music and compares them to timescales implicated in recent work on speech. Combining an experimental paradigm based on Ghitza and Greenberg (Phonetica, 66(1-2), 113–126, 2009) for speech with the approach of Farbood et al. (Journal of Experimental Psychology: Human Perception and Performance, 39(4), 911–918, 2013) for musical key-finding, listeners were asked to judge the key of short melodic sequences that were presented at a highly a compressed rate with varying durations of silence inserted in a periodic manner in the audio signal. The distorted audio signals comprised signal-silence alternations showing error rate curves that identify peak performance centered around an event rate of 5–7 Hz (143–200 ms interonset interval; 300–420 beats/min), where event rate is defined as the average rate of pitch change. The data support Electronic supplementary material The online version of this article (doi:10.3758/s13414-014-0806-0) contains supplementary material, which is available to authorized users. M. M. Farbood (*) Department of Music and Performing Arts Professions, New York University, 35 W. 4th St., Suite 1077, New York, NY 10012, USA e-mail: J. Rowland : G. Marcus Department of Psychology, New York University, New York, NY, USA O. Ghitza Department of Biomedical Engineering and Hearing Research Center, Boston University, Boston, MA, USA D. Poeppel Department of Psychology and Center for Neural Science, New York University, New York, NY, USA D. Poeppel Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany the hypothesis that the perceptual analysis of music entails the processes of parsing the signal into chunks of the appropriate temporal granularity and decoding the signal for recognition. The music-speech comparison points to similarities in how auditory processing builds on the specific temporal structure of the input, and how that structure interacts with the internal temporal dynamics of the neural mechanisms underpinning perception. Keywords Key finding . Tonal induction . Neuronal oscillations . Music structure . Brain rhythms . Speech rate Traditionally, most approaches to the perceptual analysis of speech have focused on the rich frequency structure of the signal within a short time window. Speech perception has been––appropriately––characterized as a demanding spectral analysis challenge, and considerable progress has been made investigating the mechanisms underlying short-term frequency analysis (Gold, Morgan, & Ellis, 2011; Stevens, 1998, 2005). Prior work has examined how the temporal structure of speech signals underpins perception in concert with the spectral information (see Rosen, 1992 for review; Drullman, Festen, & Plomp, 1994; Houtgast & Steeneken, 1985; Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995). One of the emerging generalizations from this line of research is that there appears to be a fortuitous alignment between robust temporal properties of speech, e.g., the envelope fluctuations characteristic of the flow of syllabic information, and the brain rhythms argued to play a role in perception and cognition (Ghitza, 2011; Giraud & Poeppel, 2012; Poeppel 2003). Although the precise mechanisms remain under vigorous debate, there is consensus that both structure in time and processing rate itself merit deeper investigation. Atten Percept Psychophys (2015) 77:28–35 In the theoretical and experimental study of music, there is a long and productive tradition of studying temporal structure and tempo (see London, 2012 for review). However, those approaches have not intersected in principled ways with related speech perception research. Here we capitalize on recent progress in both domains, combining novel approaches to temporal constraints on speech decoding (Ghitza, 2011, 2012; Ghitza & Greenberg, 2009) with results on music perception, and in particular the analysis of key (Farbood, Marcus, & Poeppel, 2013). The current study builds on an experimental design by Ghitza and Greenberg (2009) that explored the possible role of brain rhythms in speech perception. They inserted periodically spaced silences into semantically unpredictable sentences that were compressed by a factor of three, and measured the error rate in word identification. Without inserted silent gaps, the error rate for word identification in compressed speech was > 50 %. However, when silence intervals of varying durations (up to 160 ms) were added in between 40-ms segments of audio signal, performance improved, resulting in a U-shaped error-rate curve with a preferred packaging rate of around 6–17 Hz (59–167 ms IOI). Packaging rate is a term Ghitza (2011) uses to describe the periodic silence-plus-audio-segment rate of compressed stimuli distorted by silence insertions. For example, stimuli with audio segments of 40 ms and silence intervals of 80 ms would have a 120-ms packaging rate (8.33 Hz). Ghitza and Greenberg (2009) interpreted the decrease in error rate resulting from the insertions of silence as the result of adding necessary decoding time. Based on these results, they suggested an oscillatory mechanism on a specific timescale for auditory processing and developed a phenomenological model to account for these counterintuitive data (Ghitza, 2011). The association between temporal properties of speech (e.g., mean syllable duration, phoneme duration, etc.) and neuronal oscillations was made explicit by Poeppel (2003), and has subsequently been investigated empirically and computationally in a number of psychophysical and neurophysiological studies (for review, see Giraud & Poeppel, 2012). An important computational angle was introduced by Ghitza (2011, 2013) in the context of formulating a model designed to address how speech signals are parsed into coarser, typically syllable-long speech fragments, and then decoded. It has now been demonstrated convincingly (Ghitza, 2012) that lower-frequency, theta oscillations are implicated in connected speech parsing; current research is addressing the role of higher frequency beta and gamma oscillations for decoding. Musical stimuli such as those in the current study have not been used in this theoretical context, but such materials can help shed light on the mechanistic role that neuronal oscillations might play in perception. In a study exploring the psychophysics of structural keyfinding by Farbood et al. (2013), the influence of rate variation in music was examined by asking musically trained listeners 29 to judge whether melodic sequences presented at different tempi ended on a resolved or unresolved pitch. The tempi of the sequences we (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.3758%2Fs13414-014-0806-0.pdf
Article home page: http://link.springer.com/article/10.3758/s13414-014-0806-0

Morwaread M. Farbood, Jess Rowland, Gary Marcus, Oded Ghitza, David Poeppel. Decoding time for the identification of musical key, Attention, Perception, & Psychophysics, 2015, pp. 28-35, Volume 77, Issue 1, DOI: 10.3758/s13414-014-0806-0