A fast and accurate zebra finch syllable detector
RESEARCH ARTICLE
A fast and accurate zebra finch syllable
detector
Ben Pearre1*, L. Nathan Perkins1, Jeffrey E. Markowitz2, Timothy J. Gardner1
1 Department of Biology, Boston University, Boston, Massachusetts, United States of America, 2 Department
of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America
*
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Pearre B, Perkins LN, Markowitz JE,
Gardner TJ (2017) A fast and accurate zebra finch
syllable detector. PLoS ONE 12(7): e0181992.
https://doi.org/10.1371/journal.pone.0181992
Editor: Brenton G. Cooper, Texas Christian
University, UNITED STATES
Abstract
The song of the adult male zebra finch is strikingly stereotyped. Efforts to understand motor
output, pattern generation, and learning have taken advantage of this consistency by investigating the bird’s ability to modify specific parts of song under external cues, and by examining timing relationships between neural activity and vocal output. Such experiments require
that precise moments during song be identified in real time as the bird sings. Various syllable-detection methods exist, but many require special hardware, software, and know-how,
and details on their implementation and performance are scarce. We present an accurate,
versatile, and fast syllable detector that can control hardware at precisely timed moments
during zebra finch song. Many moments during song can be isolated and detected with false
negative and false positive rates well under 1% and 0.005% respectively. The detector can
run on a stock Mac Mini with triggering delay of less than a millisecond and a jitter of σ 2
milliseconds.
Received: September 15, 2016
Accepted: March 31, 2017
Published: July 28, 2017
Copyright: © 2017 Pearre et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Song data used for
training and testing are available at 10.17605/OSF.
IO/BX76R The four software packages are available
under Open Source licenses from DOIs listed in
Appendix A of the manuscript, and also here: 10.
5281/zenodo.437555 10.5281/zenodo.437557 10.
5281/zenodo.437559 10.5281/zenodo.437558.
Funding: This work was funded by NIH grants
5R01NS089679-02 and 5U01NS090454-02. The
funders had no role in study design, data collection
and analysis, decision to publish, or preparation of
the manuscript.
1 Introduction
The adult zebra finch (Taeniopygia guttata) sings a song made up of 2–6 syllables, with longer
songs taking on the order of a second. The song may be repeated hundreds of times per day,
and is almost identical each time. Several brain areas reflect this consistency in highly stereotyped neural firing patterns, which makes the zebra finch one of the most popular models for
the study of the neural basis of learning, audition, and control.
If precise moments in song can reliably be detected quickly enough to trigger other apparatus during singing, then this consistency of behaviour allows a variety of experiments. A common area of study with song-triggered experiments is the anterior forebrain pathway (AFP), a
homologue of mammalian basal ganglia consisting of a few distinct brain areas concerned
with the learning and production of song. For example, stimulation of the lateral magnocellular nucleus of the anterior nidopallium (LMAN)—the output nucleus of the AFP—at precisely
timed moments during song showed that this area controls specific variables in song output
[1]. Song-synchronised stimulation of LMAN and the high vocal centre (HVC) in one hemisphere or the other showed that control of song rapidly switches between hemispheres [2].
Feedback experiments have shown that Field L and the caudolateral mesopallium may hold a
PLOS ONE | https://doi.org/10.1371/journal.pone.0181992 July 28, 2017
1 / 18
A fast and accurate zebra finch syllable detector
Competing interests: The authors have declared
that no competing interests exist.
representation of song against which auditory signals are compared [3]. The disruption by
white noise of renditions of a syllable that were slightly above (or below) the syllable’s average
pitch showed that the apparently random natural variability in songbird motor output is used
to drive change in the song [4], and the AFP produces a corrective signal to bias song away
from those disruptions [5]. The song change is isolated to within roughly 10 milliseconds (ms)
of the stimulus, and the shape of the learned response can be predicted by a simple mechanism
[6]. The AFP transfers the error signal to the robust nucleus of the arcopallium (RA) using
NMDA-receptor–mediated glutamatergic transmission [7]. The course of song recovery after
applying such a pitch-shift paradigm showed that the caudal medial nidopallium is implicated
in memorising or recalling a recent song target, but in neither auditory processing nor directed
motor learning [8].
Despite the power and versatility of vocal feedback experiments, there is no standard syllable detector. Desiderata for such a detector include:
Accuracy: How often does the system produce false positives or false negatives?
Latency: The average delay between the target syllable being sung and the detection.
Jitter: The amount that latency changes from instance to instance of song. Our measure of jitter is the standard deviation of latency.
Versatility: Is detection possible at “difficult” syllables?
Ease of use: How automated is the process of programming a detector?
Cost: What are the hardware and software requirements?
A variety of syllable-triggering systems have been used, but few have been documented or
characterised in detail. In 1999, detection was achieved by a group of IIR filters with handtuned logical operators [9]. The system had a latency of 50 or 100 ms, and accuracy and jitter
were not reported. As access to computational resources has improved, approaches have
changed: in 2009, hand-tuned filters were implemented on a Tucker-Davis Technologies digital signal processor, bringing latency down to around 4 ms [5]. But as with other filter-bank
techniques, it is not strictly a syllable detector but rather a pitch and timbre detector—it cannot
identify a frequency sweep, or distinguish a short chirp from a long one—and thus requires
careful selection of target syllables. Furthermore, the method is neither inexpensive nor, based
on our experience with a similar technique, accurate. 2009 saw the application of a neural network to a spectral image of song [3]. They reported a jitter of 4.3 ms, but further implementation and performance details are not available. In 2011, stable portions of syllables were
matched to spectral templates in 8-ms segments [7]. This detector achieved a jitter of 4.5 ms,
and false-negative and false-positive rates of (...truncated)