An Automated Procedure for Evaluating Song Imitation
Citation: Mandelblat-Cerf Y, Fee MS (
An Automated Procedure for Evaluating Song Imitation
Yael Mandelblat-Cerf 0
Michale S. Fee 0
Johan J. Bolhuis, Utrecht University, Netherlands
0 McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology , Cambridge, Massachusetts , United States of America
Songbirds have emerged as an excellent model system to understand the neural basis of vocal and motor learning. Like humans, songbirds learn to imitate the vocalizations of their parents or other conspecific ''tutors.'' Young songbirds learn by comparing their own vocalizations to the memory of their tutor song, slowly improving until over the course of several weeks they can achieve an excellent imitation of the tutor. Because of the slow progression of vocal learning, and the large amounts of singing generated, automated algorithms for quantifying vocal imitation have become increasingly important for studying the mechanisms underlying this process. However, methodologies for quantifying song imitation are complicated by the highly variable songs of either juvenile birds or those that learn poorly because of experimental manipulations. Here we present a method for the evaluation of song imitation that incorporates two innovations: First, an automated procedure for selecting pupil song segments, and, second, a new algorithm, implemented in Matlab, for computing both song acoustic and sequence similarity. We tested our procedure using zebra finch song and determined a set of acoustic features for which the algorithm optimally differentiates between similar and non-similar songs.
-
Funding: Funding for this work was provided by the National Institutes of Health (R01 MH067105). The funders had no role in study design, data collection and
analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
Songbirds learn to sing by imitating the vocalizations of their
parents or other conspecific birds to which they are exposed at a
young age [1,2,3]. Song production and learning are under the
control of complex social and behavioral factors [4,5] and are
mediated by cortical and basal ganglia circuits with a striking
homology to similar circuits underlying motor learning in the
mammalian brain [6,7]. Thus, songbirds have emerged as a
tractable model system to study the neural mechanisms underlying
the generation and learning of complex behaviors acquired
through practice, such as speech and musical performance [8].
The most commonly used songbird for laboratory studies of
vocal learning is the zebra finch, which produce bouts of singing
lasting from 15 seconds. The song of adult zebra finches consists
of a sequence of 37 distinct song syllables called a motif. The
order of the syllables within the motif, as well as the acoustic
structure within each syllable, is typically produced in a fairly
stereotyped fashion across song renditions.
Like all songbirds, zebra finches learn to sing in a series of
stages, beginning with an exposure to a tutor song while still in the
nest. During this stage, the young bird forms a memory of the
tutor song, called a song template [3]. At around 30 days post
hatch (dph), zebra finches begin to babble, producing highly
variable vocalizations called subsong. Over the course of 46
weeks of practice, during the plastic song stage, the song of a
young zebra finch gradually becomes more structured and more
similar to the tutor song [9]. Vocal variability gradually decreases
[10] until, at sexual maturity (8090 dph) the song achieves the
highly stereotyped structure of adult song.
The mechanisms underlying vocal learning are not yet fully
understood. Vocal learning and maintenance in songbirds is
dramatically disrupted by deafening or other hearing impairments
[11,12,13,14], leading to the view that vocal learning requires the
integration of auditory feedback with vocal/motor commands
[15]. According to one model of vocal learning, a comparison of
the birds own song with the song template provides an error
signal that can be used to reinforce song variations that were a
better match to the template [7,16,17,18]. Another model suggests
that auditory feedback may be used during babbling to learn the
relation between motor commands and vocal output. Such an
inverse model could then be used to reconstruct the sequence of
motor commands needed to produce a good match to the song
template [19,20]. To test models such as these, it is necessary to
study the effects of different behavioral, neuronal or other
manipulations on song learning or song production
[21,22,23,24,25]_ENREF_26.
Early efforts at quantifying song imitation were made using
visual inspection of song spectrograms [4,24]. However, the
difficulty of assessing song similarity visually, as well as the need for
a uniform metric across research labs, spurred the development of
computerized methods of song comparison. In one approach [26],
the song spectrum is represented at each moment by a small
number of spectral features, and the similarity of two sounds is
measured as the Euclidean distance in this low-dimensional space.
Song imitation is assessed by, first, manually selecting a segment of
pupil song and a segment of tutor song. Then, using the
featurebased distance metric, regions of high similarity between the
segments of pupil and tutor songs are identified, and the results are
aggregated into a global measure of acoustic similarity and
sequence similarity. Typically, the song segments chosen for such a
comparison are song motifs of both the pupil and tutor birds. This
approach to the analysis of song similarity is the basis of a
widelyused software package (Sound Analysis Pro, SAP).
In the process of using SAP to analyze the extent to which
young birds had imitated their tutors, we discovered several
challenges. Young birds, as well as those that had undergone
experimental manipulations, produced songs that were less
stereotyped than normal adult songs, and contained vocal
elements that could not be easily identified as components of a
motif. As a result, it was unclear exactly which parts of a song bout
to include in the analysis, raising concerns about possible
inconsistencies and experimenter bias in the selection process.
Here we have developed a well-specified automated procedure for
selecting segments of pupil song, thus reducing the potential for
experimenter bias.
Existing algorithms for evaluating the acoustic and sequence
similarity of pupil and tutor song depend on the segmentation of
song into syllables and silent gaps. The variability of juvenile songs
makes such segmentation highly unreliable, and motivated us to
develop a new algorithm for evaluating song similarity that treats
pupil song as a continuous stream of sound, without segmenting it
into syllables and gaps. We have tested this algorithm with
different sets of acoustic (...truncated)