Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception
Kröger et al. EPJ Nonlinear Biomedical Physics 2014, 2:2
http://www.epjnonlinearbiomedphys.com/content/2/1/2
RESEARCH
Open Access
Associative learning and self-organization as basic
principles for simulating speech acquisition,
speech production, and speech perception
Bernd J Kröger1,2*, Jim Kannampuzha1 and Emily Kaufmann3
* Correspondence:
1
Neurophonetics Group,
Department of Phoniatrics,
Pedaudiology, and Communication
Disorders, Medical School, RWTH
Aachen University, Aachen,
Germany
2
Cognitive Computation and
Applications Laboratory, School of
Computer Science and Technology,
Tianjin University, Tianjin, China
Full list of author information is
available at the end of the article
Abstract
Background: Quantitative neural models of speech acquisition and speech
processing are rare.
Methods: In this paper, we describe a neural model for simulating speech acquisition,
speech production, and speech perception. The model is based on two important
neural features: associative learning and self-organization. The model describes an
SOM-based approach to speech acquisition, i.e. how speech knowledge and speaking
skills are learned and stored in the context of self-organizing maps (SOMs).
Results: The model elucidates that phonetic features, such as high-low, front-back in
the case of vowels, place and manner or articulation in the case of consonants and
stressed vs. unstressed for syllables, result from the ordering of syllabic states at the level
of a supramodal phonetic self-organizing map. After learning, the speech production
and speech perception of speech items results from the co-activation of neural states
within different cognitive and sensorimotor neural maps.
Conclusion: This quantitative model gives an intuitive understanding of basic
neurobiological principles from the viewpoint of speech acquisition and speech
processing.
Keywords: Speech production; Speech perception; Speech acquisition; Babbling;
Imitation; Associative learning; Self-organization; Neural maps; Self-organizing maps;
Sensorimotor learning
Background
While a great deal of research has been carried out in order to investigate brain locations of different parts or modules which comprise the speech production and
speech perception system (e.g. [1-3]), little is known about the neural functioning of
these modules during speech acquisition, speech production, and speech perception. In order to fill this gap, quantitative functional neural models have been developed (e.g. [4-8]).
One model, the neuroanatomically grounded Hebbian-learning model [8], establishes
highly specialized functional units called “Hebbian neuronal circuits” (HNCs, see also
[9]). This model appears to be especially neurobiologically realistic since it learns to associate sensory and motor speech items in a similar way to the early phases of speech
acquisition in children. In order to maintain balance between neurobiological realism
© 2014 Kröger et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
Kröger et al. EPJ Nonlinear Biomedical Physics 2014, 2:2
http://www.epjnonlinearbiomedphys.com/content/2/1/2
and computational tractability, this approach does not model single neurons and spike
chains, but rather uses “cells” or “nodes” as basic neuron-like elements which represent
a local set of neurons, and thus it realizes a lumped-type or mean-field type model in
which the primary objects of modeling are the average activity rate of the neuron set
(or cell) and cell exhibitory and inhibitory connectivity.
Self-organizing map approaches (SOM, Kohonen) belong to the group of lumped
element rate based approaches as well, but it should be noted that the degree of abstraction is much higher in SOM models than in models such as the neuroanatomically
grounded Hebbian-learning model of [8]. On the other hand, SOM approaches – as
well as more neuroantomically grounded approaches – are capable of representing the
basic principles of neural systems, i.e. self-organization, associative learning, Hebbian
learning, adaptation, and neural plasticity.
Quantitative neural models of speech processing (i.e. speech production and speech
perception) and speech acquisition which include the generation and/or perceptual
processing of articulatory and acoustic speech signals are rare. One of the most cited
approaches in this direction is the DIVA model [7,10,11]. The DIVA approach mainly
concentrates on modeling the relationship between sensory feedback and speech articulation. That model has been successfully applied e.g. to exemplifying motor adaptation
in speech production [7,10,12]. The approach introduced in the present paper concentrates on the questions of how speech knowledge, including knowledge concerning
speech motor skills, is learned and how this knowledge is stored. In contrast, no assumptions concerning knowledge or skill storage are given in the DIVA approach.
Thus, the goal of the present paper is to introduce a comprehensive model of speech
acquisition, speech production, and speech perception, based on SOM theory, which
includes knowledge and skill storage.
Methods
Structure of the model
Biologically-based neural models that describes complex behavior or complex human
functions, such as speaking, separate functional structure and knowledge [13]. The
functional structure of such a system is basically composed of neural maps and neural
mappings [7].
A neural map is an assembly of model neurons which represents a specific neural
state, i.e. a phonemic, phonetic, motor plan, or sensory state in the case of our modeling approach. These maps are located in specific cortical regions. A neural map, e.g.
neural map A, comprises Ni model neuron ni (i = 1, …, Ni). Each of these model neurons may be activated to a certain degree ai(t) at each time instant t. The whole activation pattern ai(t) (i = 1, …, Ni) of a neural map represents a specific neural state, e.g. a
motor plan, a sensory state, or a phonemic state at a certain time instant. The strength
of activation of each model neuron varies between zero (0, no activation) and one
(1, full activation).
All model neurons ni (i = 1, …, Ni) of the neural map A and all neurons nj (j = 1, …, Nj)
of the neural map B can be connected with each other (Figure 1). The entirety of
Ni x Nj neural links or neural connections between a neural map A and B is called a
neural mapping. The strength (or connectivity) of each neural link is called synaptic
Page 2 of 28
Kröger et al. EPJ Nonlinear Biomedical Physics 2014, 2:2
http://www.epjnonlinearbiomedphys.com/content/2/1/2
Figure 1 Neural map A, B and neural mapping between the two maps. Map A comprises 5 x 5 model
neurons (Ni = 25); map B comprises the same number of model neurons (Nj = 25); neural ma (...truncated)