Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception (pdf)

Article PDF cannot be displayed. You can download it here:

https://epjnbp.epj.org/articles/epjnbp/pdf/2014/01/40366_2013_Article_8.pdf

Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception

Kröger et al. EPJ Nonlinear Biomedical Physics 2014, 2:2 http://www.epjnonlinearbiomedphys.com/content/2/1/2 RESEARCH Open Access Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception Bernd J Kröger1,2*, Jim Kannampuzha1 and Emily Kaufmann3 * Correspondence: 1 Neurophonetics Group, Department of Phoniatrics, Pedaudiology, and Communication Disorders, Medical School, RWTH Aachen University, Aachen, Germany 2 Cognitive Computation and Applications Laboratory, School of Computer Science and Technology, Tianjin University, Tianjin, China Full list of author information is available at the end of the article Abstract Background: Quantitative neural models of speech acquisition and speech processing are rare. Methods: In this paper, we describe a neural model for simulating speech acquisition, speech production, and speech perception. The model is based on two important neural features: associative learning and self-organization. The model describes an SOM-based approach to speech acquisition, i.e. how speech knowledge and speaking skills are learned and stored in the context of self-organizing maps (SOMs). Results: The model elucidates that phonetic features, such as high-low, front-back in the case of vowels, place and manner or articulation in the case of consonants and stressed vs. unstressed for syllables, result from the ordering of syllabic states at the level of a supramodal phonetic self-organizing map. After learning, the speech production and speech perception of speech items results from the co-activation of neural states within different cognitive and sensorimotor neural maps. Conclusion: This quantitative model gives an intuitive understanding of basic neurobiological principles from the viewpoint of speech acquisition and speech processing. Keywords: Speech production; Speech perception; Speech acquisition; Babbling; Imitation; Associative learning; Self-organization; Neural maps; Self-organizing maps; Sensorimotor learning Background While a great deal of research has been carried out in order to investigate brain locations of different parts or modules which comprise the speech production and speech perception system (e.g. [1-3]), little is known about the neural functioning of these modules during speech acquisition, speech production, and speech perception. In order to fill this gap, quantitative functional neural models have been developed (e.g. [4-8]). One model, the neuroanatomically grounded Hebbian-learning model [8], establishes highly specialized functional units called “Hebbian neuronal circuits” (HNCs, see also [9]). This model appears to be especially neurobiologically realistic since it learns to associate sensory and motor speech items in a similar way to the early phases of speech acquisition in children. In order to maintain balance between neurobiological realism © 2014 Kröger et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Kröger et al. EPJ Nonlinear Biomedical Physics 2014, 2:2 http://www.epjnonlinearbiomedphys.com/content/2/1/2 and computational tractability, this approach does not model single neurons and spike chains, but rather uses “cells” or “nodes” as basic neuron-like elements which represent a local set of neurons, and thus it realizes a lumped-type or mean-field type model in which the primary objects of modeling are the average activity rate of the neuron set (or cell) and cell exhibitory and inhibitory connectivity. Self-organizing map approaches (SOM, Kohonen) belong to the group of lumped element rate based approaches as well, but it should be noted that the degree of abstraction is much higher in SOM models than in models such as the neuroanatomically grounded Hebbian-learning model of [8]. On the other hand, SOM approaches – as well as more neuroantomically grounded approaches – are capable of representing the basic principles of neural systems, i.e. self-organization, associative learning, Hebbian learning, adaptation, and neural plasticity. Quantitative neural models of speech processing (i.e. speech production and speech perception) and speech acquisition which include the generation and/or perceptual processing of articulatory and acoustic speech signals are rare. One of the most cited approaches in this direction is the DIVA model [7,10,11]. The DIVA approach mainly concentrates on modeling the relationship between sensory feedback and speech articulation. That model has been successfully applied e.g. to exemplifying motor adaptation in speech production [7,10,12]. The approach introduced in the present paper concentrates on the questions of how speech knowledge, including knowledge concerning speech motor skills, is learned and how this knowledge is stored. In contrast, no assumptions concerning knowledge or skill storage are given in the DIVA approach. Thus, the goal of the present paper is to introduce a comprehensive model of speech acquisition, speech production, and speech perception, based on SOM theory, which includes knowledge and skill storage. Methods Structure of the model Biologically-based neural models that describes complex behavior or complex human functions, such as speaking, separate functional structure and knowledge [13]. The functional structure of such a system is basically composed of neural maps and neural mappings [7]. A neural map is an assembly of model neurons which represents a specific neural state, i.e. a phonemic, phonetic, motor plan, or sensory state in the case of our modeling approach. These maps are located in specific cortical regions. A neural map, e.g. neural map A, comprises Ni model neuron ni (i = 1, …, Ni). Each of these model neurons may be activated to a certain degree ai(t) at each time instant t. The whole activation pattern ai(t) (i = 1, …, Ni) of a neural map represents a specific neural state, e.g. a motor plan, a sensory state, or a phonemic state at a certain time instant. The strength of activation of each model neuron varies between zero (0, no activation) and one (1, full activation). All model neurons ni (i = 1, …, Ni) of the neural map A and all neurons nj (j = 1, …, Nj) of the neural map B can be connected with each other (Figure 1). The entirety of Ni x Nj neural links or neural connections between a neural map A and B is called a neural mapping. The strength (or connectivity) of each neural link is called synaptic Page 2 of 28 Kröger et al. EPJ Nonlinear Biomedical Physics 2014, 2:2 http://www.epjnonlinearbiomedphys.com/content/2/1/2 Figure 1 Neural map A, B and neural mapping between the two maps. Map A comprises 5 x 5 model neurons (Ni = 25); map B comprises the same number of model neurons (Nj = 25); neural ma (...truncated)