PatPho: A phonological pattern generator for neural networks
BRIAN MACWHINNEY
0
1
0
Carnegie Mellon University
,
Pittsburgh, Pennsylvania
1
PING LI University of Richmond
,
Richmond, Virginia
Much of the power of neural network modeling for language use and acquisition derives from a reliance on statistical regularities implicit in the phonological properties of words. Researchers have devised several methods for representing the phonology of words, but these methods are often either unable to represent realistically sized lexicons or inadequate in the ways they represent individual words. In this paper, we present a new phonological pattern generator (PatPho) that allows connectionist modelers to derive accurate phonological representations of the English lexicon. PatPho not only generates phonological patterns that can scale up to realistically sized lexicons, but also accurately and parsimoniously captures the similarity structures of the phonology of monosyllabic and multisyllabic words.
-
Rumelhart and McClellands (1986) connectionistmodel
of the acquisition of the English past tense had a profound
positive impact on the fields of artificial neural networks,
language acquisition, and cognitive psychology. However,
that model was also heavily criticized for the way it
represented phonological patterns of the verbal input. The
fundamental structure of the past tense learning model was a
nonstandard phonological structure called the
Wickelfeature. Critics (Lachter & Bever, 1988; Pinker & Prince,
1988) argued that these distributed feature structures were
unable to faithfully represent the phonological structures
of words and the differences between words. As a result of
these problems, connectionist researchers subsequently
abandoned the use of Wickelfeatures as a way to
represent phonological input, using instead a variety of
alternative systems for phonological representations. These
methods fall roughly into three categories.
The first class of methods (e.g., Plunkett & Marchman,
1991, 1993) treats the word as a simple string of phonemes.
For example, Plunkett and Marchman used 6 binary units
to code each of the three positions in a set of English
consonantvowelconsonant (CVC), VCC, and CCV
wordlike strings. Features included voicing, sonority, and place
and manner of articulation. Because each of the three
seg
This research was supported by Grants BCS-9975249 and
BCS998009 from the National Science Foundation. We thank Xiaoming
Zhao and Lihua Chen, who assisted in the development of the source
code, and Igor Farkas for helping with the binary coding and
conducting the PCA analyses. Please address correspondence to P. Li,
Department of Psychology, University of Richmond, Richmond, VA 23173
(e-mail: ).
ments used six features, 18 units were needed to code a
three-phoneme word. A representation of this type
provides only an approximation to the phonology of words,
owing to its use of arbitrarily determined binary values for
phonological features. In addition, the representation
accommodates only a limited number of monosyllables.
Because of these problems, it is not a good choice for
simulations that attempt to model the learning of a realistic
lexicon.
Miikkulainen (1997) used a variant of this scheme with
five units on a continuous scale to represent the
phonological features of each English phoneme. In his scheme, a word
is a simple concatenationof its component phonemes. This
extended representation scheme can accommodate words
beyond monosyllables. It also provides a more accurate
representation of the phonological features, because of its
use of continuous units instead of binary units. However,
it has problems capturing the similarity between words of
different phonemic lengths. For example, spot and pot in
this coding will end up sharing very little similarity,
because phonemic concatenation leads to dislocated
positioning of similar phonemes: For spot, Units 15 represent
/s/ and Units 610 represent /p/, whereas for pot, Units
15 represent /p/ and Units 610 represent //, and so on.
Thus, the same phoneme activates completely different
units in the representation (see Plaut, McClelland,
Seidenberg, & Patterson, 1996, for a discussion of a similar
problem in orthographic representations).
A second method for representing phonological
patterns encodes no more than a single segment at a time. For
example, the NetTalk system (Sejnowski & Rosenberg,
1988) uses a read-head approach to processing, which
accepts single English orthographic letters one by one and
then outputs the corresponding English sounds. To do this,
the system maintains a local memory of the context. This
form of representation is unable to capture larger
phonological patterns and cannot deal with word-based
irregularities or nonlocal phonological patterns.
A third method for representing phonological patterns
relies on the slot-based representation introduced by
MacWhinney and Leinbach (1991) and applied in a
variety of later models (Joanisse & Seidenberg, 1999; Plaut
et al., 1996; Plunkett & Juola, 1999). MacWhinney and
Leinbach showed how the switch from Wickelfeatures to
slot-based representations solved many of the problems
with Rumelhart and McClellands (1986) model of past
tense learning. By using slot-based representations, the
phonology of a word is encoded in terms of a template
with a fixed set of slots, rather than as a string with either
a fixed or a variable length or as a series of isolated
segments. This method has its basis in autosegmental
phonological theory, according to which phonemes are bundles
of features in metric syllabic grids (Goldsmith, 1976;
Levelt, 1989). Each segment in a word is assigned to a
different slot, depending on which syllable it belongs to and
whether it appears in the syllables onset, nucleus, or coda.
For a monosyllabic word, it is relatively simple to assign
phonemes to their appropriate positions. For example,
Joanisse and Seidenberg used the CCVVCCC template to
represent English monosyllables, in which a consonant
initial would occur in the first C position and consonant
clusters would occupy the first two CC positions; single
vowels occur at the first V position, but diphthongs
occupy both VVs, and so on. Plunkett and Juola used a
CCCVVCCC template, which could additionally
accommodate consonant clusters such as /str/ at the word-initial
position. Thus, in this type of coding, spot could occur
as spCoVtCC in the template, whereas pot could occur
as CpCoVtCC, thus preserving their phonological
similarities.
The representations used by Joanisse and Seidenberg
(1999) and by Plunkett and Juola (1999) are restricted to
monosyllables. MacWhinney and Leinbach (1991) also
used slots to represent multisyllabic English verbs. For
example, a full trisyllabic template in MacWhinney and
Leinbachs representation had a
CCCVVCCCVVCCCVVCCC form. Recently, Bullinaria (1997) presented a
model that combined the slot-based representation with
aspects of the single-segment processing used in NetTalk.
However, it appears (...truncated)