An online database of phonological representations for Mandarin Chinese

Behavior Research Methods, May 2009

A Web-based database is developed to provide psycholinguists with a large-scale phonological representation system for all Mandarin Chinese monosyllables. The construction of the system is based on the slot-based phonological pattern generator (PatPho), with an adequate consideration of the language-specific features of the Chinese phonology. Users can retrieve the relevant phonological representations through an interactive query system on the Web. The query outcomes can be saved in a number of formats, such as Excel spreadsheets, for further analyses. This representation system can be used for a variety of purposes—in particular, connectionist language modeling and, more generally, the study of Chinese phonology.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.3758%2FBRM.41.2.575.pdf

An online database of phonological representations for Mandarin Chinese

PING LI 0 0 Pennsylvania State University, University Park , Pennsylvania A Web-based database is developed to provide psycholinguists with a large-scale phonological representation system for all Mandarin Chinese monosyllables. The construction of the system is based on the slot-based phonological pattern generator (PatPho), with an adequate consideration of the language-specific features of the Chinese phonology. Users can retrieve the relevant phonological representations through an interactive query system on the Web. The query outcomes can be saved in a number of formats, such as Excel spreadsheets, for further analyses. This representation system can be used for a variety of purposesin particular, connectionist language modeling and, more generally, the study of Chinese phonology. - Researchers in connectionist modeling of language have for some time been concerned with the issue of phonological representations of the relevant linguistic input to the model. How to faithfully represent the phonological patterns of words and the differences between words in a language has been discussed since the pioneering work of Rumelhart and McClelland (1986) on the acquisition of the English past tense. Recent development in this field favors the approach in which a words pronunciation is coded on a slot-based representation, while taking into consideration the articulatory features of phonemes in the word (Joanisse & Seidenberg, 1999; MacWhinney & Leinbach, 1991; Plunkett & Juola, 1999). In particular, the phonology of a word is encoded in terms of a template with a fixed set of slots; each phoneme of the word is assigned to a different slot, depending on which syllable it belongs to and at which position it appears in the syllable, such as the onset, nucleus, or coda. Most recently, on the basis of this idea of syllabic templates, Li and MacWhinney (2002) introduced a phonological pattern generator (PatPho) for connectionist modeling. PatPho is able to represent English words with variable length (up to three syllables) in a syllabic template of CCCVVCCCVVCCCVVCCC, with Cs representing consonants, Vs representing vowels, and each CCCVVCCC representing one syllable. This system accurately captures the phonological features of English words and has been successfully applied in our connectionist models of child language development (Li, Farkas, & MacWhinney, 2004; Li, Zhao, & MacWhinney, 2007; Zhao & Li, in press). The phonological representation of words is also an important issue in the connectionist study of other languages. For example, Chinese has an ideographic writing system, and it has always been a difficult problem for connectionist models to correctly represent the phonology of Chinese characters. To solve this problem, different researchers have developed different representational systems (e.g., Hsiao & Shillcock, 2004; Xing, Shu, & Li, 2004). Although these systems have greatly improved our understanding of language acquisition and language processing in Chinese, there are some problems with these systemsnotably, in terms of their generalizability for computational models other than their own. In Hsiao and Shillcocks (2004) work, the pronunciation of Chinese monosyllabic characters was represented by a 27-dimension binary vector. In their coding, the first 14 dimensions of the vector represent the phonetic features of an initial constant, the next 8 dimensions represent those of a nucleus vowel, 3 other dimensions represent the final constant, and the final 2 dimensions represent four tones in Mandarin Chinese. A significant advantage of their system is the parsimony of the binary codes (0 or 1), which allows their computational model to be tractable. The parsimony, however, introduces certain problems that may limit the accuracy of their representations. For example, only a single nucleus vowel can be represented in their system, which is inconsistent with Chinese phonology, which allows two or even three vowels to be clustered together (i.e., diphthongs or triphthongs). Hsiao and Shillcocks representations therefore cannot capture the vowel structure in Chinese. Another problem is related to the tones in Mandarin Chinese. Because there are five tones (including a neutral tone) in Mandarin Chinese, the two-node binary representations in Hsiao and Shillcocks system are unable to represent all the five tones. Xing et al.s (2004) phonological representation of Chinese characters was based on PatPho. It splits Chinese monosyllables into three partsinitial, final, and toneand uses six slots to represent the tone and the phonemes that can occur in different positions of the syllable. Each slot consists of five units, and each unit can be assigned a real value between 0.0 and 1.0 to represent a specific articulatory feature of the phoneme. In total, a 30-dimensional feature vector with real values can be used to represent the pronunciation of a Chinese character. This system, as compared with Hsiao and Shillcocks (2004), can successfully code the diphthongs and triphthongs in its representation and is able to capture the phonetic features of Chinese syllables. One minor problem with Xing et al.s (2004) system is that five units are used to represent a phoneme or a tone. However, as we will discuss below, three nodes are sufficient to represent the features of a phoneme, and a single unit with varying real numbers is able to represent all the five tones. As such, Xing et al.s system has some redundancy, and there is room to reduce its computational complexity. This representation also heavily relies on the Pinyin system (the standard romanization system for Mandarin Chinese; Institute of Linguistics of the Chinese Academy of Social Sciences, 2002). The Pinyin system is simple and easy to learn, but its simplicity also causes the problem that many different phonemes have to be represented by the same letter. For example, the Pinyin letter i could represent three phonemes that are similar but different, according to its varying positions in a syllable. A similar situation holds for a, o, e, and so on, since phonemic differences are not clearly represented in the system. Although connectionist modeling of Chinese has become an increasingly important topic in psycholinguistic research, there has not yet been a convenient tool with which investigators can accurately generate large-scale phonological representations of Chinese characters. The issue is even more serious for researchers who are not familiar with the Chinese language but, nevertheless, want to do comparative studies, as well as for investigators whose native language is Chinese but who are not trained in the Pinyin system. It would be convenient for these investigators to obtain simple, easily accessible, and vector-based representations of Chinese pronunciations. Our online phonological database of Chinese characters is designed to help researchers to do just that. Here, we introduce a phonological re (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.3758%2FBRM.41.2.575.pdf
Article home page: http://link.springer.com/article/10.3758/BRM.41.2.575

Xiaowei Zhao, Ping Li. An online database of phonological representations for Mandarin Chinese, Behavior Research Methods, 2009, pp. 575-583, Volume 41, Issue 2, DOI: 10.3758/BRM.41.2.575