Categorical Vowel Perception Enhances the Effectiveness and Generalization of Auditory Feedback in Human-Machine-Interfaces (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0059860&type=printable

Categorical Vowel Perception Enhances the Effectiveness and Generalization of Auditory Feedback in Human-Machine-Interfaces

Stepp CE (2013) Categorical Vowel Perception Enhances the Effectiveness and Generalization of Auditory Feedback in Human-Machine-Interfaces. PLoS ONE 8(3): e59860. doi:10.1371/journal.pone.0059860 Categorical Vowel Perception Enhances the Effectiveness and Generalization of Auditory Feedback in Human-Machine-Interfaces Eric Larson 0 Howard P. Terry 0 Margaux M. Canevari 0 Cara E. Stepp 0 Bernd Sokolowski, University of South Florida, United States of America 0 1 Institute for Learning and Brain Sciences, University of Washington, Seattle, Washington, United States of America, 2 Department of Speech, Language, and Hearing Sciences, Boston University , Boston , Massachusetts, United States of America, 3 Department of Biomedical Engineering, Boston University , Boston, Massachusetts , United States of America Human-machine interface (HMI) designs offer the possibility of improving quality of life for patient populations as well as augmenting normal user function. Despite pragmatic benefits, utilizing auditory feedback for HMI control remains underutilized, in part due to observed limitations in effectiveness. The goal of this study was to determine the extent to which categorical speech perception could be used to improve an auditory HMI. Using surface electromyography, 24 healthy speakers of American English participated in 4 sessions to learn to control an HMI using auditory feedback (provided via vowel synthesis). Participants trained on 3 targets in sessions 1-3 and were tested on 3 novel targets in session 4. An ''established categories with text cues'' group of eight participants were trained and tested on auditory targets corresponding to standard American English vowels using auditory and text target cues. An ''established categories without text cues'' group of eight participants were trained and tested on the same targets using only auditory cuing of target vowel identity. A ''new categories'' group of eight participants were trained and tested on targets that corresponded to vowel-like sounds not part of American English. Analyses of user performance revealed significant effects of session and group (established categories groups and the new categories group), and a trend for an interaction between session and group. Results suggest that auditory feedback can be effectively used for HMI operation when paired with established categorical (native vowel) targets with an unambiguous cue. - Funding: This study was funded by National Institutes of Health (NIH) National Institute on Deafness and Other Communication Disorders (NIDCD) training grants T32DC000018 and 1F32DC012456 (EDL), and an Undergraduate Teaching and Scholarship Grant from Boston University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Human-machine interfaces (HMIs) are designed to translate volitionally produced physiological signals into commands or control signals to augment or restore normal user function. For example, a common goal of HMI designs is to improve communication or mobility for patients with spinal cord injury, stroke or amyotrophic lateral sclerosis (ALS), in whom typical motor function has been greatly reduced or eliminated. HMI designs typically utilize biosignals such as electro-encephalography (EEG) or surface electromyography (sEMG) (which translate scalp potentials or muscle activities, respectively, into control signals). In many HMI designs, users are required to imagine specific motor movements [1] or fixate on a target in a visual scene to evoke P300 [2] or steady-state visual responses [3]. Although effective, these types of HMI designs rely on visual feedback or sustained visual attention for operation, thereby limiting normal user visual function. In an attempt to overcome this, multiple auditory-based HMI designs have exploited the cortical potentials evoked by presented auditory stimuli with increasing success [47]. Unfortunately, recent studies on HMI designs utilizing the auditory modality have found that auditory feedback is inferior to visual feedback, both in terms of resulting participant performance [8] and required participant training time [9]. Nonetheless, it has been shown that audio-visual training of spectrally-complex auditory categories (via implicit association with visual stimulus categories) could be used to obtain accurate participant categorization of novel tokens from the trained auditory categorical distributions [10], suggesting that audio-visual feedback can be used to train auditory categorization. Utilizing auditory categorical perception is complicated by the fact that participants can experience greater difficulty forming multi-dimensional auditory categorical judgments than unidimensional judgments [11]. However, auditory categorization of vowel sounds requires multi-dimensional categorization in the formant one formant two (F1F2) plane and listeners are able to quickly categorize these sounds for speech perception. Critically, listeners tend to perform effective discrimination only among those vowel categories that are perceptually relevant during early language acquisition [12]. Moreover, these learned vowel categories exhibit a so-called perceptual magnet effect, whereby similar repeated stimuli both become more easily categorized yet less readily discriminable [13]. For example, listeners perceive novel vowel categories those not utilized in their native language in terms of the perceptual categories of their native language [14], even when explicitly trained to learn the new perceptual categories [15], and listeners only distinguish between sounds belonging to one particular category when explicitly trained to do so [16]. It is not surprising that recent a recent study utilizing motor imagery and implanted cortical electrodes [17] showed high performance by mapping two dimensional control to two dimensional vowel (formant) space. Here we aim to determine the specific effects of underlying native two-dimensional vowel categorization abilities of listeners on HMI control in order to inform future development of effective auditory HMIs. To obtain a high signal-to-noise ratio (SNR) control signal to test the effectiveness of our auditory vowelproduction feedback, here we utilize sEMG as it provides signals several orders of magnitude larger in amplitude than EEG. The sEMG-based system utilized here provides real-time feedback to participants while requiring only a USB-based soundcard and sEMG amplifier connected to a standard laptop running custom C++ software (see Methods), but other control signals (e.g., EEG) could be substituted in principle. After training participants to produce specific vowel sounds based on continuous auditory and visual feedback, we found that participants readily transferred ability to control the HMI using auditory feedback alone. Criti (...truncated)