Classifying coherent versus nonsense speech perception from EEG using linguistic speech features (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-024-69568-0.pdf

Classifying coherent versus nonsense speech perception from EEG using linguistic speech features

www.nature.com/scientificreports OPEN Classifying coherent versus nonsense speech perception from EEG using linguistic speech features Corentin Puffay 1,2*, Jonas Vanthornhout 1, Marlies Gillis 1, Pieter De Clercq 1, Bernd Accou 1,2, Hugo Van hamme 2 & Tom Francart 1* When a person listens to natural speech, the relation between features of the speech signal and the corresponding evoked electroencephalogram (EEG) is indicative of neural processing of the speech signal. Using linguistic representations of speech, we investigate the differences in neural processing between speech in a native and foreign language that is not understood. We conducted experiments using three stimuli: a comprehensible language, an incomprehensible language, and randomly shuffled words from a comprehensible language, while recording the EEG signal of native Dutch-speaking participants. We modeled the neural tracking of linguistic features of the speech signals using a deep-learning model in a match-mismatch task that relates EEG signals to speech, while accounting for lexical segmentation features reflecting acoustic processing. The deep learning model effectively classifies coherent versus nonsense languages. We also observed significant differences in tracking patterns between comprehensible and incomprehensible speech stimuli within the same language. It demonstrates the potential of deep learning frameworks in measuring speech understanding objectively. Keywords EEG decoding, Deep learning, CNN, Linguistics Electroencephalography (EEG) is a non-invasive method that can be used to study brain responses to sounds. Traditionally, unnatural periodic stimuli (e.g., click trains, modulated tones, repeated phonemes) are presented to listeners, and the recorded EEG signal is averaged to obtain the resulting brain response and to enhance its stimulus-related component3,31,33. These stimuli do not reflect everyday human natural speech, as they are repetitive, not continuous, and are thus processed differently by the b rain24. Although these measures provide valuable insights about the auditory system, they do not provide insights about speech intelligibility. To investigate how the brain processes realistic speech, it is common to model the transfer function between the presented speech and the resulting brain response11,18. Such models capture the time-locking of the brain response to certain features of speech, often referred to as neural tracking. Three main model types are being used to measure the neural tracking of speech: (1) a linear regression model that reconstructs speech from EEG (backward modeling); (2) a linear regression model that predicts EEG from speech (forward modeling); and (3) classification tasks that associate synchronized segments of EEG and speech among multiple candidate s egments13,15,35. For forward and backward models, the correlation between the ground truth and predicted/reconstructed signal provides the measure of neural tracking, while for the classification task, classification accuracy is utilized. Estimations of neural tracking with such models can be used to measure speech intelligibility.40 showed a strong correlation between the neural tracking estimation obtained with linear models and speech intelligibility behavioural measurements. To investigate how the brain processes speech, research has focused on different features of speech signals, which are known to be processed at different stages along the auditory pathway. Three main classes have hence been investigated: • Acoustics (e.g., spectrogram, speech e nvelope18, f034,39) • Lexical segmentation features (e.g., phone onsets, word onsets,17,30) • Linguistics (e.g., phoneme surprisal, word frequency,7,8,20,28,36,42) 1 Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium. 2Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium. *email: ; Scientific Reports | (2024) 14:18922 | https://doi.org/10.1038/s41598-024-69568-0 1 Vol.:(0123456789) www.nature.com/scientificreports/ As opposed to neural tracking studies using broad features that carry mostly acoustic information, we here select linguistic features to narrow down our focus to speech understanding. Linguistic features of speech reflect information carried by a word or a phoneme, and their resulting brain response can be interpreted as a marker of speech u nderstanding7,20. Considering the correlation between feature c lasses12, many studies accounted for the acoustic and lexical segmentation components of linguistic features7,20, while others did not8,42, potentially measuring the neural tracking of non-linguistic information. Although the dynamics of the brain responses are known to be non-linear, most of the studies investigating neural tracking relied on linear models, which is a crude simplification. Later research attempted to introduce non-linearity, using deep neural networks. Such architectures relied on simple fully connected layers14, recurrent layers2,32, or even recently transformer-based a rchitectures15. For a global overview of EEG-based deep learning studies see35. Most deep learning work used low-frequency acoustic features, such as the Mel spectrogram, or the speech envelope2,4, or higher frequency features such as the fundamental frequency of the voice, f 034,38 to improve the decoder’s performance. Although studies using invasive recording techniques showed the encoding of multiple linguistic f eatures26, very few EEG-based deep learning studies involved linguistic features15. In a previous study36, we used a deep learning framework and measured additional neural tracking of linguistic features over lexical segmentation features in young healthy native Dutch speakers who listened to Dutch stimuli. This finding emphasized that a component of neural tracking corresponds to the phoneme or word rate, while another corresponds to the semantic context reflected in linguistic features. In addition, linear modeling studies21,41 suggested the relationship between understanding and the added value of linguistics.21 used two incomprehensible language conditions (i.e. Frisian, a West Germanic language of Friesland, and random-word-shuffling of Dutch speech) to manipulate speech understanding. However, within our deep learning framework, no investigations have been conducted on language data incomprehensible to the test subject. In this article, we aim to investigate the impact of language understanding on the neural tracking of linguistics using our above-mentioned deep learning framework. Therefore, we fine-tune and evaluate our previously published deep learning framework to measure the added value of linguistics over lexical segmentation features on the neural tracking of three different stimuli: (1) Dutch, (2) Frisian, and (3) scrambled Dutch words. Additionally, we evaluate our model on a language classification task to explore whether our CNN can learn language-specific brain responses. Methods (...truncated)