Slide Slam O15
“Um…, it’s really difficult to... um… speak fluently”: Neural tracking of real-life spoken language
Galit Agmon1, Martin G. Bleichner2, Tsarfaty Reut1, Zion Golumbic Elana1; 1Bar Ilan University, Israel, 2University of Oldenburg, Germany
Neural speech-tracking experiments usually use idealized speech (e.g., audiobooks) to study how the brain encodes and represents the phoneto-acoustic and linguistic features of speech. However, this type of speech is dramatically different from the type of speech produced spontaneously in real life, which is the type of speech material that our brains deal with daily. Compared to idealized speech, real-life speech contains frequent pauses and fillers (“um”, “er”), it can be highly disfluent and the speech-rate varies over time. Real-life speech is also very associative, as speakers construct their sentences ‘on the fly’, affecting syntactic coherence and leading to highly complex sentences. The current study aims to extend speech-tracking research to authentic real-life spoken language and explore how the brain encodes its unique properties. To this end, we recorded neural activity using EEG from 20 participants as they listened to a recording of a spontaneous, unscripted, personal narrative in Hebrew. Using speech-tracking analysis, we analyzed how brain responses are affected by several inherently variable features in real-life speech. Specifically, using a multiple-regressor approach, we estimated the temporal response function (TRF) to the acoustic envelope of the speech, after characterizing it along five different dimensions that proposedly might affect neural encoding of real-life speech. These included: lexicality, clause-boundaries, clause-duration, speech fluency, and syntactic complexity. Results of the speech-tracking analysis yielded robust TRFs in fronto-central electrodes with four prominent components, which we term the TRF-P50, TRF-N100, TRF-P2, TRF-N350 (reflecting the polarity and latency of each component). Importantly, all of these components were modulated by the speech features we examined, with the most prominent effects found for the later TRF-N350 component. The lexicality of utterances (proper words vs. fillers and mazing) mainly affected the late TRF-P2 and TRF-N350 components, which were mostly absent for non-lexical utterances. Words that constitute clause boundaries (opening vs. closing words) also showed modulation of the TRF-P2 and TRF-N350 components. Clause duration affected the latency of the TRF-N350 response, and modulated the amplitude of earlier components. Syntactic complexity of a clause also affected the amplitude of the TRF- N350, with larger responses to high-complexity vs. low-complexity clauses. Speech rate, however, did not seem to have a prominent effect on the speech-tracking response. In conclusion, the current work demonstrates the importance of acknowledging the complexity of real-life speech. Incorporating the features of real-speech into speech-tracking models will bring about a more ecological understanding of how the brain processes and encodes speech, and deals with its inherent complexities and disfluencies. Our results point specifically to the late TRF components which seem to be sensitive to the non-uniformities and linguistic complexities of spontaneous speech. This suggests that they may be the ‘continuous-speech correlates’ of the well-studied P2 and N400 ERP components in more traditional neurolinguistic research. We hope that this proof-of-concept study will provide the foundation for developing more specific models for studying neural processing of ecological real-life speech.