Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks

Neural dynamics of high-level linguistic predictions during natural audiovisual discourse processing

There is a Poster PDF for this presentation, but you must be a current member or registered to attend SNL 2023 to view it. Please go to your Account Home page to register.

Poster B111 in Poster Session B, Tuesday, October 24, 3:30 - 5:15 pm CEST, Espace Vieux-Port
This poster is part of the Sandbox Series.

David Hernández-Gutiérrez1, Romain Pastureau1, Suhail Matar1, Mikel Lizarazu1, Nicola Molinaro1; 1Basque Center on Cognition, Brain and Language (BCBL)

In natural environments, language is typically multimodal, encompassing both auditory and visual cues. When listening to connected speech, listeners also process the visual input provided by the speaker's gestures and facial movements. Previous studies have demonstrated that co-speech gestures can impact the neural processing of words, whether they are presented in isolation or embedded in sentences. Furthermore, the brain leverages visual speech cues to enhance linguistic comprehension when accompanied by gestures. However, most of these studies have employed non-naturalistic linguistic stimuli and time-locked neural measures (e.g. ERPs). In this study, we want to investigate the impact of observing the speaker's co-speech gestures and visual speech on the neural processing of spontaneous speech, with a particular focus on high-level linguistic representations. Recent research has indicated that during discourse comprehension, the brain continuously generates predictions based on the preceding linguistic context. A question we seek to address is how predictions regarding the meaning and syntactic category of upcoming words are modulated by the visual cues received from the speaker, and what are the corresponding cortico-anatomical correlates of these effects. To achieve this, we will analyze magnetoencephalographic neural activity from a sample of 30 participants presented with spontaneous speech. The recordings consist of 80 audiovisual retellings of cartoons (1 minute each), delivered by five different speakers. The continuous neural tracking of linguistic predictions will be performed with the encoding model of the multivariate temporal response function (mTRF), to predict the recorded neural data using the features of interest, namely lexico-semantic surprisal and part-of-speech (PoS) surprisal. We use GPT-2, a deep-learning model, to compute the lexico-semantic surprisal of each word, considering all the preceding words in the retelling. Subsequently, PoS surprisal is computed based on these values. The experimental conditions include four audiovisual (AV), four visual-only (VO) and one auditory-only (AO) conditions. The AV and VO conditions involve full body-face presentation, mouth occlusion, dots depicting the speaker's movements, and random dynamic dots. Participants are instructed to answer a comprehension question following each presentation. This is the first study using this methodology to investigate multimodal spontaneous speech comprehension. However, based on previous research, we predict that the TRFs of lexico-semantic surprisal will exhibit higher prediction accuracy in the AV conditions compared to the AO condition (except AV random dots). Within the AV presentations, we anticipate that the tracking of semantic surprisal will be more effective when gestures are accompanied by visual speech, compared to seeing the speaker with a mask. Regarding the neural tracking of PoS surprisal, we hypothesize that the highest accuracy will be observed in the AV presentations, depending on the syntactic category. We further anticipate that the neural source locations will exhibit activations in multimodal integration areas in AV compared to AO presentations. Moreover, cortical areas associated with lexico-semantic surprisal are expected to show more widespread activation across the scalp compared to PoS surprisal, which may predominantly activate temporal areas. These results will have significant implications for the neurobiology of multimodal communication and enhance our understanding of situated language comprehension.

Topic Areas: Signed Language and Gesture, Meaning: Discourse and Pragmatics

SNL Account Login

Forgot Password?
Create an Account