Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks

MEG Encoding using Neural Language & Speech Models and Shared Context Semantics in Listening Stories

There is a Poster PDF for this presentation, but you must be a current member or registered to attend SNL 2023 to view it. Please go to your Account Home page to register.

Poster B27 in Poster Session B, Tuesday, October 24, 3:30 - 5:15 pm CEST, Espace Vieux-Port
This poster is part of the Sandbox Series.

SUBBA REDDY OOTA1, Nathan Trouvain1, Gael Jobard2, Frederic Alexandre1, Xavier Hinaut1; 1Inria-Research, 2University of Bordeaux

Self-supervised Transformer-based language and speech processing models have revolutionized the field of both language and speech processing. Inspired by these models, recent neuroscience studies have shown that brain responses of people comprehending language can be predicted well by text-based language models, as well as speech-based models. However, existing studies on brain encoding for natural stimuli focus on functional magnetic resonance imaging (fMRI) recordings which provide high spatial resolution but poor temporal resolution. In this paper, we investigate the shared information between these Transformer-based language and speech models for brain encoding using Magneto-encephalography (MEG) recordings which provide high temporal resolution. We present a systematic study of the alignment between both neural language & speech models and brains across two language modalities (reading vs. listening) in order to estimate the temporal aspect of language and speech processing in the brain. We represent text stimulus using pretrained Transformer-based text models like BERT & GPT-2 and speech stimulus using speech deep learning models like HuBERT, Data2Vec, and Wav2Vec2.0. Our experiments on MEG-MASC naturalistic story-listening dataset (Gwilliams et al. 2022) reveal that Transformer-based text representations lead to a significant prediction in brain alignment across auditory and language regions until 550ms (with several peaks) while speech models like HuBERT and Data2Vec better capture the MEG brain activity to auditory stimulus peaks at around 200ms. Interestingly, predictions from these models agree with previous literature from controlled settings (i.e. pitch task, lateralization task, piano tones), showing similar behavior for naturalistic settings enabling us to conclude that deep learning language and speech models seem to provide relevant features likely to be used during human language and speech processing. Further, the layer-wise analysis reports that text models' brain predictivity seems to increase for the early layer with short context and late layers with long context (i.e. the peaks get larger and higher MEG predictivity), while speech model Data2Vec better encodes the MEG even in frontal language regions and observed peaks after 350ms only in the later layers. Further, we are investigating the shared information between different aspects (semantic vs. non-semantic) of speech and language processing models across time, what properties vary between these models, and how different neural models can capture the neural activity in brain regions.

Topic Areas: Meaning: Lexical Semantics, Speech Perception

SNL Account Login

Forgot Password?
Create an Account