Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks

Tracking each variable of speech inference in the human brain with MEG decoding

Poster B87 in Poster Session B, Tuesday, October 24, 3:30 - 5:15 pm CEST, Espace Vieux-Port

Joséphine Raugel1, Valentin Wyart1, Jean-Rémi King2; 1Laboratory of Cognitive and Computational Neuroscience, Ecole Normale Supérieure, PSL Research University, Paris, France, 2Laboratory of Perceptive Systems, Ecole Normale Supérieure, PSL Research University, Paris, France

Language is central to human cognition: it allows individuals to share and accumulate knowledge and it structures their social interactions. Yet, the biological and computational bases of language functions remain largely unknown. To tackle this issue, we combine magneto-encephalography recordings of healthy participants with state-of-the-art deep learning models of speech and language to understand how the human brain recognizes words from a sequence of phonemes. For this, we decode, at each time samples, the phonetic and the lexical features from a linear combination of MEG sensors. We then study these decoded representations within a formal inferential model originally theorized in decision making: namely sequential evidence accumulation. Per this mathematical framework, each phoneme and word can be hierarchically modeled as pieces of evidence that incrementally specify the meaning of a sentence. We decode these phonemic and semantic activations in the brain through linear mapping, building temporal generalization matrices. Large Language Models are typically optimized to predict the next token (word or phoneme) based on an embedded context. Regarding neural data, we expect our results to show a hierarchical predictive coding architecture, whereby the brain generates a hierarchy of predictions. To test whether the language representations of the brain and of AI systems both follow the predictions of this inferential framework, we correlate the decoded representations with the stimulus posterior as approximated with Large Language Models (GPT-3, EnCodec). During natural speech processing, we can decode phonemic activations occurring around 200ms after the start of each phoneme, as well as word activations occurring around 400ms after the start of each word. Both temporal generalization (TG) matrices associated to these phonemic and word activations have oblong shapes, though the TG matrix associated to words has a later and thicker figure. This reveals a later and longer retention period of word representations over phonemic representations, coherent with a potential hierarchical processing of language. We can also decode basal expectancies as well as conditional expectancies of phonemes – the latter is approximated with Large Language Models. Moreover, we evaluate in which measures the decoded word and phonemic representations vary with the levels of expectancy of these words and phonemes, may it be basal or conditional. By providing an experimentally-approved formal framework modelling language processing in the human brain, the present interdisciplinary project sheds light on how the human brain combines words into meaning, and in which measures this human process can be compared and contrasted to latent processes of Large Language Models. By highlighting the similarities and differences between brains and modern deep neural networks, the present results promise to help bridge the disciplines of AI and neuroscience.

Topic Areas: Speech-Language Treatment, Speech Perception

SNL Account Login

Forgot Password?
Create an Account