Computational and quantitative methods in understanding the neurobiology of language

Speakers:
Barry Devereux, Queen’s University, Belfast and University of Cambridge
John Hale, Cornell University, New York
Odette Scharenborg, Radboud University Nijmegen
Leila Wehbe, University of California, Berkeley

Modern methods in computational and quantitative linguistics incorporate a wealth of data on language, from statistical information about the acoustic and phonological regularities of speech and syntactic structure, to distributed models of word semantics and utterance meaning. An emerging area of interest is the integration of computational linguistics, big data, computational modelling and neuroimaging methods to study the neurobiology of language. This approach is attractive because it allows theoretical claims about different properties of language function to be explicitly formulated and quantified, using statistical data about specific linguistic phenomena derived from the linguistic environment. In this symposium, the 4 speakers will discuss their perspective on how interdisciplinary approaches that combine computational and data-driven methods with cognitive theory provide new opportunities for understanding language and the brain.

The spatio-temporal dynamics of language comprehension: combining computational linguistics and RSA with MEG data

Barry Devereux received a B.Sc. in Mathematics and Computer Science and a Ph.D. in Cognitive Science from University College Dublin, Ireland, before going on to do postdoctoral training in cognitive neuroscience and the neurobiology of language at the Centre for Speech, Language and the Brain, Dept. of Psychology, University of Cambridge. His work investigates spoken language comprehension and object processing from a multidisciplinary perspective, combining computational modelling of language and object processing with cognitive theory and neuroimaging. From July 2017, he is an assistant professor in Cognitive Signal Processing at Queen’s University, Belfast.

Abstract

Spoken language comprehension involves cortical systems supporting several complex and dynamic processes, from acoustic analysis and word recognition, to building syntactic structure and representing sentence meaning. Recent advances in computational and quantitative linguistics have seen an explosion in the availability of language data and increasingly sophisticated language models relevant to these processes. In a series of MEG experiments where participants listened to natural sentences, we investigate how lexically-driven expectations and syntactic structure-building interact over time by analysing how corpus-derived statistical models of lexico-syntactic information influence the multivariate spatiotemporal dynamics of incremental language comprehension in the brain. The results of these experiments demonstrate how quantitative measures of specific linguistic properties can yield a detailed picture of processes of integration during sentence comprehension in the brain.

Word-by-word neuro-computational models of human sentence processing

John Hale serves as Associate Professor of Linguistics at Cornell University. He received his PhD from Johns Hopkins University in 2003 under the direction of Paul Smolensky. His early work on information-theoretical complexity metrics was honored with awards such as the EW Beth dissertation prize. He is the author of  Automaton Theories of Human Sentence Comprehension and principal investigator in the NSF-ANR joint project “Neuro-computational models of natural language” in collaboration with Jonathan R. Brennan, Christophe Pallier and Éric de La Clergerie. For more information, browse https://courses.cit.cornell.edu/jth99/.

Abstract

The “mapping problem” (Poeppel 2012) between language structures and brain mechanisms stands in the way of a truly computational neurobiology of language. This talk offers a candidate solution, rooted in time-series predictions about comprehension effort. Such predictions are derived by traversing representations such as syntactic phrase structure trees in the manner of an incremental parsing algorithm. The resulting values serve to predict, word-by-word, neural signals such as BOLD collected during naturalistic listening. Using multiple regression, one can model incremental comprehension at many different levels of structure simultaneously. The results point to a spatial division of labor, isolating specific types of comprehension work to specific anatomical regions.

Insights into the cognitive processes underlying speech processing in the presence of background noise

Odette Scharenborg is an associate professor at the Centre for Language Studies, Radboud University Nijmegen, The Netherlands, and a research fellow at the Donders Institute for Brain, Cognition and Behaviour at the same university. Her research interests focus on narrowing the gap between automatic and human spoken-word recognition. She did a PhD, on the same topic, with Lou Boves and Anne Cutler in Nijmegen, the Netherlands. Odette is interested in the question where the difference between human and machine recognition performance originates, and whether it is possible to narrow this difference, and investigates these questions using a combination of computational modelling and behavioural experimentation. In 2008, she co-organised the Interspeech 2008 Consonant Challenge, which aimed at promoting comparisons of human and machine speech recognition in noise in order to investigate where the human advantage in word recognition originates. She was one of the initiators of the EU Marie Curie Initial Training Network “Investigating Speech Processing In Realistic Environments” (INSPIRE, 2012-2015). In 2017, she will be co-organising a 6-weeks Frederick Jelinek Memorial Summer Workshop on Speech and Language Technology on the topic of the automatic discovery of grounded linguistic units for languages without orthography. She is currently PI on a 5-year (Vidi) project funded by the Netherlands Organisation for Scientific Research on the topic of non-native spoken-word recognition in noise.

Abstract

Most people will have noticed that communication in the presence of background noise is more difficult in a non-native than in the native language – even for those who have a high proficiency in the non-native language involved. Why is that? I will present results of several behavioural experiments and computational modelling studies investigating the effect of background noise on native and non-native spoken-word recognition, in particular, on the underlying processes of multiple word activation and the competition between candidate words. These results show that the effects of background noise on spoken-word recognition are remarkably similar in native and non-native listening. The presence of noise influences both the multiple activation and competition processes: It reduces the phonological match between the input and stored words and consequently increases the set of candidate words considered for recognition during spoken-word recognition resulting in delayed and elongated phonological competition. Moreover, both native and non-native listeners flexibly adjust their reliance on word-initial and word-final information when a change in listening conditions demands it.

Modeling brain responses to natural language stimuli

Leila Wehbe works on studying language representations in the brain when subjects engage in naturalistic language tasks. She uses functional neuroimaging and natural language processing and machine learning tools to build predictive models of brain activity as a function of the stimulus language features. She completed her PhD in the Mitchell Lab in Carnegie Mellon University where she focused on modeling the different processes engaged in natural reading.

Abstract

Due to the complexity of language processing, most neurobiology-of-language studies focus on answering a specific hypothesis by using highly controlled stimuli. While controlled experiments are often seen as hallmarks of good science, the natural interdependence of language properties such as syntax and semantics makes it nearly impossible to vary only one of them in a controlled experiment. As a result, carefully handcrafted stimuli either fail to be “controls”, as they unintentionally vary many parameters simultaneously, or they can be highly artificial and run the risk of not generalizing beyond the experimental setting. For studying language, we argue that naturalistic experiments along with predictive modeling provide a promising alternative to the controlled approach. These studies sample the stimulus space broadly and then learn the relationship between stimulus features and brain activity. In this talk, I will outline some details of this approach using a specific example in which subjects read a complex natural text while their functional neuroimaging data was acquired. Different natural language processing tools were used to annotate the semantic, syntactic and narrative features of the stimulus text. Encoding models were then fit to predict brain activity as a function of the different language features. The performance of these models allows us to formulate and test hypotheses about the function of different brain regions. I will describe the spatio-temporal functional brain language maps we built using this approach. I will also present a new online engine (boldpredictions.gallantlab.org) we have built which allows researchers to compare the results of our naturalistic language experiments with more traditional controlled experiments.

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save