Slide Slam C1 Sandbox Series
Combining EEG and eye-tracking to investigate the prediction of upcoming speech in naturalistic virtual environments: a 3D visual world paradigm
Eleanor Huizeling1, Phillip Alday1, David Peeters1,2, Peter Hagoort1,3; 1Max Planck Institute for Psycholinguistics, 2Tilburg University, 3Donders Institute for Brain, Cognition and Behaviour
The human capacity to rapidly and efficiently process speech may be facilitated by expectations about upcoming speech content. Listeners’ eye gaze has indeed been shown to move towards a referent before it is mentioned when the prior linguistic input is highly constraining. We recently replicated this canonical finding in more naturalistic settings, in virtual reality, while a virtual speaker was present (Huizeling et al., 2021; PsyArXiv). We additionally showed that disfluencies in speech (“uh”) reduced the proportion of fixations towards the predicted object. Instead, fixations towards the virtual speaker increased. However, it remained unclear whether looks towards the speaker reflected a reduced confidence in the initial prediction and the listener waiting for the sentence to be disambiguated, or, alternatively, a mere increase in attention towards incoming speech, without a change to the prediction. Another way to investigate linguistic prediction is with electroencephalography (EEG). Earlier work has consistently observed reduced N400 amplitude when the semantic content of a word was easier to integrate with the preceding context, e.g. when a word was highly predictable. While eye movements provide vital information about whether a referent has been predicted before it is mentioned, N400 amplitude modulations may provide information about the ease of word processing the moment the word is perceived. In an ongoing proof-of-principle investigation, we are leveraging these complementary advantages of EEG and eye-tracking to study linguistic prediction in naturalistic, virtual environments. This method will allow us specifically to uncover new theoretical insights into the influence of disfluencies on the prediction of speech. Participants (n=18; target n=32) listened to sentences spoken by a virtual agent during a virtual tour of eight scenes (e.g., office, street, canteen). The agent discussed her relation to each scene while participants’ eye movements and EEG were recorded. Spoken stimuli, produced by the agent (incl. lip sync and gaze to the participant), were 128 subject-verb-object sentences, pre-recorded by a native Dutch speaker. Sentences were either predictable or unpredictable based on verb constraints, where the verb in the sentence was either related to a single object in the scene (restrictive and predictable), or related to multiple objects in the scene (unrestrictive and unpredictable). In only 50% of sentences the noun referred to an object present in the scene to confirm the participant’s prediction. The remaining 50% of sentences mentioned objects absent from the scene, arguably disconfirming the participant’s prediction. In a critical window between verb and noun onset, we expect a greater proportion of target object fixations in the restrictive compared to unrestrictive condition. We additionally expect a greater (more negative) N400 response to the noun when the uttered referent is absent from the scene, compared to when the object is present in the scene, an effect that is expected to be greater in restrictive than unrestrictive sentences. Successfully combining EEG and eye-tracking in virtual environments enables new research trajectories that cannot be adequately addressed by any one traditional method in isolation, such as investigating the extent that predictions of upcoming speech are dynamically informed by disfluencies.