You are viewing the SNL 2017 Archive Website. For the latest information, see the Current Website.

Poster D57, Thursday, November 9, 6:15 – 7:30 pm, Harborview and Loch Raven Ballrooms

The influence of speaker gaze on situated comprehension: Evidence from an ERP study

Torsten Jachmann1,2, Heiner Drenhaus1,2, Maria Staudte1,2, Matthew Crocker1,2;1Department of Language Science and Technology, Saarland University, Germany, 2Cluster of Excellence MMCI, Saarland University, Germany

We present findings from an ERP study (30 German right-handed participants, age: 19–33) investigating the influence of speaker gaze on listeners’ understanding of referential expressions in a shared visual scene. In our experiment, we utilized a stylized face performing gaze cues time-aligned to an auditory sentence. We manipulated the gaze cue preceding the second noun in the sentence by 800ms (Griffin & Bock, 2000) to investigate the neurophysiological responses to varying gaze congruency. Our data provides evidence that speaker gaze is used by listeners to make precise predictions about the unfolding sentence (N2) and also affect retrieval (N4) as well as integration (P6) cost, consistent with the retrieval-integration-model (Brouwer et al., 2012). Each experimental item consisted of a visual scene containing three objects that either differed in size (small, medium, large) or brightness (bright, medium, dark) (fully counterbalanced). After three seconds, a stylized face was displayed in the middle of these objects, so that the objects were situated diagonally around the face. Gaze cues were aligned to a spoken comparison of two of the objects of the form “Verglichen mit dem Auto, ist das Haus verhältnismäßig klein, denke ich” (“Compared to the car, the house is proportionally small, I think”). The gaze cue preceding the mentioning of the second noun (“house”) was manipulated (fully counterbalanced) to be: a. congruent (toward the named object); b. incongruent (toward the object unnamed in the sentence) c. neutral (straight toward the listener). Our analysis of the ERPs for the three experimental conditions (Congruent, Incongruent and Neutral) on the start of the second noun revealed a globally distributed significantly larger negativity for the incongruent and neutral conditions (b&c) compared to the Congruent condition (a) between 150-300ms (N2). We interpret this early effect as a mismatch between the expected word form given a context and the actual word candidates that are consistent with the speech signal listeners perceive (Hagoort and Brown, 2000). Additionally, an analysis of the time-window from 300-450ms (N4) revealed a central-parietally distributed significantly larger negativity of only the incongruent condition (b) compared to the other two conditions (a&c). We interpret this effect as a predictability-driven N400. In all conditions, predictions about the upcoming words can be made. In both the Congruent and Incongruent condition (a&b), the gazed at object may be predicted to be the upcoming word. In the Neutral condition, both so far unnamed objects are equally likely to be mentioned. Upon hearing the second noun, these predictions are either confirmed (a&c) or violated (b). The latter hinders word retrieval, which in turn leads to a stronger modulation of the N400. Finally, analysis of the time-window from 500-1000ms revealed a significantly larger positivity for only the incongruent condition (b) compared to conditions (a&c), reflecting the additional cost of integrating the noun into the unfolding mental model (Burkhardt, 2007) in those cases where processing was misled by the preceding gaze cue.

Topic Area: Perception: Speech Perception and Audiovisual Integration

Back to Poster Schedule