You are viewing the SNL 2017 Archive Website. For the latest information, see the Current Website.

Poster B73, Wednesday, November 8, 3:00 – 4:15 pm, Harborview and Loch Raven Ballrooms

The visual representation of lipread words in posterior temporal cortex studied using an fMRI-rapid adaptation paradigm, functional localizers, and behavior

Lynne E. Bernstein1, Silvio P. Eberhardt1, Xiong Jiang2, Maximillian Riesenhuber2, Edward T. Auer1;1Department of Speech, Language, and Hearing Sciences, 550 Rome Hall, George Washington University, Washington, District of Columbia 20052, USA, 2Department of Neuroscience, Georgetown University Medical Center, Research Building Room WP-12, 3970 Reservoir Rd. NW, Washington, District of Columbia 20007, USA

Visual speech stimuli are necessarily processed through visual pathways. But a fundamental question is to what extent the stimuli are represented qua speech in visual areas. We have shown that a more anterior region of the left posterior superior temporal sulcus/posterior middle temporal gyrus (pSTS/pMTG) responds preferentially to visual speech motion in nonsense syllables, and in contrast, a more posterior pSTS/pMTG region responds to both speech and non-speech face motion stimuli [Bernstein et al., 2011. Hum. Brain Mapp. 32, 1660-1676]. We dubbed the speech-selective area the “temporal visual speech area” (TVSA). Here, using an fMRI-RA (rapid-adaptation) paradigm, we investigated whether TVSA represents the visual forms of spoken words. In addition, regions of interest (ROIs), including the TVSA, the visual word form area (VWFA), and the fusiform face area (FFA), were individually localized using separate localizer scans. During fMRI-RA scanning, 19 young adults with normal hearing and good lipreading ability viewed visual spoken word-pairs that were the same (but different videos), or different, with perceptual differences that were near, near+, or far. The TVSA localizer scan was used to define bilateral TVSA and non-speech face motion area (NSFMA) ROIs. Left TVSA demonstrated the predicted pattern of release from adaptation: Far and near+ stimulus word-pairs demonstrated significant release from adaptation that was similar in signal level, suggesting that words that were perceptually far and words that were more similar to the adapting stimulus (near+) but still discriminably different (demonstrated with behavioral discrimination results) were represented differently within the TVSA. Release from adaptation was similar across same and near word-pairs, and was significantly below that of far and near+ word-pairs. The NSFMA demonstrated significantly lower signal levels than TVSA for all fMRI-RA pair types and did not demonstrate release from adaptation as a function of word-pair perceptual distance. Right TVSA did not demonstrate release from adaptation as a function of word-pair type: Activation was similar in the right TVSA and FFA, and their activity levels were significantly higher than in the right NSFMA, suggesting that right TVSA and FFA are activated by talking faces but are not selective for the forms of spoken words. Left MT/V5 had high signal level that was similar to that of left TVSA, but it did not demonstrate release from adaptation as a function of word-pair dissimilarity. Left FFA signal levels were similar to TVSA levels, but same, near+, and far pairs resulted in similar activation levels, suggesting that left FFA also does not represent the forms of visual spoken words. Right FFA and MT/V5 were also not selective for visual spoken word-pairs. The left VWFA signal levels were overall significantly lower than left TVSA levels but, interestingly, similar in response pattern to those of the left TVSA. D-prime behavioral discrimination values across different stimulus pairs were, in order, near < (near+) < far. These results support the existence of high-level visual representations of visual spoken word forms. (NIH DC012634)

Topic Area: Perception: Speech Perception and Audiovisual Integration

Back to Poster Schedule