Slide Slam O12
Seeing the Face of the Talker Normalizes BOLD Pattern Responses to Noisy Speech
Yue Zhang1, John Magnotti2, Anika Sonig3, Michael Beauchamp4; 1University of Pennsylvania
Viewing the talker's face improves comprehension of noisy auditory speech. To investigate the neural computations underlying this perceptual benefit, we measured BOLD fMRI pattern responses in the posterior superior temporal gyrus and sulcus (pSTG/S) of 14 healthy participants. Participants were presented with 2-second audiovisual recordings of single words (297 words from 12 different talkers) in five different formats: clear audiovisual (AcV); noisy audiovisual (AnV); clear auditory-only (Ac); noisy auditory-only (An); and visual-only (V). BOLD fMRI data was collected using a Siemens Prisma 3 tesla scanner, with words presented in a silent interval inserted between acquisitions of the multiband pulse sequence. Following presentation of each word, participants reported whether the word was intelligible ("Y") or not ("N") with a button press. Seeing the face of the talker produced a seven-fold increase in the likelihood of a "Y" rating (odds-ratio = 7.0, p = 10-8; 38% intelligible for An vs. 71% for AnV). Noisy word trials were post hoc sorted into "Y" trials (An-Y, AnV-Y) and "N" trials (An-N, AnV-N), allowing for a comparison between trials that were physically similar but perceptually different. Voxel time series were analyzed using a generalized linear model with seven regressors of interest (AcV, AnV-Y, AnV-N, Ac, An-Y, An-N, V) using the AFNI program 3dDeconvolve. The mean percent signal change across conditions was calculated for each voxel and subtracted from the response to each individual condition in order to increase the dynamic range of the fMRI pattern correlation. To compute the fMRI pattern similarity between two conditions, the normalized percent signal change in each pSTG/S voxel for the first condition was correlated with the normalized percent signal change in the second condition, resulting in a single correlation value for each pair of conditions for each of the 28 hemispheres. Interestingly, the response pattern evoked by intelligible noisy audiovisual speech was very similar to the response pattern evoked by clear audiovisual speech, even though they were physically very different, r(AcV, AnV-Y) = 0.65 +- 0.48 (mean +- SEM). In contrast, the response patterns evoked by intelligible and unintelligible noisy audiovisual speech were very different, even though the stimuli were physically similar, (AcV, AnV-N) r = 0.10 +- 0.13. Intelligibility had a weaker effect on the response patterns for auditory-only speech, r(Ac, An-Y) = 0.44 +- 0.09 vs. r(Ac, An-N) = 0.31 +- 0.10, as confirmed by a significant interaction between intelligibility and stimulus format in a linear mixed effects model. This demonstrates that intelligibility and the presence of visual speech are both important drivers of response patterns in pSTG/S. When noisy audiovisual words are intelligible, the pattern of brain response in pSTG/S is similar to that observed during clear audiovisual speech, suggesting the normalization of response patterns as a neural mechanism for the perceptual benefit of seeing the face of the talker.