You are viewing the SNL 2018 Archive Website. For the latest information, see the Current Website.

Poster D18, Friday, August 17, 4:45 – 6:30 pm, Room 2000AB

Speaker-normalized vowel representations in human auditory cortex

Matthias Sjerps1,2, Neal Fox3, Keith Johnson4, Edward Chang3;1Donders Institute for Brain Cognition and Behavior, 2Max Planck Institute Nijmegen, 3University of California San Francisco, 4UC Berkeley

Speech perception is a computationally challenging task, in part because the acoustic dimensions critical for distinguishing among speech sounds are the same as those that distinguish among different speakers. For example, while a given speaker’s /u/ will always tend to have a lower first formant (F1) than his or her /o/, a tall speaker (with a long vocal tract) will tend to have lower F1 formants for all vowels than a short speaker (with a short vocal tract). Consequently, a tall man’s /o/ and a short man’s /u/ might be acoustically identical. Behavioral research has demonstrated that listeners overcome such ambiguity by relying on context: a sound that is ambiguous between the vowels /u/ and /o/ is perceived as /o/ after a sentence spoken by a tall man (low F1), but as /u/ after a sentence spoken by a short man (high F1). However, the neurophysiological mechanisms underlying this speaker-dependent “normalization” effect remain unclear. To investigate the neural origins of normalization, neural activity was recorded directly from parabelt auditory cortex via subdurally-implanted high-density electrocorticography (ECoG) grids while five human participants listened to and identified vowels from a synthesized speech continuum ranging from /u/ to /o/ (an F1 continuum). Critically, these sounds were preceded by a context sentence that had been digitally manipulated to have either a high or low F1 range. Behavioral data replicated past normalization results: more vowels were identified as /o/ after a low F1 speaker than after a high F1 speaker. This demonstrates that listeners’ perceptual category boundary shifted to more closely reflect the F1 of the context speaker. Analysis of the ECoG recordings revealed direct evidence that context-dependent (i.e., normalized) vowel representations emerged rapidly within parabelt auditory cortex. Specifically, we found that distinct cortical sites responded preferentially to vowels from either the /u/ or /o/ end of the continuum. Importantly, however, these same neural populations also responded differentially to the same acoustic token depending on whether it was preceded by a low or high F1 speaker. Analysis of the time course of normalization demonstrated that these normalized vowel representations were preceded by a brief window (~80ms) during which acoustically veridical (context-independent) encoding of target sound acoustics dominated, suggesting that normalization first emerges in cortical processing. Finally, we found that normalized representations may partly emerge as a result of local sensitivity to the contrast between frequency distributions in currently incoming information and that in preceding speech. These results highlight the key role auditory cortex plays in the integration of incoming sounds with their preceding acoustic context, leading to the emergence of talker-normalized encoding of speech sounds which is critical to resolving the lack of invariance in speech perception.

Topic Area: Perception: Speech Perception and Audiovisual Integration