Slide Slam F6
Human cortical encoding of vowels
Yulia Oganian1, Ilina Bhaya-Grossman1, Edward Chang1; 1University of California, San Francisco
Introduction: Understanding natural speech requires listeners to map the continuous acoustics of vowel sounds onto discrete categories, a process that includes compensating for variability within and across speakers, such as due to different co-articulatory contexts or voice height. Vowel identity is determined by the central frequencies of the first two peaks in the vowel spectrum, called first (F1) and second (F2) formants. For example, /u/ and /i/ have similarly low F1 frequencies but differ in F2. Behaviorally, the influential perceptual magnet theory posits that the cognitive representation of vowels non-linearly warps the continuous F1-F2 acoustic space towards the prototypes of each vowel category. At the neural level, it has been suggested that human speech cortical areas on the superior temporal gyrus (STG) represent the relative spectral location of F1 and F2. It remains unclear, however, how such a representation may support the perception of vowel categories. Here, we capitalized on the high spatial and temporal resolution of high-density intracranial recordings (ECoG), to study how local neural tuning and distributed population representations in the STG represent vowels, using natural speech and artificial vowel sounds. Experiment 1: In Experiment 1, native speakers of Spanish (n = 7) listened to Spanish sentences naturally produced by a variety of speakers, while we recorded neural activity from the STG. First, we found that local neural populations (recorded at a single electrode contact) represented either one or both formants, with joint encoding of both formants on the majority of contacts. Formant tuning followed a nonlinear sigmoidal pattern, resulting in sensitivity to a subdivision of a formant’s full range. Further, these representations shifted to normalize for differences between speakers with different voice heights. Decoding analyses show that local populations cannot reliably discriminate between vowel categories. Crucially, however, at the population level, the range of local co-encodings of F1 and F2 allowed for tuning to single vowel categories. Moreover, population-level responses to tokens from the same category clustered together, as predicted by the perceptual magnet theory. Experiment 2: The limited vowel formant space of natural speech did not allow asking whether this STG code is specific to speech sounds, nor comprehensively describing the range of joint formant encodings in STG. To address this, in Experiment 2, participants (n = 8) listened to artificial vowel sounds with formant combinations ranging beyond those encountered in natural speech. We found that the neural representation of formants extended beyond the natural vowel formant space. Moreover, local neural tuning was best described by two-dimensional formant receptive fields, encoding a wide range of combinations of F1 and F2 values, including but not limited to the distance between F1 and F2. Conclusions: Our results show that vowel-discriminating neural populations on STG are characterized by complex, nonlinear two-dimensional formant receptive fields. In human speech, this representation gives rise to the discrimination between vowel categories and sensitivity to their boundaries at the population level. Taken together, this work describes the neural computations in human STG that give rise to the perception of vowel categories.