My Account

Poster D87, Wednesday, August 21, 2019, 5:15 – 7:00 pm, Restaurant Hall

What can machine learning tell us about human categorical perception?

Sara Beach1,2, Dimitrios Pantazis2, Ola Ozernov-Palchik2, Sidney May2,3, Tracy Centanni2,4, John Gabrieli2;1Harvard University, 2Massachusetts Institute of Technology, 3Boston College, 4Texas Christian University

Categorical perception, the phenomenon by which stimuli that vary continuously on any number of physical dimensions are nevertheless perceived as members of discrete classes, has organized thinking about phonemic processing for decades. In this study, we asked how faithfully a linear support vector machine classifier trained on high temporal resolution human neural data would reproduce human psychophysical performance and whether it would reveal the emergence of phonemic representations over time as a function of task demands. We recorded magnetoencephalography (MEG) from 48 adult volunteers who were exposed to 40 tokens each of 10 steps of an acoustic continuum ranging from ‘ba’ to ‘da’, presented via earphones, in pseudorandom order, in each of two conditions. During the Passive listening condition, participants performed a visual target detection task to maintain arousal but were told they could ignore the sounds. During the Active listening condition, participants were made to label each stimulus as either ‘ba’ or ‘da’ via counterbalanced and delayed button-press. A 10x10 perceptual dissimilarity matrix was constructed for each participant from the differences between each pair of stimuli in the percent of each labeled ‘ba’. A 10x10 neural dissimilarity matrix was constructed for each participant and at each timepoint by performing five-fold cross-validated binary classification of the MEG sensor-level data for each pair of stimuli in the Passive and Active conditions separately. First, focusing on the time window (229-253 ms after sound onset) in which ‘ba’ and ‘da’ prototypes were robustly decoded in both Passive and Active conditions, we observed greater overall neural dissimilarity (i.e., better decoding) in the Active condition, but no significant difference in the correlation of individuals’ Passive vs. Active neural matrices with the perceptual matrix. On the other hand, the average Active neural matrix had a significantly higher correlation with perception than did the average Passive neural matrix. This suggests that while neural decoding of MEG responses to an acoustic continuum may be too noisy for individual-difference analyses, the effect on decoding of attention to a categorical judgment, previously reported in fMRI, is also evident in MEG. Second, examining the entire 1-s trial window in the Active condition, decoding participants’ perception of ‘ba’ or ‘da’, regardless of stimulus identity, was significantly above chance for sustained periods, corresponding to an early window and a late window. Temporal generalization analysis revealed that phoneme representations in the late window (~400-600 ms) were sustained and stable, perhaps due to the task requirements of categorical judgment and delayed response. Preliminary analysis of sensor patterns contributing to the phonemic representations suggest dynamic involvement of different cortical regions in supporting categorical decision-making about an acoustic continuum.

Themes: Speech Perception, Computational Approaches
Method: Electrophysiology (MEG/EEG/ECOG)

Back