Poster D16, Friday, August 17, 4:45 – 6:30 pm, Room 2000AB
Lexical tone classification in frontal and posterior regions using fNIRS
Benjamin Zinszer1, Todd Hay1, Alex Athey1, Bharath Chandrasekaran1;1The University of Texas at Austin
Introduction: Tonal languages encode linguistic information through changes in pitch height and direction across phonological segments. In Mandarin Chinese, the same monosyllables form minimal pairs when contrasted between four distinct tones. Previous research has demonstrated that these tones are decodable from neurophysiological measures, namely EEG (Llanos, Xie, & Chandrasekaran, 2017) and fMRI (Feng et al., 2017), using various machine learning algorithms. Feng and colleagues used a support vector machine to successfully decode responses to intonated syllables measured with fMRI in superior temporal gyrus and inferior parietal lobule. In this study, we apply functional near-infrared spectroscopy (fNIRS) to decode neural responses to Mandarin lexical tones. Like fMRI, fNIRS measures cortical hemodynamics across several seconds after stimulus onset, but fNIRS is portable, silent, and resilient to head motion. Method: Seven native speakers of Mandarin Chinese (2M/5F) heard 100ms duration /i/-vowels intonated with tones 1 (high-flat), 2 (rising), and 4 (falling), interleaved with 140-150ms silences via insert earphones. Participants watched a silent, subtitled nature film throughout the experiment and were instructed to ignore the sounds. Tone stimuli were organized into Static blocks that repeated the same /i/+tone stimulus. Participants heard each Static block (and three additional variable-tone blocks), randomly ordered, in each run for seven to nine total runs with self-paced breaks. One participant withdrew after three runs. We measured changes in blood oxygenation (HbO) using a NIRx NIRScout system with 12 sources and 14 detectors distributed bilaterally over superior temporal, inferior parietal, and frontal regions. The fNIRS data were bandpass filtered between 0.005 and 0.7 Hz and converted to oxygenated hemoglobin (HbO) using Homer2 (Huppert et al., 2009). HbO measurements were normalized to zero mean and unit standard deviation. We trained perceptron models with two hidden layers of varying sizes (2 to 128 nodes) to discriminate the HbO measurements for the Static blocks (Tone 1, Tone 2, Tone 4) for each scan within a participant (80% training, 10% validation, and 10% testing). Inferring the basis dimensionality of the data from these single-subject models, we trained one model to generalize across all participants and estimated channel importance using integrated gradients (Sundararajan, Taly, & Yan, 2017). Results: Scan-classification accuracy was greater than 95% in every subject for hidden layers with 16 nodes. In the combined classification using this hidden layer size, accuracy for the group data was 79%. Channels above 90th percentile importance were located in bilateral posterior temporo-parietal and left frontal areas. Conclusion: Our findings suggest that fNIRS captures information about tone-specific processing activity, as previously demonstrated in EEG and fMRI. Consistent with an fMRI classifier (Feng et al., 2017), bilateral temporo-parietal regions provided important tone-specific information. Previous univariate fNIRS research using Mandarin tones also found effects of categorical perception in the superior and middle temporal gyri (Zinszer et al., 2015). Importantly, the present study also highlights left frontal regions, which have a debated role in auditory and phonetic processing. This work sets the stage for lexical tone research with MRI-incompatible populations, such as children and cochlear implant users.
Topic Area: Perception: Speech Perception and Audiovisual Integration