Decoding of speech imagery as a window to speech planning and production
Poster D44 in Poster Session D with Social Hour, Friday, October 7, 5:30 - 7:15 pm EDT, Millennium Hall
Joan Orpella1, Francesco Mantegna1, Florencia Assaneo2, David Poeppel1,3; 1New York University, 2Universidad Autónoma de México, 3Ernst Strüngman Institute
Speech imagery (the ability to generate internally quasi-perceptual experiences of speech events) is recognized as a fundamental ability tightly linked to important cognitive functions such as inner speech, phonological working memory, and predictive processing. Speech imagery is also considered an ideal medium to test theories of overt speech. Despite its pervasive nature, the study and use of speech imagery for clinical or basic research has been tremendously challenging. The lack of direct observable behavior and the difficulty in aligning imagery events across trials and individuals have prevented a better understanding of the underlying neural dynamics and limited use as a research tool. We aim to map out the generation of speech imagery by pairing magnetoencephalography (MEG) with a novel experimental protocol designed to overcome these difficulties. Thirty participants (22 women; mean age=26, std=7) imagined producing isolated syllables (e.g., pa, ta, ka) immediately after these were presented on a screen and a second time 1000ms later, while we recorded their neural activity with MEG (157-channel whole-head axial gradiometer). This Imagery condition was contrasted with a Reading condition, in which participants read the syllables but were asked not to imagine them. We recorded electromyographic data from participants’ upper lip and jaw to monitor micromovements. We also acquired magnetic resonance imaging data (T1) from a subset of participants to source project their speech imagery data. Participants were trained on an overt version of the task prior to the MEG session. Their overt productions were recorded to estimate timings and durations. We used a decoding approach to (1) classify participants MEG data as Imagery or Reading, (2) classify the imagined syllables, (3) explore different levels of representation (syllable, consonant-vowel transition) during imagery, and (4) ensure that our results could not be explained by participants’ micromovements. Participants’ MEG data was projected to source space to investigate the temporal dynamics of speech imagery. Robust classification scores were obtained for the contrast between Imagery and Reading and between the syllables. Syllable decoding revealed a rapid sequence of representations from visual encoding to the imagined speech event. Participants’ micromovements did not discriminate between the syllables. The neural correlates of the decoded sequence of representations maps neatly onto the predictions of current models of speech production (e.g., State Feedback Control; SFC) providing some evidence for hypothesized internal and external feedback loops for speech planning and production, respectively. Additionally, a novel decoding approach (Windowed Multinomial Classification) revealed the presence of two nested and concurrent levels of representation (syllable and consonant-vowel transition) while exposing the compressed nature of representations during planning. The results show an evolving sequence of representations for speech imagery with neural dynamics and characteristics consistent with SFC. It is assumed that the same sequence underlies the motor-based generation of sensory predictions that modulate speech perception and the articulatory loop of phonological working memory. The results highlight the potential of speech imagery for research, based on these new experimental approaches and analytical methods, and further pave the way for successful non-invasive brain-computer interfaces.
Topic Areas: Speech Motor Control, Language Production