Slide Slam R5
Real-Time Speech Production Envelope Reconstruction with MEG
Debadatta Dash1, Paul Ferrari1,2, Jun Wang1; 1University of Texas at Austin, 2Helen DeVos Children’s Hospital, Spectrum Health
Neural speech decoding retrieves speech information directly from the brain signals. This approach holds promise for providing better communication assistance to patients with locked-in syndrome (e.g. due to amyotrophic lateral sclerosis, ALS). However, speech decoding research using non-invasive neural signals has been limited to discrete classifications of only a few speech units (e.g., words/syllables/phrases). Considerable work remains to achieve the ultimate goal of decoding any internalized speech sounds. One stepping stone towards this goal would be to reconstruct the overt speech envelope in real-time from neural activity. Numerous studies have shown the possibility of tracking the speech envelope during speech perception but this has not been demonstrated for speech production. Here, we attempted to reconstruct the speech production envelope by decoding the temporal information of speech processing directly from neuromagnetic signals using Magnetoencephalography (MEG). MEG has been proven effective in tracking and decoding speech information in real-time due to its excellent temporal resolution. We collected neuromagnetic activity from 7 subjects speaking 5 different cued phrases (~100 trials per phrase) and from 7 different subjects speaking ‘yes’ or ‘no’ randomly (~80 trials per word) without any cue. We performed a single-trial regression analysis of the spoken speech envelope from the preprocessed gradiometer signals in real-time with 4 kHz sampling frequency using a bidirectional long-short-term-memory recurrent neural network based deep learning approach. We used wideband (0.3 - 250 Hz) neuromagnetic activity for envelope synthesis and compared it to the performance obtained using only low frequency oscillations (delta: 0.3 - 4 Hz) and delta + theta (0.3 - 8 Hz). For full spectrum decoding we successfully reconstructed the speech envelope with correlation scores of 0.82 and 0.72 for yes/no words and phrases, respectively. In the case of spoken words (yes, no), we found that the average correlation score obtained was significantly higher (p < 0.05, 1-tail paired t-test) when brainwaves with all frequencies were used compared to using delta or delta + theta only. This may indicate the importance of high frequency brain activity in characterizing dynamic information processing during speech production. However, it is also possible that the temporal characteristics of low frequency neural oscillations were not well represented due to the short analysis time windows (~ 0.3s). Indeed, for decoding phrase data with longer periods (~2 s), there were no significant differences between the low frequency and wideband pipelines. We conclude that using neural signals with all frequencies might be more efficacious for single-trial analysis of speech production envelope synthesis. In summary, this study demonstrates that it is possible to reconstruct the speech production envelope from the MEG signals in real-time providing the foundation for direct speech synthesis from non-invasive neural signals.