My Account

Poster A66, Tuesday, August 20, 2019, 10:15 am – 12:00 pm, Restaurant Hall

Spectro-temporal prediction errors support perception and perceptual learning of degraded speech: evidence from MEG encoding

Matthew H Davis1, Ediz Sohoglu1;1MRC Cognition and Brain Sciences Unit, University of Cambridge

Speech perception can be improved in three ways: 1) providing higher-fidelity speech, i.e. improving signal quality 2) providing supportive contextual cues, or prior knowledge, and 3) providing relevant prior exposure that leads to perceptual learning (Sohoglu and Davis, 2016). Predictive coding (PC) theories provide a common framework to explain the neural impact of these three changes to speech perception. According to PC accounts, neural representations of expected sounds are subtracted from bottom-up signals, such that only the unexpected parts are represented, i.e. ‘prediction error’ (Rao and Ballard, 1999). Previous multivariate fMRI data (Blank and Davis, 2016) show that when listeners’ predictions are weak or absent neural representation are enhanced for higher-fidelity speech sounds. However, when listeners make accurate predictions (e.g. after matching text), higher-fidelity speech leads to suppressed neural representations despite better perceptual outcomes. These observations are uniquely consistent with prediction error computations, and challenge alternative accounts (sharpening or interactive activation) in which all forms of perceptual improvement should enhance neural representation. In the current work we applied forward encoding models (Crosse et al., 2016) to MEG data and test the time-course of cross-over interactions between signal quality and prior knowledge or perceptual learning on neural representations. We analysed data from a previous MEG study (N=21, English speakers) which measured evoked responses to degraded spoken words (Sohoglu and Davis, 2016). Listeners heard noise-vocoded speech with varying signal quality (spectral channels), preceded by matching or mismatching written text (prior knowledge). Consistent with previous findings (Sohoglu et al., 2014), ratings of speech clarity were enhanced by greater spectral detail and matching text. Exposure to speech following matching text is shown to promote perceptual learning (Hervais-Adelman et al, 2008) and we similarly observe better recognition accuracy for vocoded words in isolation before and after this exposure. We report three main MEG findings: (1) MEG responses to speech were best predicted using a spectro-temporal modulations (outperforming envelope, spectrogram and phonetic feature representations). (2) We observe a cross-over interaction between clarity and prior knowledge consistent with prediction error representations; if matching text preceded speech then greater spectral detail was associated with reduced forward encoding accuracy whereas increased encoding accuracy was observed with greater spectral detail following mismatching text. This interaction emerged in MEG responses before 200ms consistent with early computations of prediction error proposed by PC theories. (3) Analyses of model weights (temporal response functions) show that perceptual learning reduced the sensitivity of MEG responses to spectro-temporal modulations in speech, however this effect did not depend on the amount of spectral detail presented (i.e. we did not observe the same cross-over interaction for perceptual learning as for prior knowledge). Further analyses will compare the coding of the specific spectro-temporal modulations that are preserved (slow, broadband) or degraded (fast, narrow-band) by noise-vocoding. We predict that perceptual learning will down-weight prediction errors for spectro-temporal cues that are degraded by noise-vocoding and up-weight prediction errors for cues preserved by noise-vocoding. These findings contribute towards the detailed specification of a computational model of speech perception based on PC principles.

Themes: Speech Perception, Computational Approaches
Method: Electrophysiology (MEG/EEG/ECOG)

Back