Slide Slam K7
How and when are acoustic-phonetic predictions formed during silent reading?
Máté Aller1, Ediz Sohoglu2, Matthew H. Davis1; 1MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK, 2School of Psychology, University of Sussex, Brighton, UK
A well-established line of research demonstrates that human perception is shaped not only by sensory information from our environment, but also by our prior knowledge and expectations. For speech, numerous studies demonstrated that comprehension of perceptually degraded spoken words improves when they are primed with matching written text compared to mismatching text (Sohoglu et al, 2014, JEP:HPP). Behavioural and neural (fMRI, MEG) evidence suggests that this prior knowledge acts as a top-down prediction during speech perception (Davis & Sohoglu, 2020). Here we re-analyse a previously published dataset (Sohoglu & Davis, 2016, PNAS) to investigate the temporal dynamics of neural representations of written text and their transformation into acoustic-phonetic predictions. Participants were presented with 468 monosyllabic written then spoken word pairs. Each trial started with the presentation of a written word, followed by a matching, or mismatching spoken word at one of 3 levels of sensory detail. The amount of sensory detail in speech was controlled using noise vocoding (3, 6, 12 channels). After each word, participants rated the clarity of the spoken words on a four-point scale from unintelligible (1) to fully intelligible (4). We collected MEG recordings from 21 participants while they performed the task. In the present analysis we focused on the time window after the presentation of written words and before presentation of spoken words to investigate how acoustic-phonetic predictions are formed from written text. We used representational similarity analysis (RSA) to compare the representational structures in MEG recordings to hypothetical representational structures suggested by various computational models. Specifically, we used the Mahalanobis distance to calculate the dissimilarity between neural activity patterns across MEG channels for each pair of written words, separately for each timepoint. We also computed model representational structures across word items based on orthographic, acoustic, and phonetic features. These model representational structures are then correlated, timepoint-by-timepoint, to the similarity of observed neural representations. One plausible hypothesis would be that neural representations are more similar to orthographic representations early after the presentation of the written text, whereas acoustic and phonetic representations emerge later, prior to the presentation of the spoken word. Behavioural results demonstrated higher clarity ratings for spoken words which were preceded by matching compared to mismatching written text, replicating previous findings that prior knowledge contributes to speech perception. Effects of matching text were numerically equivalent to doubling the amount of sensory detail (vocoder channels). Preliminary MEG results show an above-chance correlation between the similarity of spatial patterns of neural activity and Levenshtein edit distance for specific pairs of words (i.e., orthographic representations) between 200 and 500 ms after written word onset. In depth analyses of other types of representations (i.e., acoustic, phonetic) are ongoing.