Voices belonging to different speakers are individualised based on anatomical and social factors, such as vocal tract size and accent (Scott & McGettigan, 2015). However, within-speaker voice properties can also be highly variable, depending on both what is said and how it is said (Lavan et al., 2019). Therefore, to effectively encode different vocal identities, listeners must be able to both tell speakers apart and ‘tell speakers together’ (Burton 2013; Lavan et al., 2019). We will use representational similarity analysis to investigate whether listeners form distinct neural representations based on the voice identities of different speakers. Representational similarity analysis is based on the premise that stimuli that share similar representations elicit similar neural response patterns in the relevant region (Kriegeskorte et al., 2008). Therefore, we can probe the neural bases of vocal identity by comparing the similarity of neural responses to naturalistic speech both across- and within- speaker identities. We will analyse neural responses from the Naturalistic Neuroimaging Database (Aliko et al. 2020), an open-access fMRI dataset of 84 participants watching feature-length movies. We have constructed hypothetical representational dissimilarity matrices (RDMs), which predict (dis)similarities in neural activity between different speech tokens from the movie “500 Days of Summer”. Each RDM predicts dissimilarities based on different aspects of speaker identity. The first RDM includes within-speaker comparisons, and predicts that neural responses to the same speaker will become more similar as listeners gain familiarity during the movie. The second RDM includes across-speaker comparisons, and predicts differences in neural responses based on the broad demographics of speaker sex. The final RDM includes across- and with- speaker comparisons to represent similarities based on individual speaker identity: neural responses to speech tokens from the same speaker are predicted to be similar, and responses to speech tokens from different speakers are predicted to be dissimilar. We will combine a searchlight and region-of-interest approach to record actual observed neural dissimilarities between speech tokens at each voxel. Searchlight analyses will be conducted within bilateral superior temporal gyrus and superior temporal sulcus, following previous work that has shown voice-sensitive neural activation within these areas (Tsantani et al., 2019). Each hypothetical RDM will then be correlated with observed neural dissimilarity at each voxel within these regions. Correlations between hypothetical RDMs and observed neural dissimilarity between speech tokens should reveal potential sensitivities to different aspects of vocal identity in the relevant brain regions. On completion of this work, we hope to better understand the neural representations that contribute to telling voices together and telling voices apart. Further, we can assess the real-world validity of previous findings, by investigating whether neural regions previously associated with voice identity processing in experimental paradigms show consistent activation in response to naturalistic stimuli.

