My Account

Poster C63, Wednesday, August 21, 2019, 10:45 am – 12:30 pm, Restaurant Hall

Learning words by encoding the sequence of sounds: A computational model of speech perception

Meropi Topalidou1, Gregory Hickok1;1Department of Cognitive Sciences, University of California, Irvine

In Topalidou et al. 2018c, we propose a simple computational model of speech production that produces sequences. The novelty of this method is that the sequences are encoded by the synaptic weights of the network that results in reduced spatial and temporal complexity compared to the existing proposed models. For example, speech production models generally contain buffers or working-memory modules to encode sequences (Bohland et al, 2010; Grossberg, 1978a) or use slots to label the kind of the unit (Foygel and Dell, 2000). The goal of this work is to demonstrate how the proposed sequence encoding in the weights emerge as a result of an initial learning of auditory-lexical association. Thus, here we present a computational model of speech perception that learns the mapping between sound sequences and representations of individual words. The organization of the model is derived from psycholinguistic models that propose a higher-level lexical (abstract word) and a lower-level phonological system. Accordingly, the proposed model contains a lexical and auditory-phonological structures bidirectionally connected to each other. These components map onto the cortical regions of mid-posterior superior temporal sulcus/middle temporal gyrus (pSTS/pMTG) for the lexical component, and posterior superior temporal gyrus (pSTG) for the auditory-phonological one. Initially, the units at the lexical level are randomly connected in an all-to-all manner with the units at the auditory-phonological level. Furthermore, the model contains a soft winner-take-all mechanism through self-excitatory and lateral-inhibitory connectivity among the units at each level. On each trial, input is sent to the ``phonemes'' of a word with a short delay between them. A consequence of the lateral inhibition among the auditory units is that the activity of the unit receiving a preceding input is higher compared to its following one. The random connectivity between the two levels results in activation of only a few of the lexical units by these auditory units. At the end of each trial, Hebbian-learning is applied among the active units of the two levels. During a simulation, a number of ``words'' (sequence of phonemes) are presented multiple times to the model. Analysis of the network behavior shows that after a simulation is completed, the lexical unit that represents a word is more strongly connected with the first ``phoneme'' than the second one, and so on. This results from (i) the different maximum activity of the auditory-phonological units on each trial and (ii) Hebbian-learning, where the more active a pre- and a post-synaptic unit are, the stronger they will be connected. A limitation of the model is that multiple lexical units can learn the same sequence, but also, in rare cases, a unique unit can learn multiple sequences. This might be remedied by adding a mechanism for pattern separation, e.g., modeling the function of the dentate gyrus in the hippocampus. To conclude, our model proposes a new method for encoding sequences in speech perception, that can easily be expanded to the encode of sequences in speech production as we previously introduced.

Themes: Computational Approaches, Speech Perception
Method: Computational Modeling

Back