Poster B4, Thursday, August 16, 3:05 – 4:50 pm, Room 2000AB

Syllable sequencing into words: A computational model of speech production

Meropi Topalidou1, Emre Neftci1, Gregory Hickok1;1Department of Cognitive Sciences, University of California Irvine

Speech production is a complicated task that is based on the ability to sequence at the phoneme, syllable, and word levels. In 1951, Karl Lashley outlined the problem of serial order in behavior, where he proposed the existence of an underlying parallel representation for the performance of serial behavior. The serial order of the sequence according to him is encoded into the activity level of each unit associated with it. An extension of this idea is the competitive queuing (CQ) model by Grossberg (1978a, 1978b). The CQ model contains two layers comprising parallel representations of the phonemes. The plan layer is considered as the working memory of the word, activating all the nodes of the sequence with activity amplitudes equivalent to their positions. The nodes of the choice layer receive input from the plan layer. Expression of the sequence is achieved through the interaction of graded activation from the plan layer and inhibitory connections within the choice layer. Based on this idea, Bohland et al. (2010) introduced Gradient Order DIVA (GODIVA) model, which consists of a plan and a motor loop. The plan loop comprises a sequential structure and phonological content buffers which interact through a cortico-basal loop. Then both buffers send input to the initiation and speech maps, respectively. These maps are another cortico-basal loop called the motor loop. Recently, it has been argued that speech planning at the phonological level involves an internal feedback control mechanism that can detect and correct phonological selection errors prior to overt speech output (Hickok, 2012). While GODIVA is successfully implements syllable sequence production, it lacks internal feedback control. Our goal was to implement a mechanism that can achieve internal speech error detection and correction during multi-syllables production. We used the architecture proposed as one level in the Hierarchical State Feedback Control (HSFC) model as described in Hickok, et al. (2011). The network comprises four structures corresponding to functional-anatomic regions: lexical (pMTG), auditory-phonological (pSTS), motor-phonological (pIFG), and auditory-motor intermediary (Spt) levels. The lexical level is bidirectionally connected to both the auditory and motor levels, which themselves are connected to each other via the Spt auditory-motor interface level. Internal error correction is hypothesized to occur via auditory-motor interaction in cases where the motor plan does not match the lexical and auditory targets (Hickok, 2012). Analysis of network behavior showed that motor errors can be corrected by Spt driving the correction. Another outcome of the analysis was that the bidirectionality of the model was responsible for making predictions for the upcoming word or phonemes during perception. As a result, not fully audible or understandable words can be inferred from this prediction mechanism. Our model does not contain any buffer or working-memory to retain the sequence or multi-representation of phonemes in different layers, as most models of sequencing. The needed information of the sequence is provided by the connections weights between the word (in lexical level) and the phonemes (in auditory and motor level).

Topic Area: Speech Motor Control and Sensorimotor Integration