Poster C12, Friday, August 17, 10:30 am – 12:15 pm, Room 2000AB

Orofacial somatosensory inputs improves speech sound detection in noisy environments

Rintaro Ogane1,2, Jean-Luc Schwartz1,2, Takayuki Ito1,2,3;1GIPSA-lab, CNRS, Grenoble Campus, BP46, F-38402 Saint Martin D'Hères Cedex, France, 2Univ. Grenoble Alpes, 38400 Saint Martin D'Hères, France, 3Haskins Laboratories, New Haven, CT

Noise in speech communication reduces intelligibility and makes it more difficult for the listener to detect the talker's utterances. Seeing the talker's facial movements aids the perception of speech sounds in noisy environments (Sumby & Pollack, 1954). More specifically, it has been demonstrated in psychophysical experiments that visual information from facial movements facilitated the detection of speech sounds in noise (audiovisual speech detection advantage, Grant & Seitz, 2000; Kim & Davis, 2004). Besides visual information, the somatosensory information also intervenes in speech perception. The somatosensory information has been shown to modify speech perception in quite (Ito et al., 2009; Ogane et al., 2017), but it might also be useful for the detection of speech sounds in noisy environments. The aim of this study is to examine whether orofacial somatosensory inputs facilitate the detection of speech sounds in noise. We carried out a detection test involving speech sounds in acoustic noise and examined whether the detection threshold was changed by somatosensory stimulation associated with facial skin deformation. In the auditory perception test, two sequential noise sounds were presented through headphones. A target speech sound /pa/, which was recorded by a native French speaker, was embedded inside either of the two noise stimuli, at a random position in time (0.2 or 0.6 s after noise onset). Participants were asked to identify which noise sound contained the speech stimulus by pressing a keyboard key as quickly as possible. We tested 10 signal-to-noise ratio (SNR) levels between the target speech sound and the background noise (from -8 dB to -17 dB). The percentage of correct detection response was obtained at each SNR level, providing the estimation of psychometric functions. The detection threshold level was defined as the point at 75 % correct detection in the estimated psychometric function. We compared the detection threshold in two experimental conditions: in a pure auditory condition and in a condition in which somatosensory stimulation was added. In the somatosensory condition, facial skin deformation generated by a robotic device was applied in both noise intervals. The somatosensory stimulation timing was matched with the timing of the target speech sound onset (burst onset). The two experimental conditions contained all SNR levels with 20 occurrences per SNR level (hence 200 responses per condition), and the 400 stimuli (grouping the two conditions) were presented in a randomized order. We found that the detection threshold level was lowered when somatosensory stimulation was applied (with a 0.6 dB decrease in SNR at threshold). This “audio-somatosensory detection advantage” shows the role of somatosensory inputs for processing speech sounds even in noisy environments, and is consistent with the idea that the somatosensory information is part of the speech perception process.

Topic Area: Perception: Speech Perception and Audiovisual Integration

Back