Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Poster Slams

Investigating the role of spectral cues in eliciting the Speech-to-Song Illusion

Poster C55 in Poster Session C, Friday, October 7, 10:15 am - 12:00 pm EDT, Millennium Hall
This poster is part of the Sandbox Series.

Alejandra Santoyo1, Antoine J. Shahin1, Kristina C. Backer1; 1UC Merced

The Speech-to-Song (S2S) illusion is an auditory phenomenon where a spoken phrase repeated in sequence will begin to sound like it is being sung (Deutsch et al., 2011). Prior research has looked at the acoustic properties that may give rise to a song-like percept and found that variations in pitch and rhythm, both of which are fundamental to music perception, are important drivers of the S2S illusion (e.g., Groenveld et al., 2020; Falk et al., 2014). Here, we further examine the role of pitch in the S2S illusion. Specifically, we converted 12 lists, each comprising three English words, into whispered and sinewave (SW) speech. Whispered speech preserves the formant transitions and the envelope of speech, but the fundamental frequency is lost. SW speech lacks the fundamental frequency and degrades the formant transitions but keeps the original speech envelope intact. While it is difficult to understand SW speech, listeners are better able to identify the words if they are made aware that they are listening to speech (Vanden Bosch der Nederlanden et al., 2015). To look at this top-down effect on the S2S illusion, some participants were told that the SW block was based on speech (Known-SW speech) and others were not (Unknown-SW speech). We predicted that because whispered and SW speech lack pitch cues, both will elicit a stronger S2S illusion compared to regular speech by allowing rhythmic qualities to stand out, and this effect may be stronger when listeners do not know SW is based on speech. We recruited both musicians and non-musicians to examine if musical expertise influences the strength of S2S illusion. Thus far, twenty participants (12 musicians and 8 non-musicians) participated. Participants completed three blocks (SW, Whispered, Regular speech) with 12 trials per block. Each trial played one iteration of a three-word list and then asked participants to rate on a Likert scale of 1 to 9 whether the list sounded most like speech (‘1’) or most like song (‘9’). After this initial rating (R1), listeners heard the list repeated 9 times and were asked to once again rate on a scale of 1 to 9 (R9). First, no significant differences between R1 and R9 were observed for any of the three speech conditions, indicating that stimulus repetition did not elicit robust S2S illusory perception in the present study. Similarly, no significant effects were found between the musician and non-musician groups or between the Known-SW and Unknown-SW groups, which could be due to the currently small samples sizes. However, overall, participants reported significantly higher (i.e., more song-like) ratings for the SW speech condition compared to both the Whispered and Regular speech conditions, for both the initial (R1) and the final (R9) rating. Moreover, there were no significant differences in R1 or R9 ratings between the Whispered and Regular speech conditions. These preliminary results indicate that the degradation of formant transitions (as in SW speech) gives rise to more song-like perception initially, but this manipulation does not strengthen the S2S illusion.

Topic Areas: Perception: Auditory, Perception: Speech Perception and Audiovisual Integration