Slide Slam O3
The amplitude modulation of sounds is crucial for categorizing speech and music
Andrew Chang1, Xiangbin Teng2, M. Florencia Assaneo3, David Poeppel1,4; 1New York University, 2Max Planck Institute for Human Development, 3Universidad Nacional Autónoma de México, 4Max Planck Institute for Empirical Aesthetics
Despite our increasingly rich understanding of how humans process speech and music, surprisingly little is known about how they are treated as different auditory signals in the first place. From the perspective of acoustics, the properties of the signals can be differentiated. It has been shown that speech and music tend to have different amplitude modulation (AM) rates. Specifically, the AM rate of speech peaks between 4 and 5 Hz, while the AM rate of music tends to be slower, emphasizing modulation rate around 2 Hz (Ding et al., 2017). In addition, it is often argued that the AM of music tends to be more temporally regular or isochronous than speech (Kotz et al., 2018). Based on these insights, we hypothesized that the AM temporal features of an acoustic signal, especially its peak rate and regularity, are critical factors that determine whether a signal will be categorized as speech or music. Here we parametrically manipulated (i) the AM peak frequency (0.6 – 6.0 Hz) and (ii) the AM regularity to generate a variety of signals with varying AM envelopes. The AM envelopes were synthesized with an identical broadband low-noise noise carrier sound. Each stimulus is an amplitude modulated noise excerpt with manipulated AM features. More than 300 participants have taken part in two online behavioral experiments. On each trial, they listened to one of the generated stimuli and were prompted to make a binary judgment on whether it sounds more like a “speech” or a “music” recording. The preliminary results support the hypothesis that, across participants, the sound excerpts with slower peak AM rate and more temporally regular AM were more likely to be judged as music. These factors appear to have around 36-50% of explanatory power, suggesting that the amplitude envelope alone is essential to differentiate speech and music. Furthermore, the stronger the music-speech/slow-fast AM association is correlated with higher musical sophistication of the participants. To the best of our knowledge, this is the first study showing that the AM temporal features are critical low-level factors of determining a sound to be interpreted as speech or music.