My Account

Poster C59, Wednesday, August 21, 2019, 10:45 am – 12:30 pm, Restaurant Hall

"Mean Speech Rate" Doesn't Mean Much: Analysis of Speech Rhythm Benefits from Quantising Inter-Onset Intervals

Alexis MacIntyre1, Sophie Scott1, Ceci Cai Qing1;1University College London

Speech rhythm describes the temporal patterns and structure that emerge as speech unfolds in time, and is usually analysed in terms of recurring units, such as syllables; however, how to characterise and compare speech rhythms—within and across languages—is a controversial topic, partly because no recurring unit is known to be isochronously timed (equal in duration). The apparent irregularity of natural speech challenges theories of neural entrainment as a mechanism facilitating speech perception, given that simplistic models of entrainment oblige at least a quasi-periodic function. Despite this incongruence, speech rhythm is typically quantified using a mean speaking rate or inter-onset interval (IOI) calculated as syllables per second, yet, closer examinations of actual speech data reveal that the arithmetic mean can be a poor model. The current project addresses this limitation by investigating the viability of quantising IOI derived from vowels and stressed vowels across two relatively unrelated languages, English and Mandarin, collected under a variety of speaking conditions chosen for their differing rhythmic qualities. Rather than treat IOI as a normal distribution of random continuous durations, this approach attempts to represent speech rhythm as a mixture of discrete, recurring values generated from short sequences of speech, thereby preserving some of the local-sequential information lost in aggregate statistics. Moreover, quantifying how well individual values are represented by multiple modal peaks, rather than a single mean, may shed light on how it is that listeners report the percept of temporal regularity in speech, despite a current lack of evidence for its physical correlate. Finally, in addition to acoustic data, the timing of respiratory kinematics is also measured via inductance plethysmography, providing a complementary signal directly relating to processes that are largely inaudible, but nonetheless essential to speech production. Speech was segmented according to inhalation cycles. Model goodness of fit and comparison with traditional techniques indicate that ecologically valid speech is best described in more complex terms than simple isochrony-based explanations allow, and that, together with breathing effort, the interpretation of both vowel and stressed vowel IOI illustrate rich rhythmic similarities and differences across English and Mandarin. For example, in the case of stressed vowel IOI, sequence mode values for both languages hovered consistently close to 380 milliseconds (ms) across speakers (n = 8, range = 350-400), describing nearly a third of the data to within a threshold of 20 ms; in contrast, the sequence arithmetic mean (450 ms) failed to capture 10% of real values, and varied comparatively more by speaker (range = 400-490 ms). Shuffling stressed vowel IOI into pseudo-sequences resulted in significantly fewer data points that fell within an arbitrary threshold of .05 of the modal peak (t(8,568) = 5.37, p < .001), suggesting local temporal dependencies may be lost when data is pooled across longer timescales. More detailed results are discussed with a view to the role of regularity in speech timing, cross-linguistic comparisons, and possible implications for future studies of neural speech entrainment.

Themes: Speech Motor Control, Prosody
Method: Behavioral

Back