This collection explores the pervasive variability in speech prosody and its role in linguistic representation and linguistic processing. It contains papers presented at the Third Experimental and Theoretical Approaches to Prosody workshop (supported by an NSF Workshop grant; BCS-1451751), as well as submitted papers reflecting the themes of this workshop. This collection was edited by Mara Breen, Chigusa Kurumada, Michael Wagner, Duane Watson, and Kristine M. Yu.


The papers in this special collection all focus on the question of prosodic variation, and demonstrate how language experience predicts such variation, and how previously unexplained prosodic variation can be explained by new ways of understanding the representations underlying prosodic structure. Included in this collection are discussions of how variability interacts with linguistic structure, linguistic planning, individual differences, and language comprehension.

Yu and Stabler demonstrate that what appears to be random variability in the use of high tones in Samoan can be explained through the postulation of a more complex, but systematic, relationship between morphosyntactic structure and tonal structure. This work further specifies the nature of the relationship between syntactic structure and phonological structure.

The process of planning linguistic structure can lead to variability in the prosodic signal. Evidence for this comes from work by Tanner et al., who present statistical analyses of new data from a corpus of spontaneous speech. They look at the factors affecting variability in coronal stop deletion—a classic case of variability in phonological processes. One of the main determinants of this process is the following phonological context, and more specifically whether the upcoming work begins with a vowel or a consonant, or whether a pause follows. The hypothesis they explore is that the effect of the phonological context is modulated by the locality of production planning: Speakers do not reliably plan out phonological and phonetic detail beyond the following word in advance. Similarly, Zerkle et al. investigate gradient effects of accessibility and predictability on prosodic realization, and how it relates to production planning. Previous research has explored whether thematic roles influence the choice between pronouns and full noun phrases. The present paper explores whether this factor can account for some of the acoustic variability in the prosodic realization of full noun phrases, and if so which mechanism is responsible for such effects. The evidence suggests that referring expressions with antecedents whose thematic role makes the coreference between the two expressions more expected are acoustically reduced. A correlation with utterance initiation time suggests that the underling mechanism of this effect is production difficulty: Production difficulty appears to decrease with the accessibility of the antecedent.

Of course, some aspects of prosodic variability are likely due to individual differences in experience. Boll-Avetsiyan et al. investigate individual variability in rhythmic grouping across German speakers, and demonstrate that this variability can be predicted, in part, by the participants’ musical experience. This result demonstrates overlap in rhythmic processing between speech and music. Along similar lines, Warren examines the role of language change on prosodic variability. Warren examines listeners’ sensitivity to variability in prosody and to an ongoing merger of vowel realizations in New Zealand English. Results from his mouse-tracking experiments reveal that listeners are exquisitely sensitive to the socially conditioned variability in realizations of diphthongs and modulate their interpretations of utterance final boundary tones. This finding illuminates roles of rich social knowledge aiding accurate and robust prosodic comprehension.

Finally, one of the largest challenges for the field is understanding how listeners are able to process prosody in spite of variability in the signal. Ito et al. present data from a study designed to investigate the production of contrastive accents by a naïve, and highly heterogeneous, participant pool. They show that, despite individual variability, these unconstrained productions lead to similar anticipatory looking patterns as highly constrained laboratory productions, demonstrating fundamental agreement across listeners on intonational cues for contrast. Roy et al also examine the ways in which naïve listeners process the prosodic signal. They present a large-scale, internet-based, survey to elicit naïve listeners’ judgments on prosodic prominence and boundaries. They ask how well traditional analyses based on raw acoustic features as well as experts’ annotations can predict naïve listeners’ judgments. The data and statistical analysis showcase critical questions about effects of individual differences in the perception of prosody, which could not be addressed without the technical innovation.