Producing and perceiving speech involves the parallel transmission of numerous types of signs or categories, both linguistic (e.g., words and their constituent consonants and vowels) and indexical (social class, regional affiliation, gender etc.). The production of speech also involves a coordinated activity of some hundred muscles per second that is adapted to speaking and situational contexts. While it has long become clear that the linguistic and social as well as the cognitive and physical aspects of speaking are tightly intertwined, quite how these multiple layers of semiotic and signal aspects of speech are connected and how those connections may be manifested differently in the world’s languages and cultures remains poorly understood. The aim of the special collection is to advance the discussion of these issues by bringing together papers from various research areas that model the association between discrete categories and continuous speech dynamics in both normal and disordered speech, between variability and abstraction, and between indexical and linguistic information.
The idea for an open call leading to seven papers published in this special collection stems from a workshop on Abstraction, Diversity, and Speech Dynamics hosted by the University of Munich in May 2017 which focused on such themes of how the lexicon, phonology, sociophonetic information, and speech signals are interconnected. This workshop was part of a two-year Research Focus at the Center for Advanced Studies of the University of Munich entitled “How Words Emerge and Dissolve.”
The focus in some of the papers in the special collection is on the mechanisms by which linguistic propositional and social information interact in speech signal dynamics. Zellou and Pycha’s (2018) study is about the acoustic information in a vowel for both a following labial consonant and the physical height of the talker. Babel, Senior, and Bishop (2019) analyzed whether listeners’ adaptation to a new spoken accent was influenced by the pleasantness of a speaker’s voice. Bradlow, Blasingame, and Lee (2018) conducted a sentence recognition experiment in order to determine how first and second language processing are affected by speaker intelligibility. Levy and Hanulíková (2019) asked how variable input due to word-frequency, dialect, and foreign accent affects school-aged children’s vowel production.
All four studies are compatible with the idea that certain types of speaker information— specifically those concerned with physical height, pleasantness, intelligibility, type of accent —are processed independently of the mechanisms linking phonological knowledge to speech signals, even though some of the cues are shared in the acoustic domain. Thus the new finding in Zellou and Pycha (2018) is that the duration between the coarticulatory source (in this case a post-vocalic labial consonant) and the effect (of F2-lowering in the preceding vowel target) has an influence on the perception of both vowel quality and talker height. But there is no suggestion in their study that these phonological and speaker judgments influenced each other in perception. Indeed as Zellou and Pycha (2018) suggest, whereas compensating for the labial-induced F2-lowering is likely to depend on dynamic VC relationships that have presumably been built up through experience of communicating with several speakers, adjusting for F2-lowering due to physical height draws upon quite different knowledge accumulated after exposure to extensive intervals of speech of a particular speaker. Relatedly, Babel et al. (2019) suggest that social biases do not necessarily warp perception but exert their influences post-perceptually: They found that the pleasantness of the voice has little influence on listener adaptation to a new accent. Levy and Hanulíková (2019) show that vowel production by school-aged children was unaffected by whether they grow up bilingually or in a monolingual German home. Rather, variability in their own production was predicted by the variability across both foreign and regional accents to which they were exposed. Bradlow et al. (2018) show that if a speaker is more intelligible than another in L1 then, assuming equivalent linguistic proficiency, that speaker will also be more intelligible than the other speaker in L2. This leads them to suggest that speaker intelligibility is a long-term production setting that operates independently of how the propositional linguistic content in either language is processed.
But of course, the linguistic propositional content and social information can and do become attached to each other. Needle and Pierrehumbert (2018) document several such instances at the phonetic level in which variants are indexical. As far as markers of gender are concerned, their study shows that these can attach not just to phonemes but also to entire words and to their morphological constituents. They use large existing written corpora to establish for words and suffixes a so-called gender bias, that is, the probability of words and morphological suffixes being used by male or female authors. In their experiment, they asked participants to choose one person from six photographs that they thought was most likely to have written a word or non-word that was simultaneously visually presented. Their results showed that the participants’ judgements correlated with gender bias for words and for morphemic constituents only in non-words. The findings are consistent with the idea that linguistic and social information is stored in memory and that language users gain knowledge of gender-bias in words through experience. As Babel et al. (2019) note, young children exhibit gender bias in their productions. Needle and Pierrehumbert’s (2018) study suggests that this might come about not just because they copy gender-specific attributes of speech sounds, but also because they learn that using certain words (perhaps in particular contexts) is linked to gender. That this might be possible is because children are typically initially exposed to more frequent words in which, as Needle and Pierrehumbert (2018) show, gender bias is more likely to be marked (all but disappearing for infrequent words).
Needle and Pierrehumbert (2018) find that their participants were sensitive to the gender bias in morphemes only in non-words but not in real words. Thus as the authors suggest, the gender-bias in a derived non-word like glonitis is established analogically to similar words like arthritis and bronchitis, rather than from any gender bias that inheres in the suffix. This result is consistent with other findings (e.g., Hay & Baayen, 2005) showing that morphological generalizations emerge by analogy across words (as opposed to words being derived by combinatorial rules applied to morphemes).
Both Carignan (2018) and Zellou and Pycha (2018) take up the idea that perceived ambiguity about how phonetic events are parsed phonologically is a potential path to sound change. The idea that sound change is as much in the ear of the listener as in the mouth of the speaker has a long history (see Beddor, 2009, for details) and forms a central part of current computational models of sound change (e.g., Todd, Pierrehumbert, & Hay, 2019). According to Ohala (1993), sound change can derive from a listener error in which a coarticulatory effect (e.g., vowel nasalization) is no longer associated, or parsed, with its source (a following nasal consonant). In Beddor (2012), there is by contrast no listener error: Sound change emerges instead from the same flexibility that listeners have to weight acoustic cues differently. Thus some listeners might pay more attention to nasalization in the vowel and others more to the presence of the following nasal consonant. The path to sound change for Beddor (2012) is when such related sets of cues enter into a trading relationship. Carignan (2018) finds that his results are more consistent with Beddor’s (2012) than with Ohala’s (1993) model. Using a combination of ultrasound, acoustic nasalence, and electroglottographic techniques, Carignan (2018) analyzed English speakers’ imitations of oral and nasal vowels produced by speakers of Southern French in which the oral/nasal vowel distinction is phonologically contrastive. The nasal vowels were found to be acoustically quite similar for both groups of speakers. However, the English imitators produced open nasal vowels with a much greater adjustment to tongue height. Thus compatibly with Beddor (2012), nasalence and tongue height traded in vowel nasalization: Whereas for the French speakers nasalence was weighted to a greater extent than tongue height changes, in the English imitators it was the other way round. Carignan’s (2018) study also provides a laboratory-based explanation for the well-known finding in several languages that nasal vowels are typically phonetically raised relative to their open oral counterparts.
The type of ambiguity investigated in Zellou and Pycha (2018) is relevant to explaining the many instances reviewed in Needle and Pierrehumbert (2018) of sound changes by which phonetic variants also become indexical markers. Zellou and Pycha (2018) show that it becomes harder in perception to attribute a coarticulatory effect to its source when the interval between effect and source is large. They investigated talker height and vowel backing but their results might perhaps also carry over to processing gender in, for instance, anticipatory V1CV2 coarticulation in which the temporal interval between the V2 source and the coarticulatory effect on V1 is also large (being separated by an intervening C). As is well-known, men and women often scarcely differ in high back vowels but do show quite marked differences in high front vowels like /i/ (Fant, 1973). Therefore, V2 = /i/ is likely to cause not just coarticulatory fronting but also the perception of gender differences in V1 = /u/. If /i/ is weakened or—following Zellou and Pycha (2018) —is interpreted by the listener as being temporally too distant from /u/, then gender might become indexical as /u/-fronting is phonologized.
Another theme that is addressed by several papers (Carignan, 2018; Hermes, Mücke, Thies, & Barbe, 2019; Zellou & Pycha, 2018) is how dynamic events in the speech signals are related to phonological structures. These papers also share the viewpoint that the signal-phonology mapping can be more difficult for some phonetic events than for others and that such difficulties can lead either to sound change (see also Browman & Goldstein, 1991, for an interpretation in terms of articulatory phonology) or, as shown in Hermes et al. (2019) for the case of essential tremor patients, to an aberration in the motor commands that control how a sequence of consonants is synchronized with a following vowel within a syllable. The focus in Hermes et al. (2019) is on a comparison using electromagnetic articulography of the production of nine healthy subjects with nine patients who suffer from a condition known as essential tremor. When deep brain stimulation was used to control the tremor, patients’ speech was detrimentally affected and described as dysarthric. The control speakers and patients produced various CV and CCV syllables in real words. Their predictions are that the patients should be more likely to show an aberrant pattern of coordination in CCV than in CV. This is because the coordination patterns of C1C2V can be language-specific which, according to Hermes et al. (2019), are non-innate and have to be learned: For example, in some languages, C2 is synchronized more closely with the following V whereas in others this is not the case. The consonant and vowel in CV syllables are assumed on the other hand to be produced universally with an in-phase coupling in which the articulatory gestures of the consonant and vowel are synchronous. Compatibly with the prediction, Hermes et al. find no timing differences between the patients and control in CV sequences. For CCV, the control subjects and patients exhibited partially different production patterns with the patients’ pattern suggesting difficulties in the articulation of successive consonants. A consequence of this abnormal synchronization was a production of C2 which was unusually lengthened and variable, which Hermes et al. interpret as compensatory behaviour.
In summary, the seven papers are collectively both broad in scope and go to heart of so many active issues in laboratory phonology including how social information (Babel et al., 2019; Zellou & Pycha, 2018) and foreign accent (Bradlow et al., 2018; Levy & Hanulíková, 2019) are to be incorporated into phonological processing, the way that social and linguistic information become attached to each other (Needle & Pierrehumbert, 2018), the mechanisms by which sound change (Zellou & Pycha, 2018) and phonologization (Carignan, 2018) emerge, and how phonological signatures in articulatory patterns can be compromised in speech disorders (Hermes et al., 2019).
We gratefully acknowledge the support of the Center for Advanced Studies, Ludwig-Maximilians-University Munich and of the German Research Council (grant number: PO1269/4-1).
The authors have no competing interests to declare.
Babel, M., Senior, B., & Bishop, S. (2019). Do social preferences matter in lexical retuning? Laboratory Phonology: Journal of the Association for Laboratory Phonology, 10(1): 4, 1–22. DOI: https://doi.org/10.5334/labphon.133
Beddor, P. (2009). A coarticulatory path to sound change. Language, 85, 785–821. DOI: https://doi.org/10.1353/lan.0.0165
Beddor, P. (2012). Perception grammars and sound change. In M.-J. Solé & D. Recasens (Eds.), The Initiation of Sound Change. Perception, Production, and Social Factors (pp. 37–55). John Benjamin: Amsterdam. DOI: https://doi.org/10.1075/cilt.323.06bed
Bradlow, A., Blasingame, M., & Lee, K. (2018). Language-independent talker-specificity in bilingual speech intelligibility: Individual traits persist across first-language and second-language speech. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 9(1): 17, 1–20. DOI: https://doi.org/10.5334/labphon.137
Browman, C., & Goldstein, L. (1991). Gestural structures: Distinctiveness, phonological processes, and historical change. In I. Mattingly, & M. Studdert-Kennedy (Eds.), Modularity and the Motor Theory of Speech Perception (pp. 313–338). Erlbaum: New Jersey.
Carignan, C. (2018). Using naïve listener imitations of native speaker productions to investigate mechanisms of listener-based sound change. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 9(1): 18, 1–31. DOI: https://doi.org/10.5334/labphon.136
Hay, J., & Baayen, R. (2005). Shifting paradigms: Gradient structure in morphology. Trends in Cognitive Sciences, 9(7), 342–348. DOI: https://doi.org/10.1016/j.tics.2005.04.002
Hermes, A., Mücke, D., Thies, T., & Barbe, M. (2019). Coordination patterns in Essential Tremor patients with Deep Brain Stimulation: Syllables with low and high complexity. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 10(1): 6, 1–20. DOI: https://doi.org/10.5334/labphon.141
Levy, H., & Hanulíková, A. (2019). Variation in children’s vowel production: Effects of language exposure and lexical frequency. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 10(1): 9. DOI: https://doi.org/10.5334/labphon.131
Needle, J., & Pierrehumbert, J. (2018). Gendered associations of English morphology. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 9(1): 14, 1–23. DOI: https://doi.org/10.5334/labphon.134
Todd, S., Pierrehumbert, J., & Hay, J. (2019). Word frequency effects in sound change as a consequence of perceptual asymmetries: An exemplar-based model. Cognition, 185, 1–20. DOI: https://doi.org/10.1016/j.cognition.2019.01.004
Zellou, G., & Pycha, A. (2018). The gradient influence of temporal extent of coarticulation on vowel and speaker perception. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 9(1): 12, 1–24. DOI: https://doi.org/10.5334/labphon.118