1 Introduction

The investigation of coarticulatory effects, that is, the overlap of articulatory units in spoken language, served as a window to speech planning and execution mechanisms in adults over the last 60 years (for a review see Recasens, 2018). However, only in the last decade, non-invasive measurement techniques such as ultrasound tongue imaging were administered to young children and hence shed new light on speech motor developments as well as their interactions with cognitive aspects relevant for speech production (e.g., Barbier et al., 2020; Ménard & Noiray, 2011; Noiray, Ménard, & Iskarous, 2013; Song, Demuth, Shattuck-Hufnagel, & Ménard, 2013; Zharkova, 2017; Zharkova, Hewlett, & Hardcastle, 2011). The present study focuses on the development of lingual carryover coarticulation across childhood, the overlap of a speech segment with following ones after its target was reached. While anticipatory coarticulation has often been described as a sign of speech planning, carryover coarticulation was ascribed to mechanical inertia constraints (e.g., Recasens, 1984b) and was therefore largely understudied. We suggest that both anticipatory and carryover coarticulation are the consequence of the overlap of gestural activation. The parallelism of the development of carryover coarticulation as found in the present study and anticipatory coarticulation as previously reported, provides evidence for this hypothesis.

1.1 The development of anticipatory coarticulation

In our previous kinematic analyses of lingual anticipation of a stressed vowel in German children, we provided evidence for a developmental decrease of coarticulation degree in intrasyllabic vowel-to-consonant coarticulation (Noiray, Abakarova, Rubertus, Krüger, & Tiede, 2018), in intersyllabic vowel-to-vowel coarticulation (Rubertus & Noiray, 2018), as well as in the temporal unfolding of the vocalic gesture within the left field of an utterance of the form ə.CV.Cə (C-consonant, V-vowel; Noiray, Wieling, Abakarova, Rubertus, & Tiede, 2019b). This finding is in line with several previous investigations on intra- (e.g., Katz, Kripke, & Tallal, 1991; Kent, 1983; Zharkova, Hewlett, & Hardcastle, 2012) and intersyllabic coarticulation (e.g., Goodell & Studdert-Kennedy, 1993; Nijland et al., 2002; Nittrouer, 1993; Nittrouer, Studdert-Kennedy, & Neely, 1996), but contrasts with others that found an increasing degree of coarticulation with age (intrasyllabic: Nijland et al., 2002; Nittrouer, Studdert-Kennedy, & McGowan, 1989; Nittrouer et al., 1996; intersyllabic: Barbier et al., 2020; Hodge, 1989; Repp, 1986).

1.2 The development of carryover coarticulation

Carryover coarticulation in children’s speech has been investigated in only very few studies that focused on different speech articulators. Neither Flege (1988), who examined nasal coarticulation, nor Goffman, Smith, Heisler, and Ho (2008), who focused on labial coarticulation, provide systematic evidence for a developmental decrease in carryover coarticulation degree. The only study addressing children’s lingual carryover coarticulation we know of is Baum and Waldstein (1991). Using three different types of measures, they compared coarticulation degree in VC syllables (/iʃ, uʃ, it, ut, ik, uk/) between English-speaking hearing-impaired and age-matched normally hearing children in two age groups: six to seven and nine to ten years of age. No difference between the age groups was found within the cohorts in any of the measures, so they were grouped in the analysis. The first measure of consonant durations did not differ significantly between the normally hearing and the hearing-impaired group. The measure of mean centroid values (in fricatives and stop bursts) demonstrated stronger carryover coarticulation in normally hearing as compared to hearing-impaired children at consonant onset. At consonant midpoint, however, both cohorts exhibited the same amount of coarticulation based on this measure. In the syllables /iʃ/ and /uʃ/ the third measure of F2 peaks at vowel offset and fricative revealed a higher coarticulation degree in normally hearing than in hearing-impaired children again. The authors concluded that it is not the temporal domain of carryover coarticulation but its magnitude within this time frame that differs between the two cohorts. Interestingly, measures of anticipatory coarticulation in the same group of children (Waldstein & Baum, 1991) had indicated shorter temporal domains of anticipation for hearing-impaired than normally hearing children. Baum and Waldstein (1991) interpreted this discrepancy as well as an overall larger degree of carryover compared to anticipatory coarticulation as evidence for different mechanisms underlying the two coarticulatory directions. According to the authors, a significant age difference found in anticipatory but not in carryover coarticulation may either be due to the close and relatively advanced ages studied or provide additional support for carryover coarticulation to depend on mechanical-inertial properties that need not be learned.

1.3 Decrease of coarticulation as compression of vocalic activation curves

According to the broad framework of articulatory phonology (Browman & Goldstein, 1986), articulatory gestures have invariant goals and are planned and phased to each other context-independently. In contrast to suggestions that context-dependency is part of the speech plan and actively changes articulatory goals (e.g., Henke, 1966; Keating, 1988; Wickelgren, 1969), articulatory phonology interprets contextual variation to be introduced only upon execution by the blending of individual gestures’ influences on the vocal tract with those of other ongoing ones (e.g., Fowler, 1980; Fowler & Saltzman, 1993; Gafos & Goldstein, 2012). Here, coarticulation is seen as the coproduction of invariant articulatory gestures. The more the activation of gestures overlaps, the more coarticulation may take place. The higher degree of anticipatory coarticulation in children than in adults (e.g., Noiray et al., 2018; Noiray et al., 2019b; Rubertus & Noiray, 2018) can therefore be interpreted as greater overlap of vocalic gestures with preceding ones in the young age. The developmental decrease in coarticulation would in turn be a developmental compression of vocalic activation curves (cf. Nittrouer, 1993; Noiray et al., 2019b).

Following Nittrouer (1993, p. 961), the sketch of the prominence, that is, the strength of activation, of an utterance’s segments over time in the style of Fowler and Smith (1986), in Figure 1 illustrates the larger overlap of articulatory gestures for neighboring segments that would result from broader vocalic activation curves in children’s (left side) than adults’ speech (right side). The segment with the highest prominence at a given time point dominates the acoustic signal. Changes in the dominance and therefore acoustic segmentations within the utterance are indicated by vertical lines.

Figure 1
Figure 1

Segments’ hypothesized prominence over time in utterances of the form əCVCə.

A reason for children’s vocalic activation curves to be broader than adults’ may be the attractor or anchor function that multiple findings in language development ascribed to stressed vowels. Cutler and Mehler (1993) for example, suggested that infants have a periodicity bias leading them to attend more to vowels than to consonants in the acoustic signal. This could in turn be one reason why native phonological categories for vowels are constituted earlier in development than for consonants (Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992; Werker & Tees, 1984). The information carried by vowels and consonants was also suggested to differ: While vowels carry phonetic as well as prosodic information relevant for rhythm and syntax, consonants’ information is mainly lexical (Nespor, Peña, & Mehler, 2003). Young children were shown to focus on the vowel-inherent prosodic information to bootstrap the segmentation of first words (Gleitman & Wanner, 1982). Höhle et al. (2009) for example, provided evidence that young German-learning infants scan their input for stressed syllables to find trochaic patterns as a first strategy to detect words in the continuous signal. Also, in speech production, young children tend to reduce first words to the stressed CV syllable or a trochaic pattern (in German, e.g., Fox & Dodd, 1999). Fowler (1980) highlights the role of stressed vowels in the coproduction of speech segments and claims that not only consonants but also unstressed vowels are “superimposed on a trajectory of the shape of a vocal tract from one stressed vowel to another” (p. 131). This subsumption of segments in frames of stressed vowels might be responsible for the stress-timed speech rhythm in languages like for example English and German (Fowler, 1981)—a property of the speech signal that already newborns are very attentive to (Nazzi, Bertoncini, & Mehler, 1998).

An important consequence that the hypothesis of generally broader vocalic activation curves in the young age bears, is that children’s vowels would not only overlap more with preceding speech segments but, as visualized in Figure 1, larger overlaps would be predicted in the right field of the utterance as well. Whether this is the case has never been explicitly tested because the focus of coarticulation development studies remained in the anticipatory direction.

1.4 The role of the articulatory demands of combined segments

In addition to the general decrease of gestural overlap, the role of articulatory parameters of the combined segments for coarticulation development must be considered. The simple overlap of gestural activation does not correspond one-to-one to the amount of coarticulation found in spoken utterances. During the execution of the context-insensitive speech plan, the different parameters of the coproduced gestures, most importantly their degree of coarticulatory resistance (Bladon & Al-Bamerni, 1976; Recasens, 1984a) and the corresponding vocal tract configurations affect the degree of gestural blending. A consonant that is highly resistant to coarticulation, for example the alveolar plosive /d/, employs the tongue body that is relevant for vowel production in a rather constrained way for its own production and therefore interrupts vocalic movements. A bilabial, on the other hand, does not share the primary articulator with vowels and is therefore produced without affecting the tongue body movement necessary for the vowel trajectory. A third case is comprised of velar consonants that share the primary articulator with the vowel but are blended with the vocalic production requirements resulting in different points of palatal contact depending on the frontness of the surrounding vowels. Effects of consonants’ coarticulatory resistance on vocalic coarticulation were widely demonstrated in adults (e.g., Fowler & Saltzman, 1993; Iskarous, Fowler, & Whalen, 2010; Recasens, 1985; Recasens & Rodríguez, 2016). During language development, how strongly a given consonant clamps the tongue dorsum and how much coarticulation it therefore allows, may change with a growing control over the functional subparts of the tongue. For intrasyllabic anticipatory coarticulation, Noiray et al. (2018) found adult-like coarticulation hierarchies of /b/>/g/>/d/ in children from three to seven years of age. However, CV syllables are the fundamental syllables that are best practiced in early childhood (e.g., Fikkert, 1994); different patterns may therefore be found in gestural combinations other than CV syllables and in carryover coarticulation.

1.5 A dichotomy of underlying mechanisms?

Many authors describe a dichotomy of underlying mechanisms for anticipatory and carryover coarticulation: While the former is described as part of a speech plan, the latter is attributed to mechanical inertia constraints. Recasens (1984b, 1987) and Parush, Ostry, and Munhall (1983) for example, provided data from Catalan and English VCV sequences, respectively, suggesting that while the consonant’s coarticulatory resistance affects the temporal extent of anticipatory coarticulation, it is the spatial extent of carryover coarticulation that is affected. They interpreted this as evidence for active speech planning controlling the amount of anticipatory but not that of carryover coarticulation with reference to the articulatory requirements of the intervocalic consonant. In German, Hertrich and Ackermann (1995) found that vocalic carryover but not anticipatory effects were smaller in slower speaking rates. According to the authors, stable or increased anticipatory effects in slow speaking rates are not compatible with a view of simple coproduction but indicate a (speaker-specific) planning component in anticipatory coarticulation. The decrease of carryover coarticulation however, was interpreted to suggest that planning processes might be less relevant for this coarticulatory direction.

In a pure coproduction framework on the other hand, the overlap of context-independent articulatory gestures can account for both anticipatory and carryover coarticulatory effects. In their comparison of empirical and modeling data, Ostry, Gribble, and Gracco (1996) for example, provide evidence that coarticulation in jaw movements is not centrally planned but arises as a by-product of execution. If there is no active planning of context effects, there is no reason to assume different mechanisms underlying the two coarticulatory directions.

1.6 What we can learn from carryover coarticulation development

If anticipatory and carryover coarticulation embody a common organization and it is indeed the width of activation curves that changes across childhood, carryover coarticulation should develop in parallel with anticipation and decrease with age. If, however, different mechanisms underlie the two coarticulatory directions, the developmental differences found in lingual anticipatory coarticulation may be absent in lingual carryover coarticulation. Under the hypothesis of a dichotomy of origins for the two coarticulatory directions, it was for example suggested that inertial properties of muscles in contrast to planning processes need not to be learnt (Baum & Waldstein, 1991; Flege, 1988). Following this idea, a developmental change of coarticulation degree would be expected for anticipation but not for perseveration. Investigating the development of carryover coarticulation across childhood may therefore provide additional support for one or the other assumption on the speech production mechanism.

1.7 Research questions and predictions

There is growing evidence that in the course of speech development, children’s degree of lingual anticipatory coarticulation progressively decreases. Regarding carryover coarticulation, however, data are scarce, and predictions differ based on the theoretical framework. The present study aims to provide the first large-scale kinematic investigation of children’s carryover coarticulation development. It builds upon previous findings of the same research group to test the hypothesis of a developmental decrease in lingual carryover coarticulation.

Our goal was to provide answers to the following two questions:

1. Does the degree of carryover coarticulation decrease with increasing age as we found for anticipatory coarticulation?

Contextualizing the principles of articulatory phonology and the coproduction model of adult speech production to children’s development of coarticulation, we hypothesized the underlying vocalic activation curves of children’s speech to be generally broader than adults’ which results in more overlap of a vowel with preceding as well as following gestures. We therefore predicted a decreasing degree of carryover coarticulation with increasing age. In light of differing findings of non-linear developments across age depending on the type of coarticulation studied, we did not make specific predictions about plateaus or spurts within this decrease.

2. Do the articulatory demands of the following consonant impact the perseveration of the vocalic gesture?

Based on previous findings including ours, we hypothesized that the degree of consonants’ coarticulatory resistance affects the degree of vocalic carryover coarticulation significantly. Consonants posing strong and specific articulatory demands on the tongue dorsum (e.g., /d/) were therefore predicted to be more intrusive on the vocalic gesture than those that can blend their gestural goals with the vowels’ (e.g., /g/) and those that do not employ the tongue dorsum as a primary articulator (e.g., /b/). Since the balance between clamping and blending depends on a fine speech motor control, we expect developmental changes in the role of the consonant for coarticulation degree.

2 Method

2.1 Participants

Possibly non-linear cognitive as well as speech motor control developments occur in children before they enter school (Green, Nip, & Maassen, 2010; Noiray et al., 2019a). Therefore, we tested three age cohorts of preschool children in yearly increments, a cohort of first graders, and a group of adults, summing up to a total of 75 participants: 19 three-year-old children (10 females [f] and nine males [m], age range: 3;05–3;09 [Y;MM], mean: 3;06), 14 four-year-old children (seven f and seven m, age range: 4;04–4;08, mean: 4;05), 14 five-year-old children (seven f and seven m, age range: 5;04–5;07, mean: 5;06), 15 seven-year-old children at the end of their first or beginning of their second grade in primary school (10 f and five m, age range: 7;00–7;06, mean: 7;02), and 13 adults (seven f and six m, age range: 19–28 years, mean: 23). None of the participants reported any language-, hearing-, or vision-related problem and all were monolingual German. Adult participants and parents of child participants gave written informed consent for participating in the study while children gave oral consent. It was emphasized that they could interrupt or abort the recording session for any reason at any time. The study was approved by the Ethic Committee of the University of Potsdam (DFG project 1098).

2.2 Stimulus Material

Previously recorded disyllabic pseudowords with a trochaic stress pattern spoken by a native German female adult speaker served as model stimuli for a repetition task. They consisted of the consonants /b, d, g/ and the vowels /i, y, e, a, o, u/ in the form consonant1-vowel-consonant2-schwa (C1VC2ə) where C2 never equaled C1. Consonants were chosen to bear different degrees of lingual coarticulatory resistance. Vowels were chosen to represent the full front-to-back range of the German vowel space. Each pseudoword was recorded together with the German female article /aɪnə/ resulting in short utterances such as /aɪnə bi:də/. Vocalic carryover effects were measured at four different time points within C2 and the final schwa.

The crossed set of consonants and vowels resulted in 36 target words that were repeated at least three times in the test phase summing up to a total of 108 trials. For four- and seven-year-olds, and adults, additional stimuli were recorded that are not part of the present analysis. An overview of each cohort’s stimulus sets is presented in Table 1. The age cohorts will be referred to as C3 (three-year-olds), C4 (four-year-olds), C5 (five-year-olds), C7 (seven-year-olds), and A (adults) in the rest of the paper.

Table 1

Overview of each age cohort’s stimulus sets. Stimulus words had the form C1VC2ə with V=/i, y, e, a, o, u/. The total number of trials results from at least three repetitions of each stimulus word during the recording.

C3 C4 C5 C7 A
Consonant1 /b, d, g/ /b, d, g, z/ /b, d, g/ /b, d, g, z/ /b, d, g, z/
Consonant2 /b, d, g/ /b, d, g/ /b, d, g/ /b, d, g/ /b, d, g, z/
Nr. of stimulus words 36 46 36 46 72
Total nr. of trials 108 138 108 138 216

For all children, stimuli were presented in six blocks while adults’ increased stimulus set required nine blocks. The order of blocks was randomized for each participant, and trials within each block appeared in one of three random but pre-specified orders. We opted for this semi-randomization to be able to quickly take notes on specific trials, for a better synchronization between both experimenters and to allow a semi-automatic phonetic labeling procedure for adults’ data. During the recording, the experimenter made a note of mispronounced trials and played those again at the end of the block. Table 2 summarizes the number of trials used for the present analysis per C2 per age cohort.

Table 2

Summary of the number of analyzed trials per consonant context and age cohort.

Consonant Context Number of trials
C3 C4 C5 C7 A
Vbə 516 555 529 674 477
Vdə 463 542 503 647 479
Vgə 526 571 540 624 483
Total 1505 1668 1572 1945 1439

2.3 Experimental procedure

Participants, both children and adults, were asked to repeat acoustically presented stimuli within the SOLLAR platform (Sonographic and Optical Linguo-Labial Articulation Recording system, Noiray et al., in press) at the Laboratory for Oral Language Acquisition at the University of Potsdam (Germany). The SOLLAR platform provides a child-friendly environment allowing for simultaneous recordings of tongue motion (ultrasound imaging: Sonosite, sampling rate: 48Hz), labial movement (video recording: SONY camera, sampling rate: 50Hz), and the acoustic signal (Shure microphone, sampling rate: 48kHz). The ultrasound probe was fixed in a custom-made probe holder providing flexibility in the vertical dimension to follow the natural jaw movements but being rigid in lateral and horizontal translations. Participants sat in a comfortable chair adjustable in height and their head was positioned such that the probe touched their chin between the maxillary bones to record the tongue surface contour in the midsagittal plane. Intending to make the platform as child-friendly as possible and to allow relatively natural speech, no additional head-to-probe stabilization was employed. Instead, a visual attention-getter (a glittering golden star) and if necessary, the experimenter, helped especially the young participants keep their head stable and look straight towards the camera. Trials during which participants moved their head were discarded post-hoc via visual inspection of the video data.

During each recording session two experimenters were present. One experimenter’s first task was to make the participant feel comfortable. She familiarized the participant with the SOLLAR platform and introduced the children to a universe-themed story the repetition task was embedded in. Children were told to fly from one planet to the other in the SOLLAR spaceship and repeat foreign words from other planets’ languages. Between the blocks, children took a break and were distracted with a little sticker task. This game and the decoration stimulated their interest and engagement in the task. Adult participants were not introduced to the planet story but fulfilled the same task on the same type of stimuli in the same setup as the children to ensure comparability. During each recording block, the experimenter prompted the audio stimuli while maintaining a face-to-face connection with the participant and controlling for head stability and correct pronunciation. The second experimenter operated SOLLAR’s recording equipment from a desk not visible to the participant. S/he thoroughly monitored both video and the audio streams to control the data quality.

2.4 Data processing

The acoustic signal was recorded both in relation to the ultrasound signal and the video, enabling the generation of a common time code for the three streams. Using a cross-correlation function within MATLAB (2016), the streams were then synchronized (cf. Noiray, Cathiard, Ménard, & Abry, 2011; Noiray, Ménard, & Iskarous, 2013).

Correctly pronounced target utterances were first phonetically labeled in the acoustic signal using Praat (Boersma & Weenink, 2016). In adults’ data, WebMAUSBasic (Kisler, Schiel, & Sloetjes, 2012) detected target words and segments semi-automatically with manual correction when necessary. Child data was labeled completely manually. A stable periodic cycle in the oscillogram as well as a stable formant pattern (especially a clearly detectable second formant) were used as indices for vocalic segments. The first ascending zero-crossing in the oscillogram at the beginning of the periodicity was accordingly set as vocalic onset, the first ascending zero-crossing after the end of periodicity and disappearance of F2 as the beginning of the following consonant. From the resulting intervals, the relevant time stamps for the analysis, the temporal midpoint of the vowel (V50), the end of the vowel (V100), the temporal midpoint of the consonant (C50), the end of the consonant (C100), and the temporal midpoint of the final schwa (schwa50) were automatically extracted.

Via these time stamps from the acoustic signal, ultrasound frames of interest were selected and the corresponding tongue contours were detected semi-automatically with custom-made scripts for MATLAB (2016) as part of the SOLLAR platform (see Figure 2). For each individual frame of interest, a spline (yellow line) was automatically fit to manually placed reference points (red dots in Figure 2) on the visible midsagittal tongue surface contour.

Figure 2
Figure 2

Example of ultrasound tongue image of a five-year-old boy’s [e] recorded within SOLLAR. The left panel presents the raw ultrasound image, the right panel shows the highlighted tongue contour resulting from SOLLAR’s semi-automatic tracking. In each image, the front part of the tongue is depicted towards the left.

X- and y-coordinates for each of the 100 points of these splines were automatically extracted. For the present analysis, we used the x-coordinate, hence the horizontal position, of the highest point of the tongue dorsum surface contour as a representation of frontness of the tongue body. To prevent taking measures into account where the highest point on the tongue surface contour was on the tongue tip and not on the tongue body, we visually inspected those contours for the /d/ closure that had relatively low x-values for the highest point. This way, eight contours (four in C5, and one in each of the other cohorts) were identified and the corresponding trials removed from the analysis.

To compare coarticulatory behaviors across participants, we normalized each participant’s horizontal tongue dorsum positions on the same scale. Among all of a speaker’s trials, the most anterior tongue dorsum position during V50 was set to zero and the most posterior tongue dorsum position at V50 to one. His/her tongue dorsum positions at all time points were then scaled in relation to this range.

2.5 Data analysis

We measured coarticulatory patterns as the horizontal positions of the highest point of the tongue dorsum during the consonant and the schwa depending on the frontness of the tongue dorsum position during the preceding vowel and compared these trajectories between consonant contexts and age cohorts. Figure 3 presents an example of tongue movement trajectories for ‘einebige’ (left) and ‘einebuge’ (right) illustrated by the tongue contours of a four-year-old boy at the five time points of interest. The highest point of each tongue contour is highlighted by a dot. The contours are presented in a coordinate system in millimeters where x = zero, y = zero is the position of the center of the ultrasound probe. X-values below zero indicate the area in front of the center of the probe (displayed towards the left), x-values above zero the back (displayed towards the right). For /igə/ (left plot), the tongue starts in a front position for /i/ at V50 (in green) and moves back towards a relatively central schwa (in pink) in the course of the utterance. For /ugə/ on the other hand (right plot), the tongue has a relatively back position during V50 and moves forward in the course of the utterance. The figure shows that 1) the horizontal position of the highest point of the tongue represents the frontness of the whole tongue, and 2) not only the tongue dorsum position at V50, but also the positions at later time points differ depending on the vowel. In the present study, the goal was not to illuminate the impact of specific vowels but rather to investigate context-induced spatial changes in tongue dorsum positions within an utterance. Vowel information is therefore not considered categorically but as a continuous variable ranging from front to back tongue dorsum positions. Most front positions correspond to phonologically front vowel categories (/i/, /y/, /e/) and back positions usually express phonologically back vowel categories (/o/, /u/).

Figure 3
Figure 3

Whole tongue surface contours of participant CM4_007, a four-year-old boy, for trials ‘einebige’ (left) and ‘einebuge’ (right) at time points V50 (green), V100 (light green), C50 (orange), C100 (red), and schwa50 (pink). The dots highlight the highest points of the respective contours. The front of the tongue is displayed towards the left of each plot.

To statistically assess these vowel-dependent tongue dorsum frontness trajectories, we used generalized additive modelling (GAM). A generalized additive model is a mixed effects regression model that, in contrast to the more familiar linear mixed effects model, also includes non-linear terms similar to polynomial curves, for example. GAMs can therefore detect linear as well as non-linear patterns in dynamically varying data while also taking into account subject- and item-related variability, as known from linear mixed effects models. This approach was previously applied to ultrasound data acquired from adults (Strycharczuk & Scobbie, 2017) and used for the analysis of anticipatory coarticulation in the present developmental data set by Noiray et al. (2019b).

We fit our models using the function bam of the mgcv package in R (version 1.8–28; Wood, 2011, 2017). For each model, the function gam.check was used to examine the normality of residuals’ distribution, heteroscedasticity, and adequacy of the k-parameter. This parameter specifies the maximal non-linearity by setting the size of basis dimensions for each predictor. It is limited to the number of the predictors’ unique points. For more detailed information on the application of GAMs on articulatory data, we recommend Wieling’s (2018) tutorial.

For the current analysis, we tested whether the horizontal position of the highest point on the tongue dorsum depended on the horizontal position of the tongue dorsum during the stressed vowel (V50) at the four target time points V100, C50, C100, and schwa50. To include both time and tongue dorsum position at V50 as well as their interaction as predictors, a tensor product (te) was used. It captures changes in the shape of the tongue dorsum frontness trajectory over time as a function of the frontness of the tongue dorsum during the stressed vowel separately for each age cohort and consonant context. In the random effects structure of the model, defined in two factor smooth terms (s), we included potentially non-linear patterns for each participant and consonant over time and for the different horizontal tongue dorsum positions at V50. The complete code for this model with explanations of single parameters can be found in the Appendix (model m).

This first model detected the frontness trajectories of the tongue dorsum and tested whether the patterns found are significantly different from zero, i.e., non-linear, for each age cohort and consonant context. To answer our two research questions, however, direct comparisons of these patterns between 1) age cohorts, and 2) consonant contexts are necessary. Within GAMs, binary difference tensors need to be included to assess the statistical significance of comparisons between two dynamical patterns. To answer our first research question addressing developmental differences, we therefore included binary difference tensors capturing whether the age cohorts differed significantly with respect to the influence of the horizontal tongue dorsum position during the vowel on the frontness trajectory of the tongue dorsum during the following segments. An example of a code for a corresponding model including binary difference tensors can be found in the Appendix (model mb7).

Consonantal differences in vocalic carryover effects within age cohorts, the core of research question two, were assessed similarly: The models here included binary difference tensors capturing whether the consonant contexts /b/, /d/, and /g/ differed significantly with respect to the influence of the horizontal tongue dorsum position during the stressed vowel on the frontness trajectory of the tongue dorsum during the following segments within each cohort.

Because a total of six models was necessary to address all relevant comparisons (by fitting the models with differing reference groups), we Bonferroni-corrected our significance threshold to 0.008 to account for multiple comparisons.

3 Results

3.1 Patterns of carryover coarticulation

For each age cohort and consonant context, the pattern of carryover coarticulation is described according to three parameters: the dependent variable horizontal tongue dorsum position in the course of the utterance, and the two independent variables time point and horizontal tongue dorsum position at the midpoint of the stressed vowel. To visualize these three dimensions, we present all 15 patterns (five age cohorts x three consonant contexts) in contour plots. Because these have not yet become a standard way of presenting data, we first explain how to read them with the example of 3-year-olds’ coarticulatory pattern in /b/ contexts (Figure 4).

Figure 4
Figure 4

Example of a contour plot that visualizes horizontal tongue dorsum positions over time (based on the four time points V100, C50, C100, and schwa50 that are represented on the x-axis) depending on the tongue dorsum position during the midpoint of the vowel (V50, y-axis). Tongue dorsum positions are indicated by color coding as shown in the small legend in the top right corner: from pink for front positions (values close to zero) to blue for back positions (values close to one). The dashed horizontal lines in the contour plot correspond to the two-dimensional graphs in the top row of the right side of the figure display and refer to the tongue dorsum position over time for a specific V50 position (0.2 and 0.7, respectively). The dashed vertical lines correspond to the lower two graphs that visualize the tongue dorsum position depending on the V50 position for a specific time point (C50 and schwa50, respectively).

In the contour plot on the left side of Figure 4, the predictors time point and horizontal tongue dorsum position at the vowel midpoint are presented on the x- and y-axis, respectively. Values close to zero on the y-axis correspond to anterior tongue dorsum positions, values closer to one to posterior positions. The horizontal position of the highest point of the tongue dorsum at a given time point for a given V50 frontness value is depicted by color shades from pink for anterior positions (values close to zero) to blue for posterior positions (values close to one) as indicated in the small legend in the top right corner of the plot. Black contour lines connect points with the same value to support legibility. The vertical bands at the four different time points are the actual data while the slightly shaded areas in between are what the model predicts on the basis of this data.

The contour plot in Figure 4 presents a pattern resembling a fan getting wider towards the right side with a variety of color shades at V100 but mostly purple shades at schwa50. What this implies is that at V100 there is a broad range of horizontal tongue dorsum positions (i.e., values spread from just above zero to just below one) reflecting roughly the position at the temporal midpoint of the vowel (y-axis). The further you get away from the vowel on the x-axis, however, the less color shades referring to extreme tongue dorsum positions (i.e., far front or far back) are found. Instead, we note more central positions regardless of the previous V50 tongue dorsum positions.

Each of the four small graphs on the right-hand side of Figure 4 isolates an independent variable to illustrate relations in a more familiar two-dimensional plot. The two plots in the top row represent the horizontal green and orange dashed lines in the contour plot and depict the horizontal motion of the tongue dorsum over time when the tongue dorsum position at V50 is prespecified at 0.2 (left) and 0.7 (right). Starting from different positions, both lines move towards the center over time. In the two bottom plots, we fixed the time points C50 (black dashed vertical line) and schwa50 (red dashed vertical line). They depict how the tongue dorsum position at a given time point changes with the tongue dorsum position at V50. The fan pattern is reflected here by a stronger relationship at C50 than at schwa50.

Figure 5 presents the full matrix of contour plots for all age cohorts (from left to right: C3, C4, C5, C7, and A) and consonants (from top to bottom: /b/, /d/, and /g/). Three main observations can be drawn from this matrix: First, the fan-like gradual shift from vowel-specific to overall more central positions over time described above for three-year-olds’ /b/ context, is found in all cohorts’ /b/ and /g/ contexts. However, the temporal development towards more central tongue dorsum positions happens faster for older participants than for younger ones: The pattern is compressed in adults’ plots as compared to young children’s, indicating earlier central positions and therefore shorter vocalic impacts. Second, this developmental trend is more prominent in /g/ compared to /b/ contexts. And third, the pattern in /d/ contexts differs drastically from /b/ and /g/ contexts’: The pink hill in the plots indicates a forward movement of the tongue dorsum during the consonant. This is more salient in younger than in older participants. Our first model revealed that all of these carryover coarticulatory patterns are significantly different from zero (p < 0.00017).

Figure 5
Figure 5

Contour plots illustrating the horizontal movement of the tongue dorsum across the four time points V100, C50, C100 and schwa50 (x-axis) depending on the tongue dorsum’s horizontal position during V50 (y-axis). Color gradients as defined in the upper right corner indicate anterior (pink) to posterior (blue) positions. Patterns are plotted separately for each age cohort and consonant context (/b/, /d/, /g/). In each plot, the bright vertical bands indicate the time points we collected data for on the basis of which the model estimated the shaded areas.

Because in Figure 5, the coarticulatory patterns of seven-, five-, and maybe four-year-olds appeared visually similar, we ran a model including binary difference tensors for age cohort comparisons to check whether we should group them. Using the seven-year-olds as reference, results did not reveal any differences between their coarticulatory patterns and those of age cohort C5. A difference was found, however, in comparison to cohort C4 in the /b/ and /g/ contexts as well as to cohorts C3 and A in all three consonant contexts. Cohorts C5 and C7 were therefore grouped for the subsequent analysis (C57 henceforth).

3.2 Comparison of coarticulatory patterns across age cohorts

To assess the statistical significance of the developmental differences in coarticulatory patterns as impressionistically displayed in Figure 5, we fit three binary difference models with varying reference groups. Tables 3 to 5 present the outcomes of the models with the reference groups C3, C4, and A respectively. In every table, the first three lines refer to the coarticulatory pattern (i.e., the interaction between time and tongue dorsum position at V50 [V50pos]) of the reference group for the three consonant contexts. Lines four to 12 present the differences between the reference and the indicated age cohort for a given consonant. In all output tables, the asterisks indicating significance levels adhere to the Bonferroni-corrected thresholds.

Table 3

Output of binary difference smooth model testing for age differences within each consonant context with reference group C3. Significance codes ‘***’: p < .00017; ‘**’: p < .0017; ‘*’: p <.008; ‘.’: p < 0.017.

Tensor product edf F-value p-value
(time, V50pos): b 9.932 68.618 <0.00017 ***
(time, V50pos): d 23.804 23.693 <0.00017 ***
(time, V50pos): g 10.977 54.796 <0.00017 ***
(time, V50pos): C4 /b/ 6.207 1.688 0.11224
(time, V50pos): C57 /b/ 7.385 9.751 <0.00017 ***
(time, V50pos): A /b/ 18.635 3.307 <0.00017 ***
(time, V50pos): C4 /d/ 5.739 0.712 0.67056
(time, V50pos): C57 /d/ 6.323 3.577 0.00058 **
(time, V50pos): A /d/ 18.654 2.341 0.00024 **
(time, V50pos): C4 /g/ 9.796 3.335 <0.00017 ***
(time, V50pos): C57 /g/ 11.753 5.259 <0.00017 ***
(time, V50pos): A /g/ 9.951 5.384 <0.00017 ***
Table 4

Output of binary difference smooth model testing for age differences within each consonant context with reference group C4. Significance codes ‘***’: p < .00017; ‘**’: p < .0017; ‘*’: p <.008; ‘.’: p < 0.017.

Tensor product edf F-value p-value
(time, V50pos): b 13.942 50.662 <0.00017 ***
(time, V50pos): d 24.782 22.063 <0.00017 ***
(time, V50pos): g 12.884 41.321 <0.00017 ***
(time, V50pos): C3 /b/ 7.309 2.079 0.03133
(time, V50pos): C57 /b/ 9.292 3.198 0.00029 **
(time, V50pos): A /b/ 12.142 6.163 <0.00017 ***
(time, V50pos): C3 /d/ 6.340 1.663 0.11784
(time, V50pos): C57 /d/ 9.439 1.882 0.03155
(time, V50pos): A /d/ 12.329 3.580 <0.00017 ***
(time, V50pos): C3 /g/ 11.907 2.427 0.00144 **
(time, V50pos): C57 /g/ 7.680 11.743 <0.00017 ***
(time, V50pos): A /g/ 7.452 5.906 <0.00017 ***
Table 5

Output of binary difference smooth model testing for age differences within each consonant context with reference group A. Significance codes ‘***’: p < .00017; ‘**’: p < .0017; ‘*’: p < .008; ‘.’: p < 0.017.

Tensor product edf F-value p-value
(time, V50pos): b 21.156 44.753 <0.00017 ***
(time, V50pos): d 22.815 33.594 <0.00017 ***
(time, V50pos): g 13.446 57.036 <0.00017 ***
(time, V50pos): C3 /b/ 9.939 2.830 0.00069 **
(time, V50pos): C4 /b/ 4.001 6.231 <0.00017 ***
(time, V50pos): C57 /b/ 5.165 3.222 0.00416 *
(time, V50pos): C3 /d/ 6.447 4.356 <0.00017 ***
(time, V50pos): C4 /d/ 8.582 2.638 0.00198 *
(time, V50pos): C57 /d/ 5.810 1.832 0.07487
(time, V50pos): C3 /g/ 11.800 3.864 <0.00017 ***
(time, V50pos): C4 /g/ 7.089 9.661 <0.00017 ***
(time, V50pos): C57 /g/ 7.690 12.660 <0.00017 ***

Table 3 shows that three-year-olds differed from all other cohorts in the /g/ context, while they did not differ significantly from four-year-olds in the /b/ and /d/ contexts. Four-year-olds differed from adults in every consonant context but did not differ from the five- and seven-year-olds in the /d/ context (cf. Table 4). Table 5 finally completes the group comparisons by indicating that adults did not differ from cohort C57 in the /d/ context but did differ in /b/ and /g/ contexts. Figure 6 visualizes these age differences in two-dimensional plots. Similar to the black and red graphs in Figure 4, we fixed the independent variable time point for each plot (from top to bottom: V100, C50, C100, and schwa50). The horizontal tongue dorsum position at V50 is represented on the x-axis and its position at the indicated time point on the y-axis (both from zero = anterior positions to one = posterior positions). Each plot depicts every age cohort’s results for one consonant context (from left to right: /b/, /d/, /g/).

Figure 6
Figure 6

Relation between the tongue dorsum position at V50 (x-axis) and that at each of the four investigated time points (y-axis, per row: V100, C50, C100, schwa50) per age cohort. Each consonant context is plotted separately.

How to read the plots is demonstrated by means of the coarticulatory pattern in /g/ contexts at C100 (fourth plot in the right column). Let us focus on the light blue line that represents three-year-olds’ coarticulatory patterns. For each horizontal tongue dorsum position at the midpoint of the vowel (x-axis), the corresponding horizontal tongue dorsum position at the endpoint of the following /g/ (C100) is plotted (y-axis). For vowels produced with relatively front tongue dorsum positions, for example 0.2, which could characterize the categories /i/ or /e/, the tongue dorsum is at about 0.4 at C100. For posterior vowels, for example positions of 0.8 (i.e., /o/ or /u/), the tongue dorsum is at about 0.6 at C100. The tongue dorsum position at C100 therefore depended on the tongue dorsum position at V50. The steeper a cohort’s line in a specific consonant context is, the closer the tongue dorsum position during the investigated time point resembles that during the midpoint of the previous vowel, i.e., the higher is the coarticulation degree. The example plot (/g/ at C100) illustrates a higher coarticulation degree for younger than for older speakers, since lines flatten with increasing age.

GAMs allow us to detect linear as well as non-linear patterns in this relationship. Cohorts C4 and C57 in the same plot for example, display a slowly increasing line with higher slopes towards both ends of the V50 continuum. This implies that the tongue dorsum is in a central position (approximately 0.5) following vowels with all positions from 0.3 to 0.7; the correlation between V50 position and C100 position is therefore relatively weak here. When following vowels with extremely front or extremely back positions, however, the tongue dorsum position at time point C100 mirrors this direction resulting in higher correlations towards the edges. In adults’ /b/ context at C50 on the other hand, the non-linear pattern is reversed, with flat edges and a strongly increasing part in the middle. This implies that the tongue dorsum moved towards more central positions following extremely front and back vowel positions, while remaining approximately at the vocalic position for V50 values of 0.3 to 0.6.

Figure 6 illustrates that the development of coarticulation degree is consonant-specific. The least age differences in coarticulation degree were found in the /b/ context. Adults’ lines representing /b/ contexts stand apart from those of the child cohorts at every time point in being flatter and less linear. It also becomes apparent that the statistically confirmed difference between the five- and seven-year-olds and the younger children mainly results from a difference in coarticulation degree at schwa50 where cohort C57 displays a lower slope. For the /g/ context, Figure 6 illustrates a growing differentiation between the cohorts across time points in the utterance: While at V100 and C50, adults only differed from children in slightly more central productions following back vowel positions; at C100 and schwa50 lines spread apart. Adults produced the schwa in /g/ contexts with a relatively central tongue dorsum position independent of the previous vowel, whereas children’s tongue dorsum positions still resembled that of the preceding vowel. Finally, in /d/ contexts, tongue dorsum positions at C50 were relatively front, especially for young children, as already indicated by the pink hills in their contour plots (Figure 5), and coarticulation degree increased with age. Interestingly, however, children’s patterns at schwa50 suggest a higher coarticulation degree than found in adults again. Similar to adults, their tongue positions were mostly central, but the observed non-linearity displays vowel-induced shifts towards more anterior and posterior positions respectively, that were not found in adults.

3.3 Comparison of consonantal impact within age group

To assess consonant-induced differences in coarticulatory patterns within age cohorts, another binary difference smooth model was fit. Because Figure 5 suggested a high similarity between coarticulatory patterns in /b/ and /g/ contexts while /d/ contexts seemed to stand apart, the /b/ context was used as a reference. The output of the model comparing the interaction between time and the horizontal position of the tongue dorsum during V50 between consonant contexts within age cohorts is displayed in Table 6. While the first four lines represent the interaction between time and tongue dorsum position at V50 in the /b/ context (reference level) for each cohort, lines five to eight provide information on the difference between /b/ and /d/, and lines nine to 12 between /b/ and /g/ contexts within each cohort.

Table 6

Output of the model testing for consonantal differences within age cohorts. The reference consonant is /b/. Significance codes ‘***’: p < .00017; ‘**’: p < .0017; ‘*’: p < .008; ‘.’: p < 0.017.

Tensor product edf F-value p-value
(time, V50pos): C3 3.001 243.207 <0.00017 ***
(time, V50pos): C4 18.465 34.896 <0.00017 ***
(time, V50pos): C57 23.248 75.983 <0.00017 ***
(time, V50pos): A 19.750 47.268 <0.00017 ***
(time, V50pos): C3 /d/ 7.830 32.389 <0.00017 ***
(time, V50pos): C4 /d/ 8.876 13.653 <0.00017 ***
(time, V50pos): C57 /d/ 7.826 26.629 <0.00017 ***
(time, V50pos): A /d/ 8.253 9.348 <0.00017 ***
(time, V50pos): C3 /g/ 7.783 1.118 0.34066
(time, V50pos): C4 /g/ 6.886 3.655 0.00042 **
(time, V50pos): C57 /g/ 7.397 9.584 <0.00017 ***
(time, V50pos): A /g/ 6.196 5.647 <0.00017 ***

For three-year-olds, there was no significant difference between the /b/ and /g/ context patterns. In all other cohorts however, both the coarticulatory pattern for /d/ and for /g/ contexts differed significantly from that in the /b/ context. Similar to Figure 6, Figure 7 visualizes these consonantal differences in two dimensions. Each plot depicts one age cohort’s tongue dorsum positions in all three consonant contexts.

Figure 7
Figure 7

Relation between the tongue dorsum position at V50 (x-axis) and that at the four investigated time points (y-axis, per row: V100, C50, C100, schwa50) per consonant context. Each age cohort is plotted separately.

Figure 7 illustrates that /b/ and /g/ did not differ in their coarticulatory pattern for three-year-olds as was indicated by the model. Here, we get the additional information, that the coarticulation degree of these consonants was higher than that of /d/. In addition to this difference, older cohorts’ /b/ contexts allowed an even higher coarticulation degree than /g/ contexts. For adults however, the consonantal differences do not seem as pronounced as for children. In each cohort, V100 was characterized by a high vowel dependency in each consonant context; the consonant-related differences of the vowel’s coarticulation degree were strongest at C50 and C100 and decreased again at schwa50.

To summarize our main results, consonant-context-dependent differences in vocalic carryover coarticulation were found in every age cohort. Except for three-year-olds’ /b/ and /g/ patterns, the three consonant contexts differed significantly for every age cohort. The across-cohort analysis testing for developmental differences revealed developmental trajectories of coarticulation degree to be consonant-specific, with a clear decrease in coarticulation degree with age for /g/ contexts, a slight decrease for /b/ contexts, and a special pattern for /d/ contexts indicating less coarticulation for children than adults in the domain of the consonant but slightly more coarticulation for younger participants again during the final schwa.

4 Discussion

The present study is the first one using kinematic data to assess children’s lingual carryover patterns as well as the first one ever investigating carryover coarticulation in German children. Because previous analyses within the same group of participants suggested a strong decrease of vocalic anticipation with increasing age, we asked whether vowel-related movements would also overlap more with following gestures in children’s than in adults’ speech. In a cross-sectional design we recorded speech movements via ultrasound tongue imaging to follow the development of coarticulatory processes in children from as young as three years of age until adulthood. The results confirm a developmental decrease in carryover coarticulation degree and therefore support our hypothesis of broader vocalic activation not only in the left (Noiray et al., 2019b; Rubertus & Noiray, 2018) but also in the right field of the utterance. The articulatory demands of the consonantal context were shown to impact the coarticulatory pattern within as well as across cohorts. The following sections discuss origins and implications of 1) the general developmental decrease of coarticulation, 2) the consonantal impact within age cohorts, and 3) the consonant-dependent developments across cohorts.

4.1 Carryover coarticulation decreases with age

The present study uncovered a developmental decrease of carryover coarticulation of stressed vowels in VCə sequences. In utterances with the consonants /b/ and /g/, this decrease was evident through the gradual shift from vowel-specific tongue dorsum positions towards a central position being significantly slower for young children than for older children and especially for adults (cf. Figure 5). Our interpretation within the coproduction framework (Fowler, 1980) is that the overlap of vocalic activation with following gestures is larger in younger than in older speakers. Although utterances containing the alveolar stop /d/ displayed a pattern differing tremendously from that of /b/ and /g/ contexts, we found evidence for a developmental decrease of vocalic activation here as well: While during the domain of the consonant the vocalic impact was lower in children than in adults, children produced the final schwa with tongue dorsum positions resembling more those of the vowel than adults’. The special coarticulatory pattern noted in alveolar contexts and the resulting discontinuous coarticulatory effects will be discussed in Sections 4.2 and 4.3.

The finding that not only young children’s anticipatory but also their carryover coarticulation of stressed vowels is stronger than older participants’, supports the hypothesis of a compression of activation curves with age that was implied by Nittrouer (1993) and revisited here. While our illustration of the hypothesized activation curves in Figure 1 is just a simplified sketch not integrating articulatory demands of the combined gestures, it depicts developmental differences in speech production strategies coherent with those identified in the present study. The symmetry of developments in both coarticulatory directions suggests anticipation and perseveration of a stressed vowel’s articulatory gestures to have a common origin in the way gestures are phased to each other and overlap with their neighbors instead of being the result of two distinct processes.

A possible origin of broader vocalic activation in child than in adult speech may be related to the finding that a well-balanced degree of inhibition of temporarily irrelevant information in various cognitive domains only matures in the course of childhood (e.g., Bjorklund & Harnishfeger, 1990). Elements that are for some reason prominent or hyperactive would therefore be harder to inhibit for children than for adults. In Section 1.3, various arguments for children to ascribe stressed vowels a special status in perception as well as production were summarized. Fowler (1980) proposed that exactly these stressed vowels serve as the basis for the organization of gestural activation and phasing. It therefore seems likely that a hyperactivation of vowels could in turn lead to broader gestural activation and execution, as for example suggested in Tilsen’s (2016) selection-coordination theory of speech production. Accordingly, children’s difficulty with inhibiting especially prominent parts of the speech plan would result in higher activation of vowels at a selection stage before execution leading to earlier initiation of the vocalic gesture and therefore to a higher degree of anticipatory coarticulation. It would, however, also delay the de-selection of the gesture after the vowel target was reached leading to broader overlap with following gestures, i.e., to a higher degree of carryover coarticulation. According to Tilsen (2018), the development of the speech production system across childhood could be driven by a change of the kind of feedback accessible to speakers as is known from other motor processes (e.g., Butz, Sigaud, & Gérard, 2003). While children would first rely on relatively slow external feedback from peripheral sensory organs, with experience, the faster internal feedback that works with predictions of sensory consequences of outgoing motor commands becomes accessible. It seems possible that the change in the use of feedback with increasing language experience and the maturation of inhibitory processes are closely related. Tilsen (2018) explicitly relates findings of a higher degree of anticipatory vowel-to-consonant coarticulation to an immaturity of the coordination of gestures. More precisely, he ascribes the CV hyper-coarticulation in child speech to an “asymmetry such that closure is more strongly coupled to V than release” (p. 33). At least in this form his reasoning does not provide a direct explanation either for children’s greater degree of anticipatory long-distance coarticulation (Rubertus & Noiray, 2018), nor for the greater degree of carryover coarticulation found in the present study. We hope our findings stimulate a revival of empirical interest in carryover coarticulation to feed speech production (development) models.

While the maturation of inhibitory control possibly triggered by a change in the use of feedback can account for the anticipation and the perseveration of a vocalic gesture as well as their development across childhood, it would certainly be premature to exclude other scenarios. The decrease of coarticulation in the two directions might only resemble each other on the surface while being driven by different maturational processes: Can Flege’s (1988) conclusion that inertial properties of the speech motor system do not change with age for example be replicated with articulatory measures and be transferred from velar to lingual motion? Only additional systematic investigations of carryover coarticulation in children’s speech, for example via different speech rate conditions, as well as a direct comparison of anticipatory and carryover coarticulation can enlighten our understanding of the speech production mechanism and its development.

In order to investigate coarticulation development across childhood, the present study focused on age differences only, for reasons of simplicity. We would like to emphasize, however, that age itself should not be mistaken as the driving force for changes in spoken language. Instead, age is a mediating variable reflecting various cognitive and motor developments not illuminated here. In a recent study, it was for instance found that children with greater knowledge of the phonological structure of their language show more mature coarticulatory patterns than children with poor phonological awareness (Noiray et al., 2019a). In addition, we have started examining possible influences of literacy acquisition on speech organization.

4.2 The impact of the consonant within cohorts

While we did find evidence for generally broader vocalic activation in children than in adults in all three consonant contexts, our results suggest that the consonant impacts on the coarticulatory trajectories within each cohort as well as on the development of vocalic carryover coarticulation across cohorts. Within each age cohort, there is a clear distinction of vowel-dependent tongue dorsum movement patterns as a function of consonants’ degree of coarticulatory resistance. Utterances with low resistant consonants /b/ and /g/ are characterized by a gradual shift from vowel-specific towards central tongue dorsum positions while tongue dorsum movements in utterances with the high resistant consonant /d/ are clearly discontinuous, moving front in the domain of the consonant to then resume a rather central position during schwa. Except for the three-year-olds, a significant difference in coarticulation degree was found between the two low resistant consonants /b/ and /g/ with the bilabial allowing for more vocalic perseveration than the velar.

This set of findings is in line with Fowler and Brancazio’s (2000) hypothesis of temporary resistance-dependent consonantal clamps of the tongue dorsum during continuous vowel productions. Because the tongue dorsum is not recruited for the bilabial plosive, it can follow its trajectory from vowel specific towards central positions without being disturbed while the lips form the closure for the consonant. Here, there is therefore no need for the tongue dorsum to reach a rather central position soon after the vocalic target because the bilabial plosive is intelligible independent of the tongue position. On the contrary, the velar plosive /g/ shares the primary articulator with the vowels. However, due to its low resistance, vocalic and consonantal movements are blended, resulting in vowel-dependent locations of the palatal constriction. Consequently, there is a gradual shift from the preceding vowel towards the center resembling that of the bilabial but reaching a central position earlier because of the relatively central position of the necessary palatal closure. Last, the highly resistant plosive /d/ needs a front movement of the tongue dorsum to support the tongue tip in forming the alveolar constriction which results in a strong temporal clamping of the tongue dorsum.

4.3 The impact of articulatory demands on the development of coarticulation

In terms of development across cohorts, we found that the decrease of coarticulation degree is strongest in /g/ contexts. The contour plots in Figure 5 suggest that the point of palatal contact for /g/ is more variable in younger than in older speakers. While tongue dorsum positions during the consonant’s mid and end point are distributed widely from front to back positions in the young cohorts, the older the speakers are, the more central the tongue is during /g/. Again, this speaks for a strong blending of vocalic and consonantal gestures in young children resulting in very vowel-dependent points of palatal contact and therefore more variability in the constriction location of /g/. Older speakers on the other hand, display less vowel-dependency of the consonant which we interpret as evidence for less coproduction due to compressed vocalic activation curves compared to younger speakers. For utterances including the bilabial /b/, the decrease in coarticulation degree is significant as well although the difference between the youngest and the oldest cohort is not as strong as in the /g/ context. Presumably, this is because the strength of coproduction of vocalic and consonantal gestures does not result in changes of a point of lingual contact but only determines how long the vowel ‘fades out.’

The developmental results from the alveolar context shed light on an aspect of the maturation of speech motor control across the age cohorts tested. This finding supplements existing research suggesting that the development of speech motor control is protracted and continues until adolescence at least in some aspects (e.g., Noiray, Cathiard, Abry, & Ménard, 2010; Noiray et al., 2013; Smith & Zelaznik, 2004). To produce an alveolar stop, the tongue tip forms a constriction at the alveolar ridge. The strong forwards movement of the tongue dorsum that we see during the production of /d/ in young children but way less in adults (cf. Figure 5) indicates that children do not only move the tongue tip forward but the tongue dorsum as well. We interpret this as evidence for a tighter coupling or lack of independence between the tongue tip and the tongue dorsum in young children. Interestingly, results in Noiray et al. (2019b, p. 3043) had indicated a more anterior tongue dorsum position for adults than for children during the production of /d/ as C1 instead of C2. A closer look at the data revealed that children’s tongue dorsum positions during the production of the alveolar stop are approximately the same in C1 and C2 while adults’ positions differ tremendously. This pattern is predicted by Tilsen’s (2018) hypothesis of immature coordinative control: In C1 = d contexts on the one hand, the vowel is coupled to the consonant’s release for adults but to the consonant’s closure for children. Vocalic and consonantal demands would therefore be blended more strongly in young participants, while adults’ vocal tract is dominated by the alveolar constriction gesture. In C2 = d contexts on the other hand, the vocal tract is in shape for the vowel when the consonantal gesture is initiated. Vocalic and consonantal gestures would therefore blend immediately from the beginning of C2 in all cohorts. Regardless of the reasons behind the adults’ pattern however, this observation shows that an independence of the tongue tip from the tongue dorsum creates the possibility for independent movements on the one hand and articulatory synergies on the other (cf. Noiray et al., 2013). Importantly, children in the younger cohorts have not mastered the independent functional control of tongue tip and tongue dorsum yet (cf. Nittrouer, Studdert-Kennedy, & Neely, 1996), but with increasing age and speech motor experience the independence of the two articulatory organs increases, approximating an adult-like pattern in five- and seven-year-olds. How these developmental changes in articulatory independence and synergies can be simulated in speech production models like the task dynamic model of inter-articulator speech coordination (TaDA) is the focus of another project currently run in our laboratory. While the tighter coupling of tongue tip and tongue dorsum prevents a higher activation of children’s vocalic gestures from being measurable during the consonantal domain, it becomes apparent again during the schwa. Assuming Fowler and Brancazio’s (2000) notion of consonants clamping the tongue dorsum temporarily during continuous vowel productions, this suggests that children’s vocalic gestures are active until the final schwa but temporarily hidden by the consonantal requirements, while adults’ vowel activation seems to decrease to a minimum before schwa production.

5 Conclusion

The present study provides first empirical evidence for vocalic carryover coarticulation to decrease with increasing age and therefore to develop similarly as anticipatory coarticulation. Although for now we cannot rule out other possible scenarios, this finding does not give rise to a discrepancy of underlying mechanisms between the two coarticulatory directions. Instead, we interpret our results as suggesting one common mechanism underlying anticipatory and carryover coarticulation: the coproduction of simultaneously active speech gestures that decreases across childhood in both directions because of a maturation of inhibitory control mechanisms responsible for accurate selection and de-selection of gestures. In addition to the width of gestural activation, our results support the notion that the degree of vocalic carryover coarticulation depends on the compatibility of articulatory demands of active speech gestures. Because of speech motor control maturation during childhood, this dependency is another source for developmental differences in coarticulatory patterns.

Additional File

The additional file for this article can be found as follows:


A PDF file containing examples and explanations for the model code. DOI: https://doi.org/10.5334/labphon.228.s1


This research was supported by two grants from the Deutsche Forschungsgemeinschaft (255676067 and 1098, recipient: Aude Noiray). Our study would not have been possible without a number of people who we are very thankful to: Mark Tiede and Jan Ries for assisting in developing the SOLLAR platform, the BabyLAB team at University of Potsdam (in particular Barbara Höhle and Tom Fritzsche) for their support in participants’ recruitment, the team at Laboratory for Oral Language Acquisition for diligent help in data collection and processing, Martijn Wieling for providing valuable insights as well as example scripts for the statistical analysis, and Dzhuma Abakarova and Stella Krüger for fruitful discussions throughout the project and for giving helpful feedback on earlier versions of this paper. Last but not least, we are grateful to all our participants, adults and children (as well as their parents), for their time and patience and for challenging but enjoyable data collection sessions.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Aude Noiray provided the administration, the funding, and the technical resources for the project. The present study was conceptualized and designed by Aude Noiray and Elina Rubertus. Both authors contributed to data collection and processing together with a team of students. Elina Rubertus is responsible for the formal analysis and the writing of this manuscript that was subsequently jointly edited by both authors.


Barbier, G., Perrier, P., Payan, Y., Tiede, M. K., Gerber, S., Perkell, J. S., & Ménard, L. (2020). What anticipatory coarticulation in children tells us about speech motor control maturity. Plos One, 15(4), e0231484. DOI:  http://doi.org/10.1371/journal.pone.0231484

Baum, S. R., & Waldstein, R. S. (1991). Perseveratory coarticulation in the speech of profoundly hearing-impaired and normally hearing children. Journal of Speech and Hearing Research, 34, 1286–1292. DOI:  http://doi.org/10.1044/jshr.3406.1286

Bjorklund, D. F., & Harnishfeger, K. K. (1990). The resources construct in cognitive development: Diverse sources of evidence and a theory of inefficient inhibition. Developmental Review, 10(1), 48–71. DOI:  http://doi.org/10.1016/0273-2297(90)90004-N

Bladon, R. A. W., & Al-Bamerni, A. (1976). Coarticulation Resistance in English /l/. Journal of Phonetics, 4(2), 137–150. DOI:  http://doi.org/10.1016/S0095-4470(19)31234-3

Boersma, P., & Weenink, D. (2016). Praat: Doing phonetics by computer (Version 6.0.20) [Computer program]. Available from http://www.praat.org/

Browman, C. P., & Goldstein, L. M. (1986). Towards an articulatory phonology. Phonology Yearbook, 3(1986), 219–252. DOI:  http://doi.org/10.1017/S0952675700000658

Butz, M. V., Sigaud, O., & Gérard, P. (2003). Anticipatory behavior: Exploiting knowledge about the future to improve current behavior. In Anticipatory behavior in adaptive learning systems (pp. 1–10). Springer. DOI:  http://doi.org/10.1007/978-3-540-45002-3_1

Cutler, A., & Mehler, J. (1993). The periodicity bias. Journal of Phonetics, 21, 101–108. DOI:  http://doi.org/10.1016/S0095-4470(19)31323-3

Fikkert, P. (1994). On the acquisition of prosodic structure. Leiden, The Netherlands: Leiden University.

Flege, J. E. (1988). Anticipatory and carry-over nasal coarticulation in the speech of children and adults. Journal of Speech and Hearing Research, 31(December), 525–536. DOI:  http://doi.org/10.1044/jshr.3104.525

Fowler, C. A. (1980). Coarticulation and theories of extrinsic timing. Journal of Phonetics, 8(1), 113–133. DOI:  http://doi.org/10.1016/S0095-4470(19)31446-9

Fowler, C. A. (1981). Production and perception of coarticulation among stressed and unstressed vowels. Journal of Speech, Language, and Hearing Research, 24, 127–139. DOI:  http://doi.org/10.1044/jshr.2401.127

Fowler, C. A., & Brancazio, L. (2000). Coarticulation resistance of American English consonants and its effects on transconsonantal vowel-to-vowel coarticulation. Language and Speech, 43(1), 1–41. DOI:  http://doi.org/10.1177/00238309000430010101

Fowler, C. A., & Saltzman, E. (1993). Coordination and coarticulation in speech production. Language and Speech, 36(2–3), 171–195. DOI:  http://doi.org/10.1177/002383099303600304

Fowler, C. A., & Smith, M. R. (1986). Speech Perception as “Vector Analysis”: An Approach to the Problems of lnvariance and Segmentation. In J. S. Perkell & D. H. Klatt (Eds.), Invariance and variability in speech processes (pp. 123–139). Hillsdale, NJ: Lawrence Erlbaum Associates.

Fox, A. V., & Dodd, B. J. (1999). Der Erwerb des phonologischen Systems in der deutschen Sprache [The phonological acquisition of German]. Sprache-Stimme-Gehör, 23(4), 183.

Gafos, A., & Goldstein, L. (2012). Articulatory representation and organization. In A. C. Cohn, C. Fougeron & M. K. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp. 220–231). Oxford: Oxford University Press.

Gleitman, L. R., & Wanner, E. (1982). The state of the art. In E. Wanner & L. R. Gleitman (Eds.), Language acquisition: The state of the art (pp. 3–48). Cambridge, UK: Cambridge University Press.

Goffman, L., Smith, A., Heisler, L., & Ho, M. (2008). The breadth of coarticulatory units in children and adults. Journal of Speech, Language, and Hearing Research. DOI:  http://doi.org/10.1044/1092-4388(2008/07-0020)

Goodell, E. W., & Studdert-Kennedy, M. (1993). Acoustic evidence for the development of gestural coordination in the speech of 2-year-olds: A longitudinal study. Journal of Speech, Language, and Hearing Research, 36(4), 707–727. DOI:  http://doi.org/10.1044/jshr.3604.707

Green, J. R., Nip, I. S. B., & Maassen, B. (2010). Some organization principles in early speech development. Speech Motor Control: New Developments in Basic and Applied Research, 10, 171–188. DOI:  http://doi.org/10.1093/acprof:oso/9780199235797.003.0010

Henke, W. L. (1966). Dynamic articulatory model of speech production using computer simulation. (Doctoral Dissertation). Cambridge, MA: Massachusetts Institute of Technology.

Hertrich, I., & Ackermann, H. (1995). Coarticulation in slow speech: Durational and spectral analysis. Language and Speech, 38(2), 159–187. DOI:  http://doi.org/10.1177/002383099503800203

Hodge, M. M. (1989). A comparison of spectral temporal measures across speaker age: Implications for an acoustic characterization of speech maturation. (Doctoral dissertation). University of Wisconsin-Madison.

Höhle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., & Nazzi, T. (2009). Language specific prosodic preferences during the first half year of life: Evidence from German and French infants. Infant Behavior and Development, 32(3), 262–274. DOI:  http://doi.org/10.1016/j.infbeh.2009.03.004

Iskarous, K., Fowler, C. A., & Whalen, D. H. (2010). Locus equations are an acoustic expression of articulator synergy. The Journal of the Acoustical Society of America, 128(4), 2021–2032. DOI:  http://doi.org/10.1121/1.3479538

Katz, W. F., Kripke, C., & Tallal, P. (1991). Anticipatory coarticulation in the speech of adults and young children: Acoustic, perceptual, and video data. Journal of Speech, Language, and Hearing Research, 34(6), 1222–1232. DOI:  http://doi.org/10.1044/jshr.3406.1222

Keating, P. A. (1988). The window model of coarticulation: Articulatory evidence. UCLA Working Papers in Phonetics, 69, 3–29.

Kent, R. D. (1983). The segmental organization of speech. In P. MacNeilage (Ed.), The production of speech (pp. 57–89). New York: Springer. DOI:  http://doi.org/10.1007/978-1-4613-8202-7_4

Kisler, T., Schiel, F., & Sloetjes, H. (2012). Signal processing via web services: The use case WebMAUS. Talk presented at Digital Humanities Conference. Hamburg, Germany.

Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., & Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science, 255(5044), 606–608. DOI:  http://doi.org/10.1126/science.1736364

MATLAB. (2016). MATLAB and Statistics Toolbox Release. Nattick, Massachusetts, United States: The MathWorks, Inc.

Ménard, L., & Noiray, A. (2011). The development of lingual gestures in speech: Experimental approach to language development. Faits de Langues, 37, 189–202. DOI:  http://doi.org/10.1163/19589514-037-01-900000011

Nazzi, T., Bertoncini, J., & Mehler, J. (1998). Language discrimination by newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance, 24(3), 756. DOI:  http://doi.org/10.1037/0096-1523.24.3.756

Nespor, M., Peña, M., & Mehler, J. (2003). On the different roles of vowels and consonants in speech processing and language acquisition. Lingue e Linguaggio, 2(2), 203–230. DOI:  http://doi.org/10.1418/10879

Nijland, L., Maassen, B., Van der Meulen, S., Gabreëls, F., Kraaimaat, F. W., & Schreuder, R. (2002). Coarticulation patterns in children with developmental apraxia of speech. Clinical Linguistics & Phonetics, 16(6), 461–483. DOI:  http://doi.org/10.1080/02699200210159103

Nittrouer, S. (1993). The emergence of mature gestural patterns is not uniform: Evidence from an acoustic study. Journal of Speech, Language, and Hearing Research, 36(5), 959–972. DOI:  http://doi.org/10.1044/jshr.3605.959

Nittrouer, S., Studdert-Kennedy, M., & McGowan, R. S. (1989). The emergence of phonetic segments: Evidence from the spectral structure of fricative-vowel syllables spoken by children and adults. Journal of Speech, Language, and Hearing Research, 32(1), 120–132. DOI:  http://doi.org/10.1044/jshr.3201.120

Nittrouer, S., Studdert-Kennedy, M., & Neely, S. T. (1996). How children learn to organize their speech gestures: Further evidence from fricative-vowel syllables. Journal of Speech, Language, and Hearing Research, 39(2), 379–389. DOI:  http://doi.org/10.1044/jshr.3902.379

Noiray, A., Abakarova, D., Rubertus, E., Krüger, S., & Tiede, M. (2018). How do children organize their speech in the first years of life? Insight from ultrasound imaging. Journal of Speech, Language, and Hearing Research, 61(6), 1355–1368. DOI:  http://doi.org/10.1044/2018_JSLHR-S-17-0148

Noiray, A., Cathiard, M.-A., Abry, C., & Ménard, L. (2010). Lip rounding anticipatory control: Crosslinguistically lawful and ontogenetically attuned. Speech Motor Control: New Developments in Basic and Applied Research, 153. DOI:  http://doi.org/10.1093/acprof:oso/9780199235797.003.0009

Noiray, A., Cathiard, M.-A., Ménard, L., & Abry, C. (2011). Test of the movement expansion model: Anticipatory vowel lip protrusion and constriction in French and English speakers. The Journal of the Acoustical Society of America, 129(1), 340–349. DOI:  http://doi.org/10.1121/1.3518452

Noiray, A., Ménard, L., & Iskarous, K. (2013). The development of motor synergies in children: Ultrasound and acoustic measurements. Journal of the Acoustical Society of America, 133(1), 444–452. DOI:  http://doi.org/10.1121/1.4763983

Noiray, A., Popescu, A., Killmer, H., Rubertus, E., Krüger, S., & Hintermeier, L. (2019a). Spoken language development and the challenge of skill integration. Frontiers in Psychology, 10, 2777. DOI:  http://doi.org/10.3389/fpsyg.2019.02777

Noiray, A., Ries, J., Tiede, M., Rubertus, E., Laporte, C., & Ménard, L. (in press). Recording and analyzing kinematic data in children and adults with SOLLAR: Sonographic & Optical Lingui-Labial Articulation Recording system. Journal of Laboratory Phonology.

Noiray, A., Wieling, M., Abakarova, D., Rubertus, E., & Tiede, M. (2019b). Back from the future: Non-linear anticipation in adults’ and children’s speech. Journal of Speech, Language, and Hearing Research, 62(8S), 3033–3054. DOI:  http://doi.org/10.1044/2019_JSLHR-S-CSMC7-18-0208

Ostry, D. J., Gribble, P. L., & Gracco, V. L. (1996). Coarticulation of jaw movements in speech production: Is context sensitivity in speech kinematics centrally planned? Journal of Neuroscience, 16(4), 1570–1579. DOI:  http://doi.org/10.1523/JNEUROSCI.16-04-01570.1996

Parush, A., Ostry, D. J., & Munhall, K. G. (1983). A kinematic study of lingual coarticulation in VCV sequences. The Journal of the Acoustical Society of America, 74(4), 1115–1125. DOI:  http://doi.org/10.1121/1.390035

Recasens, D. (1984a). V-to-C coarticulation in Catalan VCV sequences: An articulatory and acoustical study. Journal of Phonetics, 12(1), 61–73. DOI:  http://doi.org/10.1016/S0095-4470(19)30851-4

Recasens, D. (1984b). Vowel-to-vowel coarticulation in Catalan VCV sequences. The Journal of the Acoustical Society of America, 76(6), 1624–1635. DOI:  http://doi.org/10.1121/1.391609

Recasens, D. (1985). Coarticulatory patterns and degrees of coarticulatory resistance in Catalan CV sequences. Language and Speech, 28(2), 97–114. DOI:  http://doi.org/10.1177/002383098502800201

Recasens, D. (1987). An acoustic analysis of V-to-C and V-to-V: Coarticulatory effects in Catalan and Spanish VCV sequences. Journal of Phonetics, 15, 299–312. DOI:  http://doi.org/10.1016/S0095-4470(19)30580-7

Recasens, D. (2018). Coarticulation. In Oxford Research Encyclopedia of Linguistics. DOI:  http://doi.org/10.1093/acrefore/9780199384655.013.416

Recasens, D., & Rodríguez, C. (2016). A study on coarticulatory resistance and aggressiveness for front lingual consonants and vowels using ultrasound. Journal of Phonetics, 59, 58–75. DOI:  http://doi.org/10.1016/j.wocn.2016.09.002

Repp, B. H. (1986). Some observations on the development of anticipatory coarticulation. The Journal of the Acoustical Society of America, 79(5), 1616–1619. DOI:  http://doi.org/10.1121/1.393298

Rubertus, E., & Noiray, A. (2018). On the development of gestural organization: A cross-sectional study of vowel-to-vowel anticipatory coarticulation. PloS One, 13(9), e0203562. DOI:  http://doi.org/10.1371/journal.pone.0203562

Smith, A., & Zelaznik, H. N. (2004). Development of functional synergies for speech motor coordination in childhood and adolescence. Developmental Psychobiology, 45(1), 22–33. DOI:  http://doi.org/10.1002/dev.20009

Song, J. Y., Demuth, K., Shattuck-Hufnagel, S., & Ménard, L. (2013). The effects of coarticulation and morphological complexity on the production of English coda clusters: Acoustic and articulatory evidence from 2-year-olds and adults using ultrasound. Journal of Phonetics, 41(3–4), 281–295. DOI:  http://doi.org/10.1016/j.wocn.2013.03.004

Strycharczuk, P., & Scobbie, J. M. (2017). Fronting of Southern British English high-back vowels in articulation and acoustics. The Journal of the Acoustical Society of America, 142(1), 322–331. DOI:  http://doi.org/10.1121/1.4991010

Tilsen, S. (2016). Selection and coordination: The articulatory basis for the emergence of phonological structure. Journal of Phonetics, 55, 53–77. DOI:  http://doi.org/10.1016/j.wocn.2015.11.005

Tilsen, S. (2018). Three mechanisms for modeling articulation: selection, coordination, and intention. Cornell Working Papers in Phonetics and Phonology, 1–50.

Waldstein, R. S., & Baum, S. R. (1991). Anticipatory coarticulation in the speech of profoundly hearing-impaired and normally hearing children. Journal of Speech, Language, and Hearing Research, 34(6), 1276–1285. DOI:  http://doi.org/10.1044/jshr.3406.1276

Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7(1), 49–63. DOI:  http://doi.org/10.1016/S0163-6383(84)80022-3

Wickelgren, W. A. (1969). Context-Sensitive Coding in Speech Recognition, Articulation and Developments. In Information processing in the nervous system (pp. 85–96). DOI:  http://doi.org/10.1037/h0026823

Wieling, M. (2018). Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics, 70, 86–116. DOI:  http://doi.org/10.1016/j.wocn.2018.03.002

Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1), 3–36. DOI:  http://doi.org/10.1111/j.1467-9868.2010.00749.x

Wood, S. N. (2017). Generalized additive models: An introduction with R. Chapman and Hall/CRC. DOI:  http://doi.org/10.1201/9781315370279

Zharkova, N. (2017). Voiceless alveolar stop coarticulation in typically developing 5-year-olds and 13-year-olds. Clinical Linguistics & Phonetics, 31(7–9), 503–513. DOI:  http://doi.org/10.1080/02699206.2016.1268209

Zharkova, N., Hewlett, N., & Hardcastle, W. J. (2011). Coarticulation as an indicator of speech motor control development in children: An ultrasound study. Motor Control, 15(1), 118–140. DOI:  http://doi.org/10.1123/mcj.15.1.118

Zharkova, N., Hewlett, N., & Hardcastle, W. J. (2012). An ultrasound study of lingual coarticulation in /sV/ syllables produced by adults and typically developing children. Journal of the International Phonetic Association, 42(2), 193–208. DOI:  http://doi.org/10.1017/S0025100312000060