1. Introduction

Foundational theoretical proposals for modelling laterals postulate that laterals contain an intrinsic vowel-like dorsal gesture, whose presence is systematically related to the formation of the lateral channel, allowing airflow along the side of the tongue (Browman & Goldstein, 1995; Ladefoged & Maddieson, 1996; Sproat & Fujimura, 1993). While the nature of the relationship between dorsal retraction and tongue lateralization has received different interpretations, there is a general consensus that dorsal retraction and lateralization are closely linked. This presents us with a puzzle when we consider the common diachronic pathway affecting syllable-final /l/. Syllable-final /l/ is prone to so-called /l/-darkening, which involves increased tongue dorsum retraction. This, however, can lead to /l/-vocalization, a process through which /l/ loses its consonantal status and becomes a vocoid. Lateralization is inherently a consonantal feature, since it is not known to be contrastive in vowels. Therefore, we can expect that vocalization of /l/ involves loss of lateralization. Thus, it would seem that the very component of /l/ production hypothesized either to drive lateralization or to follow directly from it (tongue dorsum retraction) is, in fact, a step towards its demise. Note that the term ‘lateralization’ can be used to describe a specific articulatory posture (lowering of the sides of the tongue), or a diachronic process that involves a change from a [-lateral] sound to a [+lateral] one. In this paper, we generally use ‘lateralization’ in the former sense, although in the title ‘de-lateralization’ is meant diachronically.

A possible solution to the puzzle involves severing the link between lateralization and tongue dorsum retraction. Empirical evidence documenting the link between these articulatory dimensions of /l/ is lacking, in part because it is difficult to simultaneously track lateral channel formation at the sides of the tongue, and the other articulatory movements, in the midsagittal plane. Therefore, it is not yet known whether the dependency between lateralization and tongue dorsum retraction, assumed in most phonological characterizations of /l/, can be broken, and if so, when this happens in the life cycle of /l/-darkening and /l/-vocalization. This empirical gap is the major barrier to understanding the nature of tongue dorsum retraction in /l/ and to reconciling competing hypotheses on the pathway to /l/-vocalization. The present paper provides new empirical evidence addressing both of these interrelated issues, by investigating the lateral characteristics of different types of /l/ in New Zealand English and relating them to articulatory variation in the midsagittal plane.

Before introducing our hypotheses in Section 1.4, we first review some of the terminology typically used to describe different types of /l/ variants and how these relate to the articulatory characteristics under study (Section 1.1) and elaborate on the observation that /l/-darkening serves as a precursor to /l/-vocalization (Section 1.2), and on the phonological nature of the tongue dorsum gesture (Section 1.3).

1.1. Types of /l/: Light, dark, and vocalized

Midsagittal variation in /l/ production is related to the opposition between ‘light’ or ‘clear’ onset /l/, and ‘dark’ coda /l/, as observed in multiple varieties of English. Further variation is related to the process of ‘/l/-vocalization.’ The categories of ‘light,’ ‘dark,’ and ‘vocalized,’ in relation to /l/, are primarily perceptual. However, systematic perceptual classification of /l/ into these three categories has proven problematic, and there is considerable disagreement in classification judgement (Hall-Lew & Fix, 2012). Given the substantial body of instrumental research documenting the acoustic and articulatory characteristics that contribute to the percept of /l/-darkening and of /l/-vocalization, in this paper, we use ‘darkening’ and ‘vocalization’ to refer to groups of acoustic and articulatory measurements that have been established to correspond to the traditional labels for perceptual properties, as discussed below.

The main known articulatory correlate of /l/-darkening is retraction or raising of the tongue dorsum in relatively darker /l/ (De Decker & Mackenzie, 2017; Giles & Moll, 1975; Lee-Kim, Davidson, & Hwang, 2013; Narayanan, Alwan, & Haker, 1997; Sproat & Fujimura, 1993; Strycharczuk & Scobbie, 2016; Turton, 2015, 2017). Sproat and Fujimura (1993) also observe that this dorsal gesture systematically corresponds to lowering in the mid part of the tongue. Acoustically, dark /l/ tends to have lower F2 compared to light /l/, as well as reduced F2-F1 difference (Carter, 2002; Carter & Local, 2007; De Decker & Mackenzie, 2017; Kirkham, Nance, Littlewood, Lightfoot, & Groarke, 2019; Kirkham, Turton, & Leemann, 2020; Lehiste, 1964; Mackenzie, Olson, Clayards, & Wagner, 2018; Sproat & Fujimura, 1993; Turton, 2015). Interestingly, while midsagittal tongue retraction and F2 lowering are both correlates of /l/-darkening, and they are generally considered to be correlated with each other, the correlation between these two measures, in the context of /l/-darkening, is not direct. Some studies report /l/-darkening differences realized along the F2 dimensions with no detectable midsagittal articulatory correlates (De Decker & Mackenzie, 2017; Turton, 2014). Furthermore, Ying, Shaw, Kroos, and Best (2012) report a correlation between tongue dorsum advancement and F2 lowering. In addition, Strycharczuk and Scobbie (2017a) show that the correlation between dorsal retraction and F2 lowering depends on segmental context. These apparent discrepancies between articulation and acoustics suggest that not all articulatory information relevant to /l/-darkening can be captured in the midsagittal plane.

While /l/-darkening can be operationalized in terms of tongue body gestures, the percept of /l/-vocalization is linked to articulatory reduction in tongue tip raising. Both light and dark /l/ involve an apical gesture, in which the tongue tip comes into contact with the alveolar ridge. In contrast, in vocalized /l/, this gesture is reduced, or it may be altogether absent. The link between a perception of vocalization and reduction in apicality of /l/ is somewhat indirect. The two are known to coincide. For the varieties of English where vocalized /l/ had been reported based on auditory observations, subsequent articulatory studies confirm the presence of reduced apical gesture in coda /l/. For instance, auditory reports of /l/-vocalization in Southern British English go back to Gimson (1980) and Wells (1982). Electropalatography studies by Wright (1987) and Hardcastle and Barry (1989) confirm partial loss of alveolar contact in coda /l/ in Southern British English, whereas later articulatory research documents further stages of apical reduction in the same variety, up to a point where the apical gesture cannot be clearly identified (Scobbie & Pouplier, 2010; Strycharczuk & Scobbie, 2020; Turton, 2014, 2017). Similarly, we have auditory evidence of /l/-vocalization in Australian English and New Zealand English (Borowsky & Horvath, 1997; Horvath & Horvath, 2002), as well as American English (Ash, 1982), and later articulatory evidence of the presence of a variable reduced apical gesture in coda /l/ (Lin, Beddor, & Coetzee, 2014; Szalay, Benders, Cox, & Proctor, 2019). However, we are not aware of evidence establishing a direct link between reduction in the apical gestures and the perception of a vocalized /l/. Presumably, vocalized /l/ also involves the reduction/absence of lateral airflow, although there is not yet any direct evidence confirming this presumption either (see Section 1.4).

1.2. Darkening as a precursor to vocalization

From the articulatory characterization of /l/-darkening versus /l/-vocalization above, it would appear that the two processes are distinct: The currently dominant theoretical framing of /l/-darkening focuses on dorsal retraction, whereas the framing of /l/-vocalization tends to highlight apical reduction. However, there is evidence that darkening serves as a precursor to /l/-vocalization, suggesting a kind of progressionary dependency. Firstly, dorsal retraction, which is a defining feature of /l/-darkening, is also present, prominently so, in vocalized /l/ (Smith & Lammert, 2013). In addition, the main known acoustic correlate of /l/-vocalization is the same as in /l/-darkening, i.e., F2 lowering and the first two vowel formants coming closer together (Lin et al., 2014). The F2 lowering effect is greater for vocalized /l/, compared to dark /l/. Acoustic variation between dark- and vocalized /l/ along the same acoustic dimension highlights commonalities between the two processes, and it potentially explains the perceptual difficulty of coding the different variants, as reported by Hall-Lew and Fix (2012). Finally, /l/-vocalization tends to follow diachronically from an earlier process of /l/-darkening. Johnson and Britain (2007) note that /l/-vocalization is known to have emerged in those varieties of English that already have a dark /l/, and it is not attested in varieties such as Irish English, where /l/ is relatively lighter. Apart from English, diachronic vocalization of dark /l/ has been reported for Catalan, Provençal, and Italian dialects (Recasens, 1996), Occitan (Müller, 2011), Dutch (Jongkind & van Reenen, 2007), Swiss German (Leemann, Kolly, Werlen, Britain, & Studer-Joho, 2014), and Polish (Koneczna, 1965; Nagórko, 1996), inter alia.

One possible explanation for why darkening seems to feed vocalization involves the timing of the dorsal and apical gestures. Sproat and Fujimura (1993) develop a framework for this, proposing that the primary difference between onset and coda /l/ stems from the temporal coordination of the dorsal and the apical gesture. In light onset /l/, the apical gesture comes first, and the dorsal gesture follows, or the two can be simultaneous. In dark coda /l/, the dorsal gesture is timed relatively earlier (corresponding to what is acoustically the preceding vowel), whereas the tongue tip gesture is delayed. Variation in gestural magnitude, as discussed above, can follow from these relative timing patterns. If we assume a fixed temporal window for a syllable, the earlier a gesture begins within that window the greater chance it has to achieve its target (Moon & Lindblom, 1994). In contrast, a delayed onset of a gesture may result in gestural undershoot, i.e., failure to reach the gestural target. The apical gesture of coda /l/ is temporally disadvantaged in this respect, since it occurs at the end of the syllable. The temporal disadvantage is exacerbated for darker /l/s—increasing the delay between the dorsal and apical gestures further reduces the time available to achieve the apical target. In this way, gestural weakening, as seen in /l/-vocalization, can follow from extreme delay of the apical gesture. The predictions from this framework are borne out by multiple findings on the realization of English /l/. It has been confirmed that different degrees of /l/-darkening, as well as /l/-vocalization are systematically correlated with tip delay, i.e., the temporal lag between the dorsal and the apical gesture (Sproat & Fujimura, 1993, for American English; replicated by Browman & Goldstein, 1995; Gick, 2003; and Harper, 2019, for American English; by Strycharczuk & Scobbie, 2015, for Southern British English; and by Ying et al., 2017, for Australian English). Furthermore the spatio-temporal view of /l/-darkening and vocalization predicts a systematic correlation between the degree of spatial reduction of the apical gesture and the temporal delay of that gesture. Such a correlation has been confirmed for Southern British English (Strycharczuk & Scobbie, 2020) and for American English (Harper, 2019). In addition, Strycharczuk and Scobbie (2015) report an apparent time increase in tip delay in Southern British English, which suggests that dark /l/ is gradually becoming darker. This is simultaneous with word-final vocalized /l/ being on the rise in this variety, consistent with a transition from production-based variation to incipient sound change.

In sum, we observe that /l/-darkening appears to be a precursor of /l/-vocalization. The basis for this pattern may be related to acoustic/perceptual similarity of dark and vocalized /l/ and/or to the time constraints that dark /l/ may place on achieving the apical constriction. How the loss of lateral channel formation also assumed to accompany /l/-vocalization enters into the picture is not clear, in part because data is lacking, but there are a number of theoretical claims that can be tested.

1.3. The role of the dorsal gesture

One aim of the current study is to assess the status of the dorsal gesture and the degree to which it is dependent on lateralization. Gestural characteristics of laterals, as in the model by Sproat and Fujimura (1993), are thought to follow from universal phonetic constraints, evidenced by gestural similarities between laterals in different languages. The presence of the two midsagittal /l/ gestures has been linked in different ways to lateralization. According to Sproat and Fujimura (1993), the dorsal gesture is a direct consequence of tongue lateralization, because tongue lateralization, in their view, involves narrowing of the tongue blade. Since the volume of the tongue is incompressible, such narrowing causes displacement of the anterior and posterior end of the blade. This explanation finds a reflection in the phonological specification of laterals as [+coronal] and [+lateral]. There is no phonological specification for [+dorsal], because this aspect of lateral production is proposed not to be under direct phonological control. Alternatively, in the Articulatory Phonology characterization by Browman and Goldstein (1995), tongue lateralization is controlled indirectly by retracting the tongue dorsum. In this case, lateralization cannot happen without control of both the apical and dorsal gestures. Both of these theories, although they differ in details, predict a dependency between lateralization and darkening: Greater degrees of darkening should correspond to greater degrees of lateralization. Typologically, the link between lateralization and the presence of the dorsal gesture is confirmed by a dorsal gesture being observed in multiple languages, including Quebec French, Serbo-Croatian, and Squamish Salish (Gick, Campbell, Oh, & Tamburri-Watt, 2006), as well as in Russian and Spanish (Proctor, 2011). To the theoretical predictions of dependency and the typological observations (the ubiquity of the dorsal gesture for /l/), we add in this study direct observation of lateral channel formation across variation in dorsal gesture magnitude.

One may ask how clear /l/ fits into the picture just outlined, given that clear /l/ is of course lateralized and, by definition, not dark. In the proposals discussed above, light /l/ is modelled as containing a dorsal gesture, which triggers some dorsal retraction compared to a plain coronal, although the dorsal gesture may be less apparent in light /l/ because it is shorter in duration and temporally overlapped with the apical gesture. We return to this issue in Section 4, in the light of our own findings.

1.4. Pathways to change

The main aim of our study is to establish the pathway of change involved in /l/-vocalization and, in particular, the diachronic relationship between the loss of the apical gesture and the loss of lateralization. We pursue two competing hypotheses.

The first hypothesis is that loss of the apical gesture precipitates loss of lateralization, yielding /l/-vocalization. On this hypothesis, reduction of the apical gesture leads the change. Ladefoged and Maddieson (1996, p. 182–185) argue that laterals do not necessarily require a midsagittal constriction, but they normally involve a central occlusion achieved by means of an apical gesture. In addition, a simulated 3D vocal tract model by Zhou (2009) predicts that a midsagittal constriction is necessary to produce acoustic zeros. We can therefore expect that the absence of central occlusion reduces the acoustic-auditory effect of lateral airflow. This would be consistent with the observation that no language seems to use lateralization as a contrastive feature for vowels. If the coronal constriction is lost in /l/, the acoustic effect of lateralization may then be reduced, triggering subsequent loss of lateralization through perceptual reinterpretation. Thus, under this first hypothesis, loss of lateralization follows from reduction of the apical gesture, and it is somewhat abrupt. Empirical support for this scenario would come from reduced lateralization in vocalized /l/, but not in dark /l/, since the specific event triggering reduction in lateralization is loss of apical contact.

The second hypothesis is that reduction of lateralization precipitates reduction and eventual loss of the apical gesture. While the synchronic theoretical accounts described above predict that /l/-darkening correlates with enhanced (greater magnitude) lateralization, sound change may disrupt this link. The process of /l/-vocalization may entail an intermediate stage in which the dorsal gesture is reinterpreted as being under phonological control. Reduction of lateralization may then follow from this shift in the burden of contrast from lateralization to tongue dorsum retraction. Thus, under hypothesis two, lateralization is reduced en route to /l/-vocalization, which would manifest itself as loss of lateralization in dark /l/, compared to light /l/. Vocalized /l/ is also predicted to show reduction in lateralization.

Previous evidence on lateralization in darkening or vocalization context is inconclusive. Narayanan et al. (1997) analyzed 3D tongue shapes in sustained light and dark /l/ productions by four speakers of American English. They report considerable midsagittal differences between light and dark /l/, as expected. The lateral channel area and lateral compression, on the other hand, were similar across the two sounds, although one subject showed greater lateral compression for dark /l/. One of the participants in the study by Narayanan et al. produced a few instances of vocalized /l/. The effect of this on lateralization is not discussed beyond noting that the loss of linguo-palatal contact complicated the calculations of lateral channel area. In a dynamic study of lateral channel formation in Australian English, Ying et al. (2017) report that the timing of lateral channel formation (relative to the tongue dorsum gesture) is relatively stable across vowel contexts and syllable positions even as the magnitude of lateralization varies. Articulatory-acoustic models of lateralization show that the combination of linguo-palatal contact and the presence of lateral channels generates zeros in the acoustic output (Charles & Lulich, 2019; Narayanan et al., 1997; Zhang & Espy-Wilson, 2004; Zhou, 2009). The zeros are observed for both light and dark /l/ in American English, but their location is highly sensitive to the length and relative symmetry of the lateral channels.

To bring new evidence to bear on the pathways of change involved in /l/-vocalization, we study the relationship between midsagittal and lateral aspects of /l/-articulation, based on co-registered electromagnetic (EMA) and ultrasound data from seven speakers of New Zealand English. We ask whether there is any loss of lateralization in vocalized /l/, and if so, what is the diachronic relationship between loss of lateralization and vocalization in sound change. We address the latter question through an analysis of synchronic variation in the realization of /l/. Across environments expected to show various degrees of /l/-darkening and /l/-vocalization, we report the magnitude of lateralization. In exposing correlations between articulatory components of /l/, variation is crucial. One source of variation, in this respect, is individual. New Zealand English is generally described as showing systematic /l/-darkening in codas, and it has also been reported to show variable /l/-vocalization (Bauer, 2008; Gordon & Maclagan, 2008; Hay, 2008; Horvath & Horvath, 2002). In addition, we examine variation in lateralization related to varying degrees of /l/-darkening, induced by systematically varying the vocalic and morphosyntactic context for /l/. Pursuing these relations through a combination of EMA and ultrasound allows us to take advantage of the strengths of both techniques: 3D EMA allows us to investigate lateral tongue displacement and to make across-speaker generalizations, whereas ultrasound gives us a holistic view of tongue shape under various segmental and morphosyntactic manipulations while facilitating comparison to past work.

2. Materials and methods

2.1. Stimuli

The stimuli were words and short phrases containing /l/ in a systematically varied vocalic and morphosyntactic context. The preceding vowels were [iː] (FLEECE), [ə.] (KIT), and [oː] (THOUGHT). These vowel transcriptions for the three lexical sets are based on Bauer and Warren (2008), and are chosen as the most representative of the vowel quality we find in our data. For each vowel context, we included /l/ in the following morphosyntactic context: word-initial (#lV), word-medial morpheme-internal (VlV), word-medial morpheme-final (Vl-V), word-final pre-vocalic (Vl#V), word-final pre-consonantal (Vl#C). We know from previous research, that the degree of /l/-darkening varies gradiently, depending on the strength of the following morpho-syntactic boundary (Lee-Kim et al., 2013; Sproat & Fujimura, 1993; Turton, 2015, 2017), and furthermore, the darkening effects interact with the quality of the preceding vowel (Mackenzie et al., 2018; Strycharczuk & Scobbie, 2017b). Therefore, by manipulating these two factors we expected to elicit a broader spectrum of /l/-darkening from light to dark/vocalized /l/. Since the KIT vowel does not occur in open syllables in English, we used a schwa preceding the initial /l/ in the #lV context. In the word-initial context, stress fell on the second syllable (the vowel following the /l/), whereas in the remaining contexts, the stress fell on the vowel preceding the /l/. The stimuli are listed in Table 1. Note that New Zealand English is typically non-rhotic, although there is variation. None of our participants pronounced coda /r/.

Table 1

Experimental stimuli.

#lV he licks the lip more law
VlV helix fillet Paula
Vl-V heal-ing fill-er maul-er
Vl#V heal it fill it maul on her
Vl#C heal Mick fill Finn maul Bob

2.2. Participants

Seven native speakers (three males) of New Zealand English from the South Island participated. Research by Horvath and Horvath (2002) on New Zealand English shows that South Island speakers have relatively high levels of /l/-vocalization, compared to other New Zealand and Australian varieties. Speaker sex is not known to be strongly predictive of /l/-vocalization rates. Our participants were all university students, aged 20–35. They all signed a consent form, and received a 40 New Zealand Dollars thank you voucher.

2.3. Procedure

We acquired simultaneous ultrasound, EMA, and audio data, following the procedure in Derrick, Best, and Fiasson (2015).

The ultrasound data were acquired with a GE Logiq-E (version 11) portable ultrasound machine, using a GE 8C-RS probe. The ultrasound was connected to an Epiphan VGA2USB Pro frame grabber plugged into a USB port for a 15” late 2013 MacBook Pro model with 2.6 quad core i7 and 16 GB of RAM. The data were recorded on a MacBook Pro, using FFMPEG, running an X.264 encoder in Windows 7 run from bootcamp. Video data were sampled at 60 Hz. The ultrasound probe was placed under the participant chin, and stabilized using a non-metallic probe holder (see Derrick et al., 2015, for technical details about the probe holder).

The audio data were collected using a Sennheiser MKH 416 microphone, mounted on a Manfrotto 244 variable grip arm positioned on a table in front of a participant and to the right, at a circa 5-10 cm distance from the participant’s mouth. The time alignment was controlled via the NDI software. The audio data were sampled at 22050Hz. An additional audio track was collected to synchronize the ultrasound data, using a Sennheiser MKH 416 microphone, connected to the MacBook Pro via a Sound Devices LLC USB Pre 2.

The EMA data were acquired using the NDI Wave system on a PC computer, and sampled at 100Hz. We glued five sensors on the participant’s tongue. Three sensors were glued along the midsagittal line, on the tongue tip (TT), on the tongue dorsum (TD) as far back as comfortable for the participant, and on the tongue mid (TM), circa halfway between TT and TD. In addition, we glued two sensors parasagitally, as close as possible to the sides of the tongue (TR and TL). In relation to the midsagittal sensors, the parasagittal ones were placed circa half-way between tongue tip and tongue body. Sensor placement is schematized in Figure 1. We also glued a sensor to the gum directly under the inner lower left incisor, and to the upper and lower lip along the midsagittal line. Furthermore, we taped sensors to the participant’s nasion, left mastoid, and right mastoid.

Figure 1
Figure 1

Placement of EMA sensors on the tongue.

Participants sat in front of a laptop computer screen, wearing the headset. The participant’s head was positioned next to the NDI Wave EMA field projector, on the right hand side. Participants read prompts off a screen. The prompts appeared one-by-one in a random order within each block. The experiment was self-timed, and the participants moved to the next prompt by pressing the space bar on the computer. At the beginning and end of each block, participants read a tatatatata sequence, which we used to check and adjust time-alignment between ultrasound and audio signal (see Section 2.4 below). Each participant read 20 repetitions of the experimental material, except S7, who read ten. For the first ten blocks, the ultrasound probe was placed under the participant’s chin in the midsagittal plane. For the last ten blocks, the ultrasound probe was positioned under the participant’s chin in the coronal plane. For S7, we only recorded 10 midsagittal blocks, and no coronal data. In some cases, sensors detached from the tongue throughout the experiment. We re-glued the sensors whenever we noticed this during the experiment. We identified instances involving loose sensors and sensor errors post-hoc, through analysis of data ranges, and discarded the affected blocks. This procedure yielded 1784 tokens of EMA data, for analysis 1050 tokens of midsagittal ultrasound data, and 900 tokens of coronal ultrasound data. The coronal ultrasound data are not reported in this paper.

At the end of the experiment, we collected palate data, using the NDI Wave palate probe, and instructing the participant to trace along the midsagittal plane of the hard palate to the back of the upper incisors. We then recorded the position of the occlusal plane, by instructing the participant to hold a protractor in their mouth. Three EMA sensors had been taped to the protractor, and were subsequently used to to calibrate head position. We did not collect palate or occlusal data that could be used to interpret and rotate the ultrasound data.

The experiment received ethical approval from the University of Canterbury Human Ethics Committee (Ref. HEC 2015/127).

2.4. Data processing

We used the sensors on the protractor and the sensors on the mastoid/nasion to rotate the EMA data in MATLAB. We confirmed the rotation visually in MVIEW (Tiede, 2010). We then exported the X, Y, and Z coordinates of rotated EMA data, including all the lingual and lip sensors, for further analysis in R.

We used the time-aligned audio signal for acoustic analysis in Praat (Boersma & Weenink, 2009). For each token, we identified the acoustic onset of the vowel, using the onset of formant structure as a landmark for vowels preceded by obstruents, and using amplitude-based cues for vowels preceded by nasals. For nasal + vowel sequences, the onset of the vowel is associated with a sharp rise in amplitude, visible in the waveform, which is where we placed the boundary. We also identified the end of /l/, relying on amplitude-based cues. For pre-vocalic /l/, we generally see an abrupt rise in amplitude when the following vowel begins (Skarnitzl, 2009). When /l/ is followed by another consonant (in our case, it was /m/ or /f/), we find an abrupt drop in amplitude marking the offset of /l/. The acoustic segmentation was done manually in Praat by the first author. We then exported the times of Vl boundaries and filtered the EMA data accordingly in R, such that the remainder of articulatory analysis was done on sensor displacement data corresponding to the acoustic Vl boundaries.

The ultrasound videos were converted and stored in an AVI container, and manually time-aligned with the corresponding audio signal. The time alignment was done based on the tatatatata sequences. For each sequence, we identified the release of closure in the ultrasound signal, and aligned it with the release burst in the corresponding acoustic signal. We only did this for midsagittal ultrasound data, as we could not align the coronal data using the same approach. The alignment approach is not precise, and we only used it to identify the boundaries of each recording in a broad sense. We do not rely on the alignment in our ultrasound analysis. We segmented the time-aligned ultrasound videos into chunks corresponding to individual tokens. We imported those into Articulate Assistant Advanced v.16 (AAA, Articulate Instruments Ltd 2014) for analysis of tongue contours.

In AAA, we overlaid a fan onto each recording. The fan is a reference frame, consisting of 42 equidistant radials, as specified by the standard AAA settings. The fan was fitted manually for each speaker, with the origin halfway through the bottom border of the image, and the outermost fanlines overlapping with the boundaries of the ultrasound image. We tracked the outline of the tongue contour throughout, using the automatic tongue contour tracking function in AAA. For each token, we identified the ultrasound image corresponding to the maximum tongue tip raising for /l/, following the method by Strycharczuk and Scobbie (2015). We excluded the data from speaker S7 at this stage, as we did not have a good enough image of the tongue tip to reliably identify the point of maximum tongue tip raising. We exported the tongue contour data for these selected time points in cartesian coordinates for further analysis in R.

2.5. Analysis

2.5.1. Ultrasound analysis

In the ultrasound analysis, we compared the tongue contours within each speaker, depending on the interaction between the vowel and the morphosyntactic context. The tongue contours were averaged and smoothed using Generalized Additive Mixed Models (GAMMs) fitted using a polar conversion of the cartesian coordinates. This was implemented using the rticulate package (Coretta, 2019a). The input to each model included an interaction between vowel and morphosyntactic condition, a smooth for X coordinate of tongue position (anterior-posterior), as well as X by vowel and morphosyntactic condition. The dependent variable was tongue height. While the models themselves were fitted in polar coordinates, the results in Figures 2 and 3 are visualized using cartesian transforms. In each case, we present the AR1 error model to correct for the residual autocorrelation in the model (Baayen, van Rij, de Cat, & Wood, 2018). This analysis is mainly exploratory, and we use it to inform a more systematic analysis based on EMA data reported in Sections 2.5.2 and 3.2.

Figure 2
Figure 2

GAMM-smoothed average tongue contours for /l/ at the point of maximum tongue tip raising, as a function of preceding vowel and morphosyntactic context. Data from speakers S1, S2, and S3. Tongue tip is on the right. The images are not rotated on the occlusal plane.

Figure 3
Figure 3

GAMM-smoothed average tongue contours for /l/ at the point of maximum tongue tip raising, as a function of preceding vowel and morphosyntactic context. Data from speakers S4, S5, and S6. Tongue tip is on the right. The images are not rotated on the occlusal plane.

2.5.2. EMA analysis

In our EMA analysis, we focused on two measures. As a measure of /l/-vocalization, we used the vertical displacement of the TT sensor (tongue tip raising). We tracked this for each token, over the duration of the /l/ and the preceding vowel. As a measure of lateralization, we used the tongue lateralization index, developed by Ying et al. (2017). The lateralization index is the vertical displacement of the sides of the tongue, normalized for the vertical displacement along the midsagittal line. It is defined as the difference in height between the relatively more lowered parasagittal sensor (TR.z or TL.z), and the estimated height of the tongue blade in the midsagittal plane along the same horizontal location as TR/TL. For each speaker, we determined which parasagittal sensor was typically lower, based on visual inspection of mean TR.z and TL.z curves. The estimated height of the tongue blade (within the coronal plane) along the same horizontal location as TR/TL was based on a second-order polynomial fit to the three sensors in the midsagittal plane. We z-scored both measures, tongue tip raising and the lateralization index, within speaker (Blackwood Ximenes, Shaw, & Carignan, 2017), since the absolute values are affected by the anatomical differences between individuals, the placement of the sensors, and the positioning of the EMA field projector.

Time normalization was obtained by calculating the ratio of each measurement time point and the overall Vl duration. The measures were then analyzed dynamically, using GAMMs. The GAMM models were fitted using the mgcv and itsadug packages in R (van Rij, Wieling, Baayen, & van Rijn, 2015; Wood, 2017). The smoothing terms were estimated using the ML (Maximum Likelihood) method. We corrected for autocorrelation using an AR1 error model (Baayen et al., 2018). The models were visualized using the tidymv package (Coretta, 2019b). Statements about significance are based on model comparisons. For a detailed tutorial on fitting and interpreting GAMM results, the reader is referred to Sóskuthy (2017) and Wieling (2018).

2.5.3 Reproducibility

The data presented in this paper are available through The Open Science Framework (DOI 10.17605/OSF.IO/9Y6QV), along with the R code we used. The analysis file also contains all the stepwise GAMM model comparisons, and final model summaries.

3. Results

3.1. Ultrasound results

We begin by presenting the ultrasound results, which give us a snapshot view of /l/ for each participant, depending on the vocalic and morphosyntactic context. We focus on a specific time-point, the maximum tongue tip raising, as it can be consistently identified across different speakers and different contexts, and it also provides key information about /l/-vocalization.

Figures 2 and 3 show the effect of morphosyntactic condition on the shape of the tongue, within each vowel context within speaker. Based on the ultrasound images, two speakers, S3 and S6, show a clear tendency for /l/-vocalization, manifested as reduction in the degree of tongue tip raising. Vocalization is limited to word-final pre-consonantal context (Vl#C), and it also varies depending on the vowel. S3 does not show clear vocalization in the context of KIT. Furthermore, we observe a subtle reduction in the magnitude of tongue tip raising for S1 and S2 in the word-final pre-consonantal context, Vl#C.

For all remaining speakers, word-final pre-consonantal /l/ (Vl#C) appears to be dark. /l/-darkening is manifested as tongue root retraction and tongue lowering of the tongue mid when /l/ is preceded by the FLEECE vowel.1 In the other vowel contexts, /l/-darkening effects are limited to tongue root retraction, and velarization (tongue dorsum raising).

The FLEECE context stands out in showing the most variation induced by manipulation of the morphosyntactic context. The differences between the relatively light, word-initial, /l/ (#lV) and the relatively darkest, word-final pre-consonantal, /l/ (Vl#C) are greater within the FLEECE series, compared to KIT or THOUGHT (speaker S1 is an exception). Furthermore, the FLEECE series generally shows more morphosyntactic gradience. A manifestation of this is the morpheme-final /l/ patterning in between light and dark /l/ for speakers S2 and S5, when /l/ is preceded by FLEECE. Note that this type of gradience is speaker-dependent. Speaker S4 shows a clearly categorical light versus dark /l/ allophony pattern.

In contrast, morphosyntactic manipulations have a much smaller and less systematic effect on tongue shape when /l/ is preceded by KIT or THOUGHT. For these two vowel contexts, only word-final pre-consonantal /l/ is systematically different from the other morphosyntactic contexts.

Figure 4 shows the result of the same models, this time comparing /l/ in two selected morphosyntactic contexts: #lV and Vl#C. These are the contexts for a canonically light and a canonically dark /l/ respectively, and the aim of this comparison is to show the effect of the preceding vowel on the degree of /l/-darkening, once the morphosyntax is controlled for. The preceding vowel clearly affects the overall tongue shape, which is expected. However, the differences are much larger within the word-initial /l/. Word-initial /l/ when preceded by FLEECE shows relatively more front tongue position, and considerable tongue body raising. Since both of these are parameters along which /l/-darkening effects manifest themselves, we can generalize that word-initial /l/ is darker when preceded by KIT or THOUGHT than when it is preceded by FLEECE. Pre-consonantal coda /l/ (Vl#C) is relatively more stable across different vowel contexts.

Figure 4
Figure 4

GAMM-smoothed average tongue contours for /l/ at the point of maximum tongue tip raising in two selected contexts: #lV and Vl#C, by vowel and by speaker. Tongue tip is on the right. The images are not rotated on the occlusal plane.

3.2 EMA results

3.2.1. Lateralization as a function of /l/-vocalization

The ultrasound results presented in Section 3.1 confirm the presence of variable /l/-vocalization in our participants, which allows us to examine the effect of vocalization on lateralization. The presence of vocalized variants in our sample is further confirmed by the patterns in the vertical displacement of the Tongue Tip EMA sensor (TT.z).

We analyzed tongue tip raising, using GAMM, with normalized TT.z as the dependent variable. We built a model of the displacement, based on the following predictors:

  • an interaction between vowel and morphosyntactic condition,

  • a smooth for normalized time, as well as normalized time by vowel and morphosyntactic condition,

  • a smooth for duration of Vl,

  • a tensor product interaction between normalized time and Vl duration,

  • by-speaker random intercepts,

  • by-speaker random slopes for interaction between vowel and morphosyntactic condition,

  • by-speaker random smooths for normalized time.

Figure 5 shows the fitted values of tongue tip raising over the normalized duration of Vl, depending on the vowel and the morphosyntactic context. In each case, the tongue goes up steadily, reaching a maximum between 60 and 75% of Vl duration, and then goes down again. The word-final pre-consonantal context (Vl#C) stands out in each vocalic context, showing reduced peak of tongue tip raising. This signals the presence of /l/-vocalization in the data, as the difference in peak displacement between Vl#C and the other contexts is considerable. There is also some reduction in tongue tip raising for the Vl#V context within the THOUGHT set, although it is much more limited than in Vl#C, and it does not generalize to other vocalic contexts.

Figure 5
Figure 5

GAMM-smoothed values of tongue tip raising over the normalized duration of Vl.

The relative reduction in tongue tip raising maximum for the Vl#C contexts confirms that we have cases of /l/-vocalization, however, this mean result can be produced by different types of variation in /l/-vocalization. In theory, vocalization can be categorical, gradient, or a mixture of both. The difference is important to operationalizing /l/-vocalization as a predictor of lateralization. If vocalization is gradient, we would want to analyze maximum vertical TT displacement as a continuous predictor of lateralization. However, if there is a categorical effect, and we are dealing with hidden subpopulations in our data, a categorical measure (vocalized or not), is more appropriate.

In order to establish whether there are any latent categories, when it comes to vocalization, we used Gaussian mixture modelling. The input data were normalized TT maxima (maxima of vertical TT displacement), one for each token. We only considered the word-final pre-consonantal context (Vl#C) here, as this is the only context where /l/-vocalization is expected to occur, which is also confirmed by the ultrasound data discussed in Section 3.1, and by the EMA data in Figure 5. The mixture modelling was implemented using the mclust package (Fraley & Raftery, 2002; Fraley, Raftery, Murphy, & Scrucca, 2012). For an accessible introduction to Gaussian mixture model in linguistic analysis, see Chapter 2 in Kirby (2011). We used the model to determine the number of categories in the distribution of TT maxima.

The best fitting model, based on BIC (Bayesian Information Criterion), identified two categories, plotted in Figure 6. Most tokens (279 out of 357) cluster around the subcategory mean of 0.97 maximum TT raising, which we can interpret as non-vocalized, cross-referencing the values with Figure 5. In addition, there is a sub-category including more extreme values of TT maxima (N = 78). This category consists mostly of tokens with a reduction in TT maximum, although there are also two tokens of extreme TT maximum. We classed the tokens in this category as vocalized, and discarded the two outliers representing extreme raising.

Figure 6
Figure 6

Two subcategories identified in the distribution of TT raising for the Vl#C context.

Since the mixture modelling suggests that there are two categories in our measure of vocalization, we proceed to treat vocalization as categorical. We fitted a GAMM model, predicting lateralization within the word-final pre-consonantal context (Vl#C), operationalized as vowel lateralization index (see Section 2.5.2 above). Higher values indicate more lateralization. The predictors included in the model were:

  • an interaction between vowel (FLEECE, KIT, and THOUGHT) and vocalization (vocalized or not),

  • a smooth for normalized time,

  • a smooth for normalized time by the interaction between vowel and vocalization,

  • a smooth for duration of Vl,

  • a tensor product interaction between normalized time and Vl duration,

  • by-speaker random intercepts,

  • by-speaker random slopes for interaction between vowel and vocalization,

  • by-speaker random smooths for normalized time.

Cross-model comparisons of Maximum Likelihood values confirmed that there was a significant effect of vocalization on the dynamic realization of lateralization, and that this effect interacted with the vowel context. The by-vowel dynamic effect of vocalization is presented in Figure 7. The confidence intervals were obtained from the AR1 version of the model.

Figure 7
Figure 7

The effect of /l/-vocalization on the tongue lateralization index in Vl#C over the normalized duration of Vl. Higher Y-values indicate more lateralization.

Figure 7 shows the by-vowel trajectories of the lateralization index in the Vl#C context, depending on whether the /l/ was vocalized or not. For the KIT and THOUGHT vowels, lateralization increases steadily from the beginning of the vowel, to peak between 50 and 75% through Vl, which is around the maximum tongue tip raising for the /l/ (see Figure 5 for comparison of tongue tip raising). For the FLEECE context, the lateralization is greatest at the start of the vowel. Then it becomes reduced, and it reaches a second peak around the 75% time point. Regardless of the vowel-specific differences in the realization of lateralization, we can generalize that /l/-vocalization involves loss of lateralization. This effect is non-linear, and generally, the greatest reduction in lateralization occurs around the lateralization peak.

3.2.2. Lateralization as a function of /l/-darkening

We now turn to the question of how /l/-darkening affects lateralization. The by-speaker ultrasound results presented in Section 3.1 confirm that the experimental manipulations in our data trigger a whole spectrum of /l/-darkening effects. It is not entirely straightforward to define dark /l/s from an articulatory point of view across different vocalic environments. We can, however, generalize that /l/-darkening involves dorsal retraction and lowering of the tongue mid.

Exploratory analysis of our data suggests that displacement of the tongue dorsum sensor does not capture the full spectrum of darkening effects we see in the ultrasound data. This is likely due to the relatively anterior placement of the TD sensor for some participants, as dictated by considerations of participant comfort. We therefore focus on the vertical displacement of the tongue mid (TM) sensor in this part of the analysis, treating tongue-mid lowering as a measure of /l/-darkening. In doing so, we follow Sproat and Fujimura (1993), who show that dorsal retraction and tongue mid lowering are highly correlated (see Figures 2, 3, 4 for further confirmation). Note that Sproat and Fujimura essentially treat the two measures as equivalent: Their Tip Delay measure is defined as the temporal lag between the tongue mid lowering extremum and the maximum tongue tip raising, and they talk of a ‘dorsal retraction/lowering component’ [p. 304], but the theoretical proposal labels the relevant gestural event as a ‘dorsal gesture.’ While the theoretical emphasis in Sproat and Fujimura’s work is on dorsal retraction, they justify measuring the temporal properties of tongue mid lowering with empirical ease, compared to measuring dorsal retraction. A similar observation follows from the data presented by Proctor et al. (2019), who show that the /l/-darkening effect on tongue mid lowering can be larger than the effect of dorsal retraction.

The displacement values were scaled within speaker. We excluded vocalized /l/s from this part of analysis, so we could tease apart the effects of vocalization and darkening on lateralization. Vocalized /l/s were defined based on the procedure described in Section 3.2.1 above. Apart from vocalized /l/, we included all types of /l/ recorded in our study, including word-initial light /l/.

In order to investigate the relationship between tongue mid lowering (darkening) and lateralization, we built a GAMM model with lateralization as a dependent variable. The predictors were as follows:

  • main effect of the vowel context,

  • a smooth for normalized time,

  • a smooth for normalized time by vowel,

  • a smooth for tongue mid lowering,

  • a smooth for duration of Vl,

  • a tensor product interaction between normalized time and Vl duration,

  • a tensor product interaction between tongue-mid lowering and normalized time,

  • by-speaker random intercepts,

  • by-speaker random slopes for the vowel context,

  • by-speaker random smooths for tongue mid lowering,

  • by-speaker random smooths for normalized time.

Based on ML comparisons, the model above was a significant improvement on a model without tongue mid lowering as a predictor, and it was also an improvement on a model with no dynamic interaction between tongue mid lowering and normalized time (both at p < .001). This suggests that tongue lateralization is indeed affected by the vertical displacement of the tongue mid, but the correlation varies in time. The effect of tongue mid lowering on lateralization did not interact significantly with the vowel context, as established by model comparisons.

Figure 8 shows the by-vowel time-normalized trajectories of the lateralization index curve, as a function of tongue-mid lowering. Although tongue mid lowering was modelled as a continuous predictor, we show here selected values to visualize the interaction. The overall shape of the lateralization curves was similar to the lateralization trajectories for the Vl#C context, presented in Figure 7. Tongue mid lowering has an effect of reducing lateralization, as relatively lower TM.z values are associated with lower values for the tongue lateralization index. This effect is not linear through the Vl duration, and it is largest around the peak lateralization, and through the latter part of Vl. Thus, we can generalize that /l/-darkening, operationalized as tongue mid lowering, is correlated with overall loss of /l/-lateralization, and especially maximum lateralization.

Figure 8
Figure 8

GAMM-smoothed values of tongue lateralization index over the normalized duration of Vl, depending on vowel and on degree of tongue-mid lowering.

3.3. Summary of results

The analysis of the ultrasound data confirms that our method elicited a broad range of /l/-types. We observe variation in the degree of retraction and/or raising of the tongue dorsum, which is also correlated with lowering of the mid EMA sensor (between the tip and the dorsum). We interpret this articulatory variation as an expression of degree of /l/-darkening. We also find variation in the degree of vertical TT displacement, which corresponds with various degrees of /l/-vocalization. Neither darkening nor vocalization can be straightforwardly predicted from the morpho-syntactic or vocalic manipulation alone: The two factors interact in how they condition the realization of /l/, and this is subject to further individual variation. For /l/-vocalization, this is expected, based on previous results that show stochastic variation in the rates at which this process applies in New Zealand (Horvath & Horvath, 2002). The interaction between vowel and morpho-syntactic environment in conditioning /l/-darkening is reminiscent of similar interactions reported for British English (Strycharczuk & Scobbie, 2017b) and American English (Mackenzie et al., 2018).

The instances of /l/-vocalization in our data are generally limited to word-final pre-consonantal /l/. This is consistent with previous sociolinguistic auditory research that reports considerably higher rates of /l/-vocalization by New Zealand speakers before a following consonant, compared to a following vowel (Horvath & Horvath, 2002). Based on unsupervised clustering, 21.8% of the Vl#C tokens were vocalized. Compared to previous sociolinguistic research, this rate is noticeably lower. Horvath and Horvath report an over 60% vocalization rate in pre-consonantal /l/. This discrepancy is potentially explained by style, since our data represent lab speech, which is less natural and likely more formal. The vocalized tokens of /l/ showed, on average, less lateralization, and this difference persisted through the combined duration of vowel and /l/.

We further analyzed the effect of /l/-darkening on lateralization in non-vocalized /l/s, this time including tokens from all morpho-syntactic environments. As a measure of /l/-darkening, we used the lowering of the tongue mid. We found that lowering of the tongue mid was correlated with reduction in lateralization, which suggests partial reduction in lateralization for relatively darker /l/, and the greatest degree of lateralization in light /l/. The reduction was especially evident at peak lateralization. We also note that the lateralization curves varied, depending on the vowel context. For KIT and THOUGHT vowels, the lateralization rose through the vowel and the initial part of the /l/, and then got reduced again. For the FLEECE context, we note high lateralization values at the onset of the vowel.

4. Discussion

We expected that vocalized /l/ may involve loss of lateralization, based on previous research concerning articulation-acoustics relationship in /l/, which indicates that the presence of zeros in the acoustic spectrum requires a midsagittal constriction. Our prediction is indeed confirmed by the data that show considerably reduced levels of lateralization in word-final pre-consonantal /l/s that are vocalized. The finding concerning reduced lateralization in vocalized /l/ sets us up to investigate the time course of loss of lateralization in sound change, reconstructed through synchronic variation.

The question is whether loss of lateralization precedes or follows the loss of midsagittal constriction in syllable-final /l/, and the crucial evidence comes from levels of lateralization in dark /l/. If lateralization were reduced in vocalized /l/, but not in dark /l/, we could conclude that loss of lateralization is a relatively late development in the time course of change from dark to vocalized /l/, perhaps one that is triggered by loss of central occlusion and perceptual reanalysis of /l/ as a non-lateralized vocoid. However, comparison of synchronic variation in lateralization levels presents a different pattern: Reduction in lateralization is a feature of dark /l/ and it correlates with the degree of /l/-darkening. This would suggest that, diachronically, change in lateralization precedes apical reduction.

A further question that arises at this point is: Which factors could condition dark /l/ to be less lateral compared to light /l/? Part of the answer might lie in the observation that /l/-darkening involves lowering of the tongue mid, correlated with dorsal retraction, previously noted by Sproat and Fujimura (1993), and replicated in our study. Tongue mid raising could be mechanically connected to the degree of lateralization, because tongue raising in that region along the midsagittal line will contribute to increasing the vertical distance from the lowered sides of the tongue. By the same token, midsagittal lowering of the tongue mid could have a limiting effect on the degree of lateralization. This mechanism could contribute to the loss of lateralization. Only vocalized /l/ shows reduction in the raising of the tongue tip, but both vocalized and dark /l/ are realized with increasing degrees of tongue mid lowering. It is not entirely clear whether the effect of /l/-darkening on lateralization is mechanical. It may alternatively reflect a shift in phonological control of /l/ from active control of lateral channel formation (passive dorsal movements) to active phonological control of the tongue dorsum (and passive lateral movements). A reviewer proposes that the gesture under active control could in fact be the tongue mid lowering, and not dorsal retraction. This is certainly a possibility, and one that is consistent with the observation that tongue mid lowering is more measurable across different types of /l/, compared to dorsal retraction, as seen in our own data, and as discussed by Sproat and Fujimura (1993) and Proctor et al. (2019). One hint that the effect of /l/-darkening on lateralization is not entirely mechanical comes from the timing of tongue mid lowering relative to lateralization peaks. In the KIT and THOUGHT contexts, the effect of tongue mid height on lateralization starts relatively weak in our data, and it increases over the Vl sequence, having its maximal impact near the peak of lateralization. That is, maximal tongue-mid lowering occurs earlier in time than the maximal effect of lowering on lateralization. If the effect of tongue mid lowering on lateralization were purely mechanical, then we would expect to see the greatest effect around the time when the TM sensor is in its maximally lowered position, which considerably precedes the maximum tongue tip raising in coda /l/. Instead, we do not observe the effect of lowering on lateralization until the tongue tip achieves its target. This pattern of timing is thus consistent with a shift in phonological control from active to control of lateralization, as in the [+lateral] proposal, to passive control, as in Browman and Goldstein (1995).

Let us now consider whether our findings challenge the models that link the presence of a dorsal gesture in /l/ to lateralization, a second aim or our study. Recall the proposal of Sproat and Fujimura (1993) that dorsal retraction is a mechanical consequence of tongue blade narrowing involved in lateralization, one that remains phonologically unspecified. In a similar vein, Browman and Goldstein (1995) propose that lateralization is controlled via stretching the tongue in midsagittal plan. Our data do not necessarily speak against there being a link between lateralization and dorsal displacement, but they suggest that the degree of dorsal retraction triggered by lateralization or required to maintain it is relatively small. The empirical argument for that comes from high lateralization levels in light /l/s, for which the dorsal gesture, if at all present, is limited in magnitude. As the dorsal gesture increases in magnitude, which we find in dark /l/, the dependency between retraction and lateralization is apparently broken and even reversed, considering the tongue mid lowering mechanism and suppression of lateralization, discussed above. In other words, lateralization may be a reason why the dorsal gesture occurs in the first place, thus contributing to initiation of /l/-darkening as sound change. However, propagation of the same change happens for reasons independent of lateralization. What remains to be explained is what those independent reasons may be. While our data do not bring direct evidence on this issue, we suggest two broad sets of possible explanation, drawing on previous proposals concerning factors that may contribute to diachronic vocalization of /l/.

One possibility is that there is a more general production bias that favours consonantal onsets and vocalic codas. This type of bias has been proposed to be a factor in conditioning /l/-vocalization through more processes such as final reduction, and position-specific gestural timing (Browman & Goldstein, 1995; Gick, 1999; Sproat & Fujimura, 1993). Diachronic enhancement of the dorsal gesture in coda /l/ may arise from such bias, because it makes coda /l/ more vocalic. Under this view, /l/-darkening can be seen as gestural re-organization that functions as gradient transition to /l/-vocalization. Our findings complement previous proposals suggesting links between /l/-darkening and /l/-vocalization. In Section 1.2, we observed that the two might be interpreted as stages in the same continuum of change. From a gestural timing perspective, /l/-darkening and /l/-vocalization can both be characterized by delay in raising of the tongue tip, relative to dorsal retraction (Sproat & Fujimura, 1993). An additional shared characteristic between the two processes, brought forth by our data, is the reduction in lateralization accompanying /l/-darkening and /l/-vocalization. We have already made the argument that loss of lateralization makes a sound relatively less consonantal, since lateralization is a consonantal feature. In this context, the loss of lateralization we observe is consistent with a shift from a consonantal variant to a vocalic one. Thus, the term ‘vocalization’ is an adequate characterization of a multi-dimensional shift in manner of articulation. However, an increase in vocalic characteristics is already present for dark /l/. This observation supports a relatively gradient interpretation of classes such as ‘consonantal’ and ‘vocalic.’ Note also that loss of lateralization is a gradual process, as we observe intermediate degrees of such loss. In this sense, lateralization behaves similarly to other articulatory dimensions involved in the vocalization of /l/. Midsagittaly, we also observed varying degrees of dorsal retraction, and varying degrees of reduction in tongue tip raising, both of which make /l/ more vocalic. Both observations are consistent with previous literature (Lee-Kim et al., 2013; Sproat & Fujimura, 1993; Turton, 2015, 2017).

Another potential explanation behind an increase in /l/-darkening is perceptual. The presence of a dorsal gesture in /l/ has specific acoustic consequences, including F2 lowering among other spectral cues. For instance, pre-/l/ vowels differ systematically from other segmental context in Australian English, a closely related variety to New Zealand English (Palethorpe & Cox, 2003). Once vowel formants become affected by the following coda /l/, they have the potential to become more robust in cuing the presence of the /l/ than spectral information associated with the consonant itself. As acoustic consequences of dorsal retraction become enhanced, this could contribute to increased dorsal retraction arising through a perception-production loop. Simultaneously, acoustic cues associated with the presence of a central occlusion may become perceptually less prominent, creating conditions for apical reduction, as part of a cue trade-off. Similarly, the acoustic consequences of lateralization may give way to other cues in signalling coda /l/, contributing to reduction in lateralization as /l/-darkening increases.

In effect, although the dorsal gesture may originally condition the presence of lateralization or follow from it, it eventually contributes to its reduction. While such a chain of events may seem odd, it is not completely unusual. Consider, for instance, what happens in cases of phonetic cue restructuring for a phonological voicing contrast. Many languages show a so-called voicing effect which involves lengthening of the vowel before a lenis stop. This typological tendency has been ascribed to a combination of factors, one of which is laryngeal adjustment: The achievement of stop closure is slower in voiced stops compared to voiceless stops, which has to do with the time required to produce the aerodynamic conditions for voicing during closure (Halle & Stevens, 1967). As a result, the preceding vowel lengthens. In many languages, including English, the vowel lengthening increases over time, and takes over as the primary voicing cue in certain positions, such as word-final (see Coretta, 2020 for a recent overview of the voicing effect). In this way, one of the phonetic factors that contributes to the origins of the voicing effect, the presence of vocal fold vibration, may disappear over time as a less prominent perceptual cue.

We also note that the lateralization profiles differ by vowel. We generally find that lateralization peaks around the time point of the maximum tongue tip raising. However, in the context of a preceding FLEECE vowel, lateralization values are highest at the onset of the vowel, followed by a dip and another peak that is timed more closely with the tongue tip raising maximum. It is not obvious why there should be a vowel-initial rise in lateralization values. One possibility is that this vowel context enables a higher degree of lateral bracing than the other vowels, which are all lower. Gick, Allen, Roewer-Després, and Stavness (2017) show that most segments involve lateral bracing but that /l/, particularly post-vocalic /l/, involves asymmetric lateral release of bracing, whereby one side of the tongue lowers. Proximity of the tongue blade to the palate in the FLEECE context may facilitate asymmetric lowering through bracing to a greater degree than the other vowels, which are lower. Notably, the facilitatory effect of bracing one side of the tongue to lower the other would be lost with increased distance from the palate, later in the Vl sequence. An alternative explanation, suggested to us by a reviewer, is that early lateralization in /iːl/ arises from the gestural antagonism between the front vowel and the back coda gesture, which creates specific perceptual constraints on the presence of the coda /l/.

Finally, a comment is due regarding the vowel-morphosyntax interactions we find in the ultrasound data. If we consider /l/-darkening to be a strictly segmental process, the interaction would not be expected. However, previous work finds interactions like this. For instance, in a study of Southern British English, Strycharczuk and Scobbie (2017b) find robust morphosyntactic differences on the degree of vowel fronting before /l/ for pairs such as hula and fool-ing (when the preceding vowel is GOOSE), but much smaller differences for pairs such as bully and wool-ly (when the preceding vowel is FOOT). Similarly, Mackenzie et al. (2018) report differences in the acoustic effect of /l/-darkening when the preceding vowel is FLEECE (ceiling versus kneel-ing), but they find no difference between morpheme-final and morpheme-initial /l/ when the preceding vowel is DRESS (cellar versus sell-er), or KIT (skillet versus kill-it). Note also other works that report differences between morpheme-medial and morpheme-final /l/, using FLEECE as the vocalic context (Sproat & Fujimura, 1993; Turton, 2015, 2017). Across the literature, morphologically conditioned /l/-darkening effects are most robust for /l/ preceded by FLEECE or GOOSE.

We propose that the observed vowel-morphosyntax interactions are due to a form of a ceiling effect. FLEECE and GOOSE are high-front vowels in most present-day English accents, and as such, the coarticulatory effect they exert on the neighbouring /l/ go against /l/-darkening. Mackenzie et al. (2018) call these ‘lightening’ effects. When the segmental and morphosyntactic influences conspire, light /l/ is very light. Our data confirm this, showing that word-initial /l/ is much lighter following /iː/ than in the two other vocalic contexts. Canonical coda /l/, on the other hand, show less variation conditioned by the vocalic context.

Increased difference between lightest and darkest /l/ following FLEECE increase the overall darkening scale, making potential intermediate effects more detectable. In contrast, ‘light’ word-initial /l/ preceded by KIT or THOUGHT is in fact relatively dark, due to the coarticulatory influence of the vowel. Consequently, any potential intermediate darkening effects that we might expect in words such as fill-er, or maul-er may be small, and even undetectable.

5. Conclusion

Based on new articulatory results on the parasagittal articulation of /l/, we have shown that lateralization is gradually reduced in /l/-darkening and /l/-vocalization. This provides a new perspective on a well-established diachronic link between /l/-darkening and /l/-vocalization. We have argued that incipient reduction in /l/-lateralization might contribute to the innovation of /l/-vocalization. The reduction in /l/-lateralization could follow mechanically from tongue-mid lowering or from a shift from direct lateral control (passive dorsal control) to direct dorsal control (passive lateral control). While this scenario finds partial support from the articulatory patterns in our data, further acoustic analysis and perceptual experimentation could give us a more complete view, since the account involves perceptual reinterpretation. In addition, a large-scale study into how /l/-vocalization spreads through a speech community would shed light on how perceptual reinterpretation affects the propagation of this change.


  1. Speaker S2 shows some unexpected patterns in having relatively greatest tongue root retraction for Vl#V in the FLEECE contexts and for #lV in the KIT context. However, the average tongue shape in these two cases would suggest tracking errors rather than a systematic effect. [^]


We acknowledge financial support from the New Zealand Institute for Language, Brain, and Behaviour and support from the New Zealand Marsden Fund grant “Saving energy versus making yourself understood during speech production” to the second author. We thank the experiment participants. We are also grateful to Alan Wrench for his help with data processing. Many thanks to our Laboratory Phonology editor Susanne Gahl, and two anonymous reviewers, for their feedback on the earlier versions of the manuscript. Any remaining errors remain our own.

We dedicate this paper to the memory of Romain Fiasson, who provided much kind and helpful assistance to us all during this project.

Competing Interests

The authors have no competing interests to declare.


Articulate Instruments Ltd. (2014). Articulate Assistant Advanced ultrasound module user manual, revision 2.16. Edinburgh, UK: Articulate Instruments Ltd.

Ash, S. (1982). The vocalization of /l/ in Philadelphia (Unpublished doctoral dissertation). University of Pennsylvania.

Baayen, R. H., van Rij, J., de Cat, C., & Wood, S. (2018). Autocorrelated errors in experimental data in the language sciences: Some solutions offered by Generalized Additive Mixed Models. In D. Speelman, K. Heylen, & D. Geeraerts (Eds.), Mixed-effects regression models in linguistics (pp. 49–69). Springer. DOI:  http://doi.org/10.1007/978-3-319-69830-4_4

Bauer, L. (2008). Lenition revisited. Journal of Linguistics, 44, 605–624. DOI:  http://doi.org/10.1017/S0022226708005331

Bauer, L., & Warren, P. (2008). New Zealand English: Phonology. In B. Kortmann, C. Upton, E. W. Schneider, K. Burridge, & R. Mesthrie (Eds.), A handbook of varieties of English (Vol. 3: The Pacific and Australasia, pp. 580–602). Berlin and New York: Mouton de Gruyter.

Blackwood Ximenes, A., Shaw, J. A., & Carignan, C. (2017). A comparison of acoustic and articulatory methods for analyzing vowel differences across dialects: Data from American and Australian English. The Journal of the Acoustical Society of America, 142, 363–377. DOI:  http://doi.org/10.1121/1.4991346

Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer [Computer programme]. Retrieved from http://www.praat.org/

Borowsky, T., & Horvath, B. (1997). L-vocalisation in Australian English. In F. Hinskens, R. van Hout, & W. L. Wetzels (Eds.), Variation, change and phonological theory (pp. 101–123). Amsterdam: Benjamins.

Browman, C. P., & Goldstein, L. (1995). Gestural syllable position effects in American English. In L. R. F. Bell-Berti (Ed.), Producing speech: Contemporary issues (pp. 19–33).

Carter, P. (2002). Structured variation in British English liquids (Unpublished doctoral dissertation). University of York.

Carter, P., & Local, J. (2007). F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association, 37, 183–199. DOI:  http://doi.org/10.1017/S0025100307002939

Charles, S., & Lulich, S. M. (2019). Articulatory-acoustic relations in the production of alveolar and palatal lateral sounds in Brazilian Portuguese. The Journal of the Acoustical Society of America, 145, 3269–3288. DOI:  http://doi.org/10.1121/1.5109565

Coretta, S. (2019a). rticulate: Ultrasound tongue imaging in r [Computer software manual]. Retrieved from https://github.com/stefanocoretta/rticulate (R package version 1.4.0).

Coretta, S. (2019b). tidymv: Plotting for generalised additive models [Computer software manual]. Retrieved from https://github.com/stefanocoretta/tidymv (R package version 2.0.0).

Coretta, S. (2020). Vowel duration and consonant voicing: A production study (Unpublished doctoral dissertation). University of Manchester.

De Decker, P., & Mackenzie, S. (2017). Tracking the phonological status of /l/ in Newfoundland English: Experiments in articulation and acoustics. The Journal of the Acoustical Society of America, 142, 350–362. DOI:  http://doi.org/10.1121/1.4991349

Derrick, D., Best, C., & Fiasson, R. (2015). Non-metallic ultrasound probe holder for co-collection and co-registration with EMA. In Proceedings of the 18th International Congress on Phonetic Sciences, Glasgow, Scotland 2015. Retrieved from https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/proceedings.html

Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97, 611–631. DOI:  http://doi.org/10.1198/016214502760047131

Fraley, C., Raftery, A. E., Murphy, T. B., & Scrucca, L. (2012). mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation (Technical Report No. 597).

Gick, B. (1999). A gesture-based account of intrusive consonants in English. Phonology, 16, 29–54. DOI:  http://doi.org/10.1017/S0952675799003693

Gick, B. (2003). Articulatory correlates of ambisyllabicity in English glides and liquids. In J. Local, R. Ogden, & R. Temple (Eds.), Papers in laboratory phonology VI: Constraints on phonetic interpretation (pp. 222–236). Cambridge: Cambridge University Press.

Gick, B., Allen, B., Roewer-Després, F., & Stavness, I. (2017). Speaking tongues are actively braced. Journal of Speech, Language, and Hearing Research, 60, 494–506. DOI:  http://doi.org/10.1044/2016_JSLHR-S-15-0141

Gick, B., Campbell, F., Oh, S., & Tamburri-Watt, L. (2006). Toward universals in the gestural organization of syllables: A cross-linguistic study of liquids. Journal of Phonetics, 34, 49–72. DOI:  http://doi.org/10.1016/j.wocn.2005.03.005

Giles, S. B., & Moll, K. L. (1975). Cinefluorographic study of selected allophones of English /l/. Phonetica, 31, 206–227. DOI:  http://doi.org/10.1159/000259670

Gimson, A. (1980). An introduction to the pronunciation of English by AC Gimson (3rd ed.). London: Arnold.

Gordon, E., & Maclagan, M. (2008). Regional and social differences in New Zealand: Phonology. In B. Kortmann, C. Upton, E. W. Schneider, K. Burridge, & R. Mesthrie (Eds.), A handbook of varieties of English (Vol. 3: The Pacific and Australasia, pp. 64–76).

Hall-Lew, L., & Fix, S. (2012). Perceptual coding reliability of (L)-vocalization in casual speech data. Lingua, 122, 794–809. DOI:  http://doi.org/10.1016/j.lingua.2011.12.005

Halle, M., & Stevens, K. (1967). Mechanism of glottal vibration for vowels and consonants. The Journal of the Acoustical Society of America, 41, 1613–1613. DOI:  http://doi.org/10.1121/1.2143736

Hardcastle, W., & Barry, W. (1989). Articulatory and perceptual factors in /l/ vocalisations in English. Journal of the International Phonetic Association, 15, 3–17. DOI:  http://doi.org/10.1017/S0025100300002930

Harper, S. (2019). The relationship between gestural timing and magnitude for American English /l/ across speech tasks. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 (pp. 830–834).

Hay, J. (2008). New Zealand English. Edinburgh University Press. DOI:  http://doi.org/10.3366/edinburgh/9780748625291.001.0001

Horvath, B. M., & Horvath, R. J. (2002). The geolinguistics of /l/ vocalization in Australia and New Zealand. Journal of sociolinguistics, 6, 319–346. DOI:  http://doi.org/10.1111/1467-9481.00191

Johnson, W., & Britain, D. (2007). L-vocalisation as a natural phenomenon: Explorations in sociophonology. Language Sciences, 29, 294–315. DOI:  http://doi.org/10.1016/j.langsci.2006.12.022

Jongkind, A. P., & van Reenen, P. (2007). The vocalization of /l/ in standard Dutch. In A. Timuska (Ed.), Proceedings of the IVth international conference of dialectologists and geolinguists, University of Latvia, Riga, 28.7.2003 (pp. 1–6). Riga: Latvian Language Institute.

Kirby, J. P. (2011). Cue selection and category restructuring in sound change (Unpublished doctoral dissertation). The University of Chicago.

Kirkham, S., Nance, C., Littlewood, B., Lightfoot, K., & Groarke, E. (2019). Dialect variation in formant dynamics: The acoustics of lateral and vowel sequences in Manchester and Liverpool English. The Journal of the Acoustical Society of America, 145, 784–794. DOI:  http://doi.org/10.1121/1.5089886

Kirkham, S., Turton, D., & Leemann, A. (2020). A typology of laterals in twelve English dialects. The Journal of the Acoustical Society of America, 148(1), EL72–EL76. DOI:  http://doi.org/10.1121/10.0001587

Koneczna, H. (1965). Charakterystyka fonetyczna języka polskiego na tle innych języków słowiańskich [Phonetic characterization of Polish in comparison to other Slavic languages]. Warszawa: Wydawnictwo Naukowe PWN.

Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Cambridge, MA: Blackwell.

Lee-Kim, S.-I., Davidson, L., & Hwang, S. (2013). Morphological effects on the darkness of English intervocalic /l/. Laboratory Phonology, 4, 475–511. DOI:  http://doi.org/10.1515/lp-2013-0015

Leemann, A., Kolly, M.-J., Werlen, I., Britain, D., & Studer-Joho, D. (2014). The diffusion of /l/-vocalization in Swiss German. Language Variation and Change, 26, 191–218. DOI:  http://doi.org/10.1017/S0954394514000076

Lehiste, I. (1964). Acoustical characteristics of selected English consonants. The Hague: Mouton.

Lin, S., Beddor, P. S., & Coetzee, A. W. (2014). Gestural reduction, lexical frequency, and sound change: A study of post-vocalic /l/. Laboratory Phonology, 5, 9–36. DOI:  http://doi.org/10.1515/lp-2014-0002

Mackenzie, S., Olson, E., Clayards, M., & Wagner, M. (2018). North American /l/ both darkens and lightens depending on morphological constituency and segmental context. Laboratory Phonology, 9. DOI:  http://doi.org/10.5334/labphon.104

Moon, S.-J., & Lindblom, B. (1994). Interaction between duration, context, and speaking style in English stressed vowels. The Journal of the Acoustical Society of America, 96, 40–55. DOI:  http://doi.org/10.1121/1.410492

Müller, D. (2011). Developments of the lateral in Occitan dialects and their Romance and cross-linguistic context (Unpublished doctoral dissertation). Universitat de Tolosa 2 - Lo Miralh & Ruprecht-Karls-Universität Heidelberg.

Nagórko, A. (1996). Zarys gramatyki polskiej [An outline of Polish grammar]. Warszawa: Wydawnictwo Naukowe PWN.

Narayanan, S. S., Alwan, A. A., & Haker, K. (1997). Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part I. The laterals. The Journal of the Acoustical Society of America, 101, 1064–1077. DOI:  http://doi.org/10.1121/1.418030

Palethorpe, S., & Cox, E. M. (2003). Vowel modification in pre-lateral environments. Poster presented at the International Seminars on Speech Production, Sydney.

Proctor, M. (2011). Towards a gestural characterization of liquids: Evidence from Spanish and Russian. Laboratory Phonology, 2, 451–485. DOI:  http://doi.org/10.1515/labphon.2011.017

Proctor, M., Walker, R., Smith, C., Szalay, T., Goldstein, L., & Narayanan, S. (2019). Articulatory characterization of English liquid-final rimes. Journal of Phonetics, 77, 100921. DOI:  http://doi.org/10.1016/j.wocn.2019.100921

Recasens, D. (1996). An articulatory-perceptual account of vocalization and elision of dark /l/ in the Romance languages. Language and speech, 39, 63–89. DOI:  http://doi.org/10.1177/002383099603900104

Scobbie, J. M., & Pouplier, M. (2010). The role of syllable structure in external sandhi: An EPG study of vocalisation and retraction in word-final English /l/. Journal of Phonetics, 38, 240–259. DOI:  http://doi.org/10.1016/j.wocn.2009.10.005

Skarnitzl, R. (2009). Challenges in segmenting the Czech lateral liquid. In A. Esposito & R. Vích (Eds.), Cross-modal analysis of speech, gestures, gaze and facial expressions (pp. 162–172). Springer. DOI:  http://doi.org/10.1007/978-3-642-03320-9_16

Smith, C., & Lammert, A. C. (2013). Identifying consonantal tasks via measures of tongue shaping: A real-time MRI investigation of the production of vocalized syllabic /l/ in American English. In Interspeech (pp. 3230–3233).

Sóskuthy, M. (2017). Generalised Additive Mixed Models for dynamic analysis in linguistics: A practical introduction. arXiv:1703.05339.

Sproat, R., & Fujimura, O. (1993). Allophonic variation in English /l/ and its implications for phonetic implementation. Journal of Phonetics, 21, 291–311. DOI:  http://doi.org/10.1016/S0095-4470(19)31340-3

Strycharczuk, P., & Scobbie, J. (2015). Velocity measures in ultrasound data. Gestural timing of post-vocalic /l/ in English. In Proceedings of the 18th International Congress on Phonetic Sciences, Glasgow, Scotland 2015. Retrieved from https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/proceedings.html

Strycharczuk, P., & Scobbie, J. (2016). Gradual or abrupt? The phonetic path to morphologisation. Journal of Phonetics, 59, 76–91. DOI:  http://doi.org/10.1016/j.wocn.2016.09.003

Strycharczuk, P., & Scobbie, J. (2017b). Whence the fuzziness? Morphological effects in interacting sound changes in Southern British English. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 8, 7. DOI:  http://doi.org/10.5334/labphon.24

Strycharczuk, P., & Scobbie, J. M. (2017a). Fronting of Southern British English high-back vowels in articulation and acoustics. The Journal of the Acoustical Society of America, 142, 322–331. DOI:  http://doi.org/10.1121/1.4991010

Strycharczuk, P., & Scobbie, J. M. (2020). Gestural delay and gestural reduction. Articulatory variation in /l/-vocalisation in Southern British English. In A. Przewozny, C. Viollain, & S. Navarro (Eds.), The corpus phonology of English: Multifocal analyses of variation (pp. 9–29). Edinburgh University Press.

Szalay, T., Benders, T., Cox, F., & Proctor, M. (2019). Lingual configuration of Australian English /l/. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 (pp. 2816–2820).

Tiede, M. (2010). MVIEW: Multi-channel visualization application for displaying dynamic sensor movements. Development.

Turton, D. (2014). Variation in English /l/: Synchronic reflections of the life cycle of phonological processes (Unpublished doctoral dissertation). University of Manchester.

Turton, D. (2015). Determining categoricity in English /l/-darkening: A principal component analysis of ultrasound spline data. In Proceedings of the 18th International Congress on Phonetic Sciences, Glasgow, Scotland 2015.

Turton, D. (2017). Categorical or gradient? An ultrasound investigation of/l/-darkening and vocalization in varieties of English. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 8. DOI:  http://doi.org/10.5334/labphon.35

van Rij, J., Wieling, M., Baayen, R. H., & van Rijn, H. (2015). itsadug: Interpreting time series and autocorrelated data using GAMMs. (R package version 1.0.3).

Wells, J. (1982). Accents of English. 3 vols. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511611766

Wieling, M. (2018). Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics, 70, 86–116. DOI:  http://doi.org/10.1016/j.wocn.2018.03.002

Wood, S. (2017). Generalized additive models: An introduction with R (2nd ed.). Chapman and Hall/CRC. DOI:  http://doi.org/10.1201/9781315370279

Wright, S. (1987). The interaction of sociolinguistic and phonetically-conditioned CSPs in Cambridge English: Auditory and electropalatographic evidence. Cambridge papers in phonetics and experimental linguistics, 5.

Ying, J., Carignan, C., Shaw, J. A., Proctor, M., Derrick, D., & Best, C. T. (2017). Temporal dynamics of lateral channel formation in /l/: 3D EMA data from Australian English. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, pp. 2978–2982). DOI:  http://doi.org/10.21437/Interspeech.2017-765

Ying, J., Shaw, J., Kroos, C., & Best, C. T. (2012). Relations between acoustic and articulatory measurements of /l/. In Proceedings of the 14th Australasian International Conference on Speech Science and technology (pp. 109–112).

Zhang, Z., & Espy-Wilson, C. Y. (2004). A vocal-tract model of American English /l/. The Journal of the Acoustical Society of America, 115, 1274–1280. DOI:  http://doi.org/10.1121/1.1645248

Zhou, X. (2009). An MRI-based articulatory and acoustic study of American English liquid sounds /r/ and /l/ (Unpublished doctoral dissertation). University of Maryland, College Park. DOI:  http://doi.org/10.1109/ICASSP.2010.5495710