A fundamental issue in speech perception is how fine and varied phonetic details affect the identification and categorization of speech into higher-level units. An intrinsic and pervasive source of variation in speech is coarticulation (Lindblom, 1963; Iskarous et al., 2013). In order to map coarticulated speech to higher level units, listeners must recognize which cues or properties are the result of coarticulation, and take these effects into account when perceiving speech (Mann & Repp, 1980; Fowler, 1986; Gaskell & Marslen-Wilson, 1998; Fowler, 2005; Beddor, McGowan, Boland, Coetzee, & Brasher, 2013; Harrington, Kleber, & Stevens, 2016; Zellou, 2017).
Several studies have examined how the coarticulatory effects of nasal consonants on vowels are perceived in English (e.g., Beddor & Strange, 1982; Beddor, 2009; Beddor et al., 2013; Zellou, 2017). These studies found that on the one hand, listeners can perceive fine-grained phonetic details, as they can differentiate between oral and nasal vowels, and between degrees of nasalization (Beddor & Strange, 1982; Beddor et al., 2013). On the other hand, listeners can compensate for the coarticulatory influence of nasals by attributing vowel nasalization to its consonantal source. This ensures that nasal coarticulation does not hamper vowel perception (Beddor, 2009; Beddor et al., 2013; Zellou, 2017). However, English does not have a phonemic contrast between oral and nasal vowels, and a nasalized vowel typically only appears in a predictable pre-nasal or post-nasal environment. Therefore vowel-nasal coarticulation does not affect contrastive vowel cues in English and English listeners can attribute vowel nasalization as a cue to the following consonant instead of interpreting it as a cue to vowel identity.
Another segment that has been shown to have a strong coarticulatory influence on the preceding vowel is dark coda /l/ (Recasens, 2002; Cox & Palethorpe, 2007). Unlike nasals, coda /l/ affects cues that are contrastive in the English vowel inventory, as it reduces spectral cues to vowel contrast (Palethorpe & Cox, 2003; Szalay, Benders, Cox, Palethorpe, & Proctor, 2021; Wade, 2017). For instance, in Australian English, acoustic vowel contrast is reduced in the prelateral context, in particular between the vowels /iː-ɪ/ (heel-hill), /ʉː-ʊ/ (fool-full), /æɔ-æ/ (howl-Hal), and /əʉ-ɔ/ (dole-doll)1 (Palethorpe & Cox, 2003; Szalay et al., 2021). Therefore the quality of the nucleus in these words can be attributed to the coda but may also be interpreted as an intrinsic quality of the vowel. That is, coda /l/ potentially has the ability to mask acoustic cues used by listeners in vowel identification and word recognition.
The goal of this study is to investigate in new detail how vowel-lateral coarticulation affects speech perception by examining listeners’ ability to disambiguate coarticulated phonemes, in particular, to examine whether vowel-lateral coarticulation affects the perception of phonologically contrastive cues. We hypothesized that if coarticulation with coda /l/ reduces perceptually contrastive vowel cues, listeners’ ability to discriminate prelateral vowels would be hindered. If vowel-/l/ coarticulation reduces contrastive vowel cues, the effect would be evident through an increased difficulty in vowel disambiguation in the pre-/l/ context compared to a pre-/d/ context. We expected that vowels that are the most spectrally similar in the prelateral context may be most difficult to disambiguate.
We tested this hypothesis in two experiments. In the first experiment we found that /l/-final rimes were disambiguated less easily than /d/-final rimes; in particular, the spectrally similar pairs /ʉːl-ʊl, æɔl-æl, əʉl-ɔl/ (e.g., fool-full, howl-Hal, dole-doll) were poorly discriminated compared to other /l/-final target-competitor pairs and to /d/-final minimal pairs contrasting the same vowels. The limitation of our first experiment was that it used a combination of real and non-words.
Because identification of ambiguous phonemes is facilitated when the target is contained in a lexical item rather than a non-word (Ganong, 1980; Magnuson, McMurray, Tanenhaus, & Aslin, 2003), we conducted a second experiment using only real words to examine if the contrast-reducing influence of lateral codas also affects lexical access to /l/-final words. We hypothesized that if coda /l/ reduces the contrastive cues necessary for vowel disambiguation, listeners will remain unable to disambiguate vowels even when presented with real words. For example, they may map the acoustic signal of pool to the lexical item pull. We found that listeners were less accurate and slower at accessing monosyllabic words within the pairs /iːl-ɪl, ʉːl-ʊl, æɔl-æl/ and /əʉl-ɔl/ compared to their /d/-final counterparts. These results suggest that some members of /l/-final minimal pairs may be inherently acoustically ambiguous, which limits listeners’ ability to disambiguate between them.
1.1. Effect of phonetic context on phoneme identification
The identification of speech segments requires listeners to factor out the influence of surrounding segments in order to recover the intended speech sound (Mann, 1980; Fowler, 1986; Zellou, 2017). Listeners may interpret cues according to their contexts and may perceive the same ambiguous signal as different segments under different contextual conditions (Mann & Repp, 1980; Fowler, 1984; Gaskell & Marslen-Wilson, 1998; Kleber, Harrington, & Reubold, 2012; Zellou, 2017).
In the perception of consonants, when hearing a fricative that is ambiguous between /s/ and /ʃ/, listeners reported perceiving /s/ when the fricative was followed by a rounded vowel, and reported perceiving /ʃ/ when it was followed by an unrounded vowel (Mann & Repp, 1980). This is because in a fricative+unrounded vowel sequence listeners attribute the low frequencies to the fricative and categorize it as /ʃ/, whereas in a fricative+rounded vowel sequence listeners attribute the same low frequencies to lip rounding and categorize the fricative as /s/ (Mann & Repp, 1980; Smits, 2001; Mitterer, 2006). Similarly, a segment that is ambiguous between /d/ (with high F3 onset) and /ɡ/ (with low F3 onset) is more likely to be perceived as /ɡ/ when it is preceded by /l/ than when it is preceded by /ɹ/ (Mann, 1980). If listeners attribute the lowered F3 to the stop in the /l/+stop sequence they would categorize the stop as /ɡ/, whereas a lowered F3 attributed to /ɹ/ in a /ɹ/+stop sequence may lead listeners to classify the stop as /d/ (Mann, 1980). These effects might not be specific to speech, as under certain circumstances, a preceding low tone (corresponding to /ɹ/) or high tone (corresponding to /l/) have the same effect (Lotto & Kluender, 1998; Fowler, Brown, & Mann, 2000).
Consonantal context has also been shown to affect vowel categorization. For example, listeners accept a vowel with a relatively high F2 as /ʊ/ in the fronting /s_t/ context, whereas they categorize the same vowel as /ɪ/ in the non-fronting /w_l/ context despite the fact that prototypical /ʊ/ has a low F2 and prototypical /ɪ/ has a high F2 (Kleber et al., 2012). These studies suggest that listeners attribute coarticulatory information to the influencing segment and factor coarticulatory effects out in the perception of the affected segment.
There are instances of coarticulation that lead to assimilation, for example /p/ in the phrase top tag can be realized as [t] (K. N. Stevens & Keyser, 2010). Listeners are better able to attribute segmental variation to context in real words, such as freight bearer, realized with a final /p/ instead of a /t/ in freight, than in nonwords, such as preip bearer (Gaskell & Marslen-Wilson, 1998). Dutch listeners also compensate for coarticulation in the Dutch phrase tuin bank [garden bench] pronounced with a final /m/ (Mitterer & Blomert, 2003). However, this cannot be attributed to lexical effects, as German listeners also compensate for coarticulation as they perceived an /n/ in tuim bank despite not being aware that tuin means garden but tuim is a not a word (Mitterer & Blomert, 2003). These studies show contrasting results as to whether listeners integrate top-down lexical information in phoneme perception.
The ability to factor out the influence of surrounding segments allows listeners to recover phoneme categories and category membership despite contextual change to the signal. For instance, /ɡ/ has an acoustically different release burst between /ɡi/ and /ɡu/, but listeners perceived acoustically different /ɡ/ sounds in the appropriate coarticulatory context as more similar to each other than acoustically identical /ɡ/ sounds when one of them was originally produced in a different phonetic context (Fowler, 1984). Similarly, English listeners perceived oral and nasal vowels as different when nasality cannot be attributed to context (e.g., nasal vowels in the context of oral consonants or in isolation) and as similar when nasality can be attributed to context (e.g., nasal vowels in the context of nasal consonants) (Beddor & Krakow, 1999).
Coarticulation also has the potential to affect contrastive cues to vowel identity and reduce acoustic contrast. It is not clear from these studies whether listeners can disambiguate coarticulated phonemes in which coarticulation has affected contrastive cues to phoneme identity. This may create a perceptual target that is inherently ambiguous, as listeners may attribute cues to either the segment undergoing coarticulation or to the segment causing it. An environment where these interactions can be explored in more detail is lateral-final rimes, because vowel-lateral coarticulation affects contrastive vowel cues, reducing acoustic vowel contrast and making vowels potentially perceptually ambiguous in the prelateral context in natural speech.
1.2. The effect of coda /l/ on Australian English vowels
General Australian English (AusE) uses a large vowel inventory consisting of 18 stressed vowels and schwa (Figure 1) (Cox & Fletcher, 2017). The AusE vowel inventory utilizes both spectral and durational contrasts, with phonemic vowel length contrast for spectrally similar pairs (Harrington, Cox, & Evans, 1997; Cox & Palethorpe, 2007). For instance, the vowel pairs /ɐː-ɐ, eː-e/ (e.g., card-cud, shared-shed) primarily contrast in length (Cox & Palethorpe, 2007), and /iː-ɪ, ʉː-ʊ/ are realized with both durational and spectral contrast (Cox, 2006). In addition, there are spectrally similar diphthong-monophthong pairs in which one of the diphthongal targets coincides with a monophthong, such as /æɔ-æ, æɪ-æ/ in loud-lad, laid-lad, and /əʉ-ʉː, æɔ-ɔ/ in boat-boot, pout-pot (Cox, 1999). As a result, some AusE vowel pairs share spectral features.
English coda /l/ is typically realized as a dark [ɫ], articulated with a lowered and retracted tongue dorsum, and an alveolar tongue tip gesture (Sproat & Fujimura, 1993). As the tongue dorsum gesture of [ɫ] may start during the vowel production, [ɫ] favours anticipatory V-[ɫ] coarticulation, leading to the backing and the lowering of the vowel (Recasens, 2002; Lin, Palethorpe, & Cox, 2012).
There are large acoustic differences between preobstruent and prelateral vowel allophones in AusE. Diminished vowel dispersion in the F1-F2 plane and reduced contrast between certain vowel pairs is characteristic of prelateral allophones (Figure 2). Diminished vowel dispersion is the result of the backing of the front vowels: Significantly lowered F2 was found for /iː, ɪ, e, ɐ, ɔ, ʉː, ɜː/ in prelateral environments (Palethorpe & Cox, 2003; Cox & Palethorpe, 2004). Prelateral vowels show overall reduced acoustic contrast compared to their preobstruent counterparts. In particular, the members of the pairs /iː-ɪ, ʉː-ʊ, æɔ-æ/ and /əʉ-ɔ/ might be inherently acoustically ambiguous in the prelateral environment, as machine learning algorithms trained on dynamic formant data and duration data were unable to discriminate between the members of these pairs (Szalay et al., 2021). Acoustic contrast between /ʉː-ʊ/ and /əʉ-ɔ/ is partially neutralized before a coda /l/, due to the lowering of the second formant of /ʉː/ (Palethorpe & Cox, 2003; Cox & Palethorpe, 2004; Szalay et al., 2021). Contrast between /æɔ-æ/ is partially neutralized before coda laterals, as /l/ and the second element of the diphthong overlap substantially. However, Palethorpe and Cox (2003) and Szalay, Benders, Cox, and Proctor (2018) found that durational contrast was maintained between the vowel pairs in the prelateral context. Cox (2006) did not find acoustic contrast reduction between the targets of the vowels /iː-ɪ/; however, acoustic contrast between the formant trajectories of /iː-ɪ/ was reduced due to reduction in the onglide of /iː/, which is one of the differentiating features between the two vowels. In addition, both /iː/ and /ɪ/ gain a schwa-like offglide in the prelateral context, which reduces acoustic contrast between their formant trajectories (Palethorpe & Cox, 2003; Szalay et al., 2021). As a result of the vowel-lateral coarticulation, prelateral allophones of AusE vowels differ substantially from their preobstruent counterparts in both spectral and durational characteristics (Palethorpe & Cox, 2003; Szalay et al., 2021).
Reduced dispersion of vowels in the F1-F2 plane in the prelateral environment may hinder vowel perception, as a more dispersed F1-F2 vowel space has been demonstrated to facilitate intelligibility in clear speech (Bradlow, Torretta, & Pisoni, 1996; Ferguson & Kewley-Port, 2007; Neel, 2008; but see J. C. Krause & Braida, 2004 for evidence to the contrary). Reduced dispersion may diminish spectral contrast and reduce intelligibility; for example, American English listeners confused the spectral neighbours /ɑ-ʌ/ and /ɛ-æ/ but never /i-ʌ/ or /ɛ-u/ (Neel, 2008). Therefore reduced vowel dispersion caused by vowel-/l/ interactions might also increase the difficulty of vowel, and thus potentially word identification.
Studies on the perception of English lateral-final rimes have shown that vowel-lateral coarticulation helps listeners identify /l/, but hinders identification of certain vowels. Anticipatory vowel-lateral coarticulation allowed British English listeners to reliably identify /l/ in belly when /l/ and the following sounds were replaced by white noise (West, 1999). In contrast, listeners could not identify /l/, when /l/ and the preceding vowel were replaced by white noise: Listeners could identify belly from [be##], but not from [b##i] (West, 1999). Vowel identification has been examined in /l/-triggered vowel mergers in several dialects of English (Thomas & Hay, 2005; Loakes, Clothier, Hajek, & Fletcher, 2014b; Wade, 2017). Listeners from Melbourne, Australia showed a limited ability to distinguish /el/ from /æl/ in a word identification task with minimal pairs (e.g., Alan-Ellen) (Loakes, Hajek, & Fletcher, 2010a, 2010b, 2010c, 2011; Loakes, Graetzer, Hajek, & Fletcher, 2012; Loakes, Clothier, Hajek, & Fletcher, 2014a; Loakes et al., 2014b). Some speakers of New Zealand English were able to distinguish minimal pairs differing in /el/ and /æl/ despite merging /el-æl/ in production (Thomas & Hay, 2005). In Ohio English, listeners could distinguish spectrally merged /oʊl-ul/ (e.g., pole-pull) and /ul-ʊl/ (e.g., pool-pull) using durational cues, but listeners from Vermont could not (Wade, 2017).
1.3. Experimental considerations
Production and perception studies have demonstrated that vowel-lateral coarticulation reduces acoustic contrast between certain vowels in unmanipulated speech. However, it is not clear if and how listeners can disambiguate vowels when acoustic contrast is reduced. AusE lateral-final rimes may provide insights into the issue of whether reduced acoustic contrast leads to a perceptually ambiguous vowel signal or whether listeners perceive acoustically ambiguous vowels according to their coda-context.
Previous questions regarding compensation for coarticulation and context-dependent perception have been successfully addressed with spliced stimuli and phonetic vowel continua (e.g., Mann & Repp, 1980; Fowler, 1984; Kleber et al., 2012; Zellou, 2017). However, both splicing and creating a synthetic continuum are near-impossible to implement for AusE vowel-lateral coarticulation, due to the large acoustic differences between preobstruent and prelateral vowel allophones unique to AusE (Appendices B and D).
Previous work has used splicing to show that identical acoustic signals may be interpreted differently given the context (e.g., Fowler, 1984; Zellou, 2017). One could similarly aim to investigate whether the interpretation of prelateral and preobstruent vowel allophones is dependent on the coda, by presenting listeners with a prelateral vowel spliced into a preobstruent context and a preobstruent vowel spliced before a lateral coda. The first step to creating such spliced stimuli would be to identify the vowel-coda boundaries in speakers’ natural productions. While identifying a vowel-obstruent boundary is straightforward, there is no discernible boundary between vowels and the following coda /l/. This is especially true for back vowels and backing diphthongs, whose formant characteristics are similar to those of dark /l/. Moreover, even if prelateral vowels are successfully isolated, the acoustic differences between prelateral and preobstruent vowels are so large that a prelateral allophone spliced into the preobstruent context would sound noticeably different from the standard AusE production, and therefore, phonetically unnatural to AusE listeners. Listeners’ response to spliced stimuli would, therefore, not inform our understanding of the perception of prelateral vowels in AusE.
Other work has employed synthetic continua to address questions of context-dependent boundary shifts (e.g., Mann & Repp, 1980; Kleber et al., 2012). A potentially interesting question regarding vowel-lateral coarticulation would be whether the perceptual boundary location between two vowels depends on whether the vowels precede an obstruent or a lateral coda. As acoustic contrast is reduced between prelateral compared to preobstruent vowels, the unambiguous endpoints of the synthetic vowel continua would need to be the preobstruent vowel allophones, to be followed by either a synthesized lateral or a synthesized obstruent coda. However, continua based on preobstruent allophones would not appropriately represent the prelateral vowels. Firstly, synthesized preobstruent allophones would sound phonetically unnatural in the prelateral context. For example, in adult female speech /ʉː/ (e.g., food) is realized with an F2 of approximately 2200 Hz in the preobstruent position, but with an F2 of approximately 980 Hz in the prelateral position (Szalay et al., 2021). An /ʉː/ with a high F2 is typically not followed by a lateral coda in standard AusE and the synthetic stimuli would sound geographically marked or even phonetically unnatural to listeners. Secondly, some prelateral allophones would not even occur on the continuum between endpoints that represent their preobstruent counterparts. For example, the preobstruent allophones [iː] and [ɪ] as well as the linearly interpolated exemplars between these two endpoints would never occur before [ɫ], as /iː-ɪ/ are always realized with a schwa-like offglide before coda /l/, as [iːəɫ] and [ɪᵊɫ]. Such a schwa-offglide would not be present in a continuum between preobstruent vowels. In addition to prelateral /iː-ɪ/ vowels without a schwa-offglide sounding phonetically unnatural, such stimuli would not actually test the perception of prelateral allophones. To appropriately represent prelateral vowels in a synthesized continuum, one would need to synthesize different endpoints for the preobstruent and prelateral contexts: Preobstruent vowel allophones would be used as unambiguous endpoints in the obstruent-final continua and prelateral vowel allophones in the lateral-final continua, resulting in an [iːd-ɪd] and an [iːəɫ-ɪəɫ] continuum. However, this would make the endpoints of the prelateral and preobstruent continua incomparable and therefore would not inform our understanding of boundary shifts.
Thus, we chose not to address questions of compensation for coarticulation and not to use manipulated stimuli. Instead, we focus on whether acoustic contrast reduction leads to perceptual contrast reduction in naturally occurring productions. We address these questions by comparing listeners’ ability to discriminate prelateral vowels to their ability to discriminate preobstruent vowels in two experiments using unmanipulated stimuli in which acoustic vowel contrast is naturally reduced in prelateral context. We discuss our results in light of compensation for coarticulation.
2. Experiment 1: Disambiguation of /l/-final rimes
We tested perceptual contrast reduction in prelateral vowels using a rime disambiguation task. Participants were asked to identify an aurally-presented target by selecting one of two orthographic representations. Candidate pairs consisted of an exhaustive pairing of all 16 possible stressed /l/-final rimes in AusE and an exhaustive pairing of the same 16 stressed vowels in /d/-final rimes. Comparing accuracy and reaction time (RT) of responses to /d/- and /l/-final target words allowed us to test the extent to which vowel-lateral coarticulation affects vowel disambiguation. This task also allowed us to identify the most easily confused vowel pairs. We hypothesized that if vowel-lateral coarticulation masks cues that are vital to vowel disambiguation, listeners would perform worse on /l/-final rimes than on /d/-final rimes. We also predicted that /l/-final contexts would have a particularly strong negative effect on accuracy and reaction time compared to /d/-final contexts for vowel pairs that have been shown to exhibit reduced contrast in /l/-final contexts, namely /ʉː-ʊ, æɔ-æ, əʉ-ɔ/ (e.g., fool-full, howl-hal, dole-doll).
Thirty (F = 29, M = 1, bilingual = 19, age = 19–56, mean = 24.16) listeners of AusE (born in Australia or migrated to Australia before the age of 2) participated in the experiment. Participants were undergraduate students of linguistics at Macquarie University and received course credit for participation. All participants had linguistic training but were naive to the purpose of the experiment. None of the participants reported any current hearing, speaking, or reading difficulties.
The stimuli consisted of 16 AusE vowels embedded in /hVd/ and /hVl/ words. The vowels /ɪə/ and /eː/ were excluded as they never appear before final /l/. When a combination of /h/+V+/d/ or /h/+V+/l/ did not yield an existing word, the corresponding nonword was used. The two alternatives in the forced-choice task were the orthographic representations of the candidates spelled uniformly with an initial h. Nonwords were spelled according to English spelling and judged by native speakers of AusE for transparency (Appendix A).
Stimulus materials were elicited from a 21-year old monolingual female university student born in Australia to Australian-born parents and recorded with an AKG C535 EB microphone at 44.1 kHz sampling rate in a sound treated studio in 2006. The stimuli were amplitude-normalized, digitized as 16 bit WAV files, and truncated to have one-second silence before and after the word. Mean duration of target words in the /d/ condition was 486 ms (range = 320–650 ms), and 528 ms (range = 450–640 ms) in the /l/ condition.
Participants familiarized themselves with the targets and they were introduced to the experiment with a short practice session, disambiguating the nonword targets. Feedback was provided after each trial. Familiarization and practice were followed immediately by the experimental phase.
Participants were seated in front of a computer monitor located at eye height at a distance of 50 cm and wore Sennheiser 380 Pro headphones adjusted to their comfortable listening level. Participants were instructed to respond as quickly and accurately as possible. To begin each trial, a fixation cross was displayed in the centre of the screen. After 500 ms the two candidate items were displayed in lower case orthography, arranged horizontally, and presented in different coloured boxes. After 1500 ms the target word started playing, while the candidates remained on screen. Participants had 2000 ms from audio onset to select the candidate they heard (Figure 3). Selections were made with a Chronos button box whose input keys mapped to the colours on the screen. The experiment moved on to the next trial when participants responded. If participants did not answer within 2000 ms, a warning message let them know that they were too slow and they were instructed to press a button to continue. The experiment did not proceed to the next trial until the participants responded.
Each participant was tested either on 16 /d/-final targets and 15 competitor candidates or on 16 /l/-final targets and 15 competitor candidates, creating a between-participant design. Target and competitor pairs were repeated in three blocks, once per block, with a 10-second forced break between the blocks. Each participant was exposed to 240 (items) × 3 (repetitions) = 720 trials. In half of the trials, the target candidate was presented on the right, and in the other half on the left. Trials were randomized within the blocks. After the experiment, participants reported whether they found any of the words ‘unusal’ or ‘difficult’.
Responses to 30 (participants) × 720 (trials) = 21,600 trials were collected. Sixty-three observations, including all 45 trials with hill as target and heel as competitor, were excluded from the analysis due to errors in stimulus presentation. Trials in which response times were faster than 210 ms (Woods, Wyma, Yund, Herron, & Reed, 2015) or beyond mean±2 SD of the participant (Ratcliff, 1993) were excluded, leaving a total of 20,413 trials (94.8%) for the analysis.
Generalized Linear Mixed-Effect Models (GLMMs) were implemented using glmer() function in the lme4 package in R (Bates, Mächler, Bolker, & Walker, 2015; R Core Team, 2018). Response accuracy was analyzed using the logistic link function, and reaction time data was analyzed using the logarithmic link function because the distribution of RT was right-skewed and followed a log-normal distribution (Figure 4). Convergence was estimated using the BOBYQA (Bound Optimization BY Quadratic Approximation) optimizer and an increased number of maximum iterations (Powell, 2009). LmerTest package (Kuznetsova, Brockhoff, & Christensen, 2017) was used to calculate p-values using Satterthwaite’s degrees of freedom method.
To examine the effect of coda /l/ on accuracy and speed of rime disambiguation, we constructed two GLMMs, one with the dependent variable Accuracy and another with RT of correct responses. The independent variables were Coda (treatment coded, comparing /l/ to the baseline /d/) and Lexical Status of Target (deviation coded, comparing real words and non-words to the grand mean). The independent variables Coda and Lexical Status were interacting. The models included a random by-participant intercept but not a by-participant random slope for the effect of Coda, as the experiment had a between-participant design. To examine the speed-accuracy trade-off between the two coda conditions, we constructed a third GLMM with the dependent variable RT, including RT of both correct and incorrect responses. The independent variables were Coda (interacting), Response Accuracy (interacting), and Lexical Status of Target (non-interacting); the model included a random by-participant intercept but not a by-participant random slope as the experiment had a between-participant design. Coda and Response Accuracy were treatment-coded so that the intercept was the RT of incorrect responses in the /d/ condition. To shed light on how the pairing of target and competitor vowels affects rime disambiguation, we used agglomerative hierarchical cluster analysis with Ward’s method (Ward, 1963). Hierarchical cluster analysis takes the individual vowels as single-element clusters and at each step merges two clusters into a group (a cluster) in such a way that the members of one cluster are maximally similar and the members of two separate clusters are maximally dissimilar.
2.2.1. Effects of Coda
Rimes ending in /l/ were disambiguated significantly less accurately (β = –0.58, z0.28 = –2.8, p = 0.04) and non-significantly more slowly (β = 0.09, t0.05 = 1.67, p = 0.1) than /d/-final rimes (Figure 5).2 Real words were disambiguated more accurately (β = 0.18, z0.07 = 2.65, p < 0.001), and quickly (β = –0.01, t0.002 = –6.07, p < 0.001) than the grand mean.
The exploration of the speed-accuracy trade-off showed that RT of incorrect responses was slower in the /l/ condition than in the /d/ condition (β = 0.13, t0.06 = 2.17, p = 0.02). RT was slower for correct responses than for incorrect responses within the /d/ condition β = 0.04, t0.02 = 2.19, p = 0.038). The difference between the RT of correct and incorrect responses was significantly smaller in the /l/ condition than in the /d/ condition (β = –0.04, t0.02 = –1.98, p = 0.047) (Figure 6). Real words were disambiguated more quickly (β = –0.02, t0.002 = –11.57, p < 0.001) than nonwords.
2.3. Effect of Target and Competitor vowels
The effect of Target and Competitor vowels was examined using agglomerative hierarchical cluster analysis with Ward’s method (Ward, 1963), based on a confusion matrix of target- and competitor vowels (Figure 7). Vowels that form a dyad in Figure 7 are vowels which were confused the most often when paired as target and competitor. The vertical location of the nodes indicates confusability: The lower a node is located, the higher the percentage of incorrect responses.
Target and competitor vowels were most frequently confused when the two vowels shared a similar place of articulation (vowel frontness and height). Long-short vowel pairs were the hardest to disambiguate (e.g., /iː-ɪ/ and /ɐː-ɐ/) in the /d/ condition. In the /l/ condition, the vowel pairs /ʉː-ʊ, æɔ-æ/ and /əʉ-ɔ/ were easily confused; however, this analysis does not establish whether articulatory similarity has a statistically significant effect on vowel disambiguation. Comparing the clusters between the /d/ and the /l/ condition shows that the rimes were harder to disambiguate in the /l/ condition, as two-member vowel clusters are separated earlier from other clusters, that is, the nodes are located lower in the /l/ condition.
The aim of Experiment 1 was to examine the influence of coda lateral coarticulation on listeners’ ability to disambiguate vowel contrasts. As predicted, these data revealed that vowel discrimination is significantly less accurate in prelateral than in preobstruent environments. Lower accuracy in the /l/ condition is consistent with the hypothesis that coda /l/ reduces perceptual vowel contrast. Listeners were not overall slower in disambiguating /l/ final rimes. However, incorrect responses were faster than correct responses in the /d/, but not in the /l/-condition, indicating a speed-accuracy trade-off for the former, but not for the latter. The presence of a speed-accuracy trade-off is consistent with a ʻfast-guessʼ model of RT that argues that decisions made quickly are guesses and therefore less likely to be accurate whereas decisions based on evidence are slow and highly accurate (Ollman, 1966; Yellott Jr, 1971). That is, incorrect answers are likely to be the result of fast guesses in the /d/ condition; however, when listeners allocated more time to make a decision they could disambiguate the vowels correctly. In contrast, we found no evidence for a speed-accuracy trade-off in the /l/-condition due to the increased RT of the incorrect responses, indicating that the incorrect answers were the result of processing difficulties, not of insufficient time taken to process the input. This suggests that when not opting for a fast-guess, listeners allocated the same amount of time to disambiguate the rime in both conditions; however, this time was not sufficient to make accurate decisions in the /l/ condition. That is, listeners’ incorrect responses are the result of insufficient time in the /d/ condition, whereas in the /l/ condition they are the result of increased difficulties in vowel disambiguation.
We attribute the increased difficulty in vowel disambiguation in the pre-/l/ context to the coarticulatory influence of /l/ on the vowel. In the stimuli, vowel-/l/ coarticulation led to spectral contrast reduction, consistent with findings of Palethorpe and Cox (2003) and Szalay et al. (2021) (see Appendix B for the formant trajectories of the most confused rimes). The overall negative effect of coda /l/ on vowel disambiguation indicates that coda /l/ masks some of the acoustic cues listeners rely on for discriminating between members of several prelateral vowel pairs.
We also examined the effects of Target and Competitor Vowel, expecting that spectrally similar vowels would be more likely to be confused with each other. This expectation was borne out both in the /d/ and the /l/ condition, as the most confused vowel pairs are similar to each other in place of articulation and formant trajectories, such as /ʉː-ʊ, æɔ-æ, əʉ-ɔ, oɪ-ɔ/. This is not surprising in the /d/ condition, as English listeners are only likely to confuse spectrally similar vowels (Neel, 2008). However, English listeners have been shown to give more weight to length cues when spectral differences are inherently smaller (Bennett, 1968) or not available any more due to a contextual merger (Wade, 2017). This does not seem to be the case in our data: Perceptual similarity between /ʉː-ʊ, æɔ-æ, əʉ-ɔ/ increased as spectral differences became smaller in the /l/ condition, even though the vowels within these pairs differed in length (Appendix B). The high confusion rate of /ʉː-ʊ, æɔ-æ/ and /əʉ-ɔ/ shows that listeners interpret the coarticulatory effects of /l/ as an intrinsic property of the vowel and as a vowel cue, and not as a cue to the following consonant. This shows that vowel-/l/ coarticulation interferes with listeners’ ability to map the signal to higher level units and disambiguate the rime.
A limitation of Experiment 1 was the nature of the task, which required listeners to map an auditory signal to a mixture of orthographically presented real words and non-words. Real-word status, word frequency, and familiarity all affect word recognition (Rubenstein, Garfield, & Millikan, 1970; Forster & Chambers, 1973; Segui, Mehler, Frauenfelder, & Morton, 1982; Meunier & Segui, 1999). In addition, listeners were not exposed to variation in the coda, as listeners were assigned to either the /l/ or the /d/ condition. A lack of attention to the codas, which was predictable for all items, thus might have been partly responsible for the observed inefficient compensation. In Experiment 2, we used a word recognition paradigm to test whether vowel-lateral coarticulation affects how listeners compensate for context during lexical processing of words. Experiment 2 required the processing of the entire word and also presented words ending in /d/ and /l/ to all participants to draw participants’ attention to the coda.
3. Experiment 2: Word recognition
We examined listeners’ recognition of /l/-final words contrasting /iː-ɪ, ʉː-ʊ, æɔ-æ/ and /əʉ-ɔ/ to assess whether listeners can identify prelateral vowels when required to process the information lexically. Participants listened to words contrasting the vowel pairs that had been identified as the most confusable in Experiment 1 (i.e., /ʉː-ʊ, æɔ-æ, əʉ-ɔ/ and in addition /iː-ɪ/, as their pre-/l/ allophones have acoustically similar offglides) (Palethorpe & Cox, 2003) to determine how listeners map the acoustic signal of /CVl/ minimal pairs to lexical items.
Forty-six female native speakers of Australian English, born in Australia to Australian-born parents (monolingual = 33, age = 18–40 years, mean = 21.5) participated in the experiment. Participants received course credit or $15 for participation. None of the participants reported any current hearing, speaking, or reading difficulties.
The stimuli consisted of 32 unique CVC targets and 38 unique (C)V(C) fillers. For the 32 targets, 16 minimal pairs were chosen which contrasted the four vowel pairs (/iː-ɪ, ʉː-ʊ, æɔ-æ, əʉ-ɔ/), with two sets of minimal words per coda and per vowel pair. Due to the limited number of available minimal pairs, the target words varied in words class and lexical frequency. Frequency was measured in the AusE part of the GloWbe corpus (Davies, 2013); mean frequency in the /d/ condition was 312.5 per million words (range = 0.3–2,415), and 48.8 (range = 0.2–446) in the /l/ condition. Fillers were (C)V(C) words that did not contain /d/ or /l/ or the target vowels in any position. Fillers matched the candidates in part of speech and onset consonants and were chosen from the first 5,000 most frequent words of the COCA database (Davies, 2008). Mean frequency of fillers was 397 per million words (range = 10–2,048).
Two sets of recordings of the stimulus materials were elicited, from a 57- and a 25-year-old female speaker of AusE. Stimuli were recorded with an AKG C535EB Condenser Microphone onto an iMac using Presonus Studio Live 16.2.4 AI Mixer at 44.1 kHz sampling rate in a sound treated studio. The stimuli were amplitude-normalized and truncated to have a one-second silence before and after the end of the word. Formant change over time for the stimulus words is shown in Appendix D. Mean duration of target words in the /d/ condition was 593 ms (range = 425–727 ms), and 644 ms (range = 474–841 ms) in the /l/ condition. Mean duration of the fillers was 662 ms (range = 474–844 ms).
Prior to the experiment, participants familiarized themselves with the stimulus materials by reading them out loud as they were presented in random order on a computer monitor. Participants were introduced to the experiment with a short practice session, listening to audio recordings of ten words, and typing what they heard. Feedback was provided after each trial on spelling alternatives and acceptable responses. Familiarization and practice were followed immediately by the experimental phase.
Participants were seated in front of a computer monitor located at eye height at a distance of 50 cm and wore Sennheiser 380 Pro headphones adjusted to their comfortable listening level. Participants were instructed to respond as quickly and accurately as possible. To begin each trial, a fixation cross was displayed in the centre of the screen. After 500 ms, the target word started playing and participants typed what they heard. Participants were allowed to use backspace but did not receive feedback on their responses.
Each participant was tested on 32 targets and 32 fillers, all repeated in four blocks, once per block, with a 30-second forced break between the blocks. The first two blocks were spoken by the 57-year-old informant and the last two by the 25-year-old informant. The first and the third block were preceded by an additional six fillers at the beginning to habituate the listeners to the voice of the speaker. The 32 targets and the remaining 32 fillers were presented in a pseudo-random order. Each participant was exposed to 64 (items) × 2 (informants) × 2 (repetitions) + 12 (habituation) = 268 trials. The stimuli were presented with the software Expyriment (F. Krause & Lindemann, 2014). After the word recognition experiment, participants reported whether they found any of the words ‘unusual’ or ‘difficult.’
Responses to 46 (participants) × 268 (trials) = 12,328 trials were collected. Responses from the habituation trials (552 items) and from fillers (5,628 items) were excluded prior to any analysis. Nineteen tokens were excluded due to technical difficulties and coding errors.
The remaining 6,129 responses were rated for accuracy. Participants’ responses were compared to the target and classified as Intended Answer, Phonetic Respelling, Typo, Minimal Pair Error, and Other Error. Responses were classified as Intended Answer if spelled as the target or its homophone (e.g., both would and wood were classified as Intended Answer for /wʊd/). In addition, proper nouns spelled with lower case letters and contractions spelled without apostrophes were classified as Intended Answer. Unambiguous, phonetic, but nonstandard spellings of target words (e.g., knowed for node) were classified as Phonetic Respellings. Single letter deletions, additions, letter transpositions, and substitutions within one key distance of the target letter were classified as Typos (Luce & Pisoni, 1998). Responses in which participants confused members of the minimal pairs (e.g., answered fool when the target was full) were classified as Minimal Pair Errors. Any other errors, such as misheard words errors, e.g., cool for pool, were classified as Other Errors. Responses that were ambiguous between Typos and Other Errors, such as how for howl were also classified as Other Errors. Fifteen of the 31 Other Errors were ambiguous between Typos and Other Errors in the /d/ condition and 40 of 84 Other Errors were ambiguous in the /l/ condition. For the purposes of the analysis of accuracy, Intended Answers, Phonetic Respellings, and Typos were accepted as Correct; Minimal Pair Errors and Other Errors were rejected as Incorrect.
RT was measured from the onset of the stimulus to the first key-press. First, RT within 210 ms of stimulus onset (Woods et al., 2015) or above 5,000 ms of stimulus onset (Baayen & Milin, 2010) were excluded from further analysis (0.06% of responses), as were responses beyond mean +/– 2 SD for each participant by coda condition (Ratcliff, 1993), leaving a total of 5,591 trials (91%) for the analysis.
To measure the effect of coda /l/ on accuracy and speed of word recognition, we constructed two GLMMs: one with the independent variable Accuracy and another with RT. To analyze binary accuracy data, we used glmer() with the logistic link function in the lme4 package in R (Bates et al., 2015; R Core Team, 2018). To analyze RT data, we used glmer() with the logarithmic link function because the response time distribution was right-skewed and followed a log-normal distribution (Figure 8). Convergence was estimated using the BOBYQA (Bound Optimization BY Quadratic Approximation) optimizer and an increased number of maximum iterations (Powell, 2009). LmerTest package (Kuznetsova et al., 2017) was used to calculate p-values using Satterthwaite’s degrees of freedom method. The independent variables were Coda (interacting), Vowel (interacting), and Target Frequency (non-interacting). The models included a random by-participant intercept and a by-participant random slope for the effect of coda to account for inter-listener variation. Coda was treatment-coded, comparing /l/ to the baseline /d/. Vowel was deviation-coded, and the main effect of Vowel was investigated by comparing results for each vowel to the grand mean (instead of selecting one vowel as a baseline). Target Frequency was encoded as a continuous variable with the log-normalized per million words frequency of the target taken from the AusE section of GloWbE corpus (Davies, 2013).
Rimes ending in /l/ were disambiguated less accurately (β = –5.07, z2.53 = –2.01, p = 0.0144) and more slowly (β = 0.06, t0.00 = 13.39, p < 0.001) than /d/-final words (Figure 9).3 To test that the accuracy results are due to confusion of minimal pairs, we repeated the analysis of accuracy data after removing responses classified as Other Errors and retaining only the responses classified as correct and Minimal Pair errors in a model with Coda, Vowel, and Lexical Frequency as non-interacting factors. Rimes ending in /l/ were disambiguated less accurately (β = –6, z1.14 = –5.27, p < 0.001) with only Minimal Pair errors too (Figure 10).
Target vowels had no significant main effect on accuracy and did not show any significant interactions with Coda /l/. The lack of significant Vowel effects on accuracy is probably due to the fact that participans were at ceiling in the /d/ condition, therefore there was no variation between target Vowels in the /d/ condition.
Target vowels significantly affected RT (Table 1). Response times for words containing the short target vowels /ɪ, ʊ, ɔ/ were significantly quicker than the grand mean, and response times to words containing long target vowels /iː, ʉː, æɔ/, but not /əʉ/, were slower than the grand mean (Table 1). Response times for words containing phonemically long vowels may have been slower because they were on average 132 ms longer than words containing short vowels, and RT was measured from acoustic stimulus onset.
Our models contained an interaction between Coda and Vowel, but not between Coda and Lexical Frequency. These models suggested that more frequent words were disambiguated more accurately (β = 0.18, z0.06 = 3.21, p = 0.001) and more slowly (β = 0.01, t0.002 = 4.2, p < 0.001), contrary to the established results on faster RT to more frequent words (Meunier & Segui, 1999). Although a detailed analysis of the effects of lexical frequency on the accuracy and speed of recognition of /l/-final words is not possible due to the limited number of words (four for each vowel pair), an exploratory analysis suggests that listeners prefer the more frequent competitor within the pair for coal-Col and mole-moll (Figure 12).
The goal of this experiment was to gauge how listeners use word-level information when they identify /l/-final words. We found significantly less accurate and slower word recognition in /l/-final words compared to /d/-final words. The lower accuracy rates in the /l/ condition were driven by listeners’ tendency to confuse minimal pair competitors, a pattern that did not occur in the /d/ condition.
These findings that listeners sometimes map the acoustic signal inefficiently and even incorrectly may suggest that listeners are not able to recover the intended vowel phoneme under the coarticulatory influence of /l/. Instead of attributing the coarticulatory effects to coda /l/, listeners may sometimes interpret coarticulatory effects as a characteristic of the vowel. Despite the fact that typed responses showed listeners identified the words as /l/-final and therefore perceived the motivating environment for coarticulation, this information was not always useful for correct identification of the word. In this context there may be insufficient information available to listeners to allow for accurate recovery of the intended vowel.
The only two target words in which listeners sometimes failed to perceive the motivating environment were howl (/hæɔl/) and Hal (/hæl/), both of which were perceived as how (/hæɔ/) in respectively 22% and 13% of trials. Confusion of [æɔɫ#] and [æɫ#] with /æɔ#/ is not unexpected, given that the dorsal articulation of coda /l/ is inherently similar to that of a back vowel (Gick, Kang, & Whalen, 2002) and that acoustically, final /l/ can be absorbed in the preceding /æɔ/ (Palethorpe & Cox, 2003). Furthermore, /l/-vocalization is common after back vowels in AusE (Horvath & Horvath, 1997; Borowsky, 2001), which further increases the similarity between howl and how. In contrast, the low front /æ/ in Hal facilitates vocalization to a lesser extent (Horvath & Horvath, 1997; Borowsky, 2001), but if listeners perceive final /l/ as a vowel (i.e., vocalized), it is very likely to be perceived as /ɔ/ due to the correspondence between /æɔ/ and /æl/ (Palethorpe & Cox, 2003).
We did not find that the effect of /l/ on accuracy differed between words with different target vowels, despite an apparent difference in recognition accuracy (Figure 11). We detected a difference in the slowing effect of /l/ between words with different target vowels: The effect was smaller for /iː, ɪ, ʉː, əʉ/ and larger for /ɔ/, indicating increased difficulty for targets with /ɔ/. Overall vowel effects showed that words with short vowels were recognized more quickly compared to words with long vowels. The vowel effect could be the result of listeners waiting until stimulus offset, therefore taking longer to respond to stimuli with long vowels (mean length = 588 ms) compared to short vowels (mean length = 456 ms).
Frequent words were recognized more accurately but more slowly, when interactions between Coda and Target vowel were examined, partly consistent with previous findings (Morton, 1969; Meunier & Segui, 1999). The apparent slowing effect of increased lexical frequency might be the artefact of the stimuli not being balanced for lexical frequency. Nevertheless, an exploratory analysis revealed that only two minimal pairs in the /l/ condition were characterized by a listener preference for the frequent member of the minimal pair in the case of large frequency discrepancies: mole-moll (2.42 versus 0.2 occurrences per million words) (Davies, 2013) and coal-Col (66.96 versus 3.3 occurrences per million words) (Davies, 2013). That is, when the target was very infrequent, Col or mol, listeners defaulted to the more frequent minimal pair competitor, coal and mole. This could be related to participants’ unfamiliarity with the targets Col and mol (Connine, Mullennix, Shernoff, & Yelen, 1990), as some participants flagged the words moll, Col, Hal, Val as ‘unknown’ or even ‘nonsense’ words in the exit interview, but did not flag their minimally differing competitor.
Lower accuracy and slower speed of recognition of lateral-final words indicate increased processing difficulty, which we attribute to the reduced acoustic contrast between the members of the minimal pairs. Reduced acoustic contrast can make word recognition harder by making the acoustic signal inherently ambiguous in perception. Furthermore, reduced acoustic contrast can also increase lexical activation of minimal pair competitors in the /l/-context compared to the /d/-context, which inhibits the recognition of the target (Luce & Pisoni, 1998). That is, vowel-/l/ coarticulation does not only lead to increased processing difficulty, as shown in Experiment 1, but also hinders lexical access. Listeners’ minimal pair errors show that they mapped the acoustic signal to the competitor word instead of the target, indicating that CVl minimal pairs ending in /ʉːl-ʊl, əʉl-ɔl, æɔl-æl/ are inherently ambiguous between two lexical items.
The results of Experiment 1 and 2 combined show that the acoustic signal in prelateral vowels is inherently ambiguous, in particular between the members of the vowel-pairs /ʉː-ʊ, æɔ-æ, əʉ-ɔ/. In Experiment 1, we found reduced perceptual contrast between the vowel-pairs /ʉː-ʊ, æɔ-æ, əʉ-ɔ/ which we attribute to the ambiguity of the acoustic signal. This is supported by the fact that vowels with similar place of articulation are confused with each other in the /d/ and increasingly so in the /l/ condition. Vowel cues are modified by the coarticulatory influence of the coda /l/ in such a way that contrastive cues are masked and the signal becomes ambiguous between two elements in the vowel inventory. In Experiment 2, we found that reduced perceptual vowel contrast and vowel ambiguity caused by the coarticulatory effects of /l/ also hinder lexical access and recognition of /l/-final minimal pairs contrasting /iː-ɪ, ʉː-ʊ, æɔ-æ/ and /əʉ-ɔ/. Listeners’ ability to recognize words might be limited by their familiarity with the word: For unfamiliar words, listeners may map an ambiguous signal to a frequent competitor; however, more research is needed on the effects of frequency and familiarity. The two experiments together show that vowel-lateral coarticulation reduces perceptual vowel contrast both in vowel disambiguation and in word recognition.
Reduced perceptual vowel contrast in the prelateral context can potentially indicate limited compensation for coarticulation. The acoustic cues in lateral-final rimes are inherently ambiguous between cueing coda identity and vowel identity. If listeners attribute these cues to the coda /l/, they should be able to compensate for its coarticulatory influence and correctly identify the prelateral vowel. If, however, listeners attribute the cues to the vowel itself, they will fail to compensate for coarticulation and will misidentify the vowel. The finding that listeners cannot always identify prelateral vowels as they were intended by the speaker despite perceiving /l/ itself is consistent with limited compensation for the coarticulatory effects of /l/.
Perceptual vowel contrast reduction has implications for theories of sound change, as sound change is often related to how coarticulation is produced by the speaker and perceived by the listener (Ohala, 1993; Beddor, 2009; Solé & Ohala, 2010; Ohala, 2012; Garrett & Johnson, 2013; Harrington, Kleber, Reubold, Schiel, & Stevens, 2018). Coarticulation provides systematic and directional variation which may become the input for sound change (Garrett & Johnson, 2013). Ohala’s (1981, 1993, 2012) model of sound change identifies insufficient compensation for coarticulation, not its production, as a process implicated in the initiation of sound change. In contrast, in the interactive phonetic (IP) sound change model by Harrington et al. (2018), insufficient compensation for coarticulation is not the cause, but the effect of and evidence for sound change. In the IP model, the prerequisite of sound change is that typical realizations of two phonemes are acoustically distinct, but highly coarticulated realizations of one phoneme become acoustically similar to the other phoneme (Harrington et al., 2018). Viewed through these models, perceptual contrast reduction observed in these data may be a precursor to a sound change, as listeners do not always retrace the acoustic signal to the speakers’ intended form. Perceptual vowel contrast reduction indicates that the prelateral allophones of the vowel pairs /ʉː-ʊ, æɔ-æ, əʉ-ɔ/ may have merged, although we did not find the perception of pre-/l/ allophones of the vowel pairs to be skewed towards one phonemic category within the pair.
However, not all allophonic variation in production leads to sound change (Ohala, 1993) and failed compensation or miscategorization of items does not always indicate sound change (M. Stevens & Harrington, 2014; Harrington et al., 2018). In order to explore this question, an apparent time or a sociolinguistic study is needed to better understand the implications for the actuation of sound change in the prelateral vowels of Australian English.
Data Accessibility Statement
The authors are committed to making the data accessible in line with the ethics approval from Macquarie University Human Research Ethics Committee for the publication of the data.
The additional files for this article can be found as follows:
PDF file with the Target words used in Experiment 1. DOI: https://doi.org/10.5334/labphon.185.s1
PDF file with the Formant trajectories of /ʉː-ʊ, æɔ-æ, əʉ-ɔ/ in target words in Experiment 1. DOI: https://doi.org/10.5334/labphon.185.s2
PDF file with the Acoustic durations of /ʉː-ʊ, æɔ-æ, əʉ-ɔ/ in target words in Experiment 1. DOI: https://doi.org/10.5334/labphon.185.s3
PDF file with the Formant trajectories of target words in Experiment 2. DOI: https://doi.org/10.5334/labphon.185.s4
ZIP file containing the dataframes and the R script to reproduce statistical analysis. DOI: https://doi.org/10.5334/labphon.185.s5
- The phonemic symbols used in this work are based on the system outlined in Cox and Palethorpe (2007) for describing Australian English. [^]
- RT estimates are reported as log-normalized ms. [^]
- RT estimates are reported as log-normalized ms. [^]
Ethics and Consent
This research has been approved by the Macquarie University Human Research Ethics Committee (reference number: 5201600061 and 5201700256).
This research was supported in part by ARC DE150100318, ARC FT180100462, iMQRTP 2015144, and MQSIS 9201501719 grants. We thank Peter Humburg for his help in the statistical analysis, the Phonetics Lab, the Centre for Language Sciences, and the ARC Centre of Excellence in Cognition and its Disorders at Macquarie University for their comments, feedback, and support. We would also like to thank our participants without whom this research would not have been possible.
The authors have no competing interests to declare.
Baayen, H. R., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12–28. DOI: http://doi.org/10.21500/20112084.807
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. DOI: http://doi.org/10.18637/jss.v067.i01
Beddor, P. S. (2009). A coarticulatory path to sound change. Language, 85(4), 785–821. DOI: http://doi.org/10.1353/lan.0.0165
Beddor, P. S., & Krakow, R. A. (1999). Perception of coarticulatory nasalization by speakers of English and Thai: Evidence for partial compensation. The Journal of the Acoustical Society of America, 106(5), 2868–2887. DOI: http://doi.org/10.1121/1.428111
Beddor, P. S., McGowan, K. B., Boland, J. E., Coetzee, A. W., & Brasher, A. (2013). The time course of perception of coarticulation. Journal of the Acoustical Society of America, 133(4), 2350–2366. DOI: http://doi.org/10.1121/1.4794366
Beddor, P. S., & Strange, W. (1982). Cross-language study of perception of the oral–nasal distinction. The Journal of the Acoustical Society of America, 71(6), 1551–1561. DOI: http://doi.org/10.1121/1.387809
Bennett, D. C. (1968). Spectral form and duration as cues in the recognition of English and German vowels. Language and Speech, 11(2), 65–85. DOI: http://doi.org/10.1177/002383096801100201
Borowsky, T. (2001). The vocalisation of dark l in Australian English. In D. Blair & P. Collins (Eds.), English in Australia (pp. 69–87). Philadelphia, Amsterdam: John Benjamins Publishing Company. DOI: http://doi.org/10.1075/veaw.g26.07bor
Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech communication, 20(3–4), 255–272. DOI: http://doi.org/10.1016/S0167-6393(96)00063-5
Connine, C. M., Mullennix, J., Shernoff, E., & Yelen, J. (1990). Word familiarity and frequency in visual and auditory word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(6), 1084–1096. DOI: http://doi.org/10.1037/0278-7322.214.171.1244
Cox, F. (1999). Vowel change in Australian English. Phonetica, 56, 1–27. DOI: http://doi.org/10.1159/000028438
Cox, F. (2006). The acoustic characteristics of /hVd/ vowels in the speech of some Australian teenagers. Australian Journal of Linguistics, 26(2), 147–179. DOI: http://doi.org/10.1080/07268600600885494
Cox, F., & Fletcher, J. (2017). Australian English pronunciation and transcription. Cambridge: Cambridge University Press. DOI: http://doi.org/10.1017/9781316995631
Cox, F., & Palethorpe, S. (2004). The border effect: Vowel differences across the NSW-Victorian border. In C. Moskovsky (Ed.), Proceedings of the Conference of Australian Linguistics Society. Newcastle, Australia.
Cox, F., & Palethorpe, S. (2007). Australian English. Journal of the International Phonetic Association, 37(3), 341–350. DOI: http://doi.org/10.1017/S0025100307003192
Davies, M. (2008). The corpus of contemporary American English. BYU, Brigham Young University.
Davies, M. (2013). Corpus of global web-based English: 1.9 billion words from speakers in 20 countries.
Ferguson, S. H., & Kewley-Port, D. (2007). Talker differences in clear and conversational speech: Acoustic characteristics of vowels. Journal of Speech, Language, and Hearing Research, 50(5), 1241–1255. DOI: http://doi.org/10.1044/1092-4388(2007/087)
Forster, K. I., & Chambers, S. M. (1973). Lexical access and naming time. Journal of verbal learning and verbal behaviour, 12, 627–635. DOI: http://doi.org/10.1016/S0022-5371(73)80042-8
Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics, 14, 3–28. DOI: http://doi.org/10.1016/S0095-4470(19)30607-2
Fowler, C. A. (1984). Segmentation of coarticulated speech in perception. Perception & Psychophysics, 36(4), 359–368. DOI: http://doi.org/10.3758/BF03202790
Fowler, C. A. (2005). Parsing coarticulated speech in perception: Effects of coarticulation resistance. Journal of Phonetics, 33(2), 199–213. DOI: http://doi.org/10.1016/j.wocn.2004.10.003
Fowler, C. A., Brown, J. M., & Mann, V. A. (2000). Contrast effects do not underlie effects of preceding liquids on stop-consonant identification by humans. Journal of Experimental Psychology: Human perception and performance, 26(3), 877. DOI: http://doi.org/10.1037/0096-15126.96.36.1997
Ganong, W. F. (1980). Phonetic categorization in auditory word perception. Journal of experimental psychology: Human perception and performance, 6(1), 110. DOI: http://doi.org/10.1037/0096-15188.8.131.52
Garrett, A., & Johnson, K. (2013). Phonetic bias in sound change. In A. C. L. Yu (Ed.), Origins of sound change (pp. 51–97). Oxford, UK: Oxford University Press. DOI: http://doi.org/10.1093/acprof:oso/9780199573745.003.0003
Gaskell, M. G., & Marslen-Wilson, W. (1998). Mechanisms of phonological inference in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 24(2), 380–396. DOI: http://doi.org/10.1037/0096-15184.108.40.2060
Gick, B., Kang, M. A., & Whalen, D. H. (2002). MRI evidence for commonality in the post-oral articulations of English vowels and liquids. Journal of Phonetics, 30(3), 357–371. DOI: http://doi.org/10.1006/jpho.2001.0161
Harrington, J., Cox, F., & Evans, Z. (1997). An acoustic phonetic study of broad, general, and cultivated Australian English vowels. Australian Journal of Linguistics, 17(155–184). DOI: http://doi.org/10.1080/07268609708599550
Harrington, J., Kleber, F., Reubold, U., Schiel, F., & Stevens, M. (2018). Linking cognitive and social aspects of sound change using agent-based modeling. Topics in cognitive science, 10(4), 707–728. DOI: http://doi.org/10.1111/tops.12329
Harrington, J., Kleber, F., & Stevens, M. (2016). The relationship between the (mis)-parsing of coarticulation in perception and sound change: Evidence from dissimilation and language acquisition. In Recent advances in nonlinear speech processing (pp. 15–34). Cham: Springer International Publishing. DOI: http://doi.org/10.1007/978-3-319-28109-4_3
Horvath, B. M., & Horvath, R. J. (1997). The geolinguistics of a sound change in progress: /l/ vocalization in Australia. U. Penn Working Papers in Linguistics, 4(1), 109–124.
Iskarous, K., Mooshammer, C., Hoole, P., Recasens, D., Shadle, C. H., Saltzman, E., & Whalen, D. (2013). The coarticulation/invariance scale: Mutual information as a measure of coarticulation resistance, motor synergy, and articulatory invariance. The Journal of the Acoustical Society of America, 134(2), 1271–1282. DOI: http://doi.org/10.1121/1.4812855
Kleber, F., Harrington, J., & Reubold, U. (2012). The relationship between the perception and production of coarticulation during a sound change in progress. Language and Speech, 55(3), 383–405. DOI: http://doi.org/10.1177/0023830911422194
Krause, F., & Lindemann, O. (2014). Expyriment: A python library for cognitive and neuroscientific experiments. Behavior Research Methods, 46(2), 416–428. DOI: http://doi.org/10.3758/s13428-013-0390-6
Krause, J. C., & Braida, L. D. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates. The Journal of the Acoustical Society of America, 115(1), 362–378. DOI: http://doi.org/10.1121/1.1635842
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. DOI: http://doi.org/10.18637/jss.v082.i13
Lin, S., Palethorpe, S., & Cox, F. (2012). An ultrasound exploration of Australian English /CVl/ words. In 14th Australasian International Conference on Speech Science and Technology (pp. 105–108). Sydney, Australia.
Lindblom, B. (1963). Spectrographic study of vowel reduction. Journal of the Acoustical Society of America, 35(11), 1773–1781. DOI: http://doi.org/10.1121/1.1918816
Loakes, D., Clothier, J., Hajek, J., & Fletcher, J. (2014a). Identifying /el/-/æl/: A comparison between two regional Australian towns. In J. Hay & E. Parnell (Eds.), 15th Australasian International Conference on Speech Science and Technology. Christchurch, New Zealand.
Loakes, D., Clothier, J., Hajek, J., & Fletcher, J. (2014b). An investigation of the /el/- /æl/ merger in Australian English: A pilot study on production and perception in South-West Victoria. Australian Journal of Linguistics, 34(4), 436–452. DOI: http://doi.org/10.1080/07268602.2014.929078
Loakes, D., Graetzer, N., Hajek, J., & Fletcher, J. (2012). Vowel perception in Victoria: Variability, confusability and listener expectation. In Proceedings of the 14th Australasian International Conference on Speech Science and Technology (Vol. 14). Macquarie University, Sydney, Australia.
Loakes, D., Hajek, J., & Fletcher, J. (2010a). The /el/-/æl/ sound change in Australian English: A preliminary perception experiment. In Y. Treis & R. De Busser (Eds.), Selected papers from the 2009 conference of the Australian Linguistic Society. Melbourne, Australia.
Loakes, D., Hajek, J., & Fletcher, J. (2010b). Issues in the perception of the /el/-/æl/ contrast in Melbourne: Perception, production and lexical frequency effects. In Proceedings of 13th Australasian International Conference on Speech Science and Technology. Melbourne, Australia.
Loakes, D., Hajek, J., & Fletcher, J. (2010c). (Mis)perceiving /el/-/æl/ in Melbourne English: A micro-analysis of sound perception and change. In Proceedings of 13th Australasian International Conference on Speech Science and Technology. Melbourne, Australia.
Loakes, D., Hajek, J., & Fletcher, J. (2011). /æl/-/el/ transposition in Australian English: Hypercorrection or a competing sound change? In Proceedings of the 17th International Congress of Phonetic Sciences. City University of Hong Kong.
Lotto, A. J., & Kluender, K. R. (1998). General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification. Perception & Psychophysics, 60(4), 602–619. DOI: http://doi.org/10.3758/BF03206049
Luce, P. A., & Pisoni, D. B. (1998, Feb). Recognizing spoken words: The neighborhood activation model. Ear and hearing, 19, 1–36. DOI: http://doi.org/10.1097/00003446-199802000-00001
Magnuson, J. S., McMurray, B., Tanenhaus, M. K., & Aslin, R. N. (2003). Lexical effects on compensation for coarticulation: The ghost of christmash past. Cognitive Science, 27(2), 285–298. DOI: http://doi.org/10.1207/s15516709cog2702_6
Mann, V. A. (1980). Influence of preceding liquid on stop-consonant perception. Perception & Psychophysics, 28(5), 407–412. DOI: http://doi.org/10.3758/BF03204884
Mann, V. A., & Repp, B. H. (1980). Influence of vocalic context on perception of the [ʃ]-[s] distinction. Perception & Psychophysics, 28(3), 213–228. DOI: http://doi.org/10.3758/BF03204377
Meunier, F., & Segui, J. (1999). Frequency effects in auditory word recognition: The case of suffixed words. Journal of Memory and Language, 41, 327–344. DOI: http://doi.org/10.1006/jmla.1999.2642
Mitterer, H. (2006). On the causes of compensation for coarticulation: Evidence for phonological mediation. Perception & Psychophysics, 68(7), 1227–1240. DOI: http://doi.org/10.3758/BF03193723
Mitterer, H., & Blomert, L. (2003). Coping with phonological assimilation in speech perception: Evidence for early compensation. Perception & Psychophysics, 65(6), 956–969. DOI: http://doi.org/10.3758/BF03194826
Morton, J. (1969). Interaction of information in word recognition. Psychological review, 76(2), 165. DOI: http://doi.org/10.1037/h0027366
Neel, A. T. (2008). Vowel space characteristics and vowel identification accuracy. Journal of Speech, Language, and Hearing Research, 51(3), 574–585. DOI: http://doi.org/10.1044/1092-4388(2008/041)
Ohala, J. J. (1981). The listener as a source of sound change. In C. S. Masek, R. A. Hendrick, & M. F. Miller (Eds.), Papers from the parasession on language and behaviour. Chicago, Illinois.
Ohala, J. J. (1993). The phonetics of sound change. In C. Jones (Ed.), Historical linguistics: Problems and perspectives (pp. 237–278). London: Longman.
Ohala, J. J. (2012). The listener as a source of sound change: An update. In M.-J. Solé & D. Recasens (Eds.), The initiation of sound change (pp. 21–36). Amsterdam; Philadelphia: John Benjamins Publishing Company. DOI: http://doi.org/10.1075/cilt.323.05oha
Ollman, R. (1966). Fast guesses in choice reaction time. Psychonomic Science, 6(4), 155–156. DOI: http://doi.org/10.3758/BF03328004
Palethorpe, S., & Cox, F. (2003). Vowel modification in pre-lateral environments. In International seminar on speech production. Macquarie University, Sydney, Australia.
Powell, M. J. D. (2009). The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, 26–46.
R Core Team. (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114(3), 510–532. DOI: http://doi.org/10.1037/0033-2909.114.3.510
Recasens, D. (2002). An EMA study of VCV coarticulatory direction. The Journal of the Acoustical Society of America, 111(6), 2828–2841. DOI: http://doi.org/10.1121/1.1479146
Rubenstein, H., Garfield, L., & Millikan, J. A. (1970). Homographic entries in the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 9(5), 487–494. DOI: http://doi.org/10.1016/S0022-5371(70)80091-3
Segui, J., Mehler, J., Frauenfelder, U., & Morton, J. (1982). The word frequency effect and lexical access. Neuropsychologia, 20(6), 615–627. DOI: http://doi.org/10.1016/0028-3932(82)90061-6
Smits, R. (2001). Evidence for hierarchical categorization of coarticulated phonemes. Journal of Experimental Psychology, 27(5), 1145–1162. DOI: http://doi.org/10.1037/0096-15220.127.116.115
Solé, M.-J., & Ohala, J. J. (2010). What is and what is not under the control of the speaker: Intrinsic vowel duration. Papers in laboratory phonology, 10, 607–655.
Sproat, R., & Fujimura, O. (1993). Allophonic variation in English /l/ and its implications for phonetic implementation. Journal of Phonetics, 21(3), 291–311. DOI: http://doi.org/10.1016/S0095-4470(19)31340-3
Stevens, K. N., & Keyser, S. J. (2010). Quantal theory, enhancement and overlap. Journal of Phonetics, 38(1), 10–19. DOI: http://doi.org/10.1016/j.wocn.2008.10.004
Stevens, M., & Harrington, J. (2014). The individual and the actuation of sound change. Loquens, 1(1), e003. DOI: http://doi.org/10.3989/loquens.2014.003
Szalay, T., Benders, T., Cox, F., Palethorpe, S., & Proctor, M. (2021). Spectral contrast reduction in Australian English /l/-final rimes. Journal of the Acoustical Society of America, 149(2), 1183–1197. DOI: http://doi.org/10.1121/10.0003499
Szalay, T., Benders, T., Cox, F., & Proctor, M. (2018). Production and perception of length contrast in lateral-final rimes. In J. Epps, J. Wolfe, J. Smith, & C. Jones (Eds.), Proceedings of the 17th Australasian International Conference on Speech Science and Technology (pp. 127–132). Sydney, Australia.
Thomas, B., & Hay, J. (2005). A pleasant malady: The Ellen/Allan merger in New Zealand English. Te Reo, 48, 69–93.
Wade, L. (2017). The role of duration in the perception of vowel merger. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 8(1). DOI: http://doi.org/10.5334/labphon.54
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. DOI: http://doi.org/10.1080/01621459.1963.10500845
West, P. (1999). Perception of distributed coarticulatory properties of English /l/ and /r/. Journal of Phonetics, 27(4), 405–426. DOI: http://doi.org/10.1006/jpho.1999.0102
Woods, D. L., Wyma, J. M., Yund, E. W., Herron, T. J., & Reed, B. (2015). Factors influencing the latency of simple reaction time. Frontiers in Human Neuroscience, 9(131), 1–12. DOI: http://doi.org/10.3389/fnhum.2015.00131
Yellott, J. I., Jr. (1971). Correction for fast guessing and the speed-accuracy tradeoff in choice reaction time. Journal of Mathematical Psychology, 8(2), 159–199. DOI: http://doi.org/10.1016/0022-2496(71)90011-3
Zellou, G. (2017). Individual differences in the production of nasal coarticulation and perceptual compensation. Journal of Phonetics, 61, 13–29. DOI: http://doi.org/10.1016/j.wocn.2016.12.002