1. Introduction

Phonological contrasts are not maintained equally in all positions in the syllable and word. Whereas languages typically allow their entire set of phonemic oppositions in syllable onsets, the inventory may be reduced in codas (e.g., Gurevich, 2011; Vandam, 2004). Moreover, there are cross-linguistic differences in the types of contrasts permitted in weak positions such as codas even in closely related languages. For example, in both French and Spanish onsets, obstruents may contrast in voicing, manner, and place. However, whereas French allows the same consonants syllable finally, Spanish obstruent codas are restricted to /s x d/ with /s/ being the most frequent (e.g., Hualde, 2005). Given such cross-linguistic variation, acquiring phonemic contrasts in weak positions such as syllable codas presents at least two challenges for second language (L2) learners. First, once they have determined a given language’s phonemic inventory more generally, they must establish the subset of phonological contrasts permitted in codas. Second, they must determine how this subset of consonants are phonetically realized syllable finally.

To date, research on the L2 acquisition of phonemic contrasts in codas has primarily focused on L1-L2 pairings where L1 contrasts constitute a subset of those attested in the target language. Most of this research has investigated the production of voicing in obstruents, particularly by learners of English of a wide variety of L1s (e.g., Flege & Port, 1981, for L1 Arabic; Simon, 2010, for L1 Dutch; Laeufer, 1996, for L1 French; Smith, Hayes-Harb, Bruss, & Harker, 2009, for L1 German; Crowther & Mann, 1992, for L1 Japanese; Baker, 2010, for L1 Korean; Flege, Munro, & Skelton, 1992, for L1 Mandarin). Research on other phonological contrasts including consonant place is limited (but see Steele, 2005; Wang, 1995, for French and English coda nasal place, respectively)1. Research on the second learning problem – that is, how learners phonetically realize phonological contrasts in weak positions including word-final codas – is more limited. With the exception of acoustic studies on coda voicing (e.g., see the above references) and a few studies on the acquisition of English word-final nasals (Mizoguchi, 2019, for Japanese; Goodin-Mayeda, Renaud, & Rothman, 2011, for Spanish), we know relatively little about how learners come to implement in a more or less target-like way the different phonetic parameters.

The present study seeks to expand our understanding of the L2 acquisition of positionally sensitive contrasts absent from learners’ L1 via an electropalatography (EPG; Gibbon & Nicolaidis, 1999) study of the production of the English word-final coda nasal place /m n ŋ/ contrast (e.g., ram /ɹæm/, ran /ɹæn/, rang /ɹæŋ/) by Japanese- and Spanish-speaking learners. Our study makes two novel contributions. First, it provides insights into the acquisition of place of articulation. Japanese allows for the most restricted set of word-final codas, with no contrastive place in this position and nasals restricted to /N/. In Japanese, there is a two-way /m n/ contrast in onsets, while the word-final nasal /N/ is considered to be phonologically ‘placeless’ and is phonetically realized in utterance-final position with a variable place ranging from uvular (Vance, 1987; Yamane, 2013) to coronal (Mizoguchi, 2019; Maekawa, 2021; see §1.2 for details). Spanish, while less complex than English, allows for a wider range of word-final coda contrasts (e.g., sol /sol/ ‘sun’, sor /soɾ/ ‘sister/nun’, son /son/ ‘they are’, sos /sos/ ‘you are’). As for Spanish nasals, the three-way /m n ɲ/ contrast observed in onsets (e.g., cama /kama/ ‘bed’, cana /kana/ ‘gray hair’, caña /kaɲa/ ‘cane/pole’) is neutralized to either [n] or [ŋ] in final codas, depending on the dialect, with most varieties neutralizing to the alveolar nasal and Caribbean varieties, among others, neutralizing to velars (e.g., Hualde, 2015). As such, Japanese, Spanish, and English represent three different points on the cross-linguistic complexity hierarchy of nasal coda place contrasts: Coda nasals do not have a fixed place target in Japanese (Level 1); neutralize to a single place in Spanish (Level 2); or exhibit a full range of place contrasts in English (Level 3). The differences between Japanese, Spanish, and English result in an interesting learning scenario. To acquire the English nasal coda contrast, both Japanese- and Spanish-speaking learners need to suppress a categorical neutralization process in their L1s and learn a place contrast for a set of consonants whose acoustic cues are weak in utterance-final position (see Miller & Nicely, 1955, for English nasal place cues). If we base our predictions on L2 theories targeting the role of perception of phonological contrasts (e.g., Best, 1995; Best & Tyler, 2007), we could expect that both groups of learners should behave similarly, since this would be a case of single-category assimilation (i.e., the three contrastive nasals in the target language would be assimilated to one nasal in the L1). As such, poor discrimination is expected with a lack of contrast in production. In contrast, if we take into account the phonetic implementation of nasal neutralization in each language, predictions based on theories that compute acoustic similarity (e.g., Flege, 1995) are unclear. Both L1 Japanese and Spanish learners have nasal variants that are similar to at least one but not all of the English nasals. Moreover, as we will see in §1.2, in both languages, nasal codas are realized with a high degree of individual variation. Thus, we would have to make predictions based on each individual’s idiolect. For example, we could predict that Japanese learners for whom the most frequent L1 realization is a velar nasal would be more accurate with English velar nasals, whereas Spanish speakers who neutralize to an alveolar nasal would be more accurate with alveolars. Finally, if we look at the overall phonotactic similarity across the languages, as concerns the type and frequency of word-final coda consonants, then we could predict that Spanish-speaking learners will be more accurate in realizing English coda nasal place than their Japanese peers of similar L2 proficiency, as Spanish is typologically closer to English in the variety of word-final codas allowed. Our second contribution is to expand our currently limited understanding of the acquisition of the phonetic implementation of new coda contrasts, in particular place of articulation. Place differences in coda nasals are not easily captured auditorily (Jun, 1995; Kawahara & Garvey, 2014; Miller & Nicely, 1955), and the acoustic models used to study nasal place are complex (e.g., Stevens, 1998). Electropalatography (EPG), in contrast, is a reliable technique for capturing place contrasts and constriction degree. To date, very few studies have used this technique for the investigation of L2 speech despite its ability to capture fine-grained articulatory differences in lingual consonant articulations in a way that makes cross-speaker comparisons relatively straightforward (given palate grid normalization; see Mennen, Scobbie, de Leeuw, Schaeffler, & Schaeffler, 2010, p. 23).

Having outlined the general research problem as well as the particulars of our study, we now turn to a review of previous research on the L2 acquisition of coda contrasts.

1.1. L2 acquisition of coda contrasts

In this section, we review previous research on the L2 acquisition of coda contrasts, both as concerns the acquisition of new phonemic contrasts and their target language-specific phonetic realization. As highlighted earlier, these constitute two separate learning problems for non-native speakers – indeed, learners may realize a perceptually distinct voicing, place, or manner contrast without their phonetic implementation being (absolutely) target-like. Given that most previous research has looked at voicing contrasts, we will first discuss these, followed by studies of the L2 acquisition of consonant place.

Research on L2 learners’ acquisition of word-final obstruent voicing has revealed several consistent patterns. In terms of the acquisition of the phonological opposition, at the developmental stage where learners are able to produce a coda obstruent but are unable to realize it in a phonetically target-like manner, the voiced member of the opposition is variably realized as voiceless (e.g., Young-Scholten, 2004) or as voiced with the help of a following epenthetic vowel (e.g., Cardoso, 2007; Sekiya & Jo, 1997). As concerns the acquisition of the phonetic parameters of voicing, studies have looked first and foremost at learners’ mastery of laryngeal voicing and the duration of the preceding vowel2. Almost without exception, learners are more target-like with the latter parameter (e.g., Baker, 2010; Crowther & Mann, 1992; Flege, McCutcheon, & Smith, 1987; Patience & Steele, 2022). Learners’ accuracy with the production of voicing is conditioned by various factors including universal constraints on its realization. For example, Yavas (1997), in a study of the production of English final voiced stops by L1 Japanese, Mandarin, and Portuguese speakers, found effects for vowel height and stop place of articulation: Learners’ voicing of /d ɡ/ but not /b/ was affected by preceding vowel height.

There is a dearth of research on the L2 acquisition of coda place and what little research has been done has focused exclusively on nasals. Moreover, most of these studies rely on auditory transcriptions, with less than a handful of instrumental and perception studies. The first of these, Wang (1995), investigated the production of the same three-way /m n ŋ/ contrast examined in the current study by L1 Mandarin-L2 English speakers. Wang’s intermediate proficiency learners were highly accurate with words ending with their L1 /n ŋ/ (90% and 100% accuracy, respectively). In contrast, accuracy with /m/-final forms was much lower (47%), which can be attributed to the absence of this consonant in coda in their L1. Non-target-like forms did not involve deletion or epenthesis but rather substitution of [n]. Steele (2005) looked at the syllabification of French word-final nasal vowel+stop sequences (e.g., lampe [lɑ̃p] ‘lamp’, conte [kɔ̃t] ‘tale’, banque [bɑ̃k] ‘bank’) by five beginner Mandarin-speaking learners. While such forms involve no coda nasals in native French, Steele’s learners often realized the nasal vowel as a vowel+nasal consonant sequence. Consequently, their production of such words involved final nasal+stop clusters. In contrast to the patterns observed in the L1 Mandarin-L2 English data in Wang (1995), no place-based accuracy asymmetry was observed. Indeed, these beginner learners were able to realize the three-way French place contrast equally well including with highly target-like syllable-final [m], the coda nasal absent from their L1 ([mC#] 92%, ([nC#] 86%, ([ŋC#] 96%). The greater accuracy of Steele’s relatively less proficient learners – recall that the participants in Wang’s (1995) study were of intermediate not beginner proficiency – may be due to the fact that homorganic nasal-obstruent sequences are attested in languages including Japanese that do not allow independent coda place contrasts. In the same way that learners may use epenthesis to facilitate the production of voicing in the preceding obstruent, at intermediate stages of development, they may be more accurate when coda nasal place is co-realized with that of the following obstruent. Two other studies (Goodin-Mayeda et al., 2011; Mizoguchi, 2019), which focused on L1 Spanish and Japanese learners of English, respectively, shed light on the linguistic structure under analysis here. Mizoguchi (2019) conducted an ultrasound and acoustic study of eight L1 Japanese-L2 English learners, who were asked to produce the utterance-final nasals in the words bum, bun, and bung as well as several control items with same-place oral stops. A large amount of inter-speaker variation was attested. Only one of the speakers consistently produced the three consonants as bilabial, alveolar, and velar as well as being distinct from Japanese /N/ (as produced in a separate experiment). Three speakers distinguished the three places (and differentiated them from /N/) for the most part, although not always producing the expected target consonants (e.g., there were instances of place substitutions including of palatals for target velar realizations). One less target-like speaker consistently substituted their L1 /N/ for English /n/, while producing the other consonants in a target-like manner. Two other speakers’ production involved /N/ substitution for both target /n/ and /ŋ/, while distinguishing the bilabial /m/. The remaining speaker substituted /N/ for both /m/ and /n/, while producing /ŋ/ distinctly but as the nasal+stop sequence [ŋɡ] (as was manifested in the consistent presence of an acoustic stop burst). Based on these results, Mizoguchi (2019) proposed a developmental path for L1 Japanese learners of English: Namely, the bilabial nasal is acquired first, the alveolar nasal last. The performance of the speakers roughly corresponded to their self-reported English proficiency levels (beginner to advanced), while showing little correlation with their length of residence in the U.S. (which ranged between zero months to nine years). As concerns Spanish-speaking learners, Goodin-Mayeda et al. (2011) reported on a perception and production study on the acquisition of word-final nasal contrasts by L1 Spanish-L2 English speakers. The study included 24 self-reported advanced learners of English living in the US as well as 25 native speaker controls. Participants performed an AX discrimination task with 18 experimental tokens (nine minimal pairs; three tokens per consonant). After listening to each minimal pair, participants indicated whether the stimuli were same/different or whether they were unsure (the latter option was scored as incorrect). The production task was presented as a grammaticality judgment task and involved the reading of declarative sentences, the last word of which contained one of the three target nasals (six tokens per condition). Responses were coded by two transcribers. Results of the discrimination task revealed that, although the learners differed from the controls, they were highly accurate (/n-ŋ/: 80%, /m-ŋ/: 88%). In production, participants were also highly accurate (/n/: 100%; /m/: 94%; /ŋ/: 92%), and, for all consonants, substitutions involved [n], which is consistent with L1-based neutralization patterns. These results, which should be interpreted with caution given the type of task (AX discrimination) and the lack of instrumental analysis of the production stimuli, suggested that L1 Spanish speakers may eventually be able to acquire the three-way English nasal contrast.

In summary, previous research on the L2 acquisition of obstruent coda contrasts has focused mainly on voicing, particularly in stops. Such studies have revealed developmental sequences – voiceless obstruents are acquired before their voiced counterparts, and the latter are often initially realized as devoiced or voiced with the assistance of a following epenthetic vowel (i.e., via resyllabification). Learners’ phonetic implementation of the voicing contrast is more accurate in terms of preceding vowel duration than the voicing of the stop. Studies of place contrasts, while few in number, have revealed strong L1-based influence (Wang, 1995; Mizoguchi, 2019) or learners’ ability to realize coda place early on when the feature is co-realized with a following stop (Steele, 2005). In contrast to research on coda voicing, we are aware of only one recent ultrasound study (Mizoguchi, 2019) that has analyzed the acoustic and articulatory implementation of place in English nasals. The current EPG study thus makes a novel contribution in comparing Japanese- and Spanish-speaking learners’ phonetic realization of the English coda /m n ŋ/ contrast using EPG, which is ideal for capturing place differences. Before presenting our EPG study, we first review further the set of contrasts permitted in Japanese and Spanish in order to be able to formulate L1-influence-based hypotheses.

1.2. Coda nasals in English, Japanese, and Spanish

The target language in this study, English, presents a three-way coda nasal /m n ŋ/ contrast (Table 1). English syllable-final coronal nasals have been reported to assimilate categorically to following labials and gradiently to other coronals and velars (Jones, 1962). Several articulatory studies using EPG have shown that, in coronal-velar sequences, categorical assimilation is most frequent (e.g., Hardcastle, 1995; Kochetov, Colantoni, & Steele, 2021) but realizations with overlapping alveolar and velar gestures have also been observed (Barry, 1985; 1991; Hardcastle, 1995; Kochetov et al., 2021). These studies have uncovered a series of contextual (e.g., word-internal versus cross-word clusters; speech style; type of text) and demographic variables (e.g., dialect) that condition the types of assimilation, including reports of individual variability related to speech rate (e.g., no assimilation in slow speech versus individual variability in fast speech; Ellis & Hardacastle, 2002).3 Variability in the realization of word-final nasals has been observed (Cruttenden, 2014) and has been noted to be partly conditioned by the sex of the speaker (e.g., Byrd, 1994). The most widely studied feature is the alveolar realizations of nasals in -ing sequences (e.g., talking [tɑkɪŋ], [tɑkɪn]; Fischer, 1958; Labov, 1966; 2001); deletion of alveolar nasals in words like on is also attested (e.g., Thomas & Bayley, 2011). In spite of the variability, there are a number of minimal pairs in very frequent words (e.g., sun-sung, thin-thing).

Table 1

Summary of the nasal inventory and type of contrasts in the L1s and target language (L2) of the study.

Language Place of articulation (onset/coda) C# Contrast
Bilabial Dental/Alveolar Palatal Velar Uvular
English mom/ram nun/ran rang /m n ŋ/
Japanese mo ‘also’ (myō ‘strange’) no ‘field’ (nyō ‘urine’) hoN ‘book’ No place contrast; neutralization to velar-uvular place; oral constriction location conditioned by preceding vowel; great degree of individual variability in terms of place and manner (vocalization)
Spanish mota ‘speck’ nota ‘note’/pan ‘bread’ ñoqui ‘gnocchi’ pan No contrast; neutralization to a single place that varies by dialect (alveolar in most dialects; velar in Caribbean dialects, among others).

As shown in Table 1, Japanese has a two-way /m n/ contrast in onsets, as well as their palatal(ized) variants before /j/ and /i/ ([mʲ] and [ɲ]). In word-final position, the nasal /N/ (the so-called ‘mora nasal’) is considered phonologically ‘placeless’, assimilating to the place of articulation of adjacent consonants and to the place and stricture of vowels (e.g., /hoN mo/ [homː mo] ‘book too’, /hoN o/ [hoõo] ‘book (direct object)’; Vance, 1987). In neutral, utterance-final (pre-pausal) position following non-high vowels, the consonant is produced as a weakly articulated uvular nasal [ɴ̞] or nasalized glide [ɰ̃] (Vance, 1987). Yamane’s (2013) ultrasound study of the consonant found that /N/ was produced by her six speakers with significant dorsum raising, yet the precise target for this raising varied considerably across speakers. Mizoguchi’s (2019) ultrasound study, in which stimuli involved the mora nasal preceded by the vowel [a], also revealed variability among the eight L1 Japanese participants, with three speakers realizing a uvular constriction, one speaker a constriction ranging from velar to uvular, one participant each with either a velar or coronal constriction, and two participants with no visible oral constriction. Maekawa’s (2021) real-time MRI study of the production of utterance final /N/ by 11 Tokyo Japanese speakers identified an additional source of variation. While the closure location of /N/ ranged from the hard palate to the uvula, it was argued that place could be predicted statistically based on the preceding vowel. Although the analysis focused on the tokens for which an oral closure was identified, Maekawa reported that 12% of the tokens in the complete data set were realized as a vowel. Kochetov’s (2014) EPG study of /N/ as produced by five Japanese speakers (three of whom are participants in the current study) revealed that the consonant’s constriction in utterance-final position could not be fully captured by the artificial palate and, thus, is likely a posterior velar or uvular. When occurring word finally before another segment, the nasal was found to take on the place and constriction of the latter, mainly in a categorical fashion (Kochetov, 2014; Mizoguchi, 2019; Stephenson & Harrington, 2002). It should be noted that /N/ is typically used to render final /n/ in numerous English loanwords (e.g., /na.pu.kiN/ ‘napkin’). In contrast, the bilabial /m/ is rendered as the syllable /mu/ (e.g., /ku.riː.mu/ ‘cream’), /ŋ/ as the sequence /Nɡu/ that is phonetically realized as a homorganic nasal+stop (e.g., /soN.ɡu/, [soŋɡu] ‘song’; Heffernan, 2005). In a perceptual study, Aoyama (2003) found that low-proficiency Japanese-speaking learners of English had no difficulty discriminating between English final /m/ and the other two final nasals, but often confused final /n/ and /ŋ/. In a follow-up categorization study with intermediate proficiency learners, the same author found that English final /m/ was consistently mapped on to the Japanese syllable /mu/; there were more errors involving the other two nasals, yet the more common patterns involved the mapping of the English /n/ and /ŋ/ onto the Japanese segment /N/ and the sequence [ŋɡu] (/Nɡu/), respectively. Based on Aoyama’s (2003) results, one may predict that Japanese speakers would have less difficulty producing final /m/ and more difficulty with /n/ and /ŋ/, with the alveolar nasal in particular being subject to the substitution of L1 /N/. This was largely confirmed in the ultrasound study by Mizoguchi (2019) summarized in the previous section.

In Spanish, the three-way /m n ɲ/ contrast observed in onsets is neutralized to either [n] or [ŋ] in word-final codas, depending on the dialect (Hualde, 2015; Ramsammy, 2011; Colantoni & Kochetov, 20124). Even if variable realizations of bilabial nasals are reported in words spelled with <m> (e.g., álbum, referendum) where the nasal may be realized as [n]/[ŋ] or [m], most speakers in most varieties neutralize word-final nasals to a single nasal5. In particular, and as concerns our speakers (see § 2.1), alveolar nasals are the norm in Madrid and Buenos Aires (Colantoni & Hualde, 2013), whereas velar nasals are expected in Cuban Spanish, since velarization is a general process in Caribbean varieties (Hualde, 2015; Kochetov & Colantoni, 2010). Kochetov and Colantoni (2010) described the realization of word-final nasals in utterance-final position for all of the Spanish-speaking participants in the current study. Results confirmed previous descriptions for Argentine Spanish. The Cuban participant, however, displayed an alternation between the expected velar nasal and alveolar realizations, which was conditioned by vowel type and stress.

It is important to point out that Spanish nasals categorically assimilate in place to the following consonant in word-internal or word-final position (campo [kampo] ‘field’, tango [t̪aŋɡo] ‘tango’, tan bien [t̪am bjen] ‘so well’, tan caro [t̪aŋ kaɾo] ‘so expensive’). This was also the case for Kochetov and Colantoni’s (2010) participants, independent of the type of nasal that they had in utterance-final position. However, when a word-final nasal is followed by a vowel-initial word, it may be variably realized as alveolar even in velarizing dialects (Hualde, 2015; Kochetov & Colantoni, 2010).

In summary, Japanese and Spanish clearly differ from English in the presence versus absence of word-final nasal place contrasts. Although Japanese and Spanish may seem very similar in permitting versus lacking place contrasts in onset and codas, respectively, and in that neutralization and assimilation to following segments are the norm, there are key phonological and phonetic differences between the two languages. From a phonological point of view, Spanish allows a greater number of word-final sonorant and obstruent contrasts than Japanese. Conversely, Japanese allows for nasals to assimilate to following vowels, which is not attested in Spanish. From a phonetic point of view, the languages also differ. The Japanese word-final mora nasal does not have a fixed place of articulation. Although articulatory studies have corroborated previous phonological characterizations of nasals as being placeless (at least for some participants), the consensus seems to be that /N/ does not have a fixed place of articulation and can be realized as a vowel or with a constriction at any place along the vocal tract. Spanish, in contrast, has a fixed neutralization target that may vary from dialect to dialect, and which allows only for a handful of lexical exceptions. Thus, based on the phonological properties of final coda nasals in English, Japanese, and Spanish, we propose that the three languages are at three different points along the coda nasal complexity hierarchy. Japanese is characterized by the complete lack of independent place (Level 1); Spanish neutralizes the three-way contrast observed in onsets to a single place (Level 2); and English exhibits the full range of place contrasts observed in the language (Level 3).

Given the absence of a three-way word-final nasal place contrast in Japanese and Spanish, we expect both groups to have difficulty mastering the English place contrast in word-final position. If coda complexity is the most important factor guiding the acquisition of coda nasals, then, we would expect that Spanish learners will outperform their Japanese counterparts. However, as summarized in the introduction, if perceptual factors are guiding the acquisition of contrasts, both groups should perform in a similar way, since the weak cues to nasal place contrast in coda position should impact learners’ ability to acquire the nasal. Finally, if individual idiolectal realizations are transferred from the L1, we should expect variability in utterance-final nasals in the participants’ L2.

2. Method

2.1. Participants

Eight speakers (seven females) participated in the study: Six L2 English learners whose first language was either Japanese or Spanish (three of each language), and two L1 English controls. Table 2 provides participants’ profiles: Their gender, place of birth, and age at the time of the testing; and, in the case of the L2 learners, their age at onset of acquisition (AOA), reported daily % use of their L2 at the time of data collection, and mean English oral proficiency as measured by accentedness scores.

Table 2

Participant profiles (AOA = age at onset of acquisition).

L1 Speaker Sex Place of birth Age AOA Daily use of L2 Mean English accentedness score
Japanese JP1 f Shizuoka, Japan 46 24 40% 2.6
JP2 f Ibaraki, Japan 41 26 40% 2.3
JP3 f Kyoto, Japan 46 29 50% 2.6
Spanish SP1 f Havana, Cuba 40 28 30% 2.4
SP2 f Buenos Aires, Argentina 56 44 40% 2.9
SP3 f Madrid, Spain 47 25 30% 1.4
English EN1 f Ontario, Canada 31 (NA) 100% 5.0
EN2 m Ontario, Canada 44 (NA) 100% 5.0

The Japanese speakers were from various locations on the island of Honshu (Shizuoka, JP1; Ibaraki, JP2; Kyoto, JP3). The Spanish speakers were from Cuba (Havana, SP1), Argentina (Buenos Aires, SP2), and Spain (Madrid, SP3). All participants had lived in Canada for 12 or more years, and were residing in Toronto at the time of testing. They reported using both English and their L1 on a regular basis, with the former employed in 30% to 50% of their daily interactions. None of the participants reported any history of speech or language disorders.

Accentedness scores were used as a measure of overall spoken English proficiency (see e.g., Bongaerts, Mennen, & Slik, 2000; Colantoni & Steele, 2007) and were calculated as follows. The speakers’ readings of the Northwind and the Sun passage (approximately one minute of speech each) were intermixed with those of eight other L2 speakers of different L1s (four French; two Korean; one each of Punjabi and Russian), as well as that of another native English speaker. The 17 recordings were randomized and presented to 10 native speakers in one of five randomized orders. The judges were asked to rate each of the speech samples on a 5-point scale, ranging from ‘1 – The speaker that you have just heard has a very strong accent and is definitely not a native speaker of English’ to ‘5 – The speaker has no foreign accent at all and is without a question a native speaker of English’. Two of the judges’ scores were not used because they failed to assign scores of ‘5’ to two of the three English native speakers. The accentedness scores in Table 2 correspond to the mean of the remaining eight judges. As can be seen in this table, accentedness scores for the L1 Japanese (range 2.3-2.6; mean 2.5) and Spanish speakers (range 1.4-2.9; mean 2.2) are considerably lower than for the English controls (mean 5.0), and, with the exception of Spanish speaker SP3 whose accent was considered to be strong, can be regarded as showing intermediate accentedness. In spite of the presence of an accent, all of the L2 speakers were fluent English speakers, had lived in Canada for 12 to 22 years (mean 17), and used the language on a daily basis. Given their mainly similar accentedness scores – with the exception of SP3, all of the participants’ mean accentedness scores fell between ‘2 – The speaker has a strong accent’ and ‘3 – The speaker has a medium accent’ – we do not expect any proficiency-based differences between the Japanese and Spanish groups.

2.2. Materials

The study materials consisted of several datasets, as summarized in Table 3. The ‘balanced set’ (a) included the word-final target nasals /m/, /n/, and /ŋ/ in the words awesome, common, and charming produced nine to 10 times by each speaker in the carrier phrase That’s a _______ answer (207 tokens). This set was specifically designed to test L1 influence on the production of nasals. Our preliminary investigation of the data, however, revealed relatively little within-speaker and within-group variation, which contrasted with our observations based on other data collected from the speakers. We therefore decided to include all of the available read materials containing final nasals, recorded over several sessions. These additional materials (referred to in the table as the ‘corpus set’ (b)) were read by all eight speakers and consisted of words with (i) word-final prevocalic nasals (e.g., cream, queen, and nothing) produced in the carrier sentences Say ___ again and Say __ aloud; (ii) word-final prevocalic nasals in a text (a paragraph from George Orwell’s novel ‘1984’); (iii) utterance-final nasals in isolated words (the same as the words in the carrier sentence Say ___ again in (i) above); (iv) utterance-final nasals (e.g., napkin, again) produced in carrier sentences (That’s an extra ___ and Say hVd ___); and (v) utterance-final nasals occurring in the same text as in (ii). In addition, we included control items (c) with nasals occurring preconsonantally (e.g., lamp, sprint, rethink) in isolated words, one of the two carrier sentences mentioned above, and the same text. Nasals in these items were expected to be produced the same way regardless of the L1, as homorganic bilabial, alveolar, and velar nasals are permitted in English, Japanese, and Spanish. Altogether, this resulted in a total of 121 items and 4973 tokens among the speakers. The full list of materials is provided in the Appendix. In summary, the corpus set included read speech produced under three reading task conditions – as isolated words, in a carrier sentence, and in a novel excerpt. The inclusion of different reading tasks (as well as a range of lexical items) was expected to provide more insight into the variation in the production of final nasals (as was previously observed in Colantoni & Kochetov, 2012, for Spanish). Note that, with the exception of the balanced set, the number of items per condition varied from set to set and factors such as stress, adjacent vowel quality, and the lexical status of words were not controlled for. With most items involving /m/ and /n/, the consonants occurred within (mostly nominal) roots (87%) and in stressed syllables (64%). In contrast, all but four items with /ŋ/ occurred in the unstressed verbal suffix –ing. In all items, the velar nasal occurred after the high front lax vowel [ɪ], while the other nasals occurred in a variety of vocalic contexts.

Table 3

Overview of the materials: Items and tokens produced by speaker by nasal consonant and dataset (see the text for details).

Stimulus type Items Tokens
/m/ /n/ /ŋ/ Total /m/ /n/ /ŋ/ Total
a. Balanced Set: target word-final prevocalic nasals in a carrier sentence 1 1 1 3 69 68 70 207
b. i. Corpus: target word-final prevocalic nasals in a carrier sentence 9 21 14 44 314 683 704 1701
ii. Corpus: target word-final prevocalic nasals in a text 4 11 6 21 250 692 378 1320
iii. Corpus: target utterance-final nasals in isolated words 7 16 2 25 196 447 56 699
iv. Corpus: target utterance-final nasals in a carrier sentence 1 6 0 7 72 296 0 368
v. Corpus: utterance-final nasals in a text 1 7 3 11 63 440 189 692
c. Control: preconsonantal nasals in isolated words/carrier sentence/text 3 3 4 10 119 116 119 354
26 65 30 121 1011 2446 1516 4973

As mentioned above, all of the materials were read by all eight speakers. Apart from the balanced set, the number of repetitions for the other datasets varied somewhat across speakers. On average, words were repeated three times by EN1, four times by SP3, and six times by the remaining speakers.

2.3 Instrumentation and procedure

The participants wore custom-made acrylic palates with 62 EPG electrodes. Most speakers (EN1, JP1, JP2, SP1, SP2, and SP3) had the traditional style ‘old’ palates manufactured by InciDental; the other two had ‘new’ palates manufactured by Articulate Instruments (see Wrench, 2007, on both designs). Both palate designs make use of a grid of 62 electrodes (see Figure 1c below) but differ mainly in the extent of coverage of the velar region (which is typically greater for the new palates; Tabain, 2011; see also Kochetov, Colantoni, & Steele, 2017). Differences in the palate design were not expected to affect the way the palates capture the general categories of constrictions (as defined in Section 2.4 below). This was further confirmed by comparing the patterns of three nasals in the control items, as illustrated in Figure A1 in the Appendix. Articulatory data were collected using a WinEPG system by Articulate Instruments (Wrench, Gibbon, McNeill, & Wood, 2002) at a sample rate of 100 Hz. Audio data were collected simultaneously at 22,050 Hz.

Prior to each recording, the participants took time to accommodate to the palate by reading the ‘Northwind and the Sun’ passage used for accentedness assessment and by speaking to the experimenter. All of the speakers were familiar with the EPG recording procedure and accustomed to wearing the palate, as they were participants in a larger cross-language articulatory study involving multiple recording sessions.

2.4. Annotation and analysis

The data analysis was performed using the software Articulatory Assistant (Wrench et al., 2002), which simultaneously captures acoustic and articulatory information. For each utterance, the nasal interval was annotated based on the acoustic signal (the waveform with reference to the spectrogram) as a period of weak formant structure/nasal murmur. Sample annotations for /n/ and /ŋ/ in common answer and charming answer are shown in the top panel of Figure 1 (a). Each of the two images contains a waveform, a spectrogram, and a sequence of palate frames (sampled every 10 ms), which indicate the contact between the tongue and the artificial palate. We can see from the enlarged palate frame sequences in (b) that the preceding and following vowels show some side contact (darker purple cells on the left and the right; more for [ɪ] than [ə]), while the nasal in each case shows one or two complete rows of contact – either in the alveolar region (the first three rows of the artificial palate) or the velar region (the last three rows). The display on the top right of each image provides an average contact profile for the entire nasal interval, with numbers in each cell indicating the mean percentage of contact for each electrode over the duration of the nasal (0%/white = no contact at all, 100%/dark purple = contact through the entire interval). The zoning of the palate is further displayed in (c), indicating all columns (C1-C8) and rows (R1-R8) of electrodes.

Figure 1
Figure 1

Sample tokens of ‘common answer’ and ‘charming answer’ as realized by JP2 (token 1 of each), illustrating (a) the annotation of the nasal interval and the following consonant closure, and (b) a sequence of palates frames during the nasal, and (c) the zoning of the artificial palate.

Based on the linguopalatal contact data obtained, two kinds of analysis were performed. The quantitative analysis made use of the variables ‘Alveolar Closure’ and ‘Velar Closure’ calculated automatically by the Articulatory Assistant software. These variables are based on the amount and horizontal extent of contact in the first three (alveolar) and last three (velar) rows, respectively (see Figure 1c). The measurements were extracted and averaged over two points – the midpoint of the nasal interval and the point of maximum contact (PMC).6 In general, an /n/ produced with a complete closure (as in the top panel in Figure 1b) is characterized by high Alveolar Closure values (around 1.00) and low Velar Closure values (approaching 0.00, but typically higher in front vowel contexts). In contrast, an /ŋ/ produced with a complete closure (as in the bottom panel in Figure 1b) would show low Alveolar Closure (near 0.00) and high Velar Closure values (approaching 1.00). As /m/ is not produced with an active lingual gesture, both variables are expected to be at their minimum (albeit with above zero Velar Closure values next to non-low vowels). The two variables are thus expected to clearly distinguish the three-way place contrast in nasals. This is illustrated in Figure 2 for the control items produced by our English speakers. Similar quantitative analyses of nasal constrictions have been used extensively in previous EPG studies (e.g., Celata, Calamai, Ricci, & Bertini, 2013; Kochetov & Colantoni, 2011; Stephenson & Harrington, 2002; among others).

Figure 2
Figure 2

Scatterplot of Alveolar Closure and Velar Closure values by consonant and speaker (L1 English participants, EN1 and EN2) in the control items (individual tokens) (see Table 3).

The second analysis involved a classification of alveolar and velar closure patterns based on qualitative visual inspection of temporal sequences of palate frames, based on criteria established in a number of previous EPG studies of assimilation and deletion (Barry, 1991; Hardcastle, 1995; Shockey, 1991; Wright & Kerswill, 1989). A nasal was considered to have a ‘full alveolar closure’ if at least one frame involved a full row of electrodes in the anterior portion of the palate (rows 1–3; see Figure 3a; cf. Figure 1). It was considered to have a ‘full velar closure’ if at least one frame involved a full row of activated electrodes in the posterior portion of the palate (rows 6–8; see Figure 3b). Alveolar or velar constrictions that lacked at least one cell in a row (yet that showed more contact than adjacent vowels) were considered as ‘partial’ closures’ (not shown in the figure; but see Figures 7, 8, 9). Nasals that did not show an active lingual constriction (independent of an adjacent vowel) were classified as ‘neither’ (see Figure 3c). Finally, there were occasional instances of nasals that contained both alveolar and velar constrictions, produced either simultaneously (see Figure 3d) or asynchronously. Such tokens were labeled ‘both’ and could contain either ‘full’ or ‘partial’ closures. Overall, the use of closure categories allowed to provide additional characterization of nasal production patterns, as some categories (such as complex alveolar-velar nasals) were not easily discernable via the quantitative analysis.

Figure 3
Figure 3

Sample tokens (temporal sequences of palates) illustrating the four main categories of nasal consonant closure observed based on the qualitative classification of the nasals.

The quantitative analysis measures (Alveolar and Velar Closure values) for target items were submitted to linear mixed effects regression (LMER) models implemented with the lme4 package (Bates, Maechler, Bolker, Walker, Christensen, Singmann, & Grothendieck, 2017) using R (R Core Team, 2014). The first set of models was run on the balanced set, which fully controls for phonetic context and stress (yet is limited to a single position and a single lexical item per consonant). The models examined the contribution of the target consonant and participants’ L1 on the realization of the nasal. The second set of models was run on the entire corpus of target items, considering consonant, L1, and position (word-final prevocalic and utterance-final). Details of these models are presented in the corresponding Results sections.

3. Results

3.1. Quantitative analysis

3.1.1. Balanced dataset

We will begin with the results for the fully balanced dataset that includes the nasals /m/, /n/, and /ŋ/ in the words awesome, common, and charming produced 9-10 times by all speakers in the carrier phrase That’s a _______ answer. To examine these results statistically, LMER models were performed for both Alveolar Closure and Velar Closure with the fixed effects Consonant and Language, and the interaction of the two. The random effect was Speaker with random intercept and slopes.7 The results of these models for both variables are summarized in Table 4, in (a) and (b) respectively. We can see that Alveolar Closure values were significantly higher for /n/ than for the baseline (English) /m/, while not being different from /ŋ/. The learners’ group values were not significantly different from the control group, except for /ŋ/ with the L1 Spanish speakers (the Cng x L1SP interaction). For Velar Closure, values were significantly higher for /ŋ/ than for the baseline English /m/, as well as for the Spanish group than the baseline English group. The Cng x L1SP interaction indicated that this was due to the higher values for Spanish /ŋ/.

Table 4

Summary of a linear mixed model for Consonant and L1 fit for the (a) Alveolar Closure and (b) Velar Closure data; the intercept is English /m/; significance levels: ‘***’ <.001, ‘**’ <.01, ‘*’ <.05.

Estimate Std. Error df t value Pr(>|t|)
a. Alveolar Closure (Intercept) 0.07 0.11 7.93 0.61 0.5617
Cn 0.71 0.12 7.98 6.16 0.0003 ***
Cng –0.03 0.09 7.87 –0.35 0.7325
L1JP 0.08 0.15 7.96 0.56 0.5890
L1SP 0.16 0.15 7.94 1.05 0.3230
Cn:L1JP 0.13 0.15 8.04 0.85 0.4180
Cng:L1JP 0.01 0.12 7.98 0.10 0.9240
Cn:L1SP 0.06 0.15 7.97 0.39 0.7107
Cng:L1SP 0.77 0.12 7.90 6.36 0.0002 ***
b. Velar Closure (Intercept) 0.15 0.02 7.90 6.90 0.0001 ***
Cn 0.02 0.03 8.60 0.91 0.3888
Cng 0.67 0.04 7.98 16.65 <0.0001 ***
L1JP 0.06 0.03 8.07 2.22 0.0572
L1SP 0.09 0.03 7.98 3.24 0.0120 *
Cn:L1JP 0.01 0.03 8.80 0.32 0.7546
Cng:L1JP 0.04 0.05 8.08 0.72 0.4947
Cn:L1SP –0.06 0.03 8.57 –1.68 0.1296
Cng:L1SP –0.57 0.05 8.00 –11.00 <0.0001 ***

The significant differences in Consonant and L1 revealed by the models are illustrated in Figure 4a, 4b. Note that for all three groups, labial /m/ tended to have Alveolar Closure and Velar Closure values close to zero, indicative of no active lingual contact in either of the palate areas. This was most clearly true of the L1 English group, while the L2 groups’ productions were characterized by slightly increased values (higher Alveolar Closure for Japanese and higher Velar Closure for both Japanese- and Spanish-speaking learners). This suggests that some tokens of L2 English /m/ were produced with a partly increased alveolar or velar contact. Note that there were also a few clear outlier ([n]-like) tokens for the Spanish group. For the alveolar /n/, Alveolar Closure values were relatively high for all three groups, while being somewhat lower for the English speakers. This was due to some reduction of final prevocalic /n/ in this group’s data. Velar Closure values for /n/ were overall low, except for a slight (non-significant) increase for the Japanese speakers. The main difference in the data was for the velar /ŋ/, which was produced with the expectedly high Velar Closure and low Alveolar Closure by the L1 English and Japanese speakers, but with the reverse pattern – high Alveolar Closure and low Velar Closure – by the native Spanish speakers. Note that Alveolar Closure values for /ŋ/ produced by Spanish speakers were essentially the same as for their /n/, indicative of consonant substitution. Yet, it should be noted that Velar Closure values for /ŋ/ were somewhat higher than for /n/, suggesting a simultaneously increased velar contact in at least some tokens. This can be attributed to some simultaneous alveolar/partial velar productions, or due to the coarticulatory effect of the preceding high front /ɪ/ (which may have been realized as [i] by our L2 speakers). The latter possibility is confirmed in the scatterplot in (c) (the bottom panel of Figure 4), where an overall shift of /ŋ/ tokens to the right (higher Velar Closure) is visible. Note also the existence of a few tokens where the alveolar contact for /ŋ/ was considerably reduced. Interestingly, the other two groups also produced a few tokens with incomplete alveolar closures, yet for different consonants (/n/ for L1 English speakers, /m/ for L1 Japanese) (for individual results – boxplots and scatterplots by speaker – the reader is referred to the Supplementary Materials file).

Figure 4
Figure 4

Boxplots of (a) Alveolar Closure and (b) Velar Closure values by consonant and language group for the balanced dataset; (c) scatterplot of Alveolar and Velar Closure values by consonant and language group (individual tokens).

The differences between the L1 Spanish and other speakers in the production of /ŋ/ are further illustrated by the average linguopalatal contact profiles in Figure 5. Note that all three L1 Spanish speakers produced this consonant with an alveolar closure, similar to that for /n/ (yet with slightly greater velar contact). L1 Japanese speakers produced velars similar to those of the English controls but with a somewhat fronted constriction (and correspondingly more velar contact). There were no major differences among the language groups for the other consonants: /m/ was produced with no active lingual constriction, while /n/ was produced with a clear alveolar constriction. One exception was produced by SP3, who sporadically realized /m/ with an alveolar constriction. Apart from these differences, we can see that nasal place contrasts were clearly distinguished by our speakers.

Figure 5
Figure 5

Average linguopalatal contact profiles (taken over the entire nasal interval and averaged over 9–10 tokens) for /m n ŋ/ in the balanced set by speaker.

In sum, the results for the balanced dataset showed target-like realizations for word-final prevocalic /m/ and /n/ by all three groups, as well as for /ŋ/ by the native English and Japanese speakers. In contrast, the realization of /ŋ/ by L1 Spanish speakers was consistently alveolar. In addition, the L2 groups showed more variability than the control group.

3.1.2. The entire corpus

We will now turn to the analysis of the entire corpus of data, which includes the nasals /m n ŋ/ in both word-final prevocalic and utterance-final positions, as well as in a variety of lexical items and kinds of materials – carrier sentences, words in isolation, and the passage (see Table 3). Importantly, all of these items were produced by all eight speakers.

The data were fit into LMER models with the fixed effects and interactions of Consonant, L1, and Position. The random effects were Speaker (with random intercepts and slopes) and Word (with random intercepts only).8 The results are summarized in Table 5 for Alveolar Closure and Table 6 for Velar Closure. Starting with Alveolar Closure, we can see that, as with the balanced set, values were significantly higher for /n/ than for baseline /m/ while not differing from /ŋ/. Alveolar Closure values for the L1 Spanish group were significantly higher than for the native English group. There was also a significant interaction of Consonant and L1, however, only for the L1 Spanish participants’ /ŋ/. Position was not significant on its own but participated in significant interactions with Consonant (/n/ only) and Consonant and L1 (/n/ for L1 Japanese speakers). Some of these differences can be observed in Figure 6a. As expected, /n/ was produced with a near-maximum Alveolar Closure, in contrast to near-zero values for /m/. The overall higher Alveolar Closure values for the L1 Spanish speakers were due to the near-ceiling values for both /n/ and /ŋ/, regardless of the position. The predominantly alveolar realization of /ŋ/ was also the source of the Consonant and L1 interaction. The interactions involving Position appeared to be due to the lower values of word-final intervocalic /n/ for L1 English speakers, relative to their Japanese peers (in the same position) as well as relative to the same consonant in utterance-final position as produced by the English speakers.

Table 5

Summary of a linear mixed model for Consonant, L1, and Position fit for the Alveolar Closure data; the intercept is English /m/ in word-final prevocalic position; significance levels: ‘***’ <.001, ‘**’ <.01, ‘*’ <.05.

Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.04 0.03 22.19 1.27 0.2171
Cn 0.83 0.07 9.81 11.67 <0.0001 ***
Cng –0.01 0.04 19.23 –0.25 0.8067
L1JP 0.03 0.03 11.52 1.01 0.3337
L1SP 0.13 0.03 12.01 4.18 0.0013 **
Positionu_final 0.00 0.04 339.60 –0.06 0.9551
Cn:L1JP –0.05 0.09 8.55 –0.54 0.6023
Cng:L1JP 0.00 0.04 10.45 0.00 0.9988
Cn:L1SP –0.07 0.09 8.62 –0.73 0.4828
Cng:L1SP 0.72 0.04 10.81 16.84 <0.0001 ***
Cn:Positionu_final 0.11 0.04 334.30 2.55 0.0111 *
Cng:Positionu_final –0.01 0.05 295.10 –0.18 0.8588
L1JP:Positionu_final 0.00 0.03 4919.00 0.14 0.8888
L1SP:Positionu_final –0.02 0.03 4916.00 –0.61 0.5417
Cn:L1JP:Positionu_final –0.09 0.04 4918.00 –2.42 0.0155 *
Cng:L1JP:Positionu_final 0.02 0.05 4918.00 0.33 0.7443
Cn:L1SP:Positionu_final –0.07 0.04 4916.00 –1.74 0.0819
Cng:L1SP:Positionu_final –0.04 0.05 4915.00 –0.95 0.3436
Table 6

Summary of a linear mixed model for Consonant, L1, and Position fit for the Velar Closure data; the intercept is English /m/ in word-final prevocalic position; significance levels: ‘***’ <.001, ‘**’ <.01, ‘*’ <.05.

Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.24 0.03 43.76 8.352 <0.0001 ***
Cn 0.00 0.03 117.60 0.1680 0.8668
Cng 0.35 0.04 31.68 8.5560 <0.0001 ***
L1JP 0.02 0.02 9.27 0.6420 0.5367
L1SP 0.07 0.02 9.48 2.7280 0.0222 **
Positionu_final –0.04 0.04 148.30 –1.0580 0.2919
Cn:L1JP 0.03 0.02 18.29 1.4750 0.1572
Cng:L1JP 0.29 0.04 8.75 7.6820 <0.0001 ***
Cn:L1SP 0.01 0.02 19.43 0.4270 0.6738
Cng:L1SP –0.27 0.04 8.90 –7.2640 <0.0001 ***
Cn:Positionu_final 0.09 0.04 147.60 1.9660 0.0512
Cng:Positionu_final 0.28 0.06 141.80 4.9490 <0.0001 ***
L1JP:Positionu_final 0.03 0.02 4930.00 1.2910 0.1968
L1SP:Positionu_final 0.04 0.02 4929.00 1.7680 0.0771
Cn:L1JP:Positionu_final 0.01 0.02 4930.00 0.5650 0.5719
Cng:L1JP:Positionu_final –0.25 0.03 4930.00 –9.1270 <0.0001 ***
Cn:L1SP:Positionu_final –0.01 0.02 4929.00 –0.5360 0.5918
Cng:L1SP:Positionu_final –0.16 0.03 4929.00 –5.5450 <0.0001 ***

Turning to Velar Closure (Table 6), values were significantly higher for /ŋ/ than the baseline /m/, while not being different from /n/. Velar Closure values for the L1 Spanish group were significantly lower than for the English group; however, this difference was limited to /ŋ/, as indicated by the significant Cng:L1SP interaction. There was also a significant interaction of Consonant and L1, involving the L1 Japanese participants’ /ŋ/. As seen in Figure 6b, this was due to higher values for this consonant as produced by the native Japanese speakers compared to their English peers. That is, unlike the L1 Japanese speakers, the controls often produced /ŋ/ with an incomplete velar closure, especially in word-final prevocalic position. The latter difference was also a likely cause for the significant Cng:Positionu_final and Cng:L1JP:Positionu_final interactions. Finally, there was a significant three-way interaction of Consonant, L1, and Position involving the L1 Spanish participants’ /ŋ/. This interaction appears to be due to the lesser difference between the English- and Spanish-speaking participants’ /ŋ/ word finally prevocalically than utterance finally.

The scatterplot in Figure 6c combines the two variables by plotting individual tokens. It is of interest to compare these results to those for the balanced set in Figure 4c. In the entire corpus data, notice the overall much greater variation for all three groups, especially for the L2 learners. For the L1 English speakers, variation mainly involved some weakening of the alveolar closure for /n/ and of the velar closure for /ŋ/, primarily in word-final prevocalic position. For the L1 Japanese speakers, there was considerable weakening of alveolar closures for /n/ in both positions, accompanied by clearly velar realizations of /n/ and some simultaneous alveolar-velar realizations (tokens with both high Alveolar Closure and Velar Closure). The latter two types were mainly limited to utterance-final position. Like the other two groups, the L1 Spanish speakers showed some alveolar weakening for /n/, which was absent in the balanced set. In addition, they showed considerable variation with /m/ (with tokens having partial alveolar constrictions) and especially /ŋ/. While the latter consonant was predominantly realized as alveolar (similarly as in the balanced dataset), there were many tokens of incomplete alveolar closures, and a sizable number of target-like velar realizations. Note that the latter were completely absent in the balanced set. Furthermore, L1 Spanish speakers produced a number of /n/ tokens as velars or simultaneous alveolar-velar consonants (for individual results – boxplots and scatterplots by speaker – the reader is referred to the Supplementary Materials file).

Figure 6
Figure 6

(a) Scatterplot of Alveolar and Velar Closure values by consonant and language group (individual tokens) and boxplots of (b) Alveolar Closure and (c) Velar Closure values by consonant and language group for the complete dataset.

In sum, the results for the complete corpus dataset confirm the key observations made concerning the balanced set: The overall target-like realization of nasals by the L1 Japanese speakers versus neutralization of alveolar and velar nasals in favour of the former by the native Spanish speakers. In addition, the results for the complete dataset were characterized by overall greater variation in the realization of all consonants, especially with the L2 learners. In what follows, we will examine different types of realizations of nasals in terms of their constrictions separately for each target consonant and by speaker.

3.2. Qualitative observations

3.2.1 Production of /m/

Percentages of /m/ tokens organized by closure location (alveolar, velar, both, neither) and type (full, partial) are summarized in Table 7. These categories are based on the procedure described in Section 2.4. Overall, an overwhelming majority of consonant realizations lacked either alveolar or velar constrictions, as would be expected of a bilabial consonant. Two L1 Spanish speakers, however, showed relatively high percentages of alveolar realizations of /m/ – 16% for SP1 and 10% for SP3 (including full and partial closures). Almost all alveolar tokens for the first speaker involved words with a preceding /ɹ/ (e.g., from (entering), scream, uniform (of)), and thus could be attributed to the coarticulatory (or assimilatory) influence of the preceding rhotic. Alveolar realizations for SP3 included /m/ in a variety of contexts (uniform (of), awesome (answer), ransom, shame), and thus were not clearly influenced by coarticulation. Interestingly, the few tokens of the alveolar /m/ produced by Japanese speakers JP2 and JP3 involved some of the same words with a preceding /ɹ/ as for SP1.

Table 7

Percentages of /m/ tokens by closure type, separately by speaker; above-zero percentages are shaded, with darker shading for dominant patterns.

L1 English L1 Japanese L1 Spanish
a. alveolar full 0% 0% 0% 3% 2% 14% 1% 9%
partial 0% 0% 0% 1% 0% 1% 0% 1%
b. velar partial 0% 0% 0% 0% 0% 1% 0% 0%
c. both (all: alv. full & vel. full/partial) 0% 0% 0% 0% 1% 1% 0% 0%
d. neither neither 99% 100% 100% 96% 98% 83% 99% 90%

Figure 7 provides examples of within- and across-speaker variation in the realization of /m/. Temporal sequences of palates in (a) show target-like no-lingual and alveolar realizations of the nasal by SP3. The palates in (b) show the realization of the nasal as alveolar or a complex alveolar-velar articulation by SP1. Note that the annotated nasal intervals are preceded by partial alveolar constrictions corresponding to [ɹ] (in from) realized as [fɹ̩n] or [fɹ̩n͡ŋ]), which presumably condition (or at least favour) the alveolar closure. Finally, the frame sequences in (c) and (d) show alveolar realizations of /m/ by two of the L1 Japanese speakers, without an apparent influence of the rhotic ([fɹən]).

Figure 7
Figure 7

Temporal sequences of palates for individual tokens of words with /m/ (with the corresponding palates indicated as ‘N’) showing target-like and non-target-like realizations.

3.2.2. Production of /n/

As would be expected, /n/ was overwhelmingly produced with an alveolar constriction (see Table 8), typically with a full closure (90% or more for all speakers except JP3). Velar realizations were notable for JP3, accounting for 11% of this speaker’s productions. The majority of these tokens were in words with the preceding front vowels /i/ and /e/ (e.g., between (aloud), telescreen (it), again, drain (again), vain) and/or in utterance-final position (e.g., afternoon, groan, train). Note that this speaker also showed a large percentage of nasal tokens without any closure, and these tended to occur in similar contexts – after front vowels and/or utterance finally. The few instances of velarization or vocalization observed for the other two L1 Japanese speakers conformed to these observations. Altogether, these patterns are reminiscent of the speakers’ L1 Japanese /N/, which is susceptible to adjacent vowel stricture assimilation/coarticulation and is mostly realized as uvular utterance finally (see Section 1.2).

Among the Spanish speakers, only SP1 showed somewhat increased velarization or deletion of /n/. Notably, as highlighted earlier, this participant is a speaker of Cuban Spanish, where the default realization of final nasals is velar. The patterning of SP1 with the L1 Japanese speakers may, therefore, not be coincidental, given the phonotactic restrictions on place in final codas in both varieties. As stated, this is also consistent with this participant’s idiolectal patterns.

The partial closure realization by English speakers (most clearly shown by EN1) is not unexpected. Shockey’s (1991) EPG study, for example, reported that approximately a quarter of /n/ tokens in conversational speech were produced by her two British English participants without a complete closure.

Table 8

Percentages of /n/ tokens by closure type, separately by speaker; above-zero percentages are shaded, with darker shading for dominant patterns.

L1 English L1 Japanese L1 Spanish
a. alveolar full 90% 97% 94% 93% 59% 87% 92% 94%
partial 10% 2% 3% 1% 14% 5% 4% 3%
b. velar full 0% 0% 1% 0% 7% 1% 0% 0%
partial 0% 0% 0% 1% 4% 2% 0% 0%
c. both (all: alv. full & vel. full/partial) 0% 0% 2% 2% 2% 3% 3% 0%
d. neither neither 0% 1% 1% 5% 13% 4% 2% 2%

Figure 8 shows specific examples of /n/ realizations: With a target-like alveolar closure and a non-target-like velar closure by JP1 in (a), with a velar closure and without a consonant constriction by JP2 in (b) and (c), and with an incomplete velar closure by SP1 in (d).

Figure 8
Figure 8

Temporal sequences of palates for individual tokens of words with /n/ (with the corresponding palates indicated as ‘N’) showing target-like and non-target-like realizations.

3.2.3. Production of /ŋ/

As shown in Table 9, the vast majority of /ŋ/ tokens produced by the L1 English and Japanese speakers involved a velar closure. Approximately half of these closures, however, were partial for English speakers. In contrast, full closures were much more common for the Japanese-speaking participants (and near 100% for JP2 and JP3). In addition, the L1 English speakers occasionally produced velars without any constriction (4-8%; i.e., vocalized). Notably, almost all partial and deleted closures among all speakers were observed in word-final intervocalic position (e.g., claiming (aloud), reading (out)), rather than in utterance-final position. Certain alveolar realizations of /ŋ/ by JP1 involved instances of the word nothing, and thus could be considered specific to this lexical item. As discussed extensively above, L1 Spanish speakers predominantly realized /ŋ/ as alveolar, typically with a complete closure. There was, however, a sizable percentage of velar realizations as well as of full deletions. Interestingly, 78% of these (32 out of 41 tokens) were limited to three lexical items, all nouns – anything, nothing, and ring (occurring in isolation or in a carrier phrase or the text). In contrast, alveolar realizations were attested for all 24 words with /ŋ/, most of which were -ing participles or related adjectives (see Appendix, Table A1). This raises the possibility that the L1 Spanish speakers’ production of /ŋ/ was at least partly conditioned by a word’s grammatical status.

Table 9

Percentages of /ŋ/ tokens by closure type, separately by speaker; above-zero percentages are shaded, with darker shading for dominant patterns.

L1 English L1 Japanese L1 Spanish
a. alveolar full 0% 0% 2% 0% 0% 82% 84% 86%
partial 0% 0% 1% 0% 0% 7% 2% 1%
b. velar full 48% 48% 83% 97% 97% 3% 7% 11%
partial 43% 47% 14% 3% 2% 1% 1% 2%
c. both (all: alv. full & vel. full) 0% 0% 0% 0% 0% 0% 2% 0%
d. neither neither 8% 4% 1% 0% 0% 4% 3% 0%

Figure 9 illustrates the range of variation in the realizations of /ŋ/ by the L1 Spanish speakers: With a velar closure and an alveolar closure by SP3 in (a), with a partial alveolar and full velar closure by SP1 in (b) and (c), and with a complex alveolar-velar closure and a vocalized nasal by SP2 in (d) and (e).

Figure 9
Figure 9

Temporal sequences of palates for individual tokens of words with /ŋ/ (with the corresponding palates indicated as ‘N’) showing target-like and non-target-like realizations by Spanish speakers.

Referring back to Table 9, it is somewhat surprising that the L1 Spanish speaker who exhibited the lowest percentage of velar realizations of /ŋ/ was the Cuban speaker SP1. Given the presence of this realization in her idiolect, we would expect to find the opposite – a much higher percentage of [ŋ]. Moreover, her production involved a higher percentage of velarization of /n/ (see Table 8). This can be possibly attributed to the effect of the preceding vowel: In all items, /ŋ/ occurred after high front /ɪ/. As mentioned, our previous investigation of L1 Spanish final nasals (Colantoni & Kochetov, 2012) revealed that this speaker showed resistance to velarization next to front vowels. Once again, we see strong L1 – in particular, idiolectal effects – in this speaker’s L2 English production. In sum, the production of the velar nasal by this Spanish speaker suggests some grammatical and contextual conditioning.

As we noted, the performance of our L1 Spanish speakers with respect to the production of /ŋ/ contrasts with the (near-)perfect performance of the native Japanese speakers. A closer examination of the Japanese speakers’ production of this sound revealed that it was consistently produced with a stop-like burst, both utterance finally and word finally prevocalically, and with an optional voiceless vocoid utterance finally. That is, items like claiming aloud /ˈkleɪmɪŋ əˈlaʊd/ and ring /ɹɪŋ/ were produced as [ˈkleɪmɪŋɡ əˈlaʊd]~[ˈkleɪmɪŋɡ əˈlaʊd] and [ɹɪŋɡə̥]~[ɹɪŋɡə̥] (see Figure 10 below). Given this, the performance of the L1 Japanese speakers is only partially target-like, and this is achieved by re-using the gestural pattern associated with their L1 nasal + velar sequence, rather than acquiring a novel contrast.

Figure 10
Figure 10

Sample spectrograms and sequences of palate frames of utterance-final released nasals including those with a following devoiced vocoid, as produced by L1 Japanese speakers JP1 (a) and JP3 (b) and (c).

3.3. Epenthetic vocoids

As mentioned above, the velar nasal was realized by the L1 Japanese speakers as a nasal + stop sequence, which was sometimes followed in utterance-final position by a devoiced vocoid (i.e., /ŋ/ produced as [ŋɡə̥] or [ŋɡə̥]) (as shown in in Figure 10 (a)). The production of voiceless vocoids, which can be regarded as case of partial epenthesis, was also observed for the other utterance-final nasals, albeit less frequently. Thus, the items atom and twin were occasionally produced as [ˈæɾəmə̥]~[ˈæɾəmə̥] and [tw̥ɪnə̥]~[tw̥ɪnə̥]), as illustrated in Figure 10 (b) and Figure 10 (c) with the production of JP3.

A closer examination of acoustic data from JP3 revealed that this speaker produced utterance-final /m/ with such vocoids 72% of the time. Vocoids after utterance-final /n/ were much less frequent, observed in approximately 15% of tokens. Notably, however, none of the ‘velarized’ cases of /n/ (produced as [ŋ̞] by the speaker) exhibited epenthetic vocoids, which suggests that partial epenthesis is a possible strategy to counter velarization. Epenthesis of this kind was less frequent, yet still conspicuous, in the productions of /m/ and /n/ by the other two L1 Japanese speakers yet seemed to be absent in the data from their Spanish peers and the English controls. As discussed in § 1.2, both strategies have been previously attested in the acquisition of coda voicing (e.g., Cardoso, 2007; Sekiya & Joe, 1997) and place (Steele, 2005).

In summary, phonetic details such as consonant releases and partial epenthesis add to our understanding of the more target-like performance of Japanese speakers.

4. Discussion

4.1. Hypothesis evaluation

The present study set out to explore the acquisition of nasal coda contrasts and document the L2 articulatory patterns of L1 Japanese and Spanish learners of English. As discussed both in the Introduction and in Section 1.2, based on theories of L2 perceptual categorization, one would predict that both groups of learners should behave identically. However, as highlighted earlier, acquisition of the contrast not only involves perception – learners must also learn to produce it word finally. L2 acquisition theories that compute positionally-based interlanguage similarity do not prove particularly fruitful when formulating hypotheses for this acquisition scenario, given the L1 variability in the realization of utterance-final coda nasals. Thus, we test here the hypothesis that, while both the L1 Japanese and L1 Spanish learners were expected to have some difficulty acquiring the three-way English /m n ŋ/ coda contrast, based on continued L1-based influence, the Spanish-speaking learners would be relatively more accurate given that their L1 is typologically more similar to the L2 than is Japanese. While the results of our EPG study support the presence of some difficulty in realizing coda nasals by all speakers, they strongly refute the anticipated greater accuracy of the native Spanish speakers. This was most obvious with the production of velar /ŋ/ for which the Japanese-speaking learners’ mean velar rate (83%-97% of full velar closures; Table 9) contrasted starkly and was significantly different from that of the Spanish speakers who produced alveolars in the majority of cases (82-86%; Table 9), particularly in word-final prevocalic position. The superior performance of the former group was also minimally present with the bilabial nasal: While both groups included speakers who had some alveolar realizations (namely, JP2 and SP1, in word-final prevocalic position and SP3 in both utterance-final and word-final prevocalic positions), the percentage was slightly higher with the Spanish-speaking group (1–17% versus 0–4%; Table 7). As concerns the learners’ production of /n/, non-target-like velar and placeless realizations were attested with both groups (L1 Spanish: 6–13% non-target realizations; L1 Japanese: 7–41% non-target realizations; Table 8), but were much more frequent in the L1 Japanese group, mainly due to the behavior of JP3 for whom only 59% of realizations were target-like. Recall that this speaker frequently produced /n/ with a velar closure or no closure at all, especially next to front vowels and utterance finally. This is reminiscent of the Japanese nasal assimilation to vowels in stricture and the uvular realization of /N/ utterance finally. Overall, although the L1 Spanish speakers in this study differed from the controls to a larger extent than the participants in Goodin-Mayeda et al. (2011)’s study, the same hierarchy of difficulty was observed, with more accuracy with bilabials than velars.

A word of caution is warranted here. While their rate of velar articulations was significantly higher than those of the L1 Spanish speakers, recall that the Japanese-speaking learners consistently produced velar codas with a stop-like burst, both utterance finally and word finally prevocalically, and with an optional voiceless vocoid utterance finally. Neither of these features were observed with the native English speaker controls whose production sometimes rather involved incomplete closure (i.e., lenition) of the velar gesture. As such, as highlighted earlier, the performance of our L1 Japanese speakers was only partially target-like, achieved by re-using the gestural pattern associated with their L1 nasal + velar sequence, rather than acquiring a novel contrast.

These results are consistent with the patterns of adaptation of the English /ŋ/ in Japanese (Heffernan, 2005) and with findings of Aoyama’s (2003) perception study, as both point to the mapping of the L2 /ŋ/ onto the L1 /Nɡu/ rather than /N/, as well as with the articulatory findings reported by Mizoguchi (2019). Interestingly, Aoyama also observed near-perfect discrimination of English final /m/, even though this consonant does not occur in the same position in Japanese, but Heffernan (2005) showed that Japanese speakers introduced an epenthetic vowel when adapting English loanwords ending with /m/. This result parallels, on the one hand, the overall high accurate production of /m/ by our Japanese speakers; on the other hand, this result is consistent with the high rate of epenthesis with /m/ observed in the production of JP3. At the same time, Aoyama’s (2003) finding that English /n/ is predominantly mapped onto /N/ by Japanese listeners is only partly supported by our production results. It appears that, apart from the cases of utterance-final velarization (most often produced by JP3), our Japanese speakers have largely learned to produce English final /n/. Consistent with Mizoguchi’s predictions, however, alveolar place seems to be the hardest to acquire, since one of our participants (JP3) had only a 59% accuracy rate, a relatively overall non-target-like performance not attested with the other places of articulation under analysis. The high accuracy of our L1 Japanese participants may not be surprising, as our speakers had spent over 12 years in an English-speaking environment and this was also the pattern observed with the most advanced speaker in Mizoguchi’s study. In contrast, Aoyama’s categorization experiment involved participants who were less advanced learners of English who had lived in the U.S. for less than three years.

4.2. Insights into L2 speech learning

Many of the phenomena observed in our Japanese- and Spanish-speaking learners’ production of the English coda nasal place contrasts parallel findings of previous L2 speech research. First, cross-linguistic influence was observed at the level of a speaker’s first language, dialect, and even idiolect. As concerns the latter, consider L1 Spanish speaker SP1. While coda nasals neutralize to velars in her Cuban dialect, we saw evidence for an idiolectal feature over-ruling this tendency: When producing English /ŋ/, rather than having the highest rate of accuracy compared to her Argentine and Madrid Spanish-speaking peers, she had the lowest rate of velarization. This was explained by the observed resistance to velarization next to front vowels, a feature detected in a previous study of this speaker’s Spanish nasal production (Colantoni & Kochetov, 2012). In summary, as has been observed for both L2 speech perception (e.g., Chládková & Podlipsky, 2011; Escudero, Simon, & Mitterer, 2012; Escudero & Williams, 2011) and production (e.g., Brannen, 2002; Picard, 2002; Trofimovich, Gatbonton, & Segalowitz, 2007), L1-based predictions may be better informed by looking at a given speaker’s idiolect as opposed to first language or dialect.

Second, our study witnesses to the high degree of inter-learner variation that may be observed, even with speakers of the same L1 and similar L2 proficiency level. Consider, for example, Japanese-speaking learners JP1 and JP3. This pair of learners were of identical oral proficiency based on their accentedness scores (2.6 for both). However, we observed differences between them. For example, while their production of /n/ and /ŋ/ was highly similar, JP3 had a significant number of velarized realizations essentially absent from JP1’s production (16% versus 1% of tokens). JP3 also differed from JP1 (as well as from all the other learners) in having the lowest rate (59%) of complete alveolar closures. The less target-like performance of JP3 is particularly surprising given that she was the only L1 Japanese speaker whose spouse was a native English speaker and she reported speaking English at home 50% of the time (as opposed to the 40% indicated by the two other Japanese-speaking learners).

Third, our study speaks of the importance of supplementing designed experiments with corpus data, as we would have not been able to uncover the variation reported in §§ 3.1.2 and 3.2, if we have limited ourselves to the balanced set.

Fourth, our EPG study provides insights into developmental sequences, both as concerns relative difficulty and learner strategies when acquiring new contrasts. As concerns the former, in the case of our Japanese-speaking learners whose L1 neutralizes all independent place in coda nasals, acquiring /m/ and /ŋ/ was relatively easier than /n/. Related to the acquisition of new contrasts is also our finding of doubly articulated nasals, produced occasionally by all of our L2 English speakers. Such realizations are possibly indicative of the interference between L1 and L2 phonological representations or phonetic targets, particularly if the sounds are auditorily similar (Flege, 1987). The occurrence of consonants with complex closures for /ŋ/ and other nasals was unexpected. While the overall number of complex closure tokens was relatively small, they occurred in the productions of all six L2 speakers (with the highest numbers for SP1 and JP3), and involved all three consonants (while occurring most frequently for /n/). This finding is indicative of apparent occasional breakdowns in L2 speech planning or production of final consonants. That is, the speakers in such cases appear to be uncertain about selecting/activating the correct target gestures and, consequently, produce both of them. This result presents some interesting parallels with findings of complex articulations in L1 speech error studies. In an electromagnetic articulography study of elicited speech errors, Goldstein, Pouplier, Chen, Saltzman, and Byrd (2007) reported a frequent occurrence of ‘gestural intrusions’ – conflicting articulatory gestures as, for example, a velar gesture for /k/ with a partial tongue tip gesture resulting from an anticipatory activation of /t/ from the following syllable (cop top as [ktɑp tɑp]). Although frequently found, such intrusions were either not perceived by listeners (who heard a well-formed [kɑp tɑp]) or perceived as complete segmental substitutions (as [tɑp tɑp]). This, the authors note, attests to the value of detailed articulatory investigations of speech production patterns. Similarly, complex nasal realizations observed in our study may not have been noted based on auditory impressions or acoustic analysis.

Finally, the results obtained from our L1 Spanish speakers teach us that not all L1 phonological processes are equally easy/difficult to unlearn. For example, previous research has demonstrated that Spanish-speaking learners of English have difficulty blocking the application of their L1 voiced stop intervocalic approximantization process (Zampini, 1993; 1996), with the exception of /d/, which contrasts in English with /ð/. Our findings, in contrast, mirror those obtained for the acquisition of coda obstruent voicing by learners with either an L1 devoicing rule or regressive assimilation process. Young-Scholten (2004) reported that American English learners failed to acquire German final devoicing. According to the author, the difficulty in acquiring the rule can be attributed not only to transfer of the L1 final voicing contrast but also to the fact that, in the L1, final obstruents usually resyllabify with the following vowel. If L1 English-L2 German learners transfer their resyllabification rules, then, they create environments in which voicing is expected. Monteleone (2009), who studied the acquisition of voicing in English obstruent-obstruent clusters by Polish and Hungarian speakers, observed that the former transferred their L1 voicing neutralization processes to English. Thus, it is possible that neutralization processes are more difficult to unlearn than other phonological processes, and that the fact that there is some variability in the production of English word-final nasals – for example, a word like something can be realized as [sʌmθɪŋ ~ sʌmθɪn ~ sʌmθn̩] (e.g., Byrd, 1994) – and that nasals in absolute final position do not have reliable cues for place causes additional difficulties for native Spanish speakers. The Spanish-speaking learners’ contrastingly high rates of accuracy with bilabial and alveolar nasals may be related to orthography: Both of these nasals have one-to-one phoneme-grapheme mappings in Spanish (e.g., álbum /album/ ‘album’, canción /kansion/ ‘song’), whereas velars do not. Not all hope is lost, though! Our learners were more target-like with certain lexical items, such as nothing, ring, and anything. It may well be that the path to the acquisition of nasal place contrast is paved with the use of certain frequent lexical items.

4.3. Avenues for future research

Given that the present study is the first EPG investigation of the L2 acquisition of coda place contrasts, there is much fertile ground to be explored in future research. First and foremost, we highlight the interest of a replication study involving a larger number of participants that would allow to further explore the possible range of inter-learner variation condition by idiolectal differences. Such a study could employ a more controlled set of words than in the complete corpus in order to explore various effects proposed to explain patterns in our own data, including the potential coarticulatory influence of the preceding vowel and of word class; recall that our Spanish speakers’ production of /ŋ/ was possibly conditioned by a word’s grammatical status. Were a replication production study to be coupled with a perceptual study, it would be possible to investigate whether between-language and idiolectal differences also have some roots in perception. If a learners’ first language, dialect, or idiolect does not have the complete range of coda place contrasts, this also has consequences for the individual’s experience in perceptually distinguishing some contrasts. Moreover, we have highlighted that the variable patterning of coda nasals as well as the neutralization of nasal codas in some languages may be related to the weaker perceptual cues to nasal place syllable finally. If speakers differ in their perceptual sensitivity to certain contrasts (Perkell et al., 2004; 2006), such differences may well have consequences for their production.


Table A1

Full list of words used in the study by consonant and dataset.

Reading Task /m/ /n/ /ŋ/
a. Balanced Set: target word-final prevocalic nasals in a carrier sentence (That’s a(n) __ answer) awesome common charming
b. i. Corpus: target word-final prevocalic nasals in a carrier sentence atom afternoon nothing
(Say __ again.) broom Canadian ring
cream captain
kingdom carton
scream contain
shame drain
warm groan
(Say __ aloud) misname between claiming
proclaim crayon fraying
misplan glazing
question slapping
ii. Corpus: target word-final prevocalic nasals in a text (a passage from ‘1984’) from (entering) man (of) entering (along)
from (an) one (end) reading (out)
uniform (of) than (a) shutting (it)
from (every) even (at) shining (and)
thirty-nine (and) covering (and)
production (of) snooping (into)
telescreen (it)
down (in)
one (on)
down (at)
torn (at)
iii. Corpus: target utterance-final nasals in isolated words atom afternoon nothing
broom Canadian ring
cream captain
kingdom carton
scream contain
shame drain
warm groan
iv. Corpus: target utterance-final nasals in a carrier sentence (hid) again
(Say ___) (hood) again
(‘hud’) again
(That’s an extra __) ransom caption
v. Corpus: utterance-final nasals in a text (a passage from ‘1984’) him thirteen working
Winston landing
ran anything
c. Control: preconsonantal nasals in
isolated words/carrier sentence/text
-- lamp sprint rethink
(Say __ again) lamp sprin rethink
(Passage ‘1984’) simply blunt sank
Figure A1
Figure A1

Boxplots of (a) Alveolar Closure and (b) Velar Closure values by consonant, speaker, and palate type in the control nasals in lamp, sprint, and think produced in a carrier phrase; note that that there is no obvious relation between the patterns and palate types (old and new), while there are some individual differences (e.g., higher Alveolar Closure values for /ŋ/ by SP1, old palate) or possibly L1-related differences (higher Velar Closure values for /n/ by Spanish speakers).


  1. There are exceptional case studies that have looked at the acquisition of the whole consonantal inventory (e.g., Hansen, 2004). [^]
  2. A few studies have investigated other parameters including F1 onset (Flege, Munro, & Skelton, 1992) and stop release (Laeufer, 1996). [^]
  3. Categorical place assimilation of English /n/ was reported primarily before labials and, to a lesser extent, before velars; gradient assimilation in place and stricture was found before other coronals (Hardcastle, 1995; Kochetov et al., 2021). [^]
  4. The speakers in the present study also participated in this earlier study and consistently produced either alveolar final nasals (SP2 and SP3) or variably alveolar and velar ones (SP1). [^]
  5. Such variability may be, in part, due to the lesser salience of perceptual nasal place cues word-finally (see e.g., Miller & Nicely, 1955, for English). [^]
  6. Averaging over two time points provided a way to minimize sensitivity of the variables to vowel contexts and was found to better correspond to our qualitative classifications of closure patterns (see the following text). [^]
  7. The formula for Alveolar Closure was lmer(Alveolar.Closure_all~C*L1+(1+C|Speaker), data_balanced, REML=FALSE)->fit1. This model was selected based on a comparison to simpler models with only random effects (1+(1+C|Speaker)), a single fixed effect, Consonant (C+(1+C|Speaker)), and two fixed effects, Consonant and L1 without an interaction (C+L1+(1+C|Speaker)). Based on an anova() comparison of the models, the former model was found to be the best fit relative to the baseline model (χ2 = 31.12; p < 0.0001, compared to χ2 = 24.76, p < 0.0001 and χ2 = 12.07; p = 0.0024). For consistency, the same model was used for Velar Closure. [^]
  8. The formula for the Alveolar Closure analysis was lmer(Alveolar.Closure_all~C*L1*Position+(1+C|Speaker)+(1|Word), data, REML=FALSE). This model was selected based on its comparison to a series of simpler models (1+(1+C|Speaker), C+(1+C|Speaker), C+L1+(1+C|Speaker), C:L1+(1+C|Speaker)), C:L1 + Position+(1+C|Speaker)), as well as more complex models with Utterance Type (C*L1*Position+Utterance_type and C*L1*Position*Utterance_type). Based on an anova() comparison, the former model was found to be a better fit than the simpler models (χ2 = 41.21; p < 0.0001, compared to χ2 = 33.44, p < 0.0001, χ2 = 11.10, p = 0.0039, χ2 = 36.11, p < 0.0001, χ2 = 1.20, p = 0.2733, and χ2 = 0.00; p = 1.0000). The latter two models did not converge. For consistency, the same model was intended to be used for Velar Closure. However, due to the lack of convergence, the model was revised by removing the random effect of Word (resulting in lmer(Velar.Closure_all~C*L1*Position+(1+C|Speaker), data, REML=FALSE)). [^]

Additional file

The additional file for this article can be found as follows:

Supplementary Materials

Table S1 to Figure S1. DOI: https://doi.org/10.16995/labphon.6434.s1


We wish to acknowledge our participants and to thank Isabel Garriga, Ruth Martínez, Nayoung Ryu, and Katherine Sung for assistance with data annotation. The paper has benefitted from the feedback generously provided by two anonymous reviewers, the Guest Editor Barbara Gili Fivela, and the General Editor Alan Yu. All errors are our own.

Funding information

This work was partly funded by an Insight Grant from the Social the Sciences and Humanities Research Council of Canada (#435-2015-2013) to Alexei Kochetov and a University of Toronto Faculty of Arts & Science Advancing Teaching and Learning in Arts and Science (ATLAS) grant to the three authors.

Competing interests

The authors have no competing interests to declare.

Author contributions

The authors contributed equally to this work.


Aoyama, K. (2003). Perception of syllable-initial and syllable-final nasals in English by Korean and Japanese speakers. Second Language Research, 19(3), 251–265. DOI:  http://doi.org/10.1191/0267658303sr222oa

Baker, W. (2010). Effects of age and experience on the production of English word-final stops by Korean speakers. Bilingualism: Language and Cognition, 13(3), 263–278. DOI:  http://doi.org/10.1017/S136672890999006X

Barry, M. C. (1991). Temporal modelling of gestures in articulatory assimilation. In Proceedings of the 12th International Congress of the Phonetic Sciences, Volume 4 (pp. 14–17). Université de Provence.

Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., & Grothendieck, G. (2017). lme4 package, version 1.1-13 [computer software]. Retrieved from https://cran.r-project.org/web/packages/lme4/index.html

Best, C. T. (1995). A direct realist view of cross-language speech perception. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 171–204). York Press.

Best, C. T., & Tyler, M. D. (2007). Nonnative and second language speech perception: Commonalities and complementarities. In M. J. Munro & O. S. Bohn (Eds.), Second language speech learning: The role of language experience in speech perception and production (pp. 13–34). John Benjamins. DOI:  http://doi.org/10.1075/lllt.17.07bes

Bongaerts, T., Mennen, S., & Slik, F. van der (2000). Authenticity of pronunciation in naturalistic second language acquisition: The case of very advanced late learners of Dutch as a second language. Studia Linguistica, 54, 298–308. DOI:  http://doi.org/10.1111/1467-9582.00069

Brannen, K. (2002). The role of perception in differential substitution. The Canadian Journal of Linguistics, 47, 1–46. DOI:  http://doi.org/10.1017/S0008413100018004

Cardoso, W. (2007). The variable development of English word-final stops by Brazilian Portuguese speakers: A stochastic optimality theoretic account. Language Variation and Change, 19(3), 219–248. DOI:  http://doi.org/10.1017/S0954394507000142

Celata, C., Calamai, S., Ricci, I., & Bertini, C. (2013). Nasal place assimilation between phonetics and phonology: An EPG study of Italian nasal to velar clusters. Journal of Phonetics, 41, 88–100. DOI:  http://doi.org/10.1016/j.wocn.2012.10.002

Chládková, K., & Podlipský, V. J. (2011). Native dialect matters: Perceptual assimilation of Dutch vowels by Czech listeners. Journal of the Acoustical Society of America, 131(5), EL186–EL192. DOI:  http://doi.org/10.1121/1.3629135

Colantoni, L., & Hualde, J. I. (2013). Variación fonológica en el español de la Argentina [Phonological variation in Argentine Spanish]. In L. Colantoni & C. Rodríguez Louro (Eds.), Perspectivas teóricas y experimentales sobre el español de la Argentina (pp. 21–35). Iberoamericana/Veurvert. DOI:  http://doi.org/10.31819/9783954871971-003

Colantoni, L., & Kochetov, A. (2012). Nasal variability and speech style: An EPG study of coda nasals in two Spanish dialects. Italian Journal of Linguistics, 24, 11–42.

Colantoni, L., & Steele, J. (2007). Acquiring /ʁ/ in context. Studies in Second Language Acquisition, 29, 381–406. DOI:  http://doi.org/10.1017/S0272263107070258

Crowther, C. S., & Mann, V. (1992). Native language factors affecting use of vocalic cues to final consonant voicing in English. Journal of the Acoustical Society of America, 92(2), 711–722. DOI:  http://doi.org/10.1121/1.403996

Cruttenden, A. (2014). Gimson’s pronunciation of English, 8th edn. Routledge. DOI:  http://doi.org/10.4324/9780203784969

Escudero, P., Simon, E., & Mitterer, H. (2012). The perception of English front vowels by North Holland and Flemish listeners: Acoustic similarity predicts and explains cross-linguistic and L2 perception. Journal of Phonetics, 40, 280–288. DOI:  http://doi.org/10.1016/j.wocn.2011.11.004

Escudero, P., & Williams, D. (2011). Perceptual assimilation of Dutch vowels by Peruvian Spanish listeners. Journal of the Acoustical Society of America, 129, EL1–EL7. DOI:  http://doi.org/10.1121/1.3525042

Fischer, J. (1958). Social influence of a linguistic variant. Word, 14, 47–56. DOI:  http://doi.org/10.1080/00437956.1958.11659655

Flege, J. E. (1987). The production of “new” and “similar” phones in a foreign language: Evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47–65. DOI:  http://doi.org/10.1016/S0095-4470(19)30537-6

Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233–277). York Press.

Flege, J. E., McCutcheon, M. J., & Smith, S. C. (1987). The development of skill in producing word-final English stops. Journal of the Acoustical Society of America, 82(2), 433–447. DOI:  http://doi.org/10.1121/1.395444

Flege, J. E., Munro, M., & Skelton, L. (1992). Production of the word-final English /t/-/d/ contrast by native speakers of English, Mandarin, and Spanish. Journal of the Acoustical Society of America, 92(1), 128–143. DOI:  http://doi.org/10.1121/1.404278

Flege, J. E., & Port, R. (1981). Cross-language phonetic interference: Arabic to English. Language and Speech, 24(2), 125–146. DOI:  http://doi.org/10.1177/002383098102400202

Gibbon, F., & Nicolaidis, K. (1999). Palatography. In W. Hardcastle & N. Hewlett (Eds.), Coarticulation in speech production: Theory, data, and techniques (pp. 229–245). Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511486395.011

Goldstein, L., Pouplier, M., Chen, L., Saltzman, E., & Byrd, D. (2007). Dynamic action units slip in speech production errors. Cognition, 103, 386–412. DOI:  http://doi.org/10.1016/j.cognition.2006.05.010

Goodin-Mayeda, E., Renaud, J., & Rothman, J. (2011). Optimality theoretic L2 reranking and the constraint fluctuation hypothesis: Coda nasals in the L2 English of L1 Spanish speakers. In M. Pirvulescu, M. C. Cuervo, A. T. Pérez Leroux, J. Steele & N. Strik (Eds), Selected Proceedings of the 4th Conference on Generative Approaches to Language Acquisition North America (GALANA) (pp. 66–77). Cascadilla Proceedings Project.

Gurevich, N. (2011). Lenition. In M. van Oostendorp, C. J. Ewen, E. Hume & K. Rice (Eds.), The Blackwell companion to phonology, Vol. IIII (Entry 66). Blackwell. DOI:  http://doi.org/10.1002/9781444335262.wbctp0066

Hansen, J. G. (2004). Developmental sequence in the acquisition of English L2 syllable codas: A preliminary study. Studies in Second Language Acquisition, 26(1), 85–124. DOI:  http://doi.org/10.1017/S0272263104261046

Hardcastle, W. J. (1995). Assimilation of alveolar stops and nasals in connected speech. In J. Windsor Lewis (Ed.), Studies in general and English phonetics in honour of Professor J. D. O’Connor (pp. 49–67). Routledge.

Heffernan, K. (2005). Phonetic similarity and phonemic contrast in loanword adaptation. Toronto Working Papers in Linguistics, 24, 117–123.

Hualde, J. I. (2015). Los sonidos del español [The sounds of Spanish]. Cambridge University Press.

Jun, J. (1995). Perceptual and articulatory factors in place assimilation: An optimality theoretic approach [Doctoral dissertation, UCLA].

Kawahara, S., & Garvey, K. (2014). Nasal place assimilation and the perceptibility of place contrasts. Open Linguistics, 1, 17–36. DOI:  http://doi.org/10.2478/opli-2014-0002

Kochetov, A. (2014). Japanese nasal place/stricture assimilation: Electropalatographic evidence. Poster presented at the 14th Conference on Laboratory Phonology, National Institute for Japanese Linguistics (NINJAL), Tokyo, Japan, July 2014.

Kochetov, A., & Colantoni, L. (2010). Spanish nasal assimilation revisited: A cross-dialect electropalatographic study. Laboratory Phonology, 2, 487–523. DOI:  http://doi.org/10.1515/labphon.2011.018

Kochetov, A., & Colantoni, L. (2011). Coronal place contrasts in Argentine and Cuban Spanish: An electropalatographic study. Journal of the International Phonetic Association, 41(3), 313–342. DOI:  http://doi.org/10.1017/S0025100311000338

Kochetov, A., Colantoni, L., & Steele, J. (2017). A comparison of Articulate and Reading EPG palates: Capturing place/manner contrasts. In Proceeding of the 11th International Seminar on Speech Production (ISSP 2017), October 16-19, 2017, Tianjin, China. Springer. 4 pp.

Kochetov, A., Colantoni, L., & Steele, J. (2017). The Cross-Language Articulatory Database (CLAD), University of Toronto, http://clad.chass.utoronto.ca/.

Kochetov, A., Colantoni, L., & Steele, J. (2021). Variable assimilation of English word-final /n/: Electropalatographic evidence. English Language & Linguistics, 25(4), 687–718. DOI:  http://doi.org/10.1017/S1360674320000222

Labov, W. (1966). The social stratification of English in New York City. Center for Applied Linguistics.

Labov, W. (2001). Principles of linguistic change: Social factors. Blackwell.

Laeufer, C. (1996). The acquisition of a complex phonological contrast: Voice timing patterns of English final stops by native French speakers. Phonetica, 53, 86–110. DOI:  http://doi.org/10.1159/000262190

Maekawa, K. (2021). Production of the utterance-final moraic nasal in Japanese: A real-time MRI study. Journal of the International Phonetic Association, 1–24. DOI:  http://doi.org/10.1017/S0025100321000050

Mennen, I., Scobbie, J. M., de Leeuw, E., Schaeffler, S., & Schaeffler, F. (2010). Measuring language-specific settings. Second Language Research, 26(1), 13–41. DOI:  http://doi.org/10.1177/0267658309337617

Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338–352. DOI:  http://doi.org/10.1121/1.1907526

Mizoguchi, A. (2019). Articulation of the Japanese moraic nasal: Place of articulation, assimilation, and L2 transfer. [Doctoral dissertation, The Graduate Center of the City University of New York].

Monteleone, M. A. (2009). Effects of first language voicing rules on the perception and production of English obstruent sequences by adult Hungarian and Polish learners of English. [Doctoral dissertation, City University of New York].

Patience, M., & Steele, J. (2022). Relative difficulty in the L2 acquisition of the phonetics of French obstruent coda voicing. Language and Speech. 238309221114143–238309221114143. DOI:  http://doi.org/10.1177/00238309221114143

Perkell, J., Guenther, F., Lane, H., Marrone, N., Matthies, M., Stockmann, E., Tiede, M., & Zandipour, M. (2006). Production and perception of phoneme contrasts covary across speakers. In J. Harrington & M. Tabain (Eds.), Speech production: Models, phonetic processes and techniques (pp. 69–84). Psychology Press.

Perkell, J., Guenther, F., Lane, H., Matthies, M., Stockmann, E., Tiede, M., & Zandipour, M. (2004). The distinctness of speakers’ production of vowel contrasts is related to their discrimination of the contrasts. Journal of the Acoustical Society of America, 116, 2338–2344. DOI:  http://doi.org/10.1121/1.1787524

Picard, M. (2002). The differential substitution of English /θ ð/ in French: The case against underspecification in L2 phonology. Lingvisticae Investigationes, 25(1), 87–96. DOI:  http://doi.org/10.1075/li.25.1.07pic

R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. URL http://www.R-project.org/.

Ramsammy, M. (2011). The realization of coda nasals in Spanish. [Doctoral dissertation, University of Manchester].

Sekiya, Y., & Jo, T. (1997). Interlanguage syllable structure of intermediate Japanese EFL students: Interaction between universals and L1 transfer. In A. James & J. Leather (Eds.), New Sounds 97: Proceedings of the third international symposium on the acquisition of second-language speech (pp. 294–304). University of Klagenfurt.

Shockey, L. (1991). Electropalatography of conversational speech. In Proceedings of the 12th International Congress of Phonetic Sciences, 3, 10–13. Université de Provence.

Simon, E. (2010). Phonological transfer of voicing and devoicing rules: Evidence from L1 Dutch and L2 English conversational speech. Language Sciences, 32, 63–86. DOI:  http://doi.org/10.1016/j.langsci.2008.10.001

Smith, B. L., Hayes-Harb, R., Bruss, M., & Harker, A. (2009). Production and perception of voicing and devoicing in similar German and English word pairs by native speakers of German. Journal of Phonetics, 37, 257–275. DOI:  http://doi.org/10.1121/1.2942851

Steele, J. (2005). Position-sensitive licensing asymmetries and developmental paths in L2 acquisition. In L. Dekydtspotter, R. A. Sprouse & A. Liljestrand (Eds.), Proceedings of the 7th Generative Approaches to Second Language Acquisition Conference (GASLA 2004) (pp. 226–237). Cascadilla Proceedings Project.

Stephenson, L. S., & Harrington, J. (2002). Assimilation of place of articulation: Evidence from English and Japanese. In Proceedings of the 9th Australian International Conference on Speech Science and Technology (pp. 592–597).

Stevens, K. (1998). Acoustic phonetics. MIT Press.

Tabain, M. (2011). Electropalatography data from Central Arrernte: A comparison of the new Articulate palate with the standard Reading palate. Journal of the International Phonetic Association, 41, 343–367. DOI:  http://doi.org/10.1017/S0025100311000132

Thomas, E., & Bayley, G. (2011). Segmental phonology of African American English. In E. Lanehart (Ed.), The Oxford handbook of African American language (pp. 403–419). Oxford University Press.

Trofimovich, P., Gatbonton, E., & Segalowitz, N. (2007). A dynamic look at L2 phonological learning: Seeking processing explanations for implicational phenomena. Studies in Second Language Acquisition, 29, 407–448. DOI:  http://doi.org/10.1017/S027226310707026X

Vance, T. J. (1987). An introduction to Japanese phonology. SUNY Press.

Vandam, M. (2004). Word final coda typology. Journal of Universal Language, 5, 119–148. DOI:  http://doi.org/10.22425/jul.2004.5.1.119

Wang, C. (1995). The acquisition of English word-final stops by Chinese speakers [Doctoral dissertation, State University of New York, Stony Brook].

Wrench, A. (2007). Advances in EPG palate design. Advances in Speech-Language Pathology, 9, 3–12. DOI:  http://doi.org/10.1080/14417040601123676

Wrench, A. A., Gibbon, F. E., McNeill, A. M., & Wood, S. E. (2002). An EPG therapy protocol for remediation and assessment of articulation disorders. In J. H. L. Hansen & B. Pellom (Eds.), Proceedings of ICSLP-2002 (pp. 965–968). DOI:  http://doi.org/10.21437/ICSLP.2002-329

Wright, S., & Kerswill, P. (1989). Electropalatography in the analysis of connected speech processes. Clinical Linguistics and Phonetics, 3(1), 49–57. DOI:  http://doi.org/10.3109/02699208908985270

Yamane, N. (2013). ‘Placeless’ consonants in Japanese: An ultrasound investigation [Doctoral dissertation, University of British Columbia].

Yavas, M. (1997). The effects of vowel height and place of articulation in interlanguage final stop devoicing. International Review of Applied Linguistics, 35(2), 115–125.

Young-Scholten, M. (2004). Orthographic input in L2 phonological development. In P. Burmeister, T. Piske & A. Rohde (Eds.), An integrated view of language development. Papers in honor of Henning Wode (pp. 263–279). Wissenschaftlicher Verlag Trier.

Young-Scholten, M. (2004). Prosodic constraints on allophonic distribution in adult L2 acquisition. International Journal of Bilingualism, 8(1), 67–77. DOI:  http://doi.org/10.1177/13670069040080010501

Zampini, M. L. (1993). Spanish and English voiced stop phonemes and spirantization: A study in second language acquisition [Doctoral dissertation, Georgetown University].

Zampini, M. L. (1996). Voiced stop spirantization in the ESL speech of native speakers of Spanish. Applied Psycholinguistics, 17(3), 335–354. DOI:  http://doi.org/10.1017/S0142716400007979