1 Introduction

When encountering speech, listeners are faced with the non-trivial task of breaking a continuous acoustic signal into discrete (lexical) units. Previous research has indicated that L1 listeners solve this speech segmentation problem by relying on acoustic-phonetic, phonological, lexical, rhythmic, and statistical regularities in the signal (e.g., Cutler & Norris, 1988; Mattys et al., 2005; Saffran et al., 1996). However, all of these aspects of language-specific knowledge may be impoverished for an L2 listener relative to an L1 listener, making parsing of the speech signal into lexical units especially difficult for these listeners.

Moreover, certain phonological processes may further complicate the task of word segmentation, even for L1 listeners. Sandhi phenomena, which operate across morpheme and word boundaries, occur in many languages. For example, speakers of non-rhotic dialects of English, such as British English, often produce intrusive /r/ in between a word ending in certain vowels and a following vowel-initial word (e.g., saw aces [sɔ:. #r#eɪ. sɪz]). To segment speech including intrusive /r/, listeners must compensate for the epenthesized consonant in order to successfully identify the second word. Compensation failure could lead to perception mistakes (e.g., listeners may identify the second word as aces or as races). Tuinman, Mitterer, and Cutler (2011) found that listeners utilize duration differences between intrusive /r/ and word-initial /r/ to compensate for this phenomenon and identify the second word with a high level of accuracy.

In French, a number of sandhi phenomena blur the boundaries between words. One such process, liaison with enchaînement, poses potentially interesting challenges for listeners, by rendering ineffective French-specific segmentation strategies. It has long been understood that the syllable plays an important role in French speech segmentation (Content et al., 2001; Dumay et al., 2002; Mehler et al., 1981). Liaison misaligns syllable and word boundaries, and, as a result, reduces the reliability of syllable-based cues to the location of word boundaries. The current study compares L1-French and L2-French listeners’ strategies for compensating for this phonological process during speech segmentation, and finds that, while compensation for liaison is quite easily achieved by both L1- and L2-French listeners, it is significantly more difficult to constrain for L2- than for L1-French listeners. Experience with French in general and liaison in particular likely plays an important role in engendering this difficulty for L2-French listeners. The same holds for L2-(non-rhotic) English listeners and intrusive /r/. When L2-English listeners hear intrusive /r/, experience with English in general is not sufficient to locate lexical units in a continuous speech stream (Tuinman et al., 2011). Listeners must also have sufficient experience with intrusive /r/ to know to undo the English-specific phonological process.

1.1 L1-French speech segmentation

Research over the past few decades has investigated the types of information French listeners attune to in order to divide the signal into identifiable lexical units. One proposal argues that French listeners perceive French in syllable-sized chunks (compared to phoneme-by-phoneme perception for English; Content et al., 2001; Dumay et al., 2002; Mehler et al., 1981). Evidence for this proposal comes from studies where listeners were faster to recognize syllables in words (or words in sequences) with matching syllable structure (ba in ba.lance ‘balance, scale’ or lac ‘lake’ in zun.lac) than when the syllable structures mismatched (ba in bal.con ‘balcony’ or lac ‘lake’ in zu.glac). More recent theories propose that syllable onset identification (i.e., locating the onset of lac in zun.lac) triggers lexical access, thus driving speech segmentation (syllable onset segmentation heuristic, or SOSH; Dumay et al., 2002).

Although the alignment of syllable and word onsets is seemingly efficient for successful segmentation of French speech, a number of phonological processes work to misalign syllable and word boundaries in running speech. One such process is enchaînement, which resyllabifies word-final consonants when the following word begins with a vowel (e.g., chaque jour [ʃak. #ʒuʁ] ‘each/every day’ versus chaque année [ʃa. k#a. ne] ‘each/every year’). In another process, liaison, a consonant present word-finally in the orthography is produced only when the following word begins with a vowel.1 When phonetically realized, the liaison consonant typically resyllabifies (via enchaînement) to the following word (e.g., petit chou [pə. ti#. ʃu] ‘little cabbage’ versus petit ami [pə. ti. t#a .mi] ‘boyfriend’; Durand & Lyche, 2008). These processes render word segmentation heuristics that rely on alignment between syllable and word onsets, like SOSH, far less reliable. However, in addition to syllable-based cues, a number of other types of information about word boundary cues are available to listeners, including lexical information, acoustic-phonetics, and liaison-specific distributional cues. In particular, a growing body of research suggests that listeners prefer to rely on knowledge-based cues, such as lexical information, over signal-based cues, such as acoustic-phonetic information, to compensate for liaison during speech perception (e.g., Spinelli et al., 2002).

Therefore, despite the potential challenge across-word French resyllabification phenomena present for syllable-based word segmentation strategies, a body of research has shown no processing costs associated with the perception of syllable-misaligned words (i.e., vowel-initial words that receive resyllabified consonants) in liaison contexts (and with enchaînement more generally). In fact, results show the opposite pattern, with facilitated recognition of vowel-initial words in liaison contexts. For example, Spinelli et al. (2002) found that participants more easily identified vowel-initial words like agneau (‘lamb’) in liaison contexts (e.g., petit agneau [pə. ti. t#a. ɲo] ‘little lamb’) than in illegal liaison contexts (e.g., liaison /t/ present for a word with which it cannot occur; demi t agneau [də. mi. #t#a. ɲo] ‘half *t lamb’). The authors argued that listeners use lexical knowledge to assign the resyllabified consonant to the first word of the sequence in the liaison case, ultimately facilitating recognition of the vowel-initial target. When lexical knowledge does not support assignment of this consonant to the first word, as in the illegal liaison context, identification suffers. Similarly, Gaskell, Spinelli, and Meunier (2002) found no evidence of inhibited identification of vowel-initial targets (e.g., italien ‘Italian’) in liaison (généreux italien [ʒe. ne. ʁø. z#i. ta. ljɛ̃] ‘generous Italian’) compared to syllable-aligned (chapeau italien [ʃa. po#i. ta. ljɛ̃] ‘Italian hat’) sequences. Together, these studies suggest that listeners use lexical knowledge to successfully associate resyllabified consonants with the liaison word, which in turn facilitates recognition of vowel-initial words.

In addition to lexical knowledge, studies have considered whether listeners rely on other types of word boundary cues, such as acoustic-phonetic information, to compensate for liaison during speech segmentation. A number of studies have reported small but reliable acoustic-phonetic differences between resyllabified liaison consonants and underlyingly word-initial consonants (Gaskell et al., 2002; Shoemaker, 2014; Spinelli et al., 2002; Spinelli et al., 2003; Tremblay & Spinelli, 2013, 2014a; however, see Nguyen et al., 2007). Across these studies, liaison consonants (e.g., /t/ in petit abri [pə. ti. t#a. bʁi] ‘little shelter’) were on average 15–17% shorter than the corresponding non-liaison consonant in word-initial position (e.g., /t/ in petit tableau [pə. ti. #ta. blo] ‘little painting’). Some results suggest that such acoustic-phonetic information plays a secondary role in modulating lexical competition during speech segmentation (Spinelli et al., 2002), with lexical knowledge providing primary influence. Tremblay and Spinelli (2014b) presented listeners with natural liaison and consonant-initial productions in cross-spliced and identity conditions. They found that, while listeners do use acoustic cues to liaison, certain lexical biases (i.e., bias for consonant-initial vs. vowel-initial words; see also Spinelli et al., 2003) overshadow the influence of acoustic cues. However, when lexical knowledge is uninformative (i.e., when listeners are provided with phonemically ambiguous sequences such as les ailes ‘the wings’ versus les zèles ‘the zeals’, both [le. zɛl]), listeners show sensitivity to acoustic cues to liaison and can exploit them during speech segmentation (Shoemaker, 2014; Spinelli et al., 2003). Note that Shoemaker (2014) manipulated the duration of the pivotal consonant (i.e., /z/ in [le. zɛl]) to create reliable and robust acoustic cues as to whether it was a liaison consonant or word-initial consonant. As a whole, results suggest that in cases where lexical information is actually informative, it likely plays a critical role over and above any role for acoustic-phonetic factors in compensating for liaison during speech segmentation. Furthermore, acoustic-phonetic cues may need to be more robust than those in natural speech (as in Shoemaker, 2014) for listeners to rely on this information during speech segmentation.

Looking beyond acoustic-phonetic and lexical information, a series of studies have investigated the influence of a liaison-specific distributional cue to word boundaries on online speech recognition (Tremblay, 2011a; Tremblay & Spinelli, 2013, 2014a). Of the six consonants that participate in liaison (/z, n, t, ʁ, p, g/), three consonants, /z/, /n/, and /t/, make up approximately 99% of all cases of liaison, according to the Phonologie du Français Contemporain corpus of spoken French (Durand & Lyche, 2008). However, these three consonants differ in their distribution as a liaison versus underlyingly word-initial consonant: /z/ is more likely to be heard in liaison than word-initially, while /t/ is a more frequent word-initial versus liaison consonant. In a number of eye-tracking studies by Tremblay and Spinelli, listeners were shown four pictures in a visual display (e.g., tableau ‘painting’, abri ‘shelter’, and two distractors) while they listened to temporarily ambiguous sequences (i.e., [pə. ti. ta], which was part of either the full sequence petit tableau ‘little painting’ or petit abri ‘little shelter’). They found that listeners’ fixations to words in the display were influenced by the distribution of these consonants in liaison versus word-initial position: Listeners’ early fixations were biased toward consonant-initial items when listening to sequences containing /t/ (e.g., fixated more to tableau ‘painting’ than abri ‘shelter’ when hearing [pə. ti. ta]), and fixations showed the opposite bias when listeners heard sequences containing /z/ (e.g., fixated more to érable ‘maple tree’ than zéro ‘zero’ when hearing [ky. ʁjø. ze]). These fixation patterns are consistent with the more frequent occurrence of /t/ in consonant-initial position than as a liaison consonant, and vice versa for /z/; it is easier for listeners to compensate for liaison in sequences with /z/ vs. /t/ due to this distributional cue. Together, this body of work shows that a series of potentially conflicting word boundary cues, including lexical, acoustic-phonetic, and segment-specific distributional information, are available to and utilized by L1-French listeners to compensate for liaison during the segmentation of continuous speech. Moreover, a large body of empirical results, including those reported above (e.g., Spinelli et al., 2002), supports the claim that high-level cues (i.e., lexical knowledge) rather than low-level cues (i.e., acoustic-phonetic information) predominate for L1 segmentation when both are available and especially when they may not converge on the same lexical parse (Cutler, 2001; see also Mattys, Brooks, & Cooke, 2009; Mattys, Melhorn, & White, 2007; Mattys, White, & Melhorn, 2005).

1.2 L2-French liaison segmentation

If L1 speech segmentation is primarily knowledge-driven, then L2 users of a language will likely not parse the speech signal in the same manner as L1 listeners due to their impoverished knowledge of the language. Furthermore, as a language-specific phonological process that misaligns syllable and word boundaries, liaison is potentially particularly challenging for L2 listeners to control in both French production and perception. Liaison is a complex phenomenon that French infants fully acquire relatively late in language development. Typically, children make systematic errors in use of liaison (in production) until age 6 (Chevrot et al., 2013). Like French infants, L2-French learners must acquire the rules dictating when liaison must apply, when it applies variably, and when it can never apply. However, L2-French learners face additional challenges when acquiring liaison, due to potential influence from their L1 phonology and L1 segmentation strategies, as well as their knowledge of French orthography. Mastromonaco’s (1999) study of liaison production by L2-French speakers demonstrated that while these speakers learn these rules quite well (e.g., produce liaison close to 100% of the time in obligatory contexts, and less than 4% of the time in prohibited ones), they tend make errors in use of liaison not seen from adult L1-French speakers. For example, Mastromonaco found many instances of liaison produced without enchaînement (e.g., petit abri [pə. tit #a. bʁi] ‘little shelter’). One possibility is that this lack of resyllabification could be due to the fact that liaison consonants appear word-finally orthographically. Despite potential negative influences of orthography in L2 production, in perception orthography may reinforce the position of the word boundary, even when a liaison consonant is resyllabified (e.g., ⟨t⟩ occurs word-finally orthographically in petit despite [t] having moved into onset position acoustically [pə. ti. t#a. bᴚi]). Knowledge of orthography, therefore, may bolster L2 acquisition of liaison and, as a result, help L2-French listeners compensate for liaison during speech segmentation.

A limited set of studies has considered how L2-French listeners approach speech segmentation in liaison contexts. In a phoneme monitoring experiment, Dejean de la Batie and Bradley (1995) found that L1-French and L2-French listeners differed in their ability to correctly detect word-initial /t/ in non-liaison sequences like grand théâtre ([gʁã. #te. atʁ] ‘big theater’) and correctly reject liaison /t/ (as word initial) in grand éléphant ([gʁã. t#e .le. fã] ‘big elephant’). The authors argue that L1-French listeners make use of lexical information to accurately perform this task, but that L2-French listeners, who make many errors both correctly identifying and correctly rejecting /t/, either focus too heavily on phonetic cues or have impoverished lexical knowledge to draw on for this task. When provided with predictive sentence contexts, L2-French listeners made little use of this information (i.e., their phoneme detection patterns were unaffected by the presence versus absence of a predictive sentence context), while predictive sentence contexts facilitated phoneme detection for L1-French listeners (i.e., significantly faster response times when contexts were provided versus absent). Together, these results suggest that L1-French and L2-French listeners make use of different types of information when segmenting speech with potential liaison. For L2-French listeners, it appears that impoverished knowledge of the French lexicon, syntax, and semantics impedes the use of high-level word boundary cues, such as lexical information and contextual (semantic) information for the purposes of lexical segmentation in liaison contexts.

While L2-French listeners may not be able to rely on lexical information to compensate for liaison during speech segmentation to the same extent as L1-French listeners, other studies suggest they successfully exploit other word boundary cues. Shoemaker (2010) demonstrated that L2-French listeners are (1) sensitive to acoustic-phonetic, namely allophonic or subphonemic, differences between liaison and underlyingly word-initial consonants (manipulated to provide fully reliable cues, as in Shoemaker, 2014), and (2) can exploit this difference to segment lexically ambiguous sequences of words (such as les ailes/les zèles [le. zɛl] ‘the wings/the zealous ones’) where lexical knowledge is uninformative. In this study, L1-French and L2-French listeners showed no differences in discrimination or identification performance, suggesting L2-French listeners are able to acquire subphonemic detail for their L2 and utilize this information during speech segmentation in the same way as L1-French listeners. However, Tremblay and Spinelli (2014a) found that the same acoustic-phonetic cue (among others) consistently modulated online speech processing for L2-French listeners, but variably so for L1-French listeners, when other word boundary information (namely, distributional cues) was available. These results, combined with the phoneme monitoring results of Dejean de la Batie and Bradley (1995) discussed above, suggest that L1-French and L2-French listeners utilize distinct strategies to compensate for liaison during speech segmentation depending on the information provided to listeners. Specifically, L1 listeners draw on high-level, knowledge-driven cues wherever possible. In contrast, due to their relatively impoverished language-specific knowledge, L2 listeners are necessarily more influenced by low-level, signal-driven cues.

Consistent with this difference across L1 and L2 listeners in the balance of knowledge- and signal-driven cues for lexical segmentation in liaison contexts, a study by Tremblay (2011a) also found differences between L1-French and L2-French processing of liaison in the same eye-tracking paradigm as Tremblay and Spinelli (2013, 2014a). Tremblay presented both L1-French and L2-French listeners with sequences including /z/, the most common liaison consonant, either in liaison (curieux érable [ky. ʁjø. z#e. ʁabl] ‘curious maple’) or word-initially (e.g., curieux zéro [ky. ʁjø. #ze. ʁo] ‘curious zero’). Contrary to the previously demonstrated vowel-initial bias for the frequent liaison consonant /z/ (Tremblay & Spinelli, 2013, 2014a), L1-French listeners fixated more to consonant-initial targets and competitors, suggesting early expectations that the second word should be underlyingly /z/-initial (i.e., these L1-French listeners demonstrated a bias against liaison-based expectation in this experiment). Tremblay attributed this consonant-initial bias in this experiment to an over-representation in the stimuli of /z/-initial words, which are quite uncommon in French. Importantly for the critical comparison of the L1- and L2-French listeners in this study, lower proficiency L2-French listeners fixated more on vowel-initial targets and competitors, consistent with liaison-based expectations. However, higher proficiency L2-French listeners showed no differences in fixations for C-initial versus V-initial targets or competitors, perhaps indicating a transition to the L1 strategy, which did not exhibit an overwhelming liaison bias. This suggests that sufficient experience with French will allow L2-French listeners to achieve L1-like segmentation behavior, which in this case appears to involve overcoming an initial strong liaison bias (also shown for L1-Swedish/L2-French listeners in an offline transcription task; Stridfelt, 2003).

1.3 Insights from English: A hierarchy of cues

1.3.1 L1-English speech segmentation

The body of work reviewed above highlights the need for a framework that integrates a series of word boundary cues, and accounts for differences in weight given to these cues within and across listeners. Such a framework has been proposed to account for empirical work with English listeners under various conditions (Mattys et al., 2005). Specifically, Mattys et al. (2005) proposed a dynamic hierarchy of word boundary cues based on their finding that under ideal listening conditions, listeners relied most on high-level cues, including sentential context (e.g., syntax, pragmatics) and lexical-semantic knowledge. Mattys et al. (2005) demonstrated that this bias toward top-down, knowledge-driven segmentation in the absence of adverse listening conditions prevailed even when acoustic-phonetic cues (e.g., word-level stress) provided conflicting information regarding word boundary placement. However, listeners shifted reliance to bottom-up, signal-driven cues under degraded listening conditions (i.e., speech embedded in noise). The explanation for these results (and therefore the rationale behind the proposed dynamic hierarchy of word boundary cues) is that entire lexical items are difficult to extract from a degraded signal, but listeners are able to glimpse subtle acoustic-phonetic cues through the noise, and use this information to piece lexical items together in a bottom-up fashion (Cooke, 2006; Mattys et al., 2009).

Under this account, acoustic-phonetic cues play a constraining rather than deterministic role in lexical selection and, therefore, in word segmentation under ideal listening conditions. Consistent with this constraining role for signal-driven cues, Mattys, Melhorn, and White (2007) found that allophonic variation (i.e., aspirated vs. unaspirated /p/) influenced speech segmentation behavior in the face of conflicting syntactic information (i.e., verb agreement) when the subphonemic information temporally preceded the syntactic information (e.g., that woman *take spins/takes pins). Therefore, a strict interpretation of the hierarchy of cues, where listeners ignore signal-driven cues altogether when knowledge-driven cues are intact and reliable, cannot account for all empirical results. Instead, the data call for a dynamic hierarchy, which can capture the gradient trade-off between segmentation cues at different levels of the hierarchy.

To better probe this gradient trade-off, Mattys, Brooks, and Cooke (2009) implemented a paradigm eliminating categorical segmentation responses in favor of ratings on an 11-point scale. L1-English listeners heard sequences of words with matching lexical-semantic and acoustic-phonetic information (e.g., mild option) or mismatching information (where the acoustic-phonetic information cues a non-lexical parse; e.g., mile *doption), and rated whether what they heard was more like mild (the lexical parse) or mile (the non-lexical parse) using the 11-point scale. With the highest ratings corresponding to a lexical parse, listeners gave higher ratings for sequences with lexical-acoustic match and relatively low ratings for mismatching sequences. These results suggest that listeners do in fact utilize signal-driven speech segmentation even in ideal conditions, where the strict hierarchy would predict total domination of the knowledge-driven segmentation strategies. Consistent with previous results, listeners showed an acoustic drift under severe energetic masking (–8 dB SNR), with a shift in reliance to acoustic-phonetic cues both for lexical-acoustic match and mismatch sequences under this adverse listening condition. Together, studies by Mattys and colleagues support a dynamic hierarchy of word boundary cues, where listeners show an overall lexical drift (i.e., rely primarily on lexical-semantic cues) under ideal conditions and an acoustic drift when lexical-semantic information cannot be fully extracted from a degraded signal.

1.3.2 L2-English speech segmentation

As highlighted above, L2 listeners are predicted to parse the speech signal in a non-L1-like manner given a lack of L1-like knowledge of their L2. Consistent with this idea, and with data reviewed above regarding segmentation by L2-French listeners, studies with L2-English listeners have shown that L1-English and L2-English listeners exploit different types of word boundary cues. Sanders, Neville, and Woldorff (2002) found attenuated use of some high-level cues (such as syntactic information) during word segmentation by L2-English listeners in comparison to L1-English listeners, although other high-level cues (such as lexical-semantic information) were used similarly across groups (see also White et al., 2010).

In a direct test of cue weighting across L1 and L2 listeners, Mattys, Carroll, Li, and Chan (2010) considered whether L1- and L2-English (L1-Cantonese) listeners rely on lexical-semantic and acoustic-phonetic cues to word boundaries to a similar degree. Using the same paradigm and stimuli as Mattys et al. (2009), this study found that L1- and L2-English listeners utilized distinct segmentation strategies in ideal conditions. Replicating Mattys et al. (2009), L1-English listeners relied more on lexical-semantic than acoustic-phonetic information, showing a lexical drift. In contrast, L2-English listeners used acoustic-phonetic cues to parse the speech signal, showing an acoustic drift, indicating signal-driven segmentation. Unlike in Mattys et al. (2009), neither group significantly shifted their segmentation strategies in degraded conditions; the L1-English listeners still showed a lexical drift, and the L2-English listeners did not show an exaggerated acoustic drift. This was likely due to insufficient levels of degradation in the L1-English case and to a floor effect in the L2-English case. Mattys and colleagues argued that the L2-English listeners possessed impoverished lexical knowledge for their second language, and thus turned to information in the signal to drive speech segmentation. Under this interpretation, impoverished lexical knowledge has the same effect on L2 segmentation as a degraded signal has on L1 segmentation; in both cases, lexical knowledge is relatively difficult to access (either due to impoverished knowledge or due to severe energetic masking) so signal-driven processing is more efficient and/or reliable.

1.4 The current study

Using a word segmentation testing paradigm that has been applied to investigate word boundary placement in English by L1- and L2-English listeners (Mattys, Brooks, & Cooke, 2009; Mattys, Carroll, Li, & Chan, 2010), the current study asks how L1-French and L2-French listeners compensate for liaison during speech segmentation. In particular, we extend previous research in this domain by directly testing the extent to which compensation for liaison is driven by lexical-semantic vs. acoustic-phonetic information. Furthermore, we ask whether L1-French and L2-French listeners rely on different sources of information due to their differing levels of knowledge and experience with French and the French-specific phonological process of liaison, and how adverse conditions influence listeners’ use of these different types of information. These results shed light on how listeners accomplish speech segmentation in French, a language whose sound structure is characterized by a phonological process that misaligns word and syllable boundaries and therefore that presents a particularly interesting test case for the proposed dynamic hierarchy of word boundary cues. As such, this study aims to incorporate empirical results from a language other than English into existing theories of speech segmentation (e.g., Mattys et al., 2005).

As outlined above, this paradigm from Mattys and colleagues (2009, 2010) is particularly well-suited to investigate how L1 and L2 listeners differ in reliance on knowledge-based versus signal-based cues to word boundaries. We were specifically interested in the degree of lexical drift exhibited by L1-French and L2-French listeners, indicating the extent to which they relied on a knowledge-based segmentation strategy in cases of two-word sequences that do or do not exhibit the French-specific phonological phenomenon of liaison with enchaînement. Given previous reported differences between L1-French and L2-French listeners in segmentation behavior in liaison contexts (e.g., Dejean de la Batie & Bradley, 1995; Tremblay, 2011a; Tremblay & Spinelli, 2014a), we predicted that L1-French listeners would show a greater degree of lexical drift than L2-French listeners. This would also be consistent with previous findings that L1-English listeners rely more on knowledge-driven processes than L2-English listeners (Mattys et al., 2010). With respect to compensation for liaison, based on the prior demonstrations of a bias towards over- rather than under-compensation for liaison by L2-French listeners using eye-tracking (e.g., Tremblay, 2011a) and orthographic transcription (Stridfeldt, 2003) techniques, we expected to find a similar bias with this word segmentation paradigm. Finally, following the prior work on English segmentation under adverse conditions (Mattys et al., 2009) that showed decreasing reliance on knowledge-driven cues under adverse conditions (i.e., an attenuated lexical drift), we included testing conditions with and without additive noise and expected to observe an attenuated lexical drift for both L1-French and L2-French listeners in adverse compared to favorable listening conditions. Overall, with this direct test of word segmentation in liaison and non-liaison contexts, we sought evidence for the general claim that even though L1 and L2 speech perception differ in the balance between knowledge- and signal-driven cues, the French-specific phenomenon of misaligned syllable and word boundaries due to liaison is easily compensated for (but perhaps not so easily constrained) by both L1-French and L2-French listeners.

2 Methods

2.1 Participants

Two main groups of interest participated in this study. One group consisted of 18 listeners whose L1 was English and L2 was French (L2-French; age: 18–21, 15 females). Eighteen L1-French listeners (age: 19–37, 8 females) also participated. Participants in both groups were compensated $10 per hour for their participation, with the exception of five L2-French participants who received partial course credit. Informed consent was obtained from all participants, and all procedures were in accordance with the standards enforced by the Northwestern University Institutional Review Board.

Participants completed an externally validated cloze test (Tremblay, 2011b) following the main experiment to assess proficiency in French. Most L2-French participants were recruited from intermediate-level French courses (others were found via the Northwestern Linguistics Department subject pool), and the intermediate proficiency of this group was confirmed by the cloze test: Our participants showed a similar mean percent correct and range (mean = 44.3%, range = 28.9–57.8%) as Tremblay’s high-intermediate level participants (mean = 40.1%, range = 31.1–48.9%). L2-French listener cloze scores fell within the same range as L2-French participants in other studies of liaison segmentation (Tremblay, 2011a; Tremblay & Spinelli, 2014a). The L1-French participants were recruited from the Northwestern University community, employees at Alliance Française de Chicago (a French cultural and learning center; www.af-chicago.org), and social media. For these participants French was their first acquired (or second, but simultaneously acquired) language, and participants reported high levels of speaking, listening, reading, and writing proficiency in French (see Table 1). All L1-French participants acquired French in France or another French-speaking country.2

Table 1

Language background information.

French cloze test scorea AFEb %Usec SRProfd

L1-French (n = 18) 39.17 (3.87) 0.22 (0.73) 35.28 (17.19) 9.28 (2.37)
L2-French (n = 18) 19.94 (5.07) 13.33 (3.51) 4.11 (3.68) 5.06 (2.15)

Note. Mean (standard deviation).

aNumber correct out of 45.

bAge of first exposure to French.

cPercent weekly usage of French.

dSelf-rated listening proficiency (0 = low; 10 = perfect).

All participants completed a language background questionnaire prior to testing. Participants in the L1-French group were living in the United States at the time of the study, and thus had mid-to-high levels of proficiency in English. Using a scale from 0 (none) to 10 (perfect), participants rated their speaking ability (mean = 7, SD = 2.2) and listening ability (mean = 7, SD = 2.1). On average, L1-French participants acquired English at age 8 (SD = 4.6). Information of interest for all participants included age of first exposure to French, percent weekly usage of French, and self-rated (listening) proficiency on the same scale as above. Both this biographical information and mean cloze test scores are reported for L1-French and L2-French listeners in Table 1.

2.2 Materials

2.2.1 Stimuli selection

The stimuli consisted of a series of 144 adjective-noun target sequences and 72 adjective-noun filler sequences. Items were selected to vary along two factors: lexical status of the second word in the sequence3 and final consonant of the adjective. Sequences where the word-word (i.e., lexically-acceptable) parse results in a vowel-initial second word are cases where liaison applies (e.g., petit abri); sequences with a consonant-initial second word are cases where liaison does not apply (e.g., curieux zappeur; see Table 2 for a full paradigm of examples and Tables 3 and 4 for the full set of target and filler stimuli).

Table 2

Example stimuli.

Non-liaison (C-initial) Liaison (V-initial)

Real noun curieux zappeur curieux arbre
(lexically-acceptable parse) petit tableau petit abri
Nonword noun curieux *appeur curieux *zarbre
(lexically-unacceptable parse) petit *ableau petit *tabri

Table 3

Target sequences.

/t/-final adjectives Liaison condition

abri ‘shelter’ *tabri
différent ‘different’ arrêt ‘stop’ *tarrêt
maudit ‘wretched’ âne ‘donkey’ *tâne
méchant ‘mean’
parfait ‘perfect’ + Non-liaison condition
petit ‘little’ *aba tabac ‘tobacco’
récent ‘new’ *ableau tableau ‘painting’
*ariffe tarif ‘price’
/z/-final adjectives Liaison condition

élu ‘elected one’ *zélu
coûteux ‘expensive’ agneau ‘lamb’ *zagneau
curieux ‘curious’ arbre ‘tree’ *zarbre
douteux ‘doubtful’
fameux ‘famous’ + Non-liaison condition
mauvais ‘wrong, bad’ *appeur zappeur ‘zapper’
précieux ‘precious’ *éro zéro ‘zero’
*igotte zygote ‘zygote’

Note. Each adjective combines with both items in word-nonword pair for each condition. For example, différent abri/différent *tabri and différent *aba/différent tabac or coûteux élu/coûteux *zélu and coûteux *appeur/coûteux zappeur.

Table 4

Filler sequences.

/l/-final adjectives Liaison condition

alizé ‘trade wind’ *lalizé
échange ‘exchange’ *léchange
drôle ‘funny’ écart ‘distance’ *lécart
seul ‘only’ + Non-liaison condition
sale ‘dirty’ *avabo lavabo ‘sink’
*avage lavage ‘washing’
*égume légume ‘vegetable’
/ʁ/-final adjectives Liaison condition

artichaut ‘artichoke’ *rartichaut
escroc ‘crook’ *rescroc
cher ‘dear, expensive’ esquif ‘skiff’ *resquif
pur ‘pure’ + Non-liaison condition
rare ‘rare’ *oman roman ‘novel’
*ideau rideau ‘curtain’
*ecueil recueil ‘collection’

Note. Each adjective combines with both items in word-nonword pair for each condition. For example, drôle alizé/drôle *lalizé and drôle *avabo/drôle lavabo or cher artichaut/cher *rartichaut and cher *oman/cher roman.

Twelve different adjectives ending in two of the most frequent liaison consonants (six each of /z/ and /t/; e.g., curieux and petit) were included in these 144 adjective-noun sequences. For adjectives ending in the same consonant, each appeared with the same three nouns and three nonwords, which resulted in 72 pairs of sequences (e.g., petit abri/*tabri, maudit abri/*tabri, différent abri/*tabri, etc.). Therefore, each of the 12 target adjectives was heard 12 times over the course of the experiment, and nouns and nonwords were heard 6 times each. As much as possible, adjective-noun sequences previously used in liaison segmentation work (e.g., Tremblay & Spinelli, 2013, 2014a, 2014b) were also utilized for the current study.

Target adjectives and nouns were controlled for lemma frequency,4 co-occurrence frequency, and phonological neighborhood density, all obtained from the Lexique 3 online database (New et al., 2001). A simple linear regression with frequency as a dependent variable and a contrast-coded effect for adjective (/z/ versus /t/) revealed no significant difference in lemma frequency for /z/ and /t/ adjectives. Similarly, a simple linear regression for noun frequency with effects for consonant (/z/ or /t/) and liaison condition (i.e., V-initial or C-initial noun) showed no main effects or interactions, confirming target nouns did not differ in frequency across conditions. Co-occurrence frequencies between all adjectives and nouns are practically 0 (max = 4, mean = 0.11). In comparison, a collocation like petit ami (‘boyfriend’) has a co-occurrence frequency of 508. Finally, a simple linear regression for noun density with effects for consonant (/z/ or /t/) and liaison condition (V-initial or C-initial) revealed no main effects or interactions. Thus, there were no inherent differences between the adjectives, nouns, or adjective-noun sequences in any of these lexical or co-occurrence characteristics.

Participants were also presented with 72 filler sequences. These sequences comprised six different adjectives ending in non-liaison consonants (3 /ʁ/ and 3 /l/; while /ʁ/ can be a liaison consonant, it is always produced in the filler adjectives, regardless of the following phonological context, and is thus not a liaison consonant in these cases; see Table 4). For each consonant, all three adjectives appeared with three nouns and three nonwords (different than target nouns), making 36 pairs of sequences (e.g., drôle légume/*égume, sale légume/*égume, seul légume/*égume, etc.). Each filler adjective was heard 12 times over the course of the experiment, and each noun and nonword was heard 3 times.

2.2.2 Recording

An L1 speaker of standard French produced all target and filler sequences. The recording took place in a sound-attenuated booth using a Shure SM841 Condenser handheld microphone at a sampling rate of 44,100 Hz. The speaker read four randomized lists of sequences. Each list contained target or filler sequences from a single condition of the experiment (i.e., C-initial target, V-initial target, C-initial filler, V-initial filler), as well as other sequences that were not used in the current experiment. The speaker was instructed to read each sequence naturally with the same speech rate, rhythm, and prosody. Each sequence was read once unless mistakes were made, in which case those sequences were reproduced once. Recorded lists were segmented into two-word sequences for acoustic manipulation.

2.2.3 Acoustic manipulation

Previous work has shown that liaison consonants are on average 15% shorter than the same consonants in word-initial position (e.g., Gaskell et al., 2002). Productions by the native speaker in the current study are consistent with this finding: Word-initial consonants were on average 86 ms (/t/: 86.9 ms, /z/: 84.9 ms) in duration, while liaison consonants were on average 73 ms (/t/: 75.4 ms, /z/: 71 ms) in duration (a difference of 12 ms, or 14% shorter). To maximize the reliability of this acoustic-phonetic cue to word boundaries, the duration of the pivotal consonant (e.g., the liaison /t/ in petit abri or word-initial /t/ in petit *tabri) was manipulated, adapting the procedure from Shoemaker (2014). For each pair of sequences (e.g., petit abri and petit *tabri), the consonant-initial production of the pair was used as the base for manipulation.5 Therefore, each pair consisted of two phonetically identical sequences differing only in the duration of this pivotal consonant. This ensures that only this duration cue, and no other acoustic-phonetic information, could influence perception; all else being equal, a short pivotal consonant should cue a liaison (vowel-initial noun) parse, while a long consonant is expected to signal the presence of a consonant-initial noun.

In order to determine the appropriate duration for both types of pivotal consonant (resyllabified liaison consonant and underlyingly word-initial consonant), the duration of all pivotal consonants (in both petit abri and petit *tabri type productions) was measured. For /z/, initial boundaries were marked when the high amplitude and strong formant structure of the preceding vowel decreased and frication increased and final boundaries were marked at first evidence of vowel periodicity and frication decrease. For /t/, initial boundaries were marked at earliest evidence of closure, with some low amplitude voicing allowed during closure, and final boundaries were marked at the beginning of the release burst. All boundaries were marked at zero crossings.

Relative durations of the pivotal consonants (compared to the duration of the entire sequence) were calculated, collapsing across consonant. Means and standard deviations for relative durations of liaison and word-initial consonants were obtained (Table 5). To test that liaison consonants have shorter relative duration than word-initial consonants, a mixed effect regression with a contrast-coded fixed effect for Pivotal Consonant Type (liaison versus word-initial) and a random intercept for Pair was run. Results indicated a main effect of Pivotal Consonant Type (β = 0.01, SE = 0.002, χ2(1)= 25.34, p < .001), where liaison consonants had significantly shorter relative duration than word-initial consonants.

Table 5

Relative and absolute durations for acoustic manipulation.

Liaison variant Word-initial variant

Pre-manipulation (raw production) 8.57% (2.51%) 9.68% (2.57%)
73.2 ms (17.4 ms) 85.9 ms (18.6 ms)
Manipulation target 6.06% 12.25%
Post-manipulation 5.89% (0.47%) 12.08% (0.47%)
53.1 ms (7.1 ms) 109.3 ms (13.7 ms)

Note. Mean (standard deviation). To obtain relative durations, the duration of a consonant (in ms) was divided by the duration of the adjective-noun sequence (in ms).

Stimuli were manipulated to obtain a liaison-initial variant (or, liaison variant) and consonant-initial variant (or, word-initial variant) for each sequence. For the liaison variant, the standard deviation of the relative duration of liaison consonant productions was subtracted from the mean relative duration. This creates durations within the normal range of variation for liaison productions, but provides a somewhat extreme duration (strong acoustic-phonetic cue to liaison). For the word-initial variant, the standard deviation of the relative duration of word-initial consonant productions was added to the mean relative duration of these productions, again providing an extreme production still within the normal range of variation for word-initial consonants (strong acoustic-phonetic cue to consonant-initial word). Table 5 shows target relative durations for each variant type.

Variants were created by multiplying the original duration of pivotal consonants by each of the target relative durations reported in Table 5, resulting in the goal duration of the consonant after manipulation. For example, to create the liaison variant in a pair, the duration of the pivotal consonant in the consonant-initial production of that pair (e.g., /z/ in curieux zappeur) was shortened to approximately 6.06% of the duration of the entire sequence. To create the word-initial variant, the same pivotal consonant was modified to approximately 12.25% of the duration of the entire sequence. For this manipulation, a point was chosen on the acoustic waveform that would result in the most seamless manipulation (no noticeable abnormalities or large changes in amplitude). To decrease the duration of a consonant, a portion of the original consonant was highlighted and excised at zero crossings. To increase duration, a portion of the original consonant was copied at zero crossings and pasted at zero crossings until the desired duration was reached. Across stimuli, the absolute duration of pivotal consonants differed within variant type but the relative duration of the consonant to the sequence duration remained as constant as possible. Mean relative durations following manipulation are reported in Table 5.

In seven cases, manipulation required large increases in duration that created artificial sounding speech for consonant-initial variants for /z/ items (naturalness of the speech was assessed by the first author, who has training in French phonetics). In these cases, an alternative manipulation procedure was adopted that preserved the relative percentage difference between the variants, but had overall shorter absolute (and relative) durations. For example, the /z/ in douteux zappeur (consonant-initial variant) sounded unnatural when increased to a duration 12.25% of the total sequence duration. Instead, the duration of /z/ was increased as much as possible while still sounding natural and a new relative duration was calculated (10.7%). Then, the /z/ for the liaison variant was manipulated to a duration 4.5% of the total sequence duration to preserve the 6.2% difference between variants. All pivotal consonants for consonant-initial variants manipulated under this procedure have at least 10% relative duration.

Filler items underwent a similar manipulation procedure as the target items, although strict manipulation criteria used for the target items were abandoned due to features of the fillers. For example, many filler sequences included a geminate-like pivotal consonant due to the consonant appearing both at the end of the adjective and the beginning of the following noun (e.g., rare rideau), a characteristic not encountered for the target items due to liaison. As a result of this issue, filler items were manipulated on an item-by-item basis. As for the target items, consonant-initial productions were used as the base of manipulation to ensure constancy of all other acoustic-phonetic information.6 Consonant-initial variants were created by either slightly lengthening the duration of the existing consonant or leaving the duration as is. To create vowel-initial variants, the duration of the existing consonant was decreased until the production sounded vowel-initial to the experimenter’s ear.

Following manipulation of the pivotal consonant duration (as described above), each adjective-noun sequence could be categorized as having either matching or mismatching acoustic-phonetic and lexical-semantic information. For example, [pə. ti. ta. bʁi] with a short pivotal consonant /t/ represented a case of lexical-acoustic match, because both the lexical-semantics and acoustic-phonetics favored placement of the word boundary to yield the word-word parse petit abri. In contrast, the same sequence of phonemes with a long pivotal /t/ represented a lexical-acoustic mismatch because the long consonant favored the placement of the word boundary that yielded the lexically-unacceptable word-nonword parse petit *tabri. Thus, for vowel-initial nouns where liaison applies, short pivotal consonant durations represent lexical-acoustic match cases; whereas, for consonant-initial nouns where liaison does not apply, long pivotal consonants represent lexical-acoustic match cases.

2.2.4 Noise mixing

Following acoustic manipulation, the overall amplitude of all files was normalized to the same level. Each file was then digitally mixed with speech-shaped noise to create a noise condition with –8 dB SNR (i.e., the speech was set at a level that was 8 dB softer than the noise). Mattys et al. (2009) found that native speakers of English changed segmentation strategies when presented with a –8 dB SNR noise condition relative to a no-noise condition. In an attempt to replicate this finding with native speakers of French, the same SNR was implemented here for the noise condition. The noise began 500 ms before the onset of the speech signal and continued 500 ms after the speech ended. The speech and noise signals were combined to a single channel so that both signals would be presented binaurally.

To counterbalance which sequences were heard in no noise versus in noise, the 144 target sequences were divided into two sets of 72. Thirty-six of these phrases of each set were from the V-initial condition and the other 36 from C-initial. Of these two subsets of 36 items, 18 were /z/ items and 18 were /t/ items, which included 9 lexical-acoustic match and 9 lexical-acoustic mismatch phrases for each consonant. For a given pair (e.g., petit abri and petit *tabri), the lexical-acoustic match and mismatch phrases were never in the same set of 72. Assignment of noise condition to the two sets of 72 phrases was counterbalanced across participants. Therefore, each participant was exposed to both presentation conditions (no-noise and –8 dB SNR) but never on the same phrases. The same counterbalancing procedure was implemented for the filler items.

2.3 Procedure

Participants heard 216 phrases (144 targets and 72 fillers), which were randomized across participants in a single block. Both phrases in no noise and those in noise were presented in this single block (following Mattys et al., 2010). Test trials were preceded by eight practice trials. Practice trials were adjective-noun sequences including liaison with /n/ (e.g., prochain nabot/prochain *abo and prochain otage/prochain *notage), which underwent a similar manipulation procedure as the filler items. These trials included both no-noise and noise conditions. Phrases were played at a comfortable volume in either an attenuated booth or a quiet room over Sony MDRV700 headphones.

The experimental procedure mirrored that in Mattys et al. (2010) as closely as possible. After hearing a phrase, participants indicated which of the two words presented orthographically on the screen (e.g., abri or tabri; both unambiguously singular) they heard at the end of the phrase. They were told that sometimes the phrases would be played in a quiet background, and other times they would be embedded in noise. Participants were instructed to respond based on what they heard, not based on what they thought the speaker should have said. They were told that some of the options on the screen would be fake words, and that it was acceptable to report hearing those fake words if that is what they thought they heard. The two word options were presented at opposite ends of the screen, separated by dots and numbers ranging from 1 to 11. Position of the vowel-initial versus consonant-initial word options were fixed within participant, and counterbalanced across participants. Participants were told they could make use of the entire 11-point scale, and response keys were 11 adjacent keys on the keyboard (keys 1–9 and labels on 0 and ‘-‘ for 10 and 11). Participants were presented with each stimulus only once, with no option to replay any stimulus. They were given 10 seconds to respond.

Following the main test, participants were asked to rate their familiarity with the real and nonword nouns they had just encountered using a 4-point scale (1 = “I have never seen/heard this word”; 2 = “I have seen/heard this word, but I don’t know what it means”; 3 = “I have seen/heard this word and I know what it means in context, but I could not provide a definition for it”; 4 = “I have seen/heard this word, I know what it means, and I can provide a definition for it”). The real and nonword nouns were presented orthographically in the center of the screen. Participants were given eight practice trials (the real and nonword nouns from the previous practice sequences), and then rated all real and nonce words that had appeared in the experiment. These ratings will be considered in the analysis and interpretation of the results. The final task was the cloze test designed to assess proficiency in French (Tremblay, 2011b). In total, the experimental session lasted 40 to 50 minutes on average.

2.4 Data analysis

2.4.1 Response coding

Prior to analysis, raw ratings were converted to a response code that indicated whether the participant gave a lexically-acceptable (10) or lexically-unacceptable parse (0). For example, a rating of 11 corresponding to ‘tabri’ when hearing petit abri was coded as 0 (‘tabri’ is a nonword), and a rating of 0 corresponding to ‘abri’ was coded as 10; similarly, a rating of 11 corresponding to ‘zappeur’ after hearing curieux zappeur was coded 10, and a 0 rating corresponding to ‘appeur’ was coded 0 (‘appeur’ is a nonword).

Although participants were instructed to use the entire 11-point scale during the experiment, most participants overwhelmingly chose ratings at the end points. As a result, the data were highly skewed to these endpoints, which violates assumptions of normal distribution for linear regressions. Therefore, the response codes were divided into two bins, coercing responses into binary form (lexical versus non-lexical response): Responses coded 0–4 were called ‘non-lexical responses’ and those coded 6–10 were called ‘lexical responses’. So that each bin received responses from an equal number of codes (5 each), responses coded 5 were excluded from analysis (N = 334; 4% of responses).

2.4.2 Statistical analysis

Target responses from participants in the main groups of interest were analyzed using a series of logistic mixed effects regressions using the glmer function in version 1.1-7 of the lme4 package for R. The dependent variable for all analyses was lexical versus non-lexical responses. (Note: All figures feature a proportional representation of this binary variable; i.e., proportion of lexical responses out of all responses). Fixed effects in these models included contrast-coded effects for Group (L1-French versus L2-French), Noise Condition (no-noise versus –8 dB SNR), Liaison Condition (non-liaison (C-initial nouns) versus liaison (V-initial nouns)), and Lexical-Acoustic (Mis)match (match versus mismatch). All possible two- and three-way interactions, and the four-way interaction were included. The maximal random effects structure supported by the data was utilized, with random intercepts for Participant and Item, and uncorrelated random slopes for each fixed effect except Group by Participant as well as uncorrelated random slopes for Group and Noise Condition by Item. Significance of fixed effects and interactions was assessed via nested model comparison.

3 Results

3.1 Overall strategies for L1-French and L2-French listeners

A significant main effect of Group indicates that L1-French and L2-French listeners had distinct overall segmentation biases. L1-French listeners gave a significantly higher proportion of lexical responses compared to L2-French listeners (L1-French: mean = 77.8%, SE = 4.6%; L2-French: mean = 62.3%, SE = 2.5%). Although both groups of listeners showed an overall lexical drift (i.e., more than 50% lexical responses for both groups), the L2-French listeners exhibited an attenuated lexical drift when compared to the L1-French listeners. Table 6 shows descriptive statistics for each experimental manipulation separated by group, and Table 7 shows a summary of the output for the main model (i.e., showing all main effects and interactions tested, presented in Sections 3.1–3.4).

Table 6

Descriptive statistics for main experiment conditions.

Lexical-Acoustic (Mis)match
Liaison Condition
Noise Condition
Match Mismatch Liaison No liaison No noise –8 dB SNR

L1-French 80.2% (4.6%) 75.3% (4.9%) 65.7% (7.7%) 89.6% (3.0%) 78.7% (4.6%) 76.8% (4.9%)
L2-French 69.6% (2.7%) 55.1% (2.9%) 73.9% (4.6%) 50.5% (3.4%) 64.1% (2.3%) 60.4% (2.9%)

Note. Mean proportion of lexical responses. Standard error in parentheses.

Table 7

Main model output.

Factors Estimate Std. Error Chi-squared p

Intercept 1.51 0.23
Group 1.50 0.45 9.97 < .01*
Lexical-Acoustic (Mis)match 0.70 0.13 23.21 < .001*
Liaison Condition 0.25 0.36 0.50 .48
Noise Condition 0.29 0.11 6.69 < .01*

Group × Lexical-Acoustic (Mis)match –0.23 0.21 1.19 .28
Group × Liaison 3.39 0.70 17.66 < .001*
Group × Noise 0.12 0.19 0.40 .52
Lexical-Acoustic (Mis)match × Noise 0.33 0.20 2.82 .09
Lexical-Acoustic (Mis)match × Liaison 0.11 0.24 0.21 .64
Noise × Liaison 0.39 0.20 3.74 .053~

Group × Lexical-Acoustic (Mis)match × Noise 0.33 0.34 0.95 .33
Group × Lexical-Acoustic (Mis)match × Liaison 0.07 0.35 0.04 .85
Group × Noise × Liaison 1.01 0.34 8.46 < .01*
Lexical-Acoustic (Mis)match × Noise × Liaison 0.34 0.40 0.73 .39

Four-way interaction (all factors) 0.49 0.67 0.50 .48

Note. All chi-squared statistics have 1 degree of freedom. Significant effects denoted with *, marginal effects are marked with ~.

This significant difference between the L1-French and L2-French groups indicates a successful replication of Mattys and colleagues (2010). Our results confirm that L1 listeners (of a language other than English) showed a lexical drift when segmenting speech, whereby they relied on knowledge-based cues to word boundaries, replicating Mattys et al. (2009). Further, we confirm that L2-French listeners relied on knowledge-based cues to word boundaries to a lesser extent than native listeners, consistent with Mattys et al. (2010).

3.2 Effect of acoustic manipulation

Both groups of listeners gave a significantly higher proportion of lexically-acceptable responses when lexical-semantic and acoustic-phonetic information matched than when in conflict (match: mean = 68.3%, SE = 2.3%; mismatch: mean = 60.2%, SE = 2.4%). This finding provides evidence that listeners were sensitive to our acoustic-phonetic manipulation such that, as expected, shorter pivotal consonant durations were generally associated with liaison. However, Lexical-Acoustic (Mis)match did not interact significantly with any other factors, which suggests that the influence of lexical knowledge on speech segmentation overrides the acoustic cue to word boundaries in the stimuli to the same extent for both listener groups, in both liaison conditions, and in both noise conditions.

3.3 Compensation for liaison

While there was no main effect of Liaison Condition, there was a significant interaction between Liaison Condition and Group (shown in Figure 1). Follow-up regressions (shown in Table 8) revealed that, when analyzed separately, both groups showed a main effect of Liaison Condition, indicating that both L1-French and L2-French listeners gave a significantly different proportion of lexical parses to sequences when liaison occurred (with V-initial nouns; e.g., petit abri) and did not occur (C-initial nouns; e.g., curieux zappeur). However, the liaison effect differed markedly across groups (i.e., the data show a cross-over interaction), reflecting group-specific strategies for liaison compensation.

Figure 1 

Group by Liaison Condition interaction. Mean proportion of lexical responses by each group in each liaison condition. Error bars show standard error.

Table 8

Model output for follow-up regressions separated by group (L1-French and L2-French).

L1-French model Estimate Std. Error Chi-squared p

Liaison Condition –1.38 0.38 10.43 < .01*
Noise × Liaison Condition 0.92 0.30 9.18 < .01*
L2-French model Estimate Std. Error Chi-squared p

Liaison Condition 2.02 0.67 7.04 < .01*
Noise × Liaison Condition –0.10 0.21 0.23 0.63

Note. Only interaction of interest reported. Chi-squared statistics have 1 degree of freedom. Significant effects denoted with *.

In the figures presented below, the relationship between the proportion of lexical responses and listeners’ overall preference for parses with a V-initial vs. C-initial word differs across liaison conditions. In the liaison condition, a lexical response corresponds to hearing a V-initial word (e.g., abri) in the sequence. Therefore, a relatively low proportion of lexical responses in this condition reflects a preference for the C-initial, nonword parse (e.g., *tabri). In contrast, a lexical response in the no liaison condition corresponds to hearing a C-initial word (e.g., zappeur) in the sequence. Therefore, a relatively low proportion of lexical responses reflects a preference for the V-initial, nonword parse (e.g., *appeur). The group-specific strategies for liaison compensation are evidenced by the difference across groups in which liaison condition elicited the highest proportion of lexical responses.

Specifically, Figure 1 illustrates that L2-French listeners tend to respond that the noun they heard was V-initial, indicating a liaison parse; they gave a high proportion of lexical responses in the liaison condition (i.e., mostly V-initial nouns), and a relatively low proportion of lexical responses in the no liaison condition (i.e., many V-initial nouns). In sharp contrast to the L2-French listeners, L1-French listeners overwhelmingly responded that they heard C-initial nouns.

These results indicate that when presented with V-initial nouns in liaison-inducing sequences (e.g., petit abri), both L1-French and L2-French listeners generally knew to compensate for liaison. Each group gave lexically-acceptable, V-initial parses about 70% of the time, despite a consonant surfacing in the initial position of the second word. This pattern of results suggests that the L2-French listeners, like the L1-French listeners, had successfully learned the rule for liaison, and applied it appropriately in this condition. However, when hearing a sequence with an underlyingly C-initial noun (e.g., curieux zappeur), L2-French listeners showed uncertainty about the appropriate parse (i.e., responded near chance) and attributed the consonant to the adjective half the time (i.e., giving a liaison parse). In contrast, the L1-French listeners appropriately attributed the consonant to the onset of the C-initial word and gave lexical responses approximately 90% of the time. Thus, it appears that the L2-French listeners attempted to “undo” liaison inappropriately (i.e., where it had not, in fact, applied) half of the time, which suggests they over-applied the rule for liaison and have yet to learn to constrain it appropriately. In contrast, L1-French listeners assigned the appropriate C-initial parse almost all of the time, indicating they have learned to constrain the liaison rule.

3.4 Strategy shift in noise

The significant main effect of Noise demonstrates that both groups shifted segmentation strategies under degraded conditions. When the signal was presented in noise, both groups gave a significantly lower proportion of lexically-acceptable parses (no-noise: mean = 71.4%, SE = 2.8%; –8 dB SNR: mean = 68.6%, SE = 3.1%), indicating an attenuated lexical bias for both groups when the signal was degraded. However, there was a significant higher-order three-way interaction of Group by Liaison Condition by Noise Condition, which reflects a significant Liaison Condition by Noise Condition interaction for the L1-French but not the L2-French listeners (Figure 2; see Table 8 for model results). For the L1-French listeners, the introduction of noise affected responses to non-liaison and liaison sequences differently. The L1-French listeners gave a marginally lower proportion of lexical responses to non-liaison sequences in noise (mean = 86.9%; SE = 4.2%) versus in quiet (mean = 92.1%; SE = 4.3%), showing slight attenuation of the lexical bias, as expected, for the noise condition. However, these listeners showed no difference in proportion of lexical responses to liaison sequences in noise (mean = 66.6%; SE = 4.9%) versus in quiet (mean = 65.1%; SE = 4.3%; see Table 9 for model results). This difference across liaison conditions suggests that there is a limit to the amount that native listeners are willing to deviate from lexical responses; when listeners gave a relatively low proportion of lexical responses in quiet (i.e., 65.1% for the V-initial nouns in liaison sequences), they were unwilling to sacrifice the lexical bias any further in noise. In contrast, for the L2-French listeners, noise did not affect compensation for liaison versus non-liaison differently; for both C-initial nouns in non-liaison sequences and V-initial nouns in liaison, the rate of lexical responses decreased in noise versus in quiet, consistent with the overall main effect of Noise (Mattys et al., 2005).

Figure 2 

Group by Liaison Condition by Noise interaction. Mean proportion of lexical responses by both groups in both liaison conditions across noise conditions. Errors bars show standard error.

Table 9

Model output for follow-up regression separated by liaison condition (L1-French only).

Non-liaison model Estimate Std. Error Chi-squared p

Noise Condition 0.74 0.28 3.25 .07~
Liaison model Estimate Std. Error Chi-squared p

Noise Condition –0.11 0.21 0.28 .60

Note. Only main effect of interest reported. Chi-squared statistics have 1 degree of freedom. Marginal significance marked with ~.

3.5 Additional analyses

3.5.1 Influence of liaison statistics

Recent work has shown that L1-French and L2-French listeners exploit a segment-specific cue to word boundaries in liaison conditions (Tremblay, 2011a; Tremblay & Spinelli, 2013, 2014a). This analysis considers whether L1-French and L2-French listeners show sensitivity to the relative probabilities of liaison for /z/ versus /t/ (liaison more likely for /z/ than for /t/) and how use of this cue interacts with the degree to which listeners rely on lexical cues to word boundaries.

As in the main analysis, the dependent variable for this mixed effect logistic regression was lexical versus non-lexical response. This model was identical to the model used for the main analysis with the exception of the fixed effect for Lexical-Acoustic (Mis)match, which was excluded from the present analysis. This factor was excluded for a number of reasons. First, this effect did not interact with any other fixed effects in the main analysis, and we had no hypotheses predicting it should interact meaningfully with the consonant effect. Second, the factor was excluded to ease interpretation and facilitate convergence of the model, which already included a four-way interaction. With this factor excluded and effects for Consonant added to the model, all significant main effects and interactions reported in the main analysis hold (one interaction became marginal). Therefore, only significant main effects and interactions that are unique to this analysis (those involving effects of Consonant) will be discussed below, although all main effects and interactions from statistical tests are reported in Table 10.

Table 10

Model output for liaison consonant statistics analysis.

Factors Estimate Std. Error Chi-squared p

Intercept 1.52 0.23
Group 1.51 0.45 9.87 < .01*
Consonant –0.22 0.14 2.33 .13
Liaison Condition 0.23 0.37 0.41 .52
Noise Condition 0.28 0.11 6.69 < .01*

Group × Consonant –0.66 0.20 9.13 < .01*
Group × Liaison Condition 3.39 0.70 17.38 < .001*
Group × Noise 0.07 0.19 0.15 .70
Consonant × Noise 0.11 0.19 0.31 .58
Consonant × Liaison Condition –0.53 0.27 3.72 < .053~
Noise × Liaison Condition 0.36 0.20 3.35 .07~

Group × Consonant × Liaison Condition 1.27 0.34 13.25 < .001*
Group × Consonant × Noise 0.26 0.33 0.56 .45
Group × Noise × Liaison Condition 0.98 0.34 8.04 < .01*
Consonant × Noise × Liaison Condition –0.86 0.39 4.70 < .05*

Four-way interaction (all factors) –0.05 0.67 0.006 .94

Note. Chi-squared statistics have 1 degree of freedom. Significant effects denoted with *, marginal significance marked with ~.

While the main effect of Consonant failed to reach significance (descriptive statistics in Table 11), there were a series of significant interactions involving Consonant. The significant Group by Consonant interaction revealed L1-French and L2-French listeners responded differently to /z/ and /t/ items. Follow-up regressions indicated that L1-French listeners gave a higher proportion of lexical responses to /t/ items than /z/ items, but L2-French responses showed no difference across consonants (results summarized in Table 12). Another marginally significant two-way interaction between Consonant and Liaison Condition showed that listeners gave no difference in proportion of lexical responses to /z/ items for sequences with versus without liaison, but a marginally higher proportion to /t/ items in non-liaison sequences (e.g., petit tableau) than in liaison sequences (e.g., petit abri), confirmed by follow-up regressions (results summarized in Table 13).

Table 11

Descriptive statistics for consonant condition in additional analysis.

Consonant
/z/ /t/

L1-French 75.1% (4.9%) 80.4% (4.5%)
L2-French 62.4% (2.6%) 62.3% (2.9%)

Note. Mean proportion of lexical responses. Standard error in parentheses.

Table 12

Model output for follow-up regressions separated by group (L1-French and L2-French).

L1-French model Estimate Std. Error Chi-squared p

Consonant –0.53 0.15 9.82 <.01*
Consonant × Liaison Condition 0.13 0.30 0.17 .68
L2-French model Estimate Std. Error Chi-squared p

Consonant 0.09 0.17 0.28 .60
Consonant × Liaison Condition –1.13 0.29 14.05 <.001*

Note. Only effects of interest reported. Chi-squared statistics have 1 degree of freedom. Significant effects denoted with *.

Table 13

Model output for follow-up regressions separated by consonant.

/z/ model Estimate Std. Error Chi-squared p

Liaison Condition –0.15 0.41 0.13 .72
Noise Condition × Liaison Condition –0.08 0.26 0.08 .77
/t/ model Estimate Std. Error Chi-squared p

Liaison Condition 0.76 0.41 3.44 .06~
Noise Condition × Liaison Condition 0.87 0.29 8.97 <.01*

Note. Only effects of interest reported. Chi-squared statistics have 1 degree of freedom. Significant effects denoted with *, marginal significance marked with ~.

These two-way interactions were modulated by a higher-order three-way interaction between Group, Consonant, and Liaison Condition, shown in Figure 3. Results from follow-up regressions (results summarized in Table 12) support the general pattern shown in Figure 3. L1-French listeners show no difference in compensation for liaison across consonants; for both /z/ and /t/ items they show a higher rate of lexical parses for non-liaison sequences (i.e., C-initial nouns) than for liaison sequences (i.e., V-initial nouns). While L2-French listeners show a significant difference between liaison and non-liaison conditions for /z/ (though in the opposite direction from that of the L1-French listeners, giving rise to the cross-over interaction), this difference is attenuated for /t/ items relative to for /z/ items (results summarized in Table 14).

Figure 3 

Group by Consonant by Liaison interaction. Mean proportion of lexical responses for both groups and both consonants across liaison conditions. Error bars show standard error.

Table 14

Model output for follow-up regressions separated by consonant for L2-French only.

/z/ model (L2-French only) Estimate Std. Error Chi-squared p

Liaison Condition –1.94 0.46 12.69 <.001*
/t/ model (L2-French only) Estimate Std. Error Chi-squared p

Liaison Condition –0.86 0.43 3.72 .053~

Note. Only effects of interest reported. All chi-squared statistics have 1 degree of freedom. Significant effects denoted with *, marginal significance marked with ~.

Finally, the significant three-way interaction between Consonant, Liaison Condition, and Noise Condition reflects changes in how listeners compensate for liaison depending on the noise/consonant combination. Figure 4 shows a significant crossover interaction for /t/ items but no significant difference in response to liaison for /z/ items (confirmed by a follow-up regression, results summarized in Table 13). The significant interaction for /t/ items was driven by a significant attenuation of the lexical bias for sequences without liaison in noise, while there was no significant difference across noise conditions for liaison sequences (results summarized in Table 15).

Figure 4 

Consonant by Liaison Condition by Noise Condition interaction. Mean proportion of lexical responses for both consonants in both liaison conditions across noise conditions. Error bars show standard error.

Table 15

Model output for follow-up regression separated by liaison condition (/t/ items only).

Liaison /t/ model Estimate Std. Error Chi-squared p

Noise Condition –0.19 0.17 1.22 .27
No liaison /t/ model Estimate Std. Error Chi-squared p

Noise Condition 0.65 0.27 5.63 <.05*

Note. Only effects of interest reported. All chi-squared statistics have 1 degree of freedom. Significant effects denoted with *.

Results from the main analysis indicate that L2-French speech segmentation was driven by over-applied (i.e., under-constrained) knowledge of liaison. This conclusion was supported by the significant interaction between Consonant, Liaison Condition, and Group in this analysis. L2-French listeners showed attenuation of the large asymmetry across liaison conditions for /t/ items, where a higher proportion of lexical responses was given in the V-initial condition. This finding suggests that L2-French listeners utilized knowledge about liaison statistics, namely the knowledge that /t/ is more likely to occur word-initially than in liaison, to appropriately compensate for liaison. L1-French listeners showed no change across liaison conditions as a function of liaison consonant, which suggests they did not make use of the liaison consonant cue in this word segmentation paradigm.

While Tremblay and colleagues (2011, 2013, 2014a) found that both groups show sensitivity to liaison statistics during speech processing, it is possible that differences in tasks between those studies and the current study can account for this discrepancy. Tremblay and colleagues used an eye-tracking paradigm, which shows how listeners process the speech signal online, but the current study utilized a more offline task. Therefore, one possibility is that the current task lacks the sensitivity needed to detect these effects in L1-French listeners, who may use these statistics to bias processing prior to successful speech segmentation. On the other hand, L2-French listeners may use this consonant cue more directly to drive speech segmentation. Future research should investigate this idea more carefully.

Based on the other significant three-way interaction between Consonant, Liaison Condition, and Noise Condition, we might conclude that listeners generally move toward more liaison-appropriate parses when the signal is degraded, even if this conflicts with both lexical-semantic cues and liaison consonant statistics. Consistent with this interpretation, /z/ items showed an across-the-board lowering in proportion of lexical responses with a signal in noise, preserving the liaison bias already occurring in ideal listening conditions. In contrast, the significant crossover interaction for /t/ items shows that the difference between C-initial parses and V-initial (liaison) parses neutralized in noise, with fewer lexical responses (i.e., more liaison parses) in the no liaison condition. This neutralization suggests that listeners may default to liaison parses under degraded conditions when lexical-semantic cues may be impoverished (i.e., give less lexically-acceptable, but more liaison, parses for sequences with C-initial nouns). Surprisingly, this neutralization for /t/ items occurred despite the distributional tendency for /t/ to occur word-initially. This discrepancy could be resolved by appealing to a hierarchy of cues for French similar to Mattys and colleagues’ proposal for English (e.g., Mattys et al., 2005). Like English listeners, we have evidence that French listeners prefer to rely on high-level information, such as lexical knowledge, over lower-level cues, such as acoustic-phonetic information, when such cues are in conflict. Our results also support the ranking of distributional cues below lexical knowledge, due to the fact that lexical biases persist despite conflicting distributional cues. The low ranking of distributional information is further supported by the finding that L2-French, but not L1-French, listeners utilized distributional knowledge as a segmentation cue in this paradigm, consistent with Mattys et al.’s (2010) findings that L2 listeners default to lower level segmentation cues. The surprising results for /t/ items in degraded conditions provide some evidence that phonological knowledge (i.e., knowledge about the liaison process) should occupy an intermediate ranking between lexical knowledge and distributional knowledge. Future work should directly test the relative ranking of this type of word-boundary cues (i.e., cues that are outranked by lexical knowledge).

3.5.2 Word familiarity

Given the assumption that responses should have been driven by lexical knowledge of the target language (i.e., listeners should give more lexically-acceptable than lexically-unacceptable parses; Mattys et al., 2010), the influence of word familiarity on the overall pattern of responses should be considered. Analyses above revealed that L1-French and L2-French participants gave different proportions of lexical responses (1) overall (significant main effect of Group), and (2) depending on phonological factors (Group by Liaison Condition interaction; Figure 1). Here we investigate whether differences in listener familiarity to target words across groups and liaison conditions could explain these significant effects.

Mean familiarity ratings for each group broken down by liaison condition are shown in Table 16. To consider whether differences in word familiarity across groups and liaison conditions can account for our results, data were analyzed using a mixed effects regression predicting word familiarity with contrast-coded fixed effects for Group and Liaison Condition and their interaction, with random intercepts for Item and Participant, as well as by-item slopes for Group and by-participant slopes for Liaison Condition. There was a significant main effect of Group (β = 0.75, SE = 0.24, χ2(1) = 8.21, p < 0.01), where L1-French listeners were more familiar with the words than the L2-French listeners. The main effect of Liaison Condition and its interaction with Group failed to reach significance (both χ2(1) < 2, p < 0.05).

Table 16

Mean word familiarity.

C-initial words V-initial words Overall

L1-French 3.60 (0.03) 3.95 (0.02) 3.79 (0.01)
L2-French 3.19 (0.02) 2.96 (0.02) 3.04 (0.02)

Note. Mean (standard error). Familiarity rated on scale from 1 (“I have never seen/heard this word”) to 4 (“I have seen/heard this word, I know what it means, and I can provide a definition for it”).

These results show that the L1-French listeners were, in fact, more familiar with the target words than the L2-French listeners. As a test that this significant difference did not drive our main findings reported above, L1-French and L2-French responses were matched for word familiarity. That is, all L2-French responses with word familiarity lower than the lowest L1-French rating were excluded (e.g., lowest L1-French rating for tabac was 3, so any L2-French ratings for tabac below 3 were excluded). With data matched for word familiarity, the pattern of results remains unchanged. Results of the mixed effects regression above also verified that there were no significant differences in word familiarity across liaison conditions, which shows the differences in word familiarity cannot account for the important Group by Liaison Condition interaction in the main analysis. Together, results from this mixed effects regression for word familiarity show that the findings reported in the main analysis above could not be driven by differences in word familiarity across groups or across liaison conditions.

3.5.3 L2-French proficiency

As L2-French listeners approach L1-like proficiency, we might expect that their pattern of responses may shift and begin to mirror L1 behavior (i.e., higher proportion of lexical parses and more lexical responses to non-liaison versus liaison sequences). To explore this possibility, two mixed effect logistic regression models were built to test for (1) a main effect of proficiency, and (2) an interaction of proficiency and liaison condition. One model incorporated a continuous fixed effect for proficiency using cloze test scores (proportion correct answers) with otherwise the same fixed and random effects structure as models reported previously for main analyses. In the other model, L2-French participants were separated into high and low proficiency bins (9 participants in each) and proficiency was entered into the model as a contrast-coded fixed effect, again with the same fixed and random effects structure reported above. Participants in the high proficiency bin had a mean score of 54% (SE = 4.4%), while the lower proficiency participants had a mean score of 34.7% (SE = 5.5%).

Neither model yielded results which revealed a significant main effect of proficiency, indicating no change in proportion lexical responses given as proficiency improved; L2-French listeners did not reach L1-like proportions of lexical responses as they gained French proficiency. Furthermore, neither model showed a significant Proficiency by Liaison Condition interaction, which indicates that the asymmetry between responses to liaison versus non-liaison sequences does not change as a function of proficiency.7 These results provide evidence that our L2-French participants have not achieved L1-like speech segmentation patterns, and that despite the range of cloze test scores, the L2-French group was functionally homogeneous.

4 Discussion

The current study considered how L1-French and L2-French listeners segment French speech containing liaison, a phonological phenomenon that should make speech segmentation more challenging due to a misalignment of syllable and word boundaries. Sandhi phenomena, such as French liaison, occur in many languages. This particular French-specific sandhi phenomenon provides a particularly interesting case through which to study speech segmentation in general. Previous work has established that French listeners exploit the regular syllable structure of French to segment speech by proposing word boundaries at the onsets of syllables (e.g., Dumay et al., 2002). However, liaison misaligns word and syllable boundaries, rendering this dominant strategy unreliable. This creates an interesting situation in which to consider what other types of information listeners rely upon during speech segmentation.

First, our results revealed differences in the extent to which L1-French and L2-French listeners utilized knowledge-based cues (specifically, lexical knowledge) to locate word boundaries in adjective-noun sequences with and without liaison. Specifically, L1-French listeners utilized a strong lexically-based segmentation strategy, where listeners consistently assigned the sequence-medial consonant either to the coda of the first word or to the onset of the second word to yield relatively more word-word than word-nonword responses (i.e., petit abri versus petit *tabri, and curieux zappeur versus curieux *appeur). This result is consistent with the lexical drift observed for English listeners in studies by Mattys and colleagues (Mattys et al., 2009; Mattys et al., 2010). Furthermore, while L2-French listeners gave significantly fewer word-word responses than the L1-French listeners, they did still utilize a lexically-based strategy (greater than 50% lexical responses overall). These results provide evidence that L2-French listeners can rely on knowledge-based segmentation strategies in a manner similar to L1 listeners. In this respect, our results contrast with findings by Mattys et al. (2010), where L2-English listeners never exhibited a lexical drift, even under the most favorable listening conditions. This divergence may stem from methodological differences between the studies, which were necessary due to differences between English and French. In the current study, listeners were presented with a word and nonword option for the second word in the sequence (e.g., abri or *tabri). Listeners in Mattys et al. (2010) identified the first word in the sequence, and both of the presented options were words (e.g., mild or mile). Therefore, our results likely reflect an overall lexical bias effect, due to listeners’ reluctance to report hearing nonwords. However, critically, we still observed differences in the proportion of lexical responses given across conditions, indicating different segmentation strategies. Our results did not vary according to the level of proficiency of the L2-French listeners; the highest proficiency L2-French listeners did not show an L1-like pattern of results. Furthermore, these results held even when controlling for differences in word familiarity across groups.

Second, when listening to a degraded signal (signal embedded in noise), both groups relied less on knowledge-based cues. That is, while overall lexical biases persisted, both groups exhibited attenuated use of knowledge-based cues when the signal was in noise compared to the no-noise condition. This attenuation could reflect an increased reliance on signal-based cues to word boundaries, consistent with previous work in English (Mattys et al., 2005; Mattys et al., 2009). However, given the persistence of the lexical bias, an alternative interpretation is that the lower proportion of lexical responses in the noise compared to no-noise condition is indicative of listener uncertainty about the reliability of their lexical knowledge. Under either interpretation, these results indicate that listening conditions, like native listener status, influence the degree to which listeners rely on knowledge-based cues to word boundaries, replicating results for English listeners (e.g., Mattys et al., 2005; Mattys et al., 2009).

Finally, L1-French and L2-French listeners differed quite dramatically in how they compensated for liaison. Specifically, if the second word in a given two-word sequence was V-initial, thereby triggering application of liaison with enchaînement, then both L1-French and L2-French listeners were equally likely to compensate for the misalignment of word and syllable boundaries. In such cases, both listener groups had average lexical parse rates of approximately 70%. This is largely consistent with previous work, which has shown that liaison minimally disrupts speech segmentation (Gaskell et al., 2002; Spinelli et al., 2002; Spinelli et al., 2003). However, if the second word in a given two-word sequence was C-initial (i.e., not a possible liaison sequence), then the L2-French listeners were unable to appropriately constrain compensation for liaison; in fact, they over-compensated with excessively liaison-based responses. That is, the L2-French listeners experienced confusion about whether the liaison rule applied for sequences with C-initial nouns. As a result, they were just as likely to parse such sequences as if liaison had applied at the expense of giving word-nonword parses (e.g., report hearing *appeur in curieux zappeur) as they were to correctly parse such sequences without assuming liaison had applied (average lexical parse rate of approximately 50%). In contrast, L1-French listeners had a lexical parse rate of approximately 90% for sequences with C-initial nouns (i.e., in cases where liaison did not apply), demonstrating proficiency in constraining the liaison rule. Together with differences in reliance on lexical cues, these results provide compelling evidence that listener segmentation strategies vary depending on native listener status, with L2-French listeners over-applying liaison in comparison to L1-French listeners. These results support previous findings that L2-French listeners do not respond to liaison in the same way as L1-French listeners (e.g., Dejean de la Batie & Bradley, 1995; Tremblay & Spinelli, 2014a), likely reflecting impoverished knowledge of both the phonological processes and lexical items of French (Cutler, 2001). In fact, the over-application of liaison shown by L2-French listeners resembles the over-generalization typical of an intermediate stage of acquisition of morphophonological rules by L1 infants (e.g., acquisition of irregular past tense morphology in English; Ervin & Miller, 1963). Although the higher proficiency participants in the current study did not exhibit L1-like constraint of liaison during speech segmentation, we predict that a longitudinal or cross-sectional study of L2-French learners would reveal that L2-French listeners with high enough levels of proficiency eventually behave in a qualitatively similar manner to L1-French listeners (consistent with Tremblay, 2011a).

Although we have clear evidence that L1- but not L2-French listeners have acquired and constrained the rule for liaison, it remains unclear precisely what this liaison rule entails. Many analyses of the representational status of liaison consonants have been proposed, and have found varying degrees of support from a number of studies (e.g., Durand & Lyche, 2008; Soum-Favaro et al., 2014). Liaison consonants could be represented word-finally (e.g., Encrevé, 1988; Selkirk, 1974), word-initially (Ternes, 1977), or as part of an abstract, multi-word construction (Bybee, 2001). Liaison consonants could also be inserted epenthetically (Côté, 2005) or as part of a morphological process (e.g., Morin, 2003). Full discussion of the merits and downfalls of each of these analyses is beyond the scope of this paper, but see Soum-Favaro et al. (2014) for a series of in-depth discussions and Côté (2011) for a more detailed review of these various theories.

This debate is ongoing and, as a result, it is unclear what about liaison must be acquired (i.e., whether the liaison consonant belongs to the first or second word, to both, or to neither). The current task asked listeners to make an offline decision about the identity of a noun, and, for the critical stimuli, listeners were required to potentially “undo” liaison prior to making this decision. This task is not sufficiently sensitive to determine the nature of the “undoing” process implemented by listeners (i.e., whether they attributed the initial consonant of the noun to resyllabification of a word-final liaison consonant, undid epenthesis of the liaison consonant, assigned the consonant underlying word-initial status, etc.). That is, we assume a great deal of processing has occurred prior to the point at which listeners make a decision about the noun they heard, but the current data cannot speak to what form this processing takes, and, therefore, cannot contribute to the debate on the representational status of liaison consonants. The same can be said for the majority of existing literature on speech segmentation in liaison contexts (e.g., Gaskell et al., 2002; Shoemaker, 2014; Spinelli et al., 2002; Spinelli et al., 2003; Tremblay, 2011a). Regardless of the precise nature of the representation of liaison consonants (i.e., the form of the liaison rule), the critical point is that our listeners’ processing of liaison was modulated by the experimental manipulations. Furthermore, the striking differences observed in response to these manipulations provide important insights into the speech segmentation problem, in particular how listeners may compensate for liaison during speech segmentation, which is the primary focus of the current study.

Future research should continue to investigate which representational analysis best accounts for data in liaison production and perception. While most speech segmentation studies do not address this question, the eye-tracking studies by Tremblay and Spinelli (2013, 2014a) provide some indirect evidence supporting the word-final representation of liaison consonants. Recall this study investigated the influence of distributional information for liaison vs. word-initial consonants (e.g., /z/ occurs more frequently in liaison than word-initially). The fact that Tremblay and Spinelli found that listeners were sensitive to these position-specific statistics supports an analysis that liaison consonants are at worst not underlyingly represented as word-initial consonants, and at best represented word-finally. A follow-up study could further test this claim by considering distributional differences for consonants in stable (i.e., non-liaison) word-final position (e.g., /z/ in seize ‘sixteen’ [sɛz]) compared to in liaison and word-initial position. A cursory query of the Lexique database (New et al., 2001) revealed striking differences between the proportion of words ending in /z/ and the proportion of words beginning in /z/ reported by Tremblay and Spinelli (0.2% vs. 2.4%, respectively), but a far smaller difference for words ending vs. beginning with /t/ (5.5% vs. 5%, respectively). Future research could use an eye-tracking paradigm to consider how the distributional statistics for consonants in all positions (word-finally, word-medially, word-initially, in liaison) influences perception to shed light on the representational status of liaison consonants, similar to the line of questioning pursued by Nguyen et al. (2007).

While L2-French listeners over-apply liaison in perception, there is evidence that they under-apply it in production, even in obligatory contexts (Chevrot et al., 2013). In a longitudinal study, Chevrot et al. found that the overwhelming majority of errors made by L2-French speakers (L1-Korean) were omission errors (i.e., speakers did not produce liaison when they should have). While these errors gradually decreased over the course of the 18-month study, the L2-French speakers still made a substantial proportion of these errors (22.4%) after almost 4 years of French study. However, as noted in the introduction, Mastromonaco (1999) found that L2-French speakers whose L1 was English did not under-apply liaison in obligatory contexts. This study did find, though, that L2-French speakers under-produced liaison in variable contexts (when liaison could apply but is not required), suggesting that L2-French speakers may avoid producing liaison altogether in these contexts (Mastromonaco, 1999; Thomas, 2004). When L2-French speakers do produce liaison, they tend not to resyllabify the liaison consonant (Mastromonaco, 1999; Thomas, 2004). Mastromonaco notes that the L2-specific pattern of errors found with her L2-French speakers suggests not that speakers fail to acquire the liaison rule, but that they have yet to constrain its application, similar to the current findings with L2-French listeners.

These results converge nicely with the existing literature on speech segmentation (as detailed above relating to each of our empirical results). Specifically, we found that French listeners utilize different types of speech segmentation information in similar ways as English listeners (e.g., Mattys et al., 2005), despite differences between these languages across a range of levels of linguistic structure. The results of the current study, as well as results from previous research on speech segmentation in liaison contexts, provide avenues for future research on speech segmentation more generally. For example, our first additional analysis considered the influence of distributional information (namely, liaison statistics for particular consonants) on segmentation behavior. While these results diverged from previous research examining this type of information (Tremblay, 2011a; Tremblay & Spinelli, 2013, 2014a), we did find some evidence that L2-French listeners show sensitivity to these statistics and utilize them even in a relatively offline speech segmentation task. Results from Tremblay and colleagues suggest that the use of liaison consonants statistics comes relatively early in the time-course of processing, namely, prior to the recognition of lexical units. Sensitivity to this cue by our L2-French listeners in an offline task is consistent with this possibility if we assume delays and disruptions in L2 speech processing. Future research on speech segmentation (in all languages) would benefit from use of the eye-tracking methodologies championed by Tremblay and Spinelli, which would allow researchers to ask questions about the stages of processing at which different types of word boundary information come into play. For example, the current results suggest lexical information plays an important role in speech segmentation in liaison contexts, but it is unclear if this influence comes relatively early or late in the time-course of processing. Furthermore, evidence suggests that lexical frequency biases processing early, prior to word recognition (Dahan et al., 2001), but it is unclear whether this information influences processing before or after segmental-level distributional statistics come into play. The interplay of these probabilistic factors during speech segmentation has yet to be explored. This type of research would provide a richer picture of how listeners use cues to word boundaries from different sources during online speech segmentation by considering not only which cues are useful, but also when in the time-course of processing they become useful.

In the current study, participants were presented with sequences of French words containing the strongest possible lexical cues to word boundaries. Specifically, each sequence had only one lexically-acceptable parse (e.g., lexical cues dictate that listeners should parse [pə.ti.ta.bʁi] as petit abri but not petit *tabri). With stimuli of this nature, we were able to determine that listeners rely heavily on lexical information to locate word boundaries when the signal provides unambiguous lexical-semantic cues. However, listeners may shift reliance to other segmentation cues when this lexical-semantic cue is neutralized in phonemically, and lexically, ambiguous sequences such as [pə.ti.ta.mi] (either petit ami ‘boyfriend’ or petit tamis ‘little sieve’). Follow-up studies could include ambiguous levels for other segmentation cues as well, such as duration of pivotal consonants. Such manipulations would reveal more about the dynamic nature of speech segmentation, and provide more ecological validity, by showing how listeners shift cue reliance depending on which available cues provide the most reliable information.

In conclusion, together with previous results (e.g., Spinelli et al., 2003) the current study suggests that misalignment of syllable and word boundaries minimally disrupts French speech segmentation. Despite the common assumption that liaison may pose additional challenges for the already difficult speech segmentation process, the current study shows that both L1-French and L2-French listeners can largely overcome this challenge by utilizing easily acquired, but less easily constrained, language-specific knowledge of this process. The complete acquisition and constraint of liaison by L2-French listeners is likely impeded by the high variability in L1 usage of liaison. Even in cases where prescriptive rules of French grammar dictate that liaison must occur, L1-French speakers do not always produce it (Durand & Lyche, 2008). Therefore, even if L2-French listeners formally learn these grammatical rules, the rules are inconsistently reinforced by spoken evidence from L1-French speakers. Thus, the real challenge presented to listeners by the French connected speech process of liaison is the problem of its absence rather than its presence. In a more general sense, the challenge of liaison for word segmentation is not the misalignment of syllable and word boundaries that it introduces, but rather the variable word onset phonotactics that it implies. L2-French listeners have no difficulty finding a vowel-initial word onset in a syllable with a liaison consonant onset, but their knowledge of liaison seems to have introduced an excessive bias against consonant-initial words (at least in this liaison-focused experimental setting). The present results suggest that extensive experience with the language is required for full mastery of the application and non-application of this aspect of the French sound system. The same is likely true for cases of sandhi in languages other than French (e.g., Tuinman et al., 2011). Speech segmentation in these contexts requires not only experience with the target language, but also experience with the process that misaligns word boundaries with reliable cues to those boundaries. L2 listeners receive relatively impoverished levels of this general and particular experience, which leads to demonstrable differences in speech segmentation behavior by L1 and L2 listeners in these contexts.