A special class of English words with tense vowel/diphthong nuclei and liquid codas receive variable syllable count judgments (one and over-one syllables). Tilsen and Cohn (
Speakers have consistent intuitions about the number of syllables in a word. Even children as young as 4–5 years old can agree on the number of syllables they count in syllable clapping tasks (English:
In a subsequent study, Tilsen and Cohn (
Our goal in the present study is twofold: to test Tilsen and Cohn’s interpretation by submitting it to a cross-language comparison, and to propose a gestural account of sesquisyllables cross-linguistically. More specifically, we propose that language-specific differences in speaker intuitions about syllable count judgments are related to the language-specific differences in the gestural organization of coda liquids. To test this hypothesis, we compare two languages with similar phonological word structure, but different gestural specifications of their coda liquids: American English and German. Both languages have CVC words contrasting in vowel length (tense versus lax), thus allowing a direct, one-parameter difference comparison: English has dark [ɫ] in coda position (
An earlier study limited to syllable count judgments by British English and German native speakers (
We begin by presenting the gestural specifications of coda liquids in the next section, Section 1.1. The hypotheses and predictions for English and for German are presented in Section 1.2. The remainder of the paper is structured as follows: Section 2 describes the experimental method, Sections 3 and 4 present the results of the English and the German experiments, respectively, and the last two sections focus on the model proposed to account for them (Section 5) and the discussion of its implications (Section 6).
Traditionally two varieties of laterals have been described across languages: clear and dark /l/. The two varieties mainly differ in their tongue body configurations and gestural coordination patterns. American English has both varieties, depending on syllable position: clear /l/ in onset and dark /l/ in coda position. German has clear /l/ in both syllable positions. This difference in coda position is crucial to the present study. Clear and dark /l/ have been extensively described for American English (
The two types of laterals can be compared acoustically. The main acoustic correlate of lateral darkness is F2, associated with the half-wavelength resonance of the back cavity. In dark [ɫ], F2 values are low, corresponding to either a lowered predorsum, a TD retraction, or both—all configurations resulting in the lengthening of the back cavity (
Smoothed F2 trajectories in similar English (light blue) and German (dark blue) words: feel [fiːɫ] versus viel [fiːl] ‘much’ (left); pool [puːɫ] versus Stuhl [ʃtuːl] ‘chair’ (center); tile [taɪɫ] versus Teil [taɪl] ‘part’ (right). Formant values extracted at 20 equally spaced steps across the vowel-lateral sequence in the acoustic data recorded for the present study.
The dark and clear laterals also differ in coarticulation degree. Proctor (
The liquids we are concerned with also include the rhotics. The American English rhotic is an approximant, and, like the lateral, is a complex segment composed of a double lingual gesture (a tongue tip/body and a tongue root gesture) and a labial gesture (
Finally, and relevant for our hypothesis, the tongue dorsum gestures involved in the production of the English liquids are considered to be vocalic, similar to vowel gestures (
To summarize, the articulation of coda liquid consonants differs between English and German. English coda laterals and rhotics are produced with two sequential gestures, the vocalic tongue dorsum gesture preceding the consonantal tongue tip gesture. The German coda lateral is possibly also produced with a double gesture, but the tongue dorsum gesture is less actively involved, and does not precede the tongue tip gesture. The hypotheses and predictions presented next crucially take into account the gestural composition of the liquids.
Assuming gestural representations, we claim that the
Our main theoretical hypothesis is that speakers’ intuitions about syllable counts are linked to the gestural composition of coda liquids (in languages with vowel length distinction). Thus, we postulate that sesquisyllables involving liquid consonants are only present in English and not in German. Furthermore, we postulate that, in a syllable count task, even when participants attribute just one syllable to such words, reaction times should reflect the presence of sesquisyllables in English but not in German. For tokens involving liquid codas, English speakers should take longer to decide, resulting in longer reaction times than for non-liquid codas. Finally, we hypothesize that duration is not the only factor influencing speakers’ syllable count judgments (henceforth ‘SCJ’). Our theoretical and experimental hypotheses are summarized in
Main SCJ and duration hypotheses.
Speakers’ intuitions about syllable counts are linked to coda liquid gestural composition | |
Experimental hypothesis | Alternative hypothesis |
Rimes with coda laterals (and rhotics) count as sesquisyllables in English | Rimes with coda laterals count as sesquisyllables in both English and German |
Experimental hypothesis | Alternative hypothesis |
Reaction times are longer for rimes involving liquid codas in English | Reaction times do not differ based on rime composition in either English or German |
Experimental hypothesis | Alternative hypothesis |
Duration does not correlate with SCJ | Duration correlates with SCJ |
Based on our hypotheses, we make the following predictions about syllable count judgments, rime duration, and their correlation.
First, the predictions pertaining to SCJs are as follows. For English, we expect over-one SCJs to be attributed predominantly for liquid codas. Furthermore, based on Tilsen and Cohn’s (
Second, rime duration predictions can be made in relation to our SCJ prediction. If our hypothesis is correct, and sesquisyllables can be predicted by the gestural composition of coda consonants, we do not expect duration to be correlated with SCJs over all rime classes. Rather, in English, we expect rime duration to correlate with SCJs only within the subset of words with coda liquids: Words that receive over-one syllable count judgments among this type of words should exhibit longer rimes than words judged to be monosyllabic.
The cross-language comparison will be presented as two independent experiments. Experiment 1 is an extension of Tilsen and Cohn’s (
Twenty American-English native speakers, undergraduate students at the University of Chicago, and 16 German native speakers, students at the University of Potsdam, participated in the study. The American participants were recruited as volunteers in Chicago; the German participants received either payment or course credit in Potsdam. None of the participants reported any history of hearing or language impairment. All participants gave written informed consent for their participation in the experiment and for the subsequent use of their data for scientific purposes.
The same experiment design was used in both locations, in the US and in Germany, respectively. Following the experimental protocol of Tilsen and Cohn (
For the recordings, participants were asked to read target, control, and filler words embedded in carrier phrases: (English) “I say … now”/(German) “Ich sage …. drei mal.” Stimuli, described in the next section, were identical across tasks.
The English stimuli were based on Tilsen and Cohn (
English target (T) and control (C) stimuli per nucleus and coda type.
iː | bee |
feed |
keen |
feel |
beer |
uː | Pooh |
spook | zoom | pool | |
aɪ | tie | vile |
fire |
||
eɪ | may |
page | claim |
male |
|
Total | 8 | 4 | 8 | 8 | 8 |
As far as the lexicon allowed it, the orthography was controlled for: Most stimuli with a tense vowel + C are spelled CVVC, most stimuli with a diphthong + C are spelled CVCV, and all those with nasal codas are spelled CVVC. Only four stimuli—
The German stimuli were similar in structure to the ones presented in Experiment 1. Nuclei were more varied, including all six long monophthongs of German (/aː, eː, iː, oː, uː, yː/) and two diphthongs /aɪ/ and /aʊ/. All diphthong and most tense vowel tokens are spelled CVVC; a few tokens are spelled CVhC. As for English, complex onsets were avoided as much as possible, because vowel compression effects have been found for German complex onsets, as well (
German target and control stimuli per nucleus and coda type.
aː | sah [zaː] |
Staat [ʃtaːt] |
Wahn |
Saal |
Haar |
aʊ | Sau |
Baum |
faul |
||
Raum |
|||||
aɪ | sei [zaɪ] |
Teig |
Heim |
Teil |
|
eː | Fee [feː] |
Fehn |
fehl |
sehr |
|
iː | die [diː] |
Lied |
Wien |
viel |
vier |
oː | wohl |
Chor |
|||
uː | Stuhl |
fuhr |
|||
yː | blüht |
kühl |
|||
Total | 6 | 6 | 9 | 15 | 8 |
For both English and German, fillers consisted of unambiguously monosyllabic and disyllabic words. The full list of fillers is given in Appendix A.
Two repetitions of each phrase were elicited in two randomized blocks. Participants were asked to read the sentences clearly, at their normal speech rate, without over-emphasizing the variable word in the carrier phrase. The productions were monitored by the experimenter, and participants were asked to repeat any sentences which didn’t follow the instructions or seemed problematic. Recordings took place in the soundproof booths of the Linguistic Departments of the Universities of Chicago and Potsdam, respectively. The same Zoom H4NPRO recorder was used for all recordings.
For the SCJ task, participants were asked to decide how many syllables there are in a word by choosing one of three answer options: 1, 1.5, or 2 syllables. Each word was presented in written form on a computer screen, and SCJ responses were recorded through a C# application, created specifically for this task to manage the pseudo-randomization, the timing of stimuli presentation, and the recording of the response time per answer. The same application was used to record speaker information such as age, gender, and language background.
While the instructions for the SCJ task were the same as in Tilsen and Cohn (
Each word appeared on the screen for 1.5 seconds, enough time for participants to read the word silently and to subvocalize it. Participants were specifically asked to subvocalize each word and think about how they would produce it before giving their answer. This step in the task was added so that speakers would rely less on the orthographic representations of the words, and more on their own proprioceptive articulatory and auditory feedback. The word disappeared after 1.5 seconds, and three buttons, one for each answer option, appeared on a horizontal line. At the start of each trial, the mouse cursor appeared in the middle of the screen, below the 1.5 answer option, at equal distance from the 1 and 2 answer options, as shown in
Schematic representation of the experimental progress of one trial of the SCJ task.
In order to justify the option of 1.5 syllables, our written instructions followed Tilsen and Cohn (
Participants who gave more than 15% non-standard SCJs for unambiguous disyllabic words were excluded. This includes, for example, participants who chose options 1 or 1.5 syllables for a word such as
For the production study, the response variable considered is acoustic rime duration, corresponding to vowel duration in open syllables, and to vowel-consonant duration in closed syllables. All acoustic files were hand segmented in Praat (
Labeled waveforms and spectrograms for the words
In the case of postvocalic nasals, to separate a word-final [n] from the onset [n] of the following word ‘now,’ we relied either on changes in waveform and/or spectrogram, or on the short pause between the two nasals if it was present.
Labeled waveforms and spectrograms for the words
Two different metrics for rime duration were considered. The first corresponds to the relativized rime duration measure used in Tilsen and Cohn (
Rime duration data was analyzed using linear mixed effects models (
The correlation between syllable count judgments and duration was analyzed using cumulative link mixed effects models (
For the reaction time data, a whole distribution analysis of reaction times was preferred (
For all models, the significance of the main effects was tested using chi-square likelihood ratio tests. Model diagnostics plots for the final models were analyzed to test for deviations from homoscedasticity or normality. The
We first present the SCJ measure and response times for SCJs per coda type in Section 3.1. Rime duration measures are presented separately, in Section 3.2. For both SCJ and duration measures, an individual rime analysis will be presented. The last section (Section 3.3) presents the correlation between SCJs and duration.
As predicted, ambiguous over-one SCJs were attributed exclusively to liquid coda rimes. Open syllables as well as closed syllables with post-vocalic nasals or stops received only monosyllabic judgments. Furthermore, and consistent with Tilsen and Cohn’s (
Three of the 18 participants consistently attributed over-one SCJs to all rimes with liquid codas (
Percentages of attributed SCJs per coda (open, stop, nasal, lateral, rhotic) and nucleus type (diphthong, tense vowel).
Percentages of attributed SCJs per rime identity and nucleus type (/aɪ, eɪ, i, u/). CVC tokens only.
Number of speakers (out of 18) having chosen 1.5 and 2 SCJs per target token, as a function of nucleus type and rime identity.
2 | 0 | 2 | ||||
heal | 2 | 0 | 2 | |||
wheel | 1 | 0 | 1 | |||
beer | 3 | 0 | 3 | |||
fear | 3 | 0 | 3 | |||
gear | 2 | 0 | 2 | |||
pier | 4 | 0 | 4 | |||
pool | 2 | 0 | 2 | |||
vile | 3 | 4 | 7 | |||
tile | 5 | 1 | 6 | |||
fire | 4 | 4 | 8 | |||
tire | 4 | 4 | 8 | |||
wire | 3 | 3 | 6 | |||
liar | 1 | 11 | 13 | |||
male | 2 | 0 | 2 | |||
pale | 2 | 0 | 2 | |||
bee |
0 |
0 |
0 |
|||
Pooh |
0 |
0 |
0 |
|||
tie | 0 | 0 | 0 | |||
may |
0 |
0 |
0 |
Response times (RT) ranged from 0.14 to 12.04 seconds, with a mean response time of 1.2 seconds (all coda types included).
Mean, median, and standard deviation for reaction times distributions per coda type.
Mean | 0.96 | 1.04 | 1.11 | 1.42 | 1.43 |
Median | 0.76 | 0.83 | 0.78 | 1.00 | 0.96 |
standard deviation | 0.91 | 0.59 | 1.41 | 1.35 | 1.48 |
Raw distributions of response times per coda type—outliers included. Vertical solid lines represent the means for each distribution. Vertical dashed lines represent median values for each distribution.
In addition to the central tendency approach, we ran a whole RT distribution analysis (
Glmm results: Significance levels for RT (log values) as a function of coda type.
lateral | Estimate | –0.32 | –0.27 | –0.21 |
–4.80 | –2.64 | –3.09 | ||
<0.001 | <0.01 | <0.01 | ||
rhotic | Estimate | –0.39 | –0.34 | –0.28 |
–6.08 | –3.40 | –4.21 | ||
<0.001 | <0.001 | <0.001 |
The results show no difference in reaction times between laterals and rhotics (Estimate ~ 0.71,
In the next section we present rime duration results.
As explained in the Methods section, two measures were considered for the rime duration: The normalized rime duration indicates to what degree the coda consonant contributes to the rime. Higher values indicate higher contributions of the coda consonant. The averaged rime duration, measured in ms as the average over repetitions, gives a directly interpretable metric of rime duration. Recall our prediction: If acoustic rime duration is the determining factor in participants’ SCJs, we would expect (1) that targets containing liquid codas would have the longest rime duration overall, and (2) that rimes containing rhotics would be longer than those containing laterals.
Linear mixed models with rime_duration as a response variable and rime_identity and lexical frequency as predictors were considered with each lateral coda token (/aɪl/, /eɪl/, /iːl/, uːl/) as reference levels. Participant was used as random factor for averaged rime duration, and Participant and Repetition were used for the normalized rime duration model. Both models had random intercepts. A likelihood ratio test of the model with rime identity prediction against the null model revealed a significant effect of rime identity (average duration model: 𝜒2(15) = 326.68,
Normalized (upper panel) and averaged (lower panel) rime duration per rime identity, coda type (stop, nasal, lateral, rhotic), and nucleus type (/aɪ, eɪ, iː, uː/). Closed syllable tokens only.
A full list of pair-wise comparisons of rime durations per rime identity are shown in Appendix B. Below, we present crucial results that suggest duration is not the only contributing factor for SCJs:
Words with liquid rimes do not exhibit the longest rime duration. The /eɪdʒ/ rimes are significantly longer than all liquid rimes (/aɪl/: Est ~ –20 ms,
Next, diphthong rimes /aɪl/ and /aɪr/ are significantly longer than all remaining rimes (with averaged rime duration differences ranging from 23 to 56 ms (<0.001 significance levels). For either of the two measures the /aɪl/ and /aɪr/ rimes do not significantly differ from each other: average duration –
Furthermore, /id/ rimes are longer than /il/ rimes (average duration: ~26 ms longer,
Finally, for stimuli involving the tense back vowel /u/, rimes with a nasal coda are the longest (e.g., average duration: /un/ is 33 ms longer than /ul/ rimes,
Results show that liquid rimes are not consistently longer than rimes involving stops or nasals. In addition, rhotic rimes are not longer than lateral rimes. None of these duration patterns could predict the syllable counts attributed. Recall that the patterns observed for the SCJ task showed over-one SCJs being attributed exclusively to words with liquid codas. Furthermore, sesquisyllables involving rhotics receive more over-one SCJs than those involving laterals. The two results taken together thus indicate that rime duration alone is not a good predictor of participants’ decisions about syllable counts. In the next section we further probe the role of duration by testing the correlation between SCJ responses and rime duration.
Normalized and averaged rime duration per SCJ answer option (1, 1.5, 2 syllables) and coda type (stop, nasal, lateral, rhotic).
Duration is not a good predictor of SCJs overall, when including the controls. However, within the subset of sesquisyllables, the analysis correlating duration and SCJs shows that averaged rime duration is a good predictor for over-one SCJs. Over-one SCJs are associated with longer rimes: The longer the liquid rime, the higher SCJs are likely to be (Estimate 27.16;
In summary, we found that a third of our participants awarded ambiguous, over-one SCJs exclusively to monosyllabic words involving liquid codas, thus confirming the presence of sesquisyllables in American English. Duration was found to be a good predictor for over-one SCJs within sesquisyllables only: Higher SCJs were associated with longer rime durations for liquid rimes, confirming Tilsen & Cohn’s (
The next section will describe the production-SCJ experiment for German, designed to further test this hypothesis. Recall that in German, coda laterals do not have sequentially timed gestures. We therefore predict no differences in SCJs between lateral and non-lateral rimes. In other words, a class of sesquisyllables is not expected to emerge in German.
Results will be presented following the same structure as those in Experiment 1. For both SCJs and rime duration, a detailed rime identity analysis will be presented followed by correlation results between the two measures.
Several significant differences can be observed between German and English native speakers’ intuitions about syllable counts. In accordance with our predictions, German native speakers did not attribute over-one SCJs predominantly to lateral coda rimes. Over-one SCJs were attributed instead to all types of coda consonants, as well as to open syllables. The absence of an emerging pattern confirms our prediction. Also, contrary to English, two-syllable judgements were attributed only to the unambiguously disyllabic fillers.
Percentages of attributed SCJs per coda (open, stop, nasal, lateral, rhotic) and nucleus type (diphthong, tense vowel).
When we separate diphthong and tense vowel rimes, we see that diphthong-lateral rimes received the most over-one SCJs, as did tense vowel-stop and tense vowel-nasal rimes.
Percentages of attributed SCJs per rime identity and nucleus type. Only CVC tokens included.
Furthermore, all native German participants attributed over-one SCJs, but none are consistent in their choices.
Number of speakers (out of 16) having chosen 1.5 and 2 SCJs per target token, as a function of nucleus type and rime identity.
/aː/ | /aːn/ | Wahn | 2 | 0 | 2 |
/aːl/ | Saal | 2 | 0 | 2 | |
/aːr/ | Haar | 3 | 0 | 3 | |
/aːt/ | Staat | 4 | 0 | 3 | |
/eː/ | /eːn/ | Fehn | 6 | 0 | 6 |
/eːl/ | fehl |
0 |
0 |
0 |
|
/eːr/ | sehr | 1 | 0 | 1 | |
/iː/ | /iːd/ | Lied | 2 | 0 | 2 |
/iːg/ | stieg | 2 | 0 | 2 | |
/iːn/ | Wien | 2 | 0 | 2 | |
/iːl/ | viel |
1 |
0 |
1 |
|
/iːr/ | vier | 0 | 0 | 0 | |
Tier | 1 | 0 | 1 | ||
Stier | 2 | 0 | 2 | ||
Bier | 2 | 0 | 2 | ||
/oː/ | /oːl/ | wohl | 2 | 0 | 2 |
/oːr/ | Chor | 2 | 0 | 2 | |
/uː/ | /uːl/ | Stuhl | 4 | 0 | 4 |
/uːr/ | fuhr | 1 | 0 | 1 | |
/y/ | /yːt/ |
blüht |
4 |
0 |
4 |
/aɪ/ | /aɪt/ | seit | 1 | 0 | 1 |
/aɪg/ | Teig | 1 | 0 | 1 | |
/aɪm/ | Heim | 2 | 0 | 2 | |
/aɪn/ | sein |
1 |
0 |
1 |
|
/aɪl/ | Teil |
4 |
0 |
4 |
|
/aʊ/ | /aʊm/ | Baum |
1 |
0 |
1 |
/aʊn/ | Faun | 1 | 0 | 1 | |
/aʊl/ | faul |
3 |
0 |
3 |
The recorded response times varied between 0.3 seconds and 7.88 seconds, with an average response time of 1.02 seconds.
Mean, median, and standard deviation for reaction times distributions per coda type.
mean | 0.77 | 0.96 | 1.09 | 1.02 | 1.07 |
median | 0.6 | 0.76 | 0.82 | 0.77 | 0.83 |
standard deviation | 0.39 | 0.64 | 1.10 | 0.82 | 0.78 |
Raw distributions of response times per coda type. Vertical solid lines indicate mean values; vertical dashed lines indicate median values.
Glmm results: significance levels for RT (log values) as a function of coda type.
lateral | Estimate | –0.20 | –0.09 | 0.06 | 0.067 |
–2.58 | –0.14 | 1.01 | 0.97 | ||
<0.001 | 0.88 | 0.30 | 0.33 |
Contrary to the English data, there are no patterns emerging in the German participants’ SCJs. Three possibilities could explain the lack of consistency: (1) speakers attributed over-one SCJs independently of either coda type or rime duration, only because the option was available to them, (2) duration might play a bigger role in German than in English, which we will investigate further, or (3) frequency effects might be at play. The reaction time analysis ruled out an effect of frequency, leaving two options: Either participants attributed SCJs randomly, independently of coda type or rime duration, or rime duration played a more significant role in German than in English. If the latter is true, we would expect diphthong – lateral rimes to be longer compared to other diphthong rimes, and nasal and stop codas to contribute the most to tense vowel rimes.
Duration results will be presented as in Experiment 1. Differences over different classes of rimes in normalized and averaged rime duration, based on rime identity analysis, will be presented. Likelihood ratio tests showed an effect of rime_identity compared to the null model (averaged duration: 𝜒2(22) = 350.42,
Normalized rime duration per rime identity and nucleus type (aː, aɪ, aʊ, eː, iː, oː, uː) and coda type (open, stop, nasal, lateral, rhotic).
Average rime duration per rime identity and nucleus type (aː, aɪ, aʊ, eː, iː, oː, uː) and coda type (open, stop, nasal, lateral, rhotic).
The results of the full pair-wise comparison (based on
In summary, in German, duration does not play a role in participants’ SCJs in the case of diphthong rimes, and only partially correlates with SCJs in the case of tense vowel rimes. While in some cases (e.g., /eːn/), the highest count of over-one SCJs corresponds to the longest rimes, in others it does not. Rimes in /aːr/, for example, receive more over-one SCJ than /aːl/, but rime duration is significantly shorter. There appears to be no straightforward relation between SCJs and rime duration in German, and it will be confirmed by the analysis in the next section.
Average rime duration per SCJ answer option (1 and 1.5 syllables) and coda type (stop, nasal, lateral, rhotic).
In summary, we found that there is no straightforward relation between SCJs and rime duration in German. Contrary to English, German native speakers attributed the 1.5 syllable count apparently randomly, independently of coda type, duration, or lexical frequency, to words involving all coda types, and they never chose the 2-syllable option. Duration was found to be only partially correlated with SCJs. No duration difference was found between tokens that received 1 versus 1.5 syllable counts. The lack of correlation between SCJs and duration, as well as the absence of a specific pattern, confirm the absence of sesquisyllables in German, supporting our hypothesis, and justifying the search for an alternative, gestural account of sesquisyllables.
Before discussing the implications of the results for phonological representation, we will summarize the main findings of our experiments. Results from experiments 1 and 2 are threefold, and support our hypothesis and predictions presented in Section 1.2.
First, sesquisyllables are confirmed as a special class of words in English, but not in German. In English, over-one SCJs are attributed exclusively to rimes involving diphthongs or tense vowels followed by a liquid consonant, thus differentiating liquid from non-liquid codas. In German, participants attributed over-one SCJs randomly across both targets and controls, independently of coda type. English and German native speakers also differ in the consistency of their attribution of ambiguous over-one SCJs. In English, three participants attributed over-one SCJs consistently to targets involving most liquid coda tokens. Three more participants extended over-one SCJs to diphthongs followed by a liquid coda, and the remaining 12 participants all attributed over-one SCJs at least once, always to words containing a diphthong plus a liquid coda. In German, participants were inconsistent—all 16 participants attributed over-one SCJs at one point during the task, but to a random variety of rimes.
Response times also seem to support a differentiation between liquid and non-liquid codas in English, but not in German. Native English speakers took a significantly longer time to attribute SCJs to stimuli involving liquid than non-liquid codas. Therefore, even when participants ended up attributing just one SCJ to stimuli with liquid codas, they took a longer time to choose their answer, and this could be interpreted as an indication of sesquisyllables. In German, no difference based on coda type was found for closed syllable stimuli, indicating that monosyllabic words with liquid codas are not treated differently from their non-liquid counterparts.
Second, acoustic rime duration across all rime types does not correlate with participants’ intuitions about syllable counts, in either English or German. In English, rimes with liquid codas are not longer than those with non-liquid codas, but they are the only ones that receive over-one SCJs. In German, duration only partially accounts for the attributed SCJs – the longest rimes do not always receive the highest syllable counts. This result is also confirmed by the model correlating SCJs and duration.
Third, in English only, rime duration within the class of sesquisyllables does correlate with participants’ SCJs. For words with liquid codas only, non-monosyllabic judgments correspond to longer rime durations. Furthermore, within the subset of sesquisyllables, over-one syllable counts were more likely to be attributed to tokens involving rhotics and to words that were less frequent. In German, where no effect of rime duration, coda type, or lexical frequency was found for either duration metric, both monosyllabic and over-one syllable counts were attributed to rimes of similar duration, independent of coda type.
In summary, results are consistent with the presence of sesquisyllables in English, and with their absence in German. Acoustic duration being a by-product of articulatory synergies within the rime, in the next section we discuss articulatory accounts of sesquisyllables, and we develop our proposal.
The main hypothesis of the present study is that syllable count judgments are linked to the phonological representation of articulatory timing patterns. We focus on the representation of gestures as abstract subphonemic units, and on their temporal and spatial coordination, following Articulatory Phonology (
We hypothesized that the presence of a special category of sesquisyllabic words in English is related to the inherent gestural complexity of liquid coda consonants in this language. More specifically, we argue that the temporal coordination and the quality of the dual gestures involved in the production of laterals and rhotics in English, give rise to ambiguous syllable counts among native speakers. By comparing English to German, a language with similar word structure and vowel length contrasts, but different articulation for the lateral coda, we isolated the gestural parameters that allow us to test our hypothesis.
The temporal coordination of coda liquids in English has been amply studied (
Hence our proposed model entails that in sesquisyllables, the vocalic TD gesture of the liquid coda follows two vocalic gestures, which are those of a diphthong or of a tense vowel nucleus. The resulting structure thus contains three sequentially coordinated vocalic gestures. We argue that this particular structure (i.e., three sequentially coordinated vocalic gestures) receives ambiguous over-one SCJs. Beyond this particular gestural configuration, in words involving (1) lax vowels followed by any coda consonant, or (2) tense vowels/diphthongs followed by a non-liquid consonant, the third sequential vocalic gesture does not exist, and only classic monosyllabic judgments are obtained.
Recall that in German, the lateral coda consonant does not involve a TD retraction preceding the TT gesture. While conclusive articulatory evidence is still missing on whether the German lateral has a target specification for its TD component or not, we assume for now that the German /l/ does not have a TD retraction preceding the TT gesture (
In the rest of this section, we first introduce existing models of liquid consonants in American English rimes, closely related to the model we propose. We then develop our model within the Articulatory Phonology framework. In Section 6 we discuss the implications of our model and we suggest future research avenues that can further test it.
The articulatory specificities of American English coda liquids have been addressed within the framework of Articulatory Phonology (
Tilsen and Cohn (
Walker and Proctor (
The purpose of the present study was to test sesquisyllabicity cross-linguistically and propose a model within the Articulatory Phonology framework. The model we present retains from the previous two models the relevance of gestural overlap, and considers an additional factor—the vocalic quality of the gestures involved. We propose a purely gestural model, based on the degree of overlap of the vocalic subphonemic units present in the rime, allowing for gradient representations that can account for speakers’ gradient intuitions.
The model we are proposing introduces a novel factor—the quality of the gestures involved (vocalic versus consonantal). This additional factor allows us to account for the cross-linguistic distribution of sesquisyllables, as well as for the generalizations captured by moraic representations.
Our analysis assumes an interpretation of gestural quality and overlap as they are defined within the Task Dynamics Model of Saltzman and Munhall (
The count of vocalic gestures in the rime is directly linked to the gestural composition of the nucleus and the coda consonant, to the degree of overlap (controlled by bonding strength, activation windows of gestures, and/or phasing relations) between the vocalic gestures in the rime, and to timing relations between the gestural units of the coda consonant. As described in Section 1.1, coda liquids involve two gestures, one of which is vocalic: a tongue dorsum (TD) retraction gesture in the uvular region for the coda lateral, and a pharyngeal constriction of the TD for the coda rhotic. The second, consonantal gesture, is an alveolar constriction for the lateral, and a palatal constriction for the rhotic.
Diphthong nuclei are composed of two distinct vocalic gestures, with different constriction locations (
To summarize, our model makes the following assumptions:
Diphthongs are composed of two vocalic gestures of different constriction location.
Diphthongized tense vowels are composed of two vocalic gestures of similar constriction location.
Gestures with similar constriction locations have a higher degree of overlap.
TD gestures (the gestures of diphthongs and tense vowels, and the TD gestures of the liquids) are vocalic in nature.
The TD gesture of coda liquids (rhotics
Given these assumptions we can formulate the premise of the model:
Model premise: SCJs of more than one syllable are attributed to any rime which has more than two sequentially coordinated vocalic gestures.
The leftmost panel illustrates the case of a lax vowel rime (e.g.
The middle panel illustrates the case of tense vowel – liquid rime (e.g.,
Finally, for the case of diphthong – liquid rimes (e.g.,
So far, we considered only the case of coda liquids for which the vocalic TD gesture occurs before the consonantal tongue tip gesture. In German, the lateral coda differs in the timing and the constriction location of the vocalic gesture: The TD gesture, if at all present, is lowered, not retracted, and is synchronous to the tongue tip gesture. Because of the synchronicity of the two gestures (or the lack of TDL gesture), we assume that the TDL gesture is never counted as a vocalic gesture in SCJs, and a tri-vocalic sequence cannot occur. This is illustrated in
Given the hypothesized gestural configuration of German clear /l/ and our model premise—that over-one SCJs are attributed to any rime which has more than two sequentially coordinated vocalic gestures—sesquisyllables are not predicted to occur in German.
The model, as described above, can predict the presence of sesquisyllables in English and their absence in German. The next section discusses the implications of the model for the observed SCJ patterns studied here, and for other languages, as well as the limitations of our present study. We will end with a general conclusion.
The proposed model accounts for the presence of sesquisyllables in English and their absence in German by relying on the quality (vocalic versus consonantal), the timing, and the degree of gestural overlap of subphonemic units composing the syllable rime. The interaction between the degree of overlap and the quality of the gestures involved accounts for speaker-specific and word-specific variation.
Two main patterns were observed for participants’ SCJs in English: (1) sesquisyllables containing diphthongs received more over-one SCJs than those containing tense vowels, and (2) sesquisyllables with rhotic codas received more over-one SCJs than those with lateral codas. The first pattern is accounted for by the assumption that similar sequential vowel gestures, as in tense vowels, have a higher degree of overlap. While diphthongs are composed of two different vowel gestures, exhibiting less overlap, tense vowels involve two similar vocalic gestures which can vary in overlap. They can have more overlap than diphthongs, or as little overlap as the different gestures of diphthongs. The model predicts that words with more overlapped vowel gestures are unlikely to receive over-one SCJs. Tense vowels with two less overlapped vocalic gestures are more likely to receive over-one SCJs, provided that the degree of overlap between the two gestures is sufficiently reduced, similar to the one in diphthongs.
The second SCJ pattern observed, opposing rhotic to lateral codas in sesquisyllables can also be accounted for by different degrees of overlap, more specifically by the difference in patterns of coarticulation between laterals and rhotics. In an extensive real-time MRI study, Proctor et al. (
Interestingly, the same patterns were confirmed for speakers of non-rhotic British English (
In the next section we discuss the limitations of our study and of our proposed model, which will lead to a more general discussion about phonological, and in particular, gestural representation.
We acknowledge several limitations to our study and model. A first methodological limitation concerns the reliability of syllable count judgements, and their relationship with phonological representation. It is known that factors such as language, spelling, lexical frequency, or the choice of syllabification task can all contribute to speakers’ metalinguistic judgments (
A second limitation of our study is that it is solely based on acoustic data, while the predictions are rooted in articulatory specifications. To fully test the proposed model, we need to correlate English and German native speakers’ actual articulation of coda liquids to their corresponding SCJs. Articulatory data will allow more detailed preliminary observations. As far as we know, the timing of the TD and TT gestures of the German coda lateral has not yet been the main focus of an articulatory investigation. Therefore, an articulatory study looking specifically at the intra-segmental timing and inter-speaker variation of this timing is needed to make better predictions. Furthermore, research on the gestural composition of vowels is scarce. In making assumptions for our model we rely exclusively on the representation of American-English tense vowels proposed in TaDA. But without empirical verification, these representations are very limiting.
Kinematic data can also shed light on our finding in Experiment 1, that even though duration does not correlate with SCJ patterns across all rime types, within the subset of sesquisyllables, over-one SCJs are attributed to longer rimes involving liquids. Yuan and Lieberman (
A theoretical limitation of our model stems from its restriction to sesquisyllables. Ideally, the model should account for other phonotactic patterns. For example, in its current state, it cannot predict the illegality of monosyllabic words with lax vowels and no coda (i.e., */pɪ/, */tʊ/), or of words with tense vowel/diphthong followed by a coda cluster consisting of a liquid and a non-coronal consonant (*/miːɫk/, */faɪrg/). Moraic theory handles this problem by imposing bounds on lower and upper moraic counts for words, but requires adjustments pertaining to mora attribution to coda consonants like coerced weight or the extra-syllabicity of coronal consonants. These questions need to be further explored, as they address the very important question of the nature of phonological representations, particularly with respect to the role of the mora.
In the same context, as a further challenge, it is worth taking the theoretical question and the empirical investigation back to the Southeast Asian languages for which the term ‘sesquisyllable’ was originally proposed (
Beyond the empirical observation of the syllable count, a very important issue is that of the variability of judgments. In light of our hypothesis, explaining inter-speaker variability in syllable count judgments by variability in representation is the biggest challenge. This is where the English sesquisyllables crucially differ from the Southeast Asian sesquisyllabic word shapes. The latter are ‘canonical’ (
According to our model, sesquisyllables in English are a special structure that emerged from the combination of specific types of vocalic nuclei and specific types of dual-gesture codas. Both conditions are necessary for the sequential coordination that we propose to affect syllable count judgments. Do other languages meet these conditions? Such languages would have a vowel length contrast or a tense/lax contrast like English, and a sequential coordination of gestures in coda liquids. A possible language to consider is Dutch. Dutch dialects that are particularly interesting are those that have a vowel length contrast and dark /l/ in coda position, but do not have a syllable position allophony for laterals. If the lateral is dark in both positions, should we expect to find variable syllable count judgments and sesquisyllabic responses among native speakers? Not necessarily. We believe that in addition to the sequential coordination of more than two vocalic gestures, the asymmetry of positional allophony may be necessary to destabilize categorical judgments. If the coupled oscillator hypothesis is correct, namely that what defines the syllable is sequential out-of-phase coordination on the right, coexisting with in-phase coordination on the left, then syllable count judgments may shift if the asymmetry is enhanced. Thus, a clear lateral in the onset and a dark one in the coda, accumulating sequentially coordinated vocalic gestures on the right, exaggerates the asymmetry, and may prompt a shift in SCJs. But a dark /l/ in both onset and coda positions is a balanced structure, which would presumably be harder to destabilize enough to lead to a shift in intuitions. Testing for variable sesquisyllabic judgments in relevant dialects of Dutch would thus be an informative test for our gestural account, regardless of whether their presence can be confirmed or not.
The goal of our study was to propose a gestural account of sesquisyllables within the Articulatory Phonology framework and test it in a cross-language comparison. Correlations between acoustic rime duration and syllable count judgments in English and German, two languages with vowel length contrast but different gestural specifications for coda liquids, confirmed that sesquisyllables exist in English but not in German. We hypothesized that this discrepancy is primarily related to the presence of an earlier occurring vocalic (TD) gesture inherent to English coda liquids, and its synchronicity in German. Results of the acoustic analysis of rime duration and its correlation with syllable count judgments in the two languages support our hypothesis. Acoustic rime duration was found to partially correlate with syllable count judgments: Higher syllable count judgments were attributed to longer rimes only within the subclass of sesquisyllables, suggesting duration as a result of intra- and inter-segmental coordination patterns is correlated with syllable count judgments. A model accounting for the observed syllable count judgments, based on gestural representations, was proposed. While articulatory data is needed to reliably test the proposed model, studies matching acoustic data with SCJs are crucial in providing precise hypotheses and predictions. Articulatory data, analyzing the exact relationship between gestural parameters, intergestural timing, and SCJs will be a step forward to finding a unifying account for different phonological phenomena.
The additional files for this article can be found as follows:
Filler items for Experiment 1 and 2. DOI:
Results of the linear mixed model – Experiment 1 (English). DOI:
Results of the linear mixed model – averaged rime duration (Experiment 2 – German). DOI:
Results of the linear mixed model – normalized rime duration (Experiment 2 – German). DOI:
Research supported by the DAAD (DAAD P.R.I.M.E.), the Labex EFL (ANR-10-LABX-0083-LabEx EFL), the Région Île-de-France, and the Bureau des Relations Internationales, Université Paris Diderot. We thank Adamantios Gafos, Louis Goldstein, Doris Mücke, Sam Tilsen, Abby Cohn, and audiences at the University of Potsdam, the Institute of Phonetics, LMU, Munich, and the Chicago Linguistics Society (CLS 54) for insightful comments and feedback. We also thank the anonymous reviewers, whose comments have greatly improved our paper, the student participants, and the Linguistics Departments at the University of Potsdam and University of Chicago, particularly Alan Yu, for generously extending the use of their recording facilities.
The authors have no competing interests to declare.