This study tests whether native speakers of American English exhibit a glide-vowel distinction ([j]-[i]) in a speech elicitation experiment. When reading sentences out loud, participants’ pronunciations of 4 near-minimal pairs of pre-existing lexical items (e.g.,
Pre-existing lexical items suggest that a glide-vowel distinction exists in near-minimally paired environments in American English:
(1)
[CVV]:
[CGV]:
[ɛstóniə],
[nʊmónjə],
[mɪlɛ́niə],
[kɛ́njə],
[duɛ́t]
[dwɛl]
However, the precise nature and representation of this distinction has not yet been established. There is also a lack of phonetic documentation, which could help in deciding between the representations that have been proposed so far. While the examples in (1) suggest that a glide-vowel distinction may be apparent in both the [j]-[i] and [w]-[u] paradigms, this study focuses on the [j]-[i] distinction. Using a speech elicitation experiment, this study tests for the [j]-[i] glide-vowel distinction in American English. It also collects phonetic data along a variety of characteristics in an effort to determine the proper representation.
This study tests if the [j]-[i] distinction can be elicited in pre-existing lexical item pairs like those above. It also tests if this distinction can be productively extended to newly encountered words and elicited via a <y> vs. <i> orthographic distinction. Acoustic analysis is used to capture the most consistent characteristics of the distinction. This provides documentation of the distinction, as well as guidance for how future research may best examine it. Furthermore, three competing broad classes of phonological representations that have been previously put forth are considered regarding the current analysis. This study therefore not only tests whether such a distinction is available to American English speakers; it also compares these representations, considers acoustic predictions they generate, and applies these predictions to the data at hand. This may help speak between these competing representations, by either identifying one optimal approach or at least ruling one out.
Competing accounts debate whether glide-vowel distinctions are phonologically possible and attested. One argument (e.g.,
One kind of representation account proposes that glide-vowel distinctions are attributable to a distinction in the segment’s primary articulator: A
Another kind of account, henceforth referred to as a
Another kind of account is what will be referred to as a
The next section will discuss where glide-vowel distinctions appear available or constrained in the American English phonology. However, these observations regarding the distribution do not strongly speak to which kind of representation might be best applicable to the case at hand, therefore motivating analysis of production and how it may speak between the competing accounts.
The [j] glide in American English can occur as a simplex word-initial onset, followed by a host of following vowels (e.g.,
This section discusses the apparent distribution of glide-vowel distinctions in American English, narrowing in on the [C_V] environment where a glide-vowel distinction does seem to be available. There are further constraints apparent regarding the distribution of such a distinction, which may speak to our consideration of competing phonological representations. However, it is important to note that this distinction does not appear to be robustly prevalent across the English lexicon—an aspect which (according to Levi,
One apparent constraint is that of the place features of neighboring segments. First regarding the [w]-[u] paradigm, there seem to be no cases of [CwV] in which the following vowel or diphthong involves a high back vocoid (e.g., *[Cwaʊ], *[Cwu] [
The sonority of the preceding segment also appears to constrain the distribution of glides. Take, for example, the loanword adaptation of French
Turning attention to word-medial position, it appears that both homorganicity and sonority constraints may be circumvented, with [j] appearing after [n], which is both coronal and a sonorant (e.g.,
While we can further understand the constraints on this distinction by analyzing its phonological distribution, this does not present a clear choice between the competing place and constriction/height representations proposed, as both homorganicity and sonority appear to play a role. While not speaking between these two representations, the observation of both of these effects could therefore lend support to a hybrid account, like that proposed by Nevins and Chitoran (
The competing phonological representations considered above generate different predictions regarding the acoustics of the [j]-[i] glide-vowel distinction of interest in this study. There are acoustic aspects widely considered to correlate with lingual frontness, height, and constriction. There are also aspects related to timing that may play a crucial role in conveying such a distinction no matter how it is represented, though they would arguably play the only role in the syllabic pre-linking account. Therefore, analyzing the acoustics of such a distinction may speak to which representation appears more directly borne out in production, or at least rule one out.
Recall that a
On the other hand, a
There is acoustic documentation suggesting that we may find such acoustic aspects to characterize the distinction at hand. In studies of intervocalic glides (i.e., [VGV] environments), a dip in intensity between the first and second vowel is observed when an intervening glide is present, as compared to [VV] hiatus (e.g.,
In a
Another timing factor that could play a role is the speed of transition. We might expect a [jV] sequence to have a faster transition than a [iV] sequence. Liberman et al. (
A final timing factor to consider is the earliness of transition. That is, the distinction may not necessarily be about how fast the transition is, but how early it starts. The phonological interpretation that [j] is in the syllable margin with [C_], while [i] is further structurally separate as a member of the syllable nucleus, would suggest a tighter gestural coordination between a glide and the preceding consonant, whether directly formalized (á la
It is important to note that all of the phonological accounts should predict the distinction to result in some kinds of timing differences like those presented above. No matter the featural representation, the glide will be part of the syllable margin, instead of the nucleus. Therefore, while factors related to timing may be found to convey this distinction, they do not necessarily rule out a place- or constriction-based representation. If we find
To summarize the acoustic predictions generated by these competing accounts: All accounts should predict some kind of timing difference due to syllabification, with a [jV] sequence showing a shorter overall duration, an earlier transition to [_V], and/or a faster transition. The place-based representation also predicts acoustic evidence that [j] is more front than [i], and that it therefore has a higher F2. The constriction-based representation, instead, predicts that [j] has a lower acoustic intensity and/or a higher lingual articulation and therefore a lower F1. A finding of
Nine speakers participated in this study. All were remunerated for their time. The study took about 40 minutes, including informed consent and a questionnaire eliciting demographic information. All were identified as native speakers of American English. All were white, non-Hispanic or Hispanic. Most were female (8/9). Their ages ranged from 19 to 37 years. A participant’s particular US region they identified with was not tightly controlled for, as this variable has not so far been shown to significantly differ across any regional varieties of American English. No speakers reported having ever been diagnosed with any speech- or hearing-related disorder.
The experiment elicited utterances of both pre-existing lexical items and nonce names. All were designed with the purpose of eliciting the [j]-[i] distinction in a [C_V] environment. This environment was chosen because it is one where the distinction does seem to be available, as discussed above. Each stimulus was embedded within a unique sentence that participants read aloud. This was done in two blocks, with lexical items in the first and nonce names in the second. There were 8 lexical items considered to carry this distinction in near-minimally paired environments. These are listed in Table
Stimuli: Lexical items of interest.
expected pronunciation | ||||
---|---|---|---|---|
[iV]: | Estonia | hernia | millennia | Armenia |
[jV]: | pneumonia | California | Kenya | gardenia |
The 17 lexical items were each assigned to a unique sentence. These assignments were kept constant. Attention was paid to placing the stimulus in a prosodically prominent position in the early half of the sentence. The following word in every sentence began with a voiceless labial obstruent, simply as a means of controlling the place and sonority of the consonant immediately following the word-final sequence of interest. Two examples are provided in (2).
(2) | a) | The state of California passed a new bill. |
b) | The gardenia flower has a strong scent. |
There were 40 nonce names designed to test if the distinction could be elicited productively. They were also designed with the intent of eliciting the distinction in a much more diverse array of phonological environments, to therefore examine what acoustic aspects convey it consistently. To elicit the distinction itself, pairs differing in orthography were made with the aim of eliciting [j] via the <y> grapheme and [i] via <i>. The preceding consonant and the position in the word were also manipulated. Three places of articulation of the preceding consonant were used—labial, coronal, and dorsal. Within each place of articulation, four manners of articulation were used—voiceless stop, voiced stop, voiceless fricative, and nasal (the latter two manners unavailable for the dorsal place in English). Finally, word position was also manipulated. Position here is defined in terms of the placement of the [Cj]/[Ci] sequence—initial vs. medial. Paired counterparts across the main condition of orthography were created, matched along the other three factors to therefore balance for potential phonological effects on the distribution of this distinction, as discussed above (Section 2.2). There were also 40 filler nonce names created, none incorporating the variable of interest.
Additional environmental factors within the nonce names were controlled. The following vowel (elicited via the <a> grapheme) was kept constant within each position: [ɑ́] for initial position and [ə] for medial position. For the initial-position stimulus pairs, the place of the following consonant was kept identical. For the non-target syllable in each stimulus, the inventory of nuclear vowels was [ɑ, i, u, o]. This provided some diversity while keeping nucleus weight constant. A final aspect of the stimuli was the use of acute accent < ́> marks to represent stress placement. This was incorporated to keep participants from placing stress on the high front vocoid of interest (e.g., pronouncing
Stimuli: Nonce names (w/honorifics).
C_ | <i> | <y> | <i> | <y> | |
---|---|---|---|---|---|
/p_/ | Dr. Piácho | Governor Pyásha | Coach Nópia | Officer Dápya | |
/b_/ | Mr. Biási | Mr. Byásu | Miss Shábia | Mrs. Chóbya | |
/f_/ | Mr. Fiáki | Officer Fyága | Mr. Gófia | Dr. Zúfya | |
/m_/ | Sister Miáshu | Professor Myáchi | Professor Súmia | Dr. Fímya | |
/t_/ | Dr. Tiágu | Sister Tyáko | Governor Bítia | Mr. Pótya | |
/d_/ | Dr. Diáfa | Mr. Dyápu | Sister Módia | Sister Vádya | |
/s_/ | Officer Siáko | Professor Syági | Officer Kúsia | Officer Gísya | |
/n_/ | Sister Niáfi | Dr. Nyápa | Miss Vónia | Judge Búnya | |
/k_/ | Professor Kiása | Mr. Kyáso | Mrs. Dókia | Mr. Púkya | |
/ɡ_/ | Pastor Giáfu | Dr. Gyápi | Judge Nágia | Professor Tígya |
Like the lexical items, each nonce stimulus was embedded in a unique sentence, with that sentence assignment remaining constant. Carrier sentences presented the nonce stimuli as surnames in sentence-initial position. The honorifics kept the stimuli away from completely phrase-initial position while still early in the sentence in a prosodically prominent position. All sentences were of the formula presented in (3) with some examples.
(3) | + | + | + | |||||
a) | Mr. | Byásu | started | a band. | ||||
b) | Judge | Búnya | paints | beautifully. |
Environmental factors within the sentences were also controlled. For medial-position stimuli, the onset segment of the following word in the sentence was always a voiceless labial obstruent (the same method for controlling the following segment as that used for the real word stimuli described above). For initial-position stimuli, the preceding honorific was always [ɹ]-final.
The study took place in the Phonetics and Experimental Phonology laboratory at New York University. Participants were seated in a sound-attenuated booth at a desk with a computer screen in front of them. Their speech was recorded with a Shure SM35-XLR head-mounted microphone connected to a Marantz PMD 660 audio recorder (44.1 kHz sampling). Sentences were presented on the computer screen one at a time. The participant would read the sentence aloud and advance to the next by pressing the down arrow on a standard keyboard. This method expressly avoided auditory repetition, so that no such distinction nor its implementation could be auditorily primed. Previous research suggests that speakers’ productions can be phonetically influenced by previous exposure (e.g.,
The first block consisted of the 17 sentences containing pre-existing lexical items. Sentences were randomized, and then near-minimal pairs were moved to allow substantial space between the counterparts. This was repeated to result in four cycles through the stimuli, with the spacing of near-minimal pair counterparts across cycle boundaries also manually adjusted.
The second block consisted of the 40 sentences containing nonce names (and 40 with filler nonce names). First, the nonce names of interest (targets) were randomized. Then, ordering was adjusted to put maximum distance between near-minimal pair counterparts—those matching in environmental factors and differing in <y> vs. <i> orthography. Then, the filler stimuli were randomly ordered and added, one after each target stimulus, so that the cycle would alternate between target and filler stimuli. This was repeated to result in four cycles through the stimuli. After this, spacing of near-minimal pair counterparts was given similar attention across cycle boundaries.
Between the two blocks, there was a short training session regarding the nonce stimuli. The researcher told participants that they would be encountering unfamiliar last names. They were told that the names use only four vowels—[ɑ], [i], [u], and [o]. They were instructed to be consistent in pronunciation, thinking one letter equals one sound (e.g., the letter <g> should always be pronounced as [ɡ], and never as [d͡ʒ]). They were then instructed that the vowel marked with an acute accent < ́> was stressed.
After this instruction, three cycles of training stimuli were presented. Training stimuli were all made according to the filler stimulus formula. None included <y> or any <iV> sequence; some included simplex <w> onsets. The first cycle was auditory and orthographic repetition. There were ten training stimuli consisting of just an honorific + name. A pre-recording of the stimulus uttered by another English speaker played automatically with each slide showing the orthography, and the participant would repeat it. In these pre-recorded utterances, when an <a> was final and not stressed, it was reduced to a schwa, which participants followed naturally. The second cycle removed the auditory component. There were five training stimuli consisting of just an honorific + name. In this cycle, a pre-recorded utterance was no longer played and the participant would read the orthographically presented stimulus aloud. The researcher provided feedback after any errors (which were usually regarding stress placement). The last training cycle consisted of four stimuli in full sentence form. Participants were told that they would now be reading complete sentences with these names. They were told that it was important to not pause within a sentence and that they may be asked to repeat if they paused within. However, they were informed that there was no time limit and that they could say the sentence in their head before saying it out loud.
Feedback was provided during the second block. However, no feedback was ever given regarding the variable of interest. The researcher did nothing if, on a <y> stimulus, the participant’s utterance was perceived as a [iV] sequence or, for a <i> stimulus, the participant’s utterance was perceived as a [jV] sequence. Both of these behaviors were perceived to occur, though, suggesting that phonological effects on the distribution did sometimes override the orthographic elicitation. No participant was perceived to categorically produce only [jV] or [iV] across the orthographic presentations. One phenomenon that did elicit feedback regarding <y> and <i> was the pronunciation of either as the [aɪ] diphthong. This was not common, but did occur a few times with more than one participant. In such cases, the feedback was framed along the following lines, “Don’t pronounce the letter <i> or <y> as [aɪ]. The only vowels are [ɑ], [i], [u], and [o].” Feedback never included an utterance by the researcher of a [jV] or [iV] sequence. Common errors eliciting feedback were misplacement of stress, pausing, and segmental errors not within but sometimes neighboring the sequence of interest.
There were 1672 utterances examined, after excluding tokens that were produced in an unexpected way (e.g., the vocoid of interest pronounced as [aɪ], a relevant neighboring segment mispronounced, stress misplaced, the sequence held out as a speech delay). Praat software (
The following reviews what this distinction might look like acoustically. In line with a place-based representation, we would expect that [j] has more anterior raising of tongue mass than [i], and therefore a higher F2. In line with a constriction-based representation, we would expect that [j] has a higher lingual articulation and tighter constriction than [i], and therefore a lower F1 and lower acoustic intensity. As predicted by all accounts (now including that of syllabic pre-linking), timing may also play a role, with a [jV] sequence being shorter overall, having an earlier transition, and/or having a faster transition than a [iV] sequence.
Figure
Example utterance spectrograms. Spectrograms of utterances by the same speaker of a near-minimal pair expecting and appearing to exhibit the distinction of interest. The vertical red line shows the vocalic onset and the end of the spectrogram is where the vocalic offset was segmented, with the duration of the entire vocalic sequence noted. The yellow line is Praat’s intensity tracker. The F2 max and F2 min of each vocalic sequence of interest are also noted.
Table
Measurements and competing acoustic predictions.
F2 max | [iV] < [jV] | [j] more front than [i] | place |
F1 min | [iV] > [jV] | lingual articulation higher than [i] | constriction |
intensity range | [iV] < [jV] | [j] more constricted than [i] | |
F2 max time | [iV] > [jV] | [jV] has earlier transition than [iV] | all accounts |
F2 slope | [iV] < [jV] | [jV] has faster transition than [iV] | |
duration | [iV] > [jV] | [jV] = 1 syllable; [iV] = 2 syllables |
Of course, one possibility is that there is no significant difference along any measurement, which would not support the hypothesis that a distinction was elicited (at least as detectable by the measurements taken here). An observation in the
In this section, all of the acoustic measurements (previously summarized in Table
A linear mixed-effects model was performed for each of the acoustic measurements using the lme() function from the nlme package (
The results of this analysis are presented in Table
Results: Pre-existing lexical items. Descriptive statistics and results of linear mixed-effects models per measurement across the factor of expected pronunciation. Measurements are ordered by their consistency of conveying the distinction—how far, in either direction, Percent [i] > [j] is from 50%.
Measurement | Percent[i] > [j] | Mean: [i]-expectant | Mean: [j]-expectant | Coefficient: [j]-expectant | ||
---|---|---|---|---|---|---|
F2 max time | 94% | 35.67 ms | 19.43 ms | –16.49 ms | 9.99 |
*** |
duration | 83% | 167.03 ms | 130.23 ms | –37.09 ms | 2.49 |
*** |
F2 max | 75% | 2609 Hz | 2543 Hz | –67 Hz | 0.00066 | *** |
F2 slope | 25% | 10.078 Hz/ms | 10.791 Hz/ms | +0.769 Hz/ms | 0.04965 | * |
intensity range | 31% | 5.57 dB | 6.57 dB | +0.992 dB | 0.03172 | * |
F1 min | 56% | 431 Hz | 423 Hz | –6 Hz | 0.44118 |
The expected glide-vowel distinction across the near-minimal word pairs appears borne out in the data along multiple acoustic dimensions. The [j]-expectant counterparts have significantly earlier transitions into the following vowel, as represented by the earliness of F2 max, and significantly shorter durations of the entire vocalic sequence. They also have significantly wider intensity ranges, suggesting that [j] has a lower intensity relative to that of the following vowel. A difference in frontness, as represented by F2 max, is also significant but in the
Pre-existing lexical items. Visualizations provide box plots across the two expected pronunciation conditions. Points represent the measurement for each speaker’s utterance of each word (averaged across repetitions and grouped by expected pronunciation alongside the respective box plot). Lines connect each pair’s counterparts, with a green line representing that [i] > [j] and a red line representing that [i] < [j].
These results suggest that a [j]-[i] distinction is present between near-minimal pre-existing word pairs and that timing may be the most reliable distinguisher between [j] and [i]: [jV] sequences are shorter than [iV] sequences, seemingly brought about by an earlier and faster transition into [_V]. The results also challenge applying a place-based representation, with a difference in frontness found to be significant but in the reverse direction of that predicted by this representation: [j] is
This section extends the same analysis to the nonce stimulus data, with analogous linear mixed-effects models of each measurement across the condition of stimulus orthography, where a <i> orthography expects a [i] output and a <y> orthography expects a [j] output. Again, a random effect was specified for each combination of speaker and near-minimal pair. The results are presented in Table
Results: Nonce stimuli. Descriptive statistics and results of linear mixed-effects models per measurement across the factor of expected pronunciation (in this case, <i> vs. <y> stimulus orthography). Measurements are ordered by their consistency of conveying the distinction—how far, in either direction, Percent [i] > [j] is from 50%.
Measurement | Percent[i] > [j] | Mean: [i]-expectant | Mean: [j]-expectant | Coefficient: [j]-expectant | ||
---|---|---|---|---|---|---|
duration | 66% | 199.89 ms | 186.39 ms | –13.21 ms | 6.44 |
*** |
F2 max time | 66% | 23.93 ms | 19.59 ms | –4.38 ms | 3.76 |
*** |
intensity range | 65% | 10.06 dB | 9.28 dB | –0.763 dB | 0.00014 | *** |
F2 max | 59% | 2632 Hz | 2619 Hz | –12 Hz | 0.08380 | • |
F2 slope | 45% | 9.249 Hz/ms | 9.563 Hz/ms | –0.278 Hz/ms | 0.11019 | |
F1 min | 51% | 371 Hz | 386 Hz | –3 Hz | 0.36107 |
Again, the expected glide-vowel distinction across the near-minimal nonce stimulus pairs appears borne out in the data along multiple acoustic dimensions. The [j]-expectant counterparts have significantly earlier transitions into the following vowel and significantly shorter durations of the entire vocalic sequence. There is again a significant effect on the intensity range, however this is in the reverse direction ([i]-expectant stimulus utterances show a greater intensity range across the vocalic sequence than [j]-expectant stimulus utterances). It’s possible that this is a task effect. Recall that stress placement was explicitly marked in the nonce stimuli and used as a distractor variable during the experiment. Subjects may have been hyperarticulating stress by using a wider than normal intensity range to distinguish stressed syllables from unstressed syllables. The hyper-differentiation of intensity between syllables may be overriding any observably lower intensity of [j]. However, this potential for reversal does suggest that intensity may not be the most reliable characteristic of this distinction. The remaining measurements pattern in parallel with the results of the pre-existing lexical items discussed above. F2 max again patterns counter to what would be predicted by a place-based representation, with [j] having a lower F2 max (this time approaching, while not reaching, significance) and therefore a less anterior articulation. Figure
Nonce stimuli. Visualizations provide box plots across the two orthography conditions. Points represent the measurement for each speaker’s utterance of each nonce stimulus (averaged across repetitions and grouped by orthography alongside the respective box plot). Lines connect each pair’s counterparts, with a green line representing that [i] > [j] and a red line representing that [i] < [j].
These results suggest that the [j]-[i] distinction observed between pre-existing near-minimally paired lexical items (Section 4.1) is also productively extended to new words, as elicited via the <y> vs. <i> orthographic distinction, and across a wider variety of surrounding environments. They further suggest that transition earliness and overall vocalic sequence duration are the more consistent acoustic dimensions that convey this distinction ([jV] sequences are shorter than [iV] sequences, with the transition into [_V] coming earlier after [j] than after [i]). The nonce stimulus results continue to challenge applying a place-based representation to this case, with [j] again found to have a lower F2 max and therefore a less anterior articulation—the reverse of that predicted by this representation. However, these results speak less strongly in favor of a constriction-based representation, with [j] now appearing to have a greater intensity with respect to that of the following vowel. How to conclude or proceed based on these observations will be further discussed below (Section 5).
While the central pursuit of the nonce stimulus part of this study is to examine what acoustic characteristics consistently convey this distinction across a more diversified array of surrounding environments, the data may also speak to how those environmental factors constrain the distinction’s availability. Table
Descriptive statistics of F2 max time (representing transition earliness) across manipulated conditions of the surrounding environment. Within each factor, conditions are ordered by how consistently the measurement of F2 max time exhibits the distinction—how far, in either direction, Percent [i] > [j] is from 50%.
Factor | Condition | Percent[i] > [j] | Mean: [i]-expectant | Mean: [j]-expectant | Mean of Differences |
---|---|---|---|---|---|
position | medial | 71% | 22.36 ms | 16.62 ms | 5.75 ms |
initial | 62% | 25.47 ms | 22.53 ms | 2.94 ms | |
C_ place | coronal | 71% | 26.75 ms | 21.18 ms | 5.58 ms |
labial | 69% | 27.37 ms | 22.76 ms | 4.61 ms | |
dorsal | 53% | 11.47 ms | 10.17 ms | 1.30 ms |
Given that transition earliness appears to be the most consistent differentiating characteristic of this distinction, it is briefly given some more nuanced attention here. In the above analyses, transition earliness is treated as an absolute measurement: How many milliseconds after the onset of a [jV]/[iV] sequence is the maximum F2 reached before its descending transition into the following vowel begins? However, we know that the duration of a segment can be influenced by segment-extrinsic factors like speech rate (
On the other hand, effects on segmental duration are not entirely absolute or consistent. Studies examining the effects of speech rate have observed that pauses (
The data below provide a fuller description of transition earliness. In Table
Results: Transition earliness (absolute vs. relative). Descriptive statistics and results of linear mixed-effects models per measurement across the factor of expected pronunciation.
Percent[i] > [j] | Mean: [i]-expectant | Mean: [j]-expectant | Coefficient: [j]-expectant | |||
---|---|---|---|---|---|---|
Pre-existing | ||||||
absolute | 94% | 35.67 ms | 19.43 ms | –16.49 ms | 9.99 |
*** |
relative | 89% | 20.9% | 14.6% | –06.3% | 4.69 |
*** |
Nonce | ||||||
absolute | 66% | 23.93 ms | 19.59 ms | –4.38 ms | 3.76 |
*** |
relative | 61% | 11.4% | 09.8% | –01.6% | .00073 | *** |
Smoothing Spline ANOVA plots. Plots on the lefthand side represent the x-axis in Relative terms, with 50 timepoints (evenly spaced across the entire vocalic sequence duration) expressed ordinally. Plots on the righthand side are in Absolute terms, with the x-axis converted to the amount of time (ms) between each point and the onset of the vocalic sequence. In each plot, the earlier half is that containing the high front vocoid of primary interest; the latter half represents the following vowel and also some indication of the transition into the initial segment of the following word.
These results demonstrate that both the absolute and relative approaches to the measurement of transition earliness significantly reveal the distinction. However, they also suggest that an absolute approach to measuring transition earliness may be more consistent at capturing it. The results in Table
Furthermore, when examining the Absolute plots, it is apparent that the confidence intervals become wider toward the end of the contour (the right side). This is due to variation in the duration of the entire sequence: When some utterances have shorter durations, there become fewer measurement points that can be referred to in the calculation of the confidence interval. So the entire sequence duration seems to vary, but the [j]-[i] distinction is still apparent when examined in absolute terms. The combination of these observations suggests that this variation of the entire sequence’s duration may be more attributable to varying duration of the following vowel (corroborating findings mentioned above [e.g.,
The results of this study suggest that there is a distinction between the [j] glide and [i] vowel available to native speakers of American English. This is elicited in utterances of near-minimally paired lexical items. It is also extended productively to nonce stimuli, elicited solely by orthography. Analysis identifies what acoustic aspects play a consistent role in the production of this distinction. The glide appears to most consistently be characterized by an earlier transition to the following vowel and, likely as a result, a shorter overall duration of the vocalic sequence. Results from the pre-existing word pairs also suggest the glide to have a lower acoustic intensity, though this effect was reversed in the nonce stimulus production task (possibly as a task effect due to increased focus on stress placement). And while [j] is not shown to have a significantly higher and tighter lingual articulation (i.e., there is an insignificant difference in F1 min), neither is it shown to have a significantly lower and more open articulation. On the other hand, acoustic measurements do show a significant difference in articulatory frontness (measured by F2 max), suggesting that [j] is significantly less anterior (with a lower F2 max) than [i].
This further understanding of the acoustic character of this distinction serves us in multiple ways. It documents the distinction and aids in future approaches to identifying and segmenting it, which may help improve and increase its future documentation and allow for more robust analysis of its distribution and variability. The acoustic characterization can also contribute to the choice between different phonological representations considered. The finding that [j] has a significantly less anterior articulation supports ruling out a place-based representation (such as that proposed by Levi [
Furthermore, characterizing the acoustics of this distinction may further our understanding of its phonological distribution. As discussed at the beginning of this paper (Section 2.2), and suggested by the results (Table
There are multiple further directions of inquiry that this study motivates. One is to examine the perception of this distinction, both in terms of cueing and contrast. The analysis above examines acoustic measurements as characteristics of this distinction: What details of the acoustic signal exhibit significant differences across production of these apparently distinct categories? Some characteristics (e.g., transition earliness and duration) appear more consistent and reliable than others (e.g., intensity). It would be helpful to know if this characterization of the distinction’s
Another extension of this study would be to analyze the acoustic character of glide-vowel distinctions in other languages, such as those documented by the many studies cited throughout this paper. This study’s results are only intended to shine light on what representation may be most plausible (or at least rule any candidates out) for the distinction apparent in the American English phonological system under consideration. It is possible that languages previously argued on more phonological grounds to be best represented with the other approaches considered do actually cue it differently, with acoustic characterizations in line with those predicted by the respective representations. This approach of acoustic characterization is further applicable to the analysis of any distinction for which there is a diverse suite of potential acoustic cues. And, as employed here, that acoustic characterization may be useful in comparing the acoustic predictions generated by competing phonological representations of such a distinction and therefore speaking between them. Further such analysis will contribute to the ongoing broader question of how interwoven or disconnected phonological representation and phonetic realization can be (e.g.,
While the word-initial patterning of [jV] vs. [iV] appears to conflate with stress, one exception of [iV] hiatus where the following vowel, instead of the initial vowel, is stressed might be the name
The pattern of [w] being dispreferred after labial consonants is not exceptionless. Some Spanish loanwords such as
Smoothing Spline ANOVA analysis was first used in linguistics by Davidson (
I would like to thank Lisa Davidson, Maria Gouskova, and Frans Adriaans for their valuable feedback at many stages of this research. Many additional thanks go to Susannah Levi, Suzy Ahn, Sean Martin, Becky Laturnus, members of the NYU Phonetics and Experimental Phonology Lab, and audiences at the 170th meeting of the Acoustical Society of America and the 2017 annual meeting of the Linguistic Society of America for their feedback and discussion. I am grateful to the anonymous reviewers, whose comments regarding this paper’s earlier manuscript led to significant improvement. I would also like to thank the participants who provided their time and their voices for analysis.
The author has no competing interests to declare.