1. Introduction

It is well known that the phonetic realization of a phonological element typically varies according to a range of factors, including position, speech rate, and speech style. Among these, research on positional variation has brought a number of key insights linking phonological and prosodic categories with phonetic variation. Interestingly, despite the range and depth of work on the topic, extant research on positional variation has not considered the potential interaction between phonetic forces and phonological harmony.

In phonological harmony, e.g., vowel or consonant harmony, the realization of some element is determined by some other element elsewhere in the word. Among harmony patterns, the most widely studied example is vowel harmony, which is a phonological restriction on vowel co-occurrence (see Rose & Walker, 2011 for an overview). In languages with vowel harmony, only certain classes of vowels may co-occur within a given domain, often the word. Most often, this is manifested in morphophonological alternations. As an example, the backness of suffixes in Turkish is generally determined by the backness of the initial-syllable vowel. More specifically, consider the realization of the plural suffix in [bel-ler] ‘waist-PL’ and [bɑl-lɑr] ‘honey-PL.’ If the preceding vowel is [+back], [-lɑr] is the appropriate allomorph of the plural suffix, but if the preceding vowel is [–back], [-ler] is the appropriate allomorph. Phonetic studies of vowel harmony have typically examined the acoustic or articulatory properties of vowels in a fixed position to determine what similarities exist within a particular harmonic class, and conversely, what differences distinguish each class of vowels (e.g., Fulop, Kari, & Ladefoged, 1998; Guion, Post, & Payne, 2004; Svantesson, 1985; Washington, 2016). These studies have compared the properties of vowels across classes, but not across positions. While a particular set of acoustic or articulatory features may characterize one class of vowels in some position, it does not necessarily follow that those same properties will be identical across all positions for the relevant class of vowels. For instance, after coarticulation from flanking consonants is taken into account, how similar or distinct are the [ɑ] vowels in a Turkish word like [bɑl-lɑr] ‘honey-PL’?

In a language with vowel harmony, the mandate that some set of elements be identical along a given phonological dimension may conflict with phonetic trends that favor reduction in certain positions. How do these two forces interact? This interaction between phonologically-dictated sameness and phonetically-determined variation is the central topic of interest in this paper. Specifically, this paper examines positional variation in vowel harmony in Kyrgyz, an understudied Turkic language of Central Asia. The realization of words of up to four syllables in length is examined to determine the extent and nature of acoustic variation among the Kyrgyz vowels. In addition, the paper examines the potential sources of this variation, and in turn, the relationship between phonology and phonetics in the language.

2. Background

2.1. Phonetic reduction

Reduction of vowels toward a more central acoustic or articulatory value is widely reported in the literature (e.g., Tiffany, 1959; Delattre, 1969; Nord, 1986; Bradlow, Torretta, & Pisoni, 1996; Mooshammer & Geng, 2008 among many others); phonetic reduction is derivable from a number of different mechanisms. Some explanations involve close connections to grammatical knowledge while others invoke only non-linguistic articulatory constraints. Grammatical knowledge is a driving force in patterns of variation that reference domain edges. For instance, a large body of work has demonstrated that phonetic variation is linked to prosodic boundaries. Domain-initial segments are produced with more extreme articulations (Fougeron & Keating, 1997; Cho & Keating, 2001; Cho, 2002, 2006; Cho & Keating, 2009). In addition to initial edges, final edges of prosodic domains are often accompanied by phonetic lengthening (Klatt, 1975; Wightman, Shattuck-Hufnagel, Ostendorf, & Price, 1992; Edwards, Beckman, & Fletcher, 1991; Byrd, 2000). Furthermore, prominent positions, e.g., stressed syllables, often exhibit distinct articulatory and acoustic characteristics, which de Jong (1995) calls “localized hyperarticulation” (Lindblom, 1990; Harris, 1978; Öhman, 1967; Beckman, Edwards, & Fletcher, 1992).

In addition to domain edges, much work has found that phonetic variation is intricately tied to predictability; more predictable elements are subject to reduction (Pisoni, Nusbaum, Luce, & Slowiaczek, 1985; Luce, 1986; Boersma, 1998; Van Son & Pols, 1999, 2003; Aylett & Turk, 2004; Cohen Priva, 2008; Scarborough & Zellou, 2013; Seyfarth, 2014; Turnbull, 2015). For most research on predictability-based reduction, either the domain of reduction is the entire word, or predictability is considered at the word and not at the segmental level (Gahl, Yao, & Johnson, 2012; Scarborough & Zellou, 2013; Seyfarth, 2014). However, research on vowels has consistently found that contextual predictability accounts for differences in pronunciation. For instance, Jurafsky, Bell, Gregory, and Raymond (2001) find that context is highly predictive of reduced (schwa-like) vowel qualities, and Van Son and Pols (2003) argue that contextual probability motivates vowel reduction in Dutch, suggesting that reduction is not only tied to duration, but also to segmental quality.

It is important to note the relationship between vowel harmony and predictability. Vowel harmony dramatically increases the predictability of a given phonological feature. This increase in predictability creates the context for phonetic reduction. In most Turkic languages, backness harmony applies from left to right, forcing non-initial vowels to agree in backness with the initial-syllable vowel. Thus, the conditional probability of hearing a [+back] vowel in the initial syllable is significantly lower than hearing one in a second syllable if the preceding vowel is [+back]. Given that the phonetic realization of phonological elements is related to predictability, this predicts that targets of harmony may be targets of reduction.

While variation conditioned by prosodic domains or statistical learning depend largely on grammatical knowledge, a different line of work has found that phonetic variation may also depend on mechanical forces. Among these, there is a cross-linguistic pattern for articulatory effort to diminish across a prosodic unit, sometimes called supralaryngeal declination. In these cases, phonetic distinctions are incrementally reduced throughout a given domain (Nord, 1986; Vayra & Fowler, 1992; Johnson & Martin, 2001; Tabain, 2003). In Nord (1986) and Johnson and Martin (2001), this domain is the word, although Vayra and Fowler (1992), as well as Tabain (2003), present evidence that larger prosodic domains may also condition reduction. Vayra and Fowler (1992) link incremental reduction to pitch declination, the quasi-universal pattern of f0 lowering during a phrase. They suggest that general effort is reduced later in a domain, giving rise to both lowering of f0 and contraction of the vowel space.

A second mechanical force, often called undershoot, has also received support in the literature. In his seminal paper, Lindblom (1963) finds that vowel acoustics are partially conditioned upon flanking consonants and vowel duration. In contexts where vowel duration is decreased, contextual influence from consonants increases, and as a byproduct, vowel targets are undershot. Nowak’s (2006) study on Polish vowels further supports Lindblom’s analysis, indicating that consonant features play a determinative role in the realization of Polish vowels. Of the range of factors in positional variation, undershoot is the most mechanical. Segmental undershoot derives not from any reference to grammatical categories, like words, phrases, or the distribution of phonological elements, but rather from the simple inability to attain acoustic or articulatory targets due to insufficient duration and/or competing contextual phonetic targets (see also Flemming, 2001).

One common, often tacit, assumption within the phonological literature is that harmonic vowels should exhibit little to no subphonemic variation by position. Pearce (2008, 2012) contends that undershoot is blocked in harmony in order to preserve the phonetic manifestation of the spreading feature. She argues that reduction toward a central schwa-like vowel would obscure the phonological pattern, and so phonology effectively precludes undershoot, even in very short vowels, in order to enhance the effects of harmony. Thus, in the Turkish word [bɑl-lɑr] ‘honey-PL,’ once effects of stress and coarticulation from flanking consonants are accounted for, such an analysis would predict that the realization of second-syllable [ɑ] should be indistinguishable from initial-syllable [ɑ]. Under Pearce’s analysis, phonetic reduction is controlled by phonological harmony. A similar view is articulated in Zsiga (1997), which contrasts the putatively categorical effects of harmony with gradient effects that result from phonetic assimilation. Regarding the alternation between /a̙/ and /e/ in Igbo, Zsiga (1997, p. 235) contends “[t]here is no evidence that a derived [a̙] … is phonetically different from an underlying /a̙/.” Although it is unclear to what degree Zsiga (1997) allows for positional variation, one interpretation of her claim is that harmony precludes any sort of phonetic effect that might render an alternating target of harmony different from a non-alternating trigger. If, as Pearce (2008, 2012) claims, phonetic variation is suppressed under the effects of harmony, this predicts more generally that phonology dictates the nature of phonetic implementation, at least for vowel harmony. This same line of reasoning is present in work on ‘strict locality’ in phonology (Gafos, 1999; Ní Chiosáin, & Padgett, 2001), which argues that phonological assimilation produces phonetic alternations for every element within the domain of application.

2.2. Kyrgyz vowel harmony

Kyrgyz (ISO 639 kir) is a Turkic language spoken by approximately five million people, primarily in the Republic of Kyrgyzstan. Kyrgyz has an inventory of twenty-five consonants and fourteen vowels, shown in Table 1 and Table 2. Of the Kyrgyz vowels, only the eight short vowels are considered in this study. See Hebert and Poppe (1963), Kara (2003), and Toktonaliev (2015) for descriptions of the language.

Table 1

Kyrgyz consonant inventory.

Bilabial Labiodental Dental Post-alveolar/Palatal Velar Uvular
Plosive p b t d k ɡ q
Fricative f v s z ʃ ʒ x ʁ
Affricate ʦ ʧ ʤ
Nasal m n ŋ
Liquid l r
Glide w j w
Table 2

Kyrgyz vowel inventory.

[–back] [+back]
[–round] [+round] [–round] [+round]
[+high] i y yː ɯ u uː
[–high] e eː ø øː ɑ ɑː o oː

Kyrgyz exhibits two types of vowel harmony—backness and rounding harmony (Hebert & Poppe, 1963; Kaun, 1995; Washington, 2016). Both harmonies operate from left to right, with the initial syllable determining the backness and rounding of subsequent vowels. As an example, consider the realization of the locative and accusative suffixes in Table 3. Observe that both suffixes exhibit a four-way alternation for backness and rounding. Backness harmony is obeyed in all forms in Table 3, and rounding harmony is obeyed in all except one, [qul-dɑ] ‘slave-LOC.’ The domain of each harmony is the word, resulting in long words with consistent phonological backness and rounding, e.g., [bɑltɑ-lɑr-ɯbɯz-dɯ] ‘axe-PL-POSS.1P-ACC’ and [ølkø-lør-ybyz-dy] ‘nation-PL-POSS.1P-ACC.’

Table 3

Backness and rounding harmony in Kyrgyz.

Initial-syllable vowel locative accusative gloss
I til-de til-di ‘language’
E bel-de bel-di ‘lower back’
Y ɡyl-dø ɡyl-dy ‘flower’
Ø køl-dø køl-dy ‘lake’
ɯ ʤɯl-dɑ ʤɯl-dɯ ‘year’
ɑ bɑl-dɑ bɑl-dɯ ‘honey’
U qul-dɑ qul-du ‘slave’
O ʤol-do ʤol-du ‘road’

Significantly, existing work on Turkic phonetics suggests that vowel qualities may differ by position. In Kyrgyz, Washington (2006, 2008) finds that affix vowels are less distinct than root-internal vowels. Elsewhere in the family, McCollum (2015, 2019b, 2019c) argues that positional reduction is asymmetric in Kazakh and Uyghur. McCollum reports that back vowels undergo non-initial fronting without any comparable effects among the front vowels. However, Lanfranca (2012) reports that root and affix vowels in Turkish do not generally differ in F1 or F2.

2.3. Predictions

The goals of the paper are two-fold: to determine if and to what extent non-initial vowels differ acoustically from initial-syllable vowels, and how to analyze those potential differences. Throughout the paper, reduction is defined as centralization. Thus, in terms of raw formant values, this predicts high vowels will show increased F1 in reduced contexts, and low vowels will show decreased F1 in these contexts. Further, since centralization is equated with reduction throughout the paper, F2 of back vowels should increase in reduced contexts, while F2 of front vowels should decrease when reduced.

If Kyrgyz vowels are subject to positional reduction, some general possibilities are schematized in Table 4. First, if alternating vowels are more centralized than non-alternating vowels, and vowels in each successive syllable are progressively more centralized, then this suggests an incremental pattern of reduction consistent with supralaryngeal declination. In other words, if centralization in syllable x is greater than centralization in syllable x-1 for all positions, this would support supralaryngeal declination. Second, if non-initial vowels are more centralized than initial vowels, but no differences in centralization are found across successive non-initial syllables, e.g., syllable two versus syllable three, this would support a binary distinction between non-initial (target) and initial (trigger) positions. Such a binary distinction is compatible with either an initial strengthening or predictability-based account.

Table 4

A schema of possible types of positional variation.

Supralaryngeal declination Initial strengthening or predictability-based reduction Undershoot
Are non-initial vowels more centralized than initial-syllable vowels?
Is reduction incremental? i.e., are vowels in syllable x more centralized than vowels in syllable x-1?
Are vowels produced with shorter duration more centralized?

To be clear, despite predicting similar patterns, the mechanisms that induce reduction (or alternatively, enhancement) under these two possible analyses are distinct. The initial strengthening account relies on prosodic boundaries while the predictability account relies on harmony to drive reduction. As a simple demonstration of the relationship between harmony and predictability, a corpus of online Kyrgyz was examined (kir_newscrawl_2016_1M from wortschatz.uni-leipzig.de; Goldhahn, Eckart, & Quasthoff, 2012; see Goldsmith & Riggle, 2012 for more on information theory and vowel harmony). To evaluate intersyllabic vowel sequences only, all consonants, punctuation, markup, and other text formatting were removed. The remaining corpus included vowels, which were supplemented by word-beginning and word-end boundary symbols. The corpus contained 13,070,669 words and 39,275,334 vowels, with an average of 3.00 vowels per word. Overall, [+back] vowels were much more frequent in the corpus ([+back] count = 24,648,993, [–back] count = 14,626,341; 62.8 and 37.2%, respectively). The probability of an initial-syllable [+back] vowel, defined as the bigram of the word-beginning symbol followed by a [+back] vowel, was 0.63; the probability of an initial-syllable [–back] vowel was 0.37. However, the conditional probability of a [+back] vowel immediately following a [+back] vowel was 0.91, and the probability of a [–back] vowel following a [+back] vowel was 0.09. The same generalization holds for the front vowels. Given a [–back] vowel, the probability of another [–back] vowel was 0.86, much larger than the probability of encountering a [–back] vowel in the initial syllable. Thus, the increase in predictability of a given phonological feature, here [back], makes reduction of F2 an imminently plausible ancillary result of harmony.

Third and finally, if centralization is best accounted for in terms of vowel duration rather than position, then variation might best be described in terms of vowel undershoot. Under this account, differences in vowel acoustics should be derivable from duration only, and not position.

3. Methods

3.1. Participants

Thirteen (11 females, mean age: 35.0 years, range: 18–57 years) Kyrgyz speakers living in Bishkek, Kyrgyzstan participated in the study. All participants reported native fluency in the target language. Most participants also reported fluency in Russian, and some speakers reported additional fluency in Uzbek. Speaker participation and informed consent were obtained in accordance with University of California San Diego Linguistic Fieldwork IRB protocol #141520.

3.2. Stimuli

During the recording phase, participants were presented a controlled set of target words containing all short vowel contrasts in the language. Target words were derived from monosyllabic and disyllabic roots, exemplifying all eight short vowels in the language. Two monosyllabic stimuli for each category were elicited. One ended in a lateral, e.g., /bɑl/ ‘honey,’ and the other ended in a sibilant, e.g., /bɑʃ/ ‘head.’ Among the disyllabic roots, root-internal vowels were identical, e.g., /moldo/ ‘mullah’ and /ilim/ ‘science.’ The attested lexical root /qurum/ ‘soot’ was prompted, but no speakers accepted this word. As a result, no disyllabic roots with root-internal /u/ were recorded. The full list of stimuli is presented in the Appendix.

Each lexical item was prompted in the nominative, locative, ablative, and accusative cases for both singular and plural numbers. Case-marking suffixes were also elicited in conjunction with the first- and third-person singular possessive suffixes. Example forms for the lexical root /bɑl/ ‘honey’ are shown in Table 5. With monosyllabic roots, target words were up to three syllables in length, and with disyllabic roots, target words were up to four syllables in length.

Table 5

Elicited words for the lexical root /bɑl/ ‘honey’.

Singular Plural
Word Suffix(es) Word Suffix(es)
bɑl NOM bɑl-dɑr PL
bɑl-dɑ LOC bɑl-dɑr-dɑ PL-LOC
bɑl-dɑn ABL bɑl-dɑr-dɑn PL-ABL
bɑl-dɯ ACC bɑl-dɑr-dɯ PL-ACC
bɑl-ɯ POSS.3S
bɑl-ɯm POSS.1S
bɑl-ɯ-n POSS.3S-ACC
bɑl-ɯm-dɑ POSS.1S-LOC
bɑl-ɯm-dɯ POSS.1S-ACC

3.3. Procedure

Each session was divided into training and recording phases. During the training phase, participants were taught to identify a small set of lexical roots with pictorial prompts. After learning the set of roots, participants learned a set of pictorial-grammatical correspondences involving number, case, and possession. As an example, during training participants learned to associate two downward-facing arrows as a cue for the locative case and two outward-facing arrows as a cue for the ablative case. Thus, when presented with a picture of honey without any additional arrows (or other cues), the target word was /bɑl/ ‘honey.’ When the picture of honey was accompanied by two downward-facing arrows, the target word was /bɑl-dɑ/ ‘honey-LOC’ and when the picture indicating honey was accompanied by two outward-facing arrows, the cue was /bɑl-dɑn/ ‘honey-ABL.’ The training phase typically lasted around five minutes.

After participants completed training, the recording phase began. Throughout each session, participants were presented images on a laptop computer screen that showed both a picture representing a lexical item and a pictorial prompt indicating number, case, and possession. When speakers were unable to guess the target word from the prompt, they were given either the equivalent Russian word or a paraphrase in the target language. Sessions were conducted in a quiet room. Participants wore a Shure-SM10A unidirectional head-mounted microphone, and all data were recorded to a Marantz PMD 661 MKII digital recorder at a sampling rate of 44.1 kHz. Each session lasted between 45 and 90 minutes.

3.4. Segmentation and statistical analysis

All sound files were segmented in Praat (Boersma & Weenink, 2015). The beginning and end of each vowel was aligned to the onset and offset of the second formant. In cases where the second formant persisted across flanking consonants (i.e., sonorants), abrupt shifts in the amount and distribution of spectral energy were used to indicate vowel onset and offset.

After segmentation, vowel duration and the first two formants (F1 and F2) at vowel midpoint were measured. Outliers were inspected for measurement errors. In particular, a number of errors were found with /u/, where the formant tracker in Praat failed to distinguish the first two formants. In these cases, formant frequencies were hand measured at the approximate vowel midpoint.

F1 and F2 were z-score normalized (Lobanov, 1971) to facilitate more meaningful between-speaker comparisons. The data for normalization consisted of four tokens of each vowel taken from monosyllabic words. If four tokens of a given vowel were not present in monosyllabic words, then the remaining tokens were taken from the initial syllable of disyllabic words. One benefit of z-score normalization is that it provides an estimate of the acoustic center of each speaker’s vowel space. This, in turn, allows for a straightforward analysis of potential positional differences in vowel quality. In the analysis, the absolute values of F1 and F2, |F1| and |F2|, are used to assess distance from the center of the acoustic vowel space, and as a consequence, the extent and degree of hypo- or hyperarticulation (see Bradlow et al., 1996 for a different metric). In addition, vowel durations were centered to across-speaker means for each phonemic vowel quality.

The data were divided into two groups for analysis, words derived from monosyllabic roots and words derived from disyllabic roots. Results from the monosyllabic roots should display the general pattern of phonetic realization for each vowel in words up to three syllables in length. Vowels produced in words deriving from disyllables can further inform the analysis through the comparison of initial- and second-syllable vowels. For instance, if affix vowels are reduced (Washington, 2008), but not root vowels, second-syllable vowels should be indistinguishable from initial-syllable vowels in disyllabic roots. On the other hand, if second-syllable vowels in disyllabic roots behave like suffix vowels after monosyllabic roots, this would suggest a distinction between the initial syllable and all non-initial syllables instead of a root-affix distinction. Data collected from disyllabic roots also provides information on variation in words of up to four syllables in length.

For both monosyllables and disyllables, |F1| and |F2| were predicted based on vowel quality, position (syllable number), duration, as well as preceding and following consonant place of articulation. All two- and three-way interaction terms for vowel quality, position, and duration were also included in the model. Position, preceding consonant place, and following consonant place were all treatment coded. Using the lme4 package in R (lme4 package Bates, Machler, Bolker, & Walker, 2015; R Core Team, 2017), the model incorporated a random intercept for speaker. The significance of each predictor was assessed using the anova function in the lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2017; with Satterthwaite’s method for degrees of freedom estimation). Post-hoc comparisons across syllables for each contrastive vowel quality were conducted using the emmeans package (Lenth, Singmann, & Love, 2018). Since pairwise comparisons are the main focus of the analysis, the anova function was used to make initial statistical reporting simpler, and because the F-test more closely aligns with the aims of the paper than the output of the summary function in lme4. Significance is reported in the main body of the text throughout, but ANOVA tables for each analysis are included in the Appendix.

4. Results

4.1. Monosyllables

Preliminarily, observe the mean F1 and F2 vowel qualities by syllable in Figure 1 (n = 5,395 vowels). In terms of F1, most non-low vowels are characterized by increasing F1 in non-initial syllables while /ɑ/ and /u/ show no obvious differences in F1 by position. As for F2, note that vowels tend to exhibit less peripheral F2 in second and third syllables. For instance, F2 of front vowels /i/ and /e/ diminishes in second and third syllables. However, F2 of back vowels /u/ and /o/ increases in non-initial syllables, suggesting contraction of the F2 dimension of the vowel space in non-initial syllables.

Figure 1
Figure 1

Mean F1 and F2 (z) by syllable number for words derived from monosyllabic roots. Syllable number is marked by numerical subscripts.

In addition to vowel quality differences across positions, duration also differs by position. In Figure 2, observe that vowels tend to be longer in second and third syllables. This may be due to stress; descriptive work reports that primary stress falls on the final syllable (Yunusaliev, 1966; Kirchner, 1998; Johanson, 1998). Interestingly, when Figure 1 and Figure 2 are compared, notice that more central vowel qualities are produced in syllables that tend to be longer. This generalization immediately suggests that variation in Kyrgyz is not primarily derivable from undershoot since vowels are most centralized in positions where vowel length is greatest. Furthermore, centralization of stressed (i.e., final-syllable) vowels further supports the case that stress does not prevent reduction in Kyrgyz (cf. Delattre, 1969; De Jong, 1995; Cho, 2005).

Figure 2
Figure 2

Duration (ms) by vowel and position in words derived from monosyllabic roots.

4.1.1. |F1|

Vowel quality was a significant predictor of |F1| [F(7, 5332.7) = 1610.2, p < .001], which is not surprising, since the language exhibits a phonological height distinction. More importantly, both syllable number and duration exerted a significant effect on |F1| [Syllable: F(2, 5334.2) = 33.51, p < .001; Duration: F(1, 5111.0) = 4.40, p = .04]. Let us first examine more closely the effect of position on |F1|. In Figure 3, |F1| is plotted by position for each vowel. Reduced vowels should exhibit diminished |F1|, which is consistent with data for /i y ɯ/. However, the non-high vowels and /u/ do not show the same trend. The mid vowels /e ø o/ exhibit increasing |F1| by-syllable, while /u/ and /ɑ/ vary little in terms of |F1|. These vowel-specific differences are reflected in the model by the significant interaction between vowel quality and syllable number [F(14, 5331.2) = 34.56, p < .001].

Figure 3
Figure 3

|F1| (z) by position for each vowel quality in words derived from monosyllabic roots.

Pairwise comparisons of each vowel quality in syllables one, two, and three were then conducted. Results from these are shown in Table 6 with Bonferroni-corrected alpha criterion (α = .05/24 = .002). The data in Table 6 allow for a detailed comparison of the predictions of both incremental and binary analyses of reduction. Reduced |F1| is indicated by a positive estimate and t-value. If reduction is incremental, all pairwise comparisons are expected to be significant and in the right direction. If reduction is binary (or alternatively, if initial syllables are strengthened), no significant differences are expected between second- and third-syllable means.

Table 6

Pairwise comparisons for |F1| (z) of vowels by position in words elicited from monosyllabic roots.

Vowel Syllables Estimate t p Significance
ɑ 1–2 0.21 5.18 <.001
1–3 0.14 2.96 .07
2–3 –0.07 –2.30 .51
e 1–2 –0.09 –2.35 .45
1–3 –0.04 –0.80 1
2–3 0.05 1.28 1
i 1–2 0.49 10.44 <.001
1–3 0.47 7.81 <.001
2–3 –0.02 –0.42 1
ɯ 1–2 0.17 4.19 <.001
1–3 0.29 5.41 <.001
2–3 0.11 2.44 .36
o 1–2 –0.26 –8.08 <.001
1–3 –0.41 –8.06 <.001
2–3 –0.14 –2.77 .13
ø 1–2 0.15 4.38 <.001
1–3 –0.03 –0.74 1
2–3 –0.18 –4.37 <.001
u 1–2 0.27 6.49 <.001
1–3 0.35 6.22 <.001
2–3 0.08 1.68 1
y 1–2 0.53 12.16 <.001
1–3 0.57 10.25 <.001
2–3 0.04 0.87 1

Of the 24 comparisons, 13 were significant; 10 were significant and in the right direction. Additionally, no vowels showed significant pairwise differences among all three comparisons. Thus, these data for |F1| do not lend immediate support for incremental reduction. Instead, the four high vowels /i y ɯ u/ exhibited significant positive differences between |F1| in initial and non-initial syllables, but no differences between second- and third-syllable means. Among the non-high vowels /ɑ o e ø/, /ɑ/ and /ø/ exhibited significant positive differences between first- and second-syllable |F1| while all other comparisons were either insignificant or in the wrong direction. Observe also that only /ø/ displayed a significant difference between second- and third-syllable means, albeit in the wrong direction; all other second- and third-syllable differences failed to reach significance.

In Figure 4, we can see the effect of duration for |F1|. Generally, most vowels in most positions do not show a significant connection between increased duration and increased |F1|. Observe, however, that a positive correlation between duration and |F1| is more common among the non-high vowels. In fact, the low vowel is the only vowel that exhibits a substantial positive correlation between duration and |F1| across all positions. These patterns are consistent with the significant interaction between vowel quality and duration [F(7, 5333.9) = 6.94, p < .001]. Further, the interaction between syllable number and duration indicates that duration-related effects are modulated by position in the word [Syllable:Duration: F(2, 5332.0) = 7.47, p < .001]. The significance of the three-way interaction between vowel, syllable number, and duration further demonstrates that variation for |F1| is a complex relation between individual vowels, position, and duration [Vowel:Syllable:Duration: F(14, 5332.3) = 2.08, p = .01]. Finally, both preceding and following consonant place significantly affect |F1| [Preceding place: F(3, 5331.9) = 15.80, p < .001; Following place: F(3, 5338.4) = 30.25, p < .001].

Figure 4
Figure 4

Correlation between duration (centered, ms) and |F1| (z) by vowel quality and syllable number in words derived from monosyllabic roots.

Thus, there is some evidence in favor of a binary rather than incremental conception of reduction in Kyrgyz. To further address this question, two models were constructed to predict |F1| that were minimally different from the model above. In the first, position was treated as a continuous predictor, and in the second, position was treated binarily, distinguishing initial from non-initial syllables. Differences in Akaike Information Criterion (AIC; Akaike, 1974) were used to compare the two models. AIC is a metric for model selection based on Kullback-Leibler divergence, and in terms of AIC, lower values indicate better model fit. Following Burnham and Anderson (2004, p. 271), it is assumed that differences in AIC (ΔAIC) less than two indicate a non-significant difference, differences between four and seven indicate a moderately significant difference, and models with ΔAIC greater than 10 represent a highly significant difference in model fit. When the two models were compared, the model with binary position provided better fit than the model with continuous position [ΔAIC = 98.1]. This is taken as evidence that reduction of |F1| is binary rather than incremental.

The positional lowering of mid vowels in Figure 1 is an issue worthy of some discussion. If positional variation is related to centralization, then lowering is expected for the high vowels, but somewhat unexpected for the mid vowels, as F1 of /e ø o/ in Figure 1 shifts toward a value notably higher than 0, around 0.7z. Is this centralization, even though these vowels do not shift toward 0z? Recall that the zero value derived via z-score normalization approximates the center of each speaker’s acoustic vowel space. It is conceivable that the acoustic center of the vowel space derived during normalization is not identical to the acoustic quality that reduced vowels shift toward. There are two points worth noting here. First, z-score normalization predicts a central acoustic value based crucially on the number and quality of contrastive vowels in the language. For instance, a language with an inventory of /ɪ ɛ a ɔ ʊ/ would likely have a lower normalized center than a language with /i e a o u/. Therefore, the normalization process can only provide a rough estimate of what traits a centralized vowel might have in a given language. Second, vowel reduction patterns in a number of languages show trends toward distinct central vowel qualities (e.g., Delattre, 1969; Gendrot & Adda-Decker, 2006; Mooshammer & Geng, 2008; Pearce, 2008). If the exact quality of a centralized vowel may differ across languages, then it is possible to interpret the data in Figure 1 as evidence that reduction in Kyrgyz yields a more open central vowel that is lower than the predicted center of the vowel space.

4.1.2. |F2|

As expected, vowel quality was a significant predictor of |F2| [F(7, 5332.7) = 1206.2, p < .001], since Kyrgyz possesses acoustically front, central, and back vowels. Syllable number was also predictive of variation in the backness dimension [Syllable: F(2, 5334.3) = 33.70, p < .001]. By-syllable plots of |F2| for each vowel are shown in Figure 5. Observe that, in general, non-initial vowels show smaller |F2| than initial-syllable vowels. The only obvious exception to this is /ɯ/, which, as shown in Figure 1, is realized with higher |F2| in final syllables. Differences in positional variation across vowels in Figure 5 are consistent with the significant interaction between vowel and syllable number [F(14, 5331.2) = 30.35, p < .001].

Figure 5
Figure 5

|F2| (z) by position for each vowel quality in words derived from monosyllabic roots.

Since |F2| differs across syllables, the question is whether positional variation is incremental or binary. First, the pairwise comparisons in Table 7 support a binary distinction between initial and non-initial vowels. Recall from above that a binary distinction predicts differences between syllables one and two, and syllables one and three, but no difference between syllables two and three. In contrast, an incremental model of reduction predicts that all pairwise comparisons should be significant. Of the eight syllable one versus syllable two comparisons, six were significant, with decreased |F2| in second-syllables. Similarly, six of eight syllable one versus syllable three comparisons were also significant and in the right direction. In contrast, only one of eight syllable two versus syllable three comparisons was significant. This supports the conclusion that initial-syllable vowels are realized with more peripheral |F2|, while non-initial vowels, regardless of syllable number, are realized with similar |F2|.

Table 7

Pairwise comparisons for |F2| (z) of vowels by position in words elicited from monosyllabic roots.

Vowel Syllables Estimate t p Significance
ɑ 1–2 0.22 7.29 <.001
1–3 0.26 7.44 <.001
2–3 0.04 1.67 .09
e 1–2 0.36 12.57 <.001
1–3 0.33 9.05 <.001
2–3 –0.03 –1.02 .31
i 1–2 0.23 6.81 <.001
1–3 0.33 7.48 <.001
2–3 0.09 2.55 .01
ɯ 1–2 –0.22 –7.36 <.001
1–3 –0.27 –7.27 <.001
2–3 –0.05 –1.66 .10
o 1–2 –0.02 –0.73 .46
1–3 0.02 0.61 .54
2–3 0.04 1.06 .29
ø 1–2 0.11 4.63 <.001
1–3 0.12 3.68 <.001
2–3 0.01 0.28 .78
u 1–2 0.15 5.01 <.001
1–3 0.32 7.65 <.001
2–3 0.17 4.53 <.001
y 1–2 0.17 5.16 <.001
1–3 0.16 3.79 <.001
2–3 –0.01 –0.28 .78

In addition to the effects of vowel quality and syllable number, duration exerted a significant effect on |F2| [Duration: F(1) = 101.63, p < .001]. The relationship between duration and |F2| is manifest in Figure 6. Note in Figure 6 that all vowels except /ø y/ generally show a positive correlation between duration and |F2|. As one might expect based on the vowel-specific differences in Figure 6, |F2| was significantly affected by the interaction between vowel quality and duration [F(7, 5334.0) = 16.23, p < .001]. However, the interaction of syllable number and duration was not significant [Syllable: Duration: F(2, 5330.7) = 0.88, p = .41]. The three-way interaction between vowel quality, syllable number, and duration significantly contributed to the prediction of |F2| [F(14, 5332.3) = 4.17, p < .001]. Finally, both preceding and following consonant place were significant predictors of |F2| [Preceding place: F(3, 5331.9) = 5.05, p = .002; Following place: F(3, 5338.6) = 59.48, p < .001].

Figure 6
Figure 6

Correlation between duration (centered, ms) and |F2| (z) by vowel quality and syllable number.

When an alternate model with position encoded binarily (initial versus non-initial) was compared with a model encoding position as a continuous variable, the binary model fit the data significantly better than the continuous model [ΔAIC = 108.0]. This is consistent with a binary rather than incremental distinction in position. This interpretation is further supported by the data in Figure 1 and Figure 5. In Figure 1 and Figure 5, the F2/|F2| differences between a given vowel phoneme in second and third syllables are typically very small, in contrast to the larger differences between initial- and second-syllable variants.

4.1.3. Summary

Results from Section 4.1.1–2 suggest that |F1| and |F2| vary according to position and duration. Further, results point toward a binary distinction between more peripheral initial-syllable vowels and reduced non-initial vowels. Also, the fact that increased duration variably correlates with increased |F1| and |F2| supports undershoot as a low-level factor in Kyrgyz, in addition to a more pervasive effect of position.

4.2. Disyllabic roots

Results from Section 4.1 predict that |F1| and |F2| of suffix vowels should be distinct from initial-syllable vowels in words derived from disyllabic roots. However, these words also allow for the comparison of root-internal vowels. If position is truly binary, this predicts that second-syllable vowels should be realized like suffix vowels, and not like initial-syllable vowels. If the effect is morphologically conditioned, then second-syllable vowels should be realized more like first-syllable vowels, since they are both root-internal. General results from this set of words are shown in Figure 7 (n = 3,979 vowels). Note that as above, F1 tends to increase in non-initial syllables. For F2, front vowels are characterized by decreasing F2 in non-initial syllables while back vowels are characterized by increasing F2 in non-initial syllables. For most vowels, observe that second-syllable means are more similar to third- and fourth-syllable means than first-syllable means. The only exception to this generalization is /e/. Recall also that no disyllabic roots with /u/ were produced, so there are no first- or second-syllable tokens of /u/ in Figure 7. Tentatively, the plot below supports the conclusion that surface vowel realization is not principally conditioned by morphology.

Figure 7
Figure 7

Mean F1 and F2 (z) by syllable number for words derived from disyllabic roots. Syllable number is marked by numerical subscripts.

Figure 8 presents duration results from the disyllabic root dataset; vowels are longer in second, third, and fourth syllables. The same generalization made concerning the monosyllabic roots holds when the disyllabic root data is considered—vowels are reduced in non-initial syllables even though they are longer. This further lends support to the conclusion that vowel reduction is not primarily due to undershoot in the language.

Figure 8
Figure 8

Duration (ms) by vowel and position in words derived from disyllabic roots.

4.2.1. |F1|

In addition to vowel quality [Vowel: F(7, 3903.6) = 834.2, p < .001], syllable number exerts a significant effect on |F1| in words derived from disyllabic roots [Syllable: F(3, 3904.2) = 36.82, p < .001]. The effect of syllable number is observable in Figure 9. Note that, as above, |F1| of high vowels tends to diminish in non-initial syllables, |F1| of mid vowels tends to increase, while |F1| of the low vowel does not show any immediately obvious trends. The significant interaction between vowel and syllable number reflects these specific differences [F(19, 3902.2) = 17.76, p < .001].

Figure 9
Figure 9

|F1| (z) by position for each vowel quality in words derived from disyllabic roots.

By-syllable comparisons for each vowel quality are shown in Table 8. Since words were one syllable longer in the disyllabic root dataset, an additional three pairwise comparisons were possible for each vowel quality (modulo the absence of root-internal /u/; Bonferroni-adjusted α = .05/43 = .001). Pairwise comparisons suggest the superiority of a binary analysis of reduction. Fourteen of twenty-one comparisons involving the initial syllable (i.e., the left half of the table) were significant and in the right direction, while only one comparison not involving the initial syllable (i.e., the right half of the table) was significant and in the right direction.

Table 8

Pairwise comparisons for |F1| (z) of vowels by position in words elicited from disyllabic roots.

Vowel Syll. Est. t p Sig. Syll. Est. t p Sig.
ɑ 1–2 0.28 6.15 <.001 2–3 0.02 0.55 .58
1–3 0.30 6.23 <.001 2–4 –0.12 –1.90 .06
1–4 0.16 2.27 .02 3–4 –0.14 –2.16 .03
e 1–2 –0.07 –1.78 .08 2–3 –0.10 –2.78 .005
1–3 –0.17 –4.23 <.001 2–4 –0.05 –0.87 .38
1–4 –0.12 –1.91 .06 3–4 0.05 .81 .42
i 1–2 0.36 4.40 <.001 2–3 0.13 2.02 .04
1–3 0.49 6.77 <.001 2–4 0.14 1.48 .14
1–4 0.50 5.03 <.001 3–4 0.01 0.12 .90
ɯ 1–2 0.23 3.84 <.001 2–3 –0.21 –3.59 .003
1–3 0.02 0.33 .74 2–4 0.15 1.59 .11
1–4 0.38 4.12 <.001 3–4 0.36 4.00 .001
o 1–2 0.48 9.11 <.001 2–3 –0.07 –1.34 .18
1–3 0.41 6.57 <.001 2–4 –0.14 –1.59 .11
1–4 0.34 3.53 <.001 3–4 –0.07 –0.74 .46
ø 1–2 –0.13 –2.07 .04 2–3 0.13 2.36 .02
1–3 –0.00 –0.07 .94 2–4 –0.03 –0.47 .64
1–4 –0.17 –2.16 .03 3–4 –0.16 –2.38 .02
u 3–4 0.14 0.87 .39
y 1–2 0.57 7.55 <.001 2–3 0.18 2.62 .009
1–3 0.75 12.59 <.001 2–4 0.12 1.17 .24
1–4 0.70 7.13 <.001 3–4 –0.05 –0.61 .54

Duration was a significant predictor of |F1| [Duration F(1, 3833.7)= 22.25, p < .001]. However, the interaction between duration and syllable number was not significant [F(3, 3908.4) = 1.12, p = .34]. There was a significant interaction between vowel and duration, indicating that duration-based differences in |F1| are modulated by vowel quality [Vowel:Duration: F(7, 3903.3) = 5.67, p < .001]. The three-way interaction between vowel, syllable number, and duration was also significant [F(19, 3902.7) = 2.00, p < .01]. Lastly, following consonant place significantly contributed to model fit, while preceding consonant place did not [Following place: F(3, 3905.6) = 60.73, p < .001; Preceding place: F(3, 3903.2) = 1.39, p = .24].

To test whether positional variation for |F1| is best analyzed as incremental or binary among these words, two models were compared. These models were identical to the model above, except that in one, position was encoded as a continuous predictor while in the other, position was encoded binarily, as a distinction between initial and non-initial syllables. When variation for |F1| is compared in these two models, the model with binary position provided significantly better model fit [ΔAIC = 88.2].

4.2.2. |F2|

In the disyllabic dataset, |F2| was significantly affected by vowel quality [F(7, 3905.0) = 680.3, p < .001]. Syllable number also exerted a significant effect on |F2| [F(3, 3905.8) = 190.7, p < .001]. Figure 10 provides plots of each vowel across all positions. Observe that for all vowels except /ɯ/ non-initial positions exhibited smaller |F2| than their initial-syllable counterparts. Despite this general trend, there is notable variation between vowels. For instance, observe that second-syllable /e/ is more similar to initial-syllable /e/ while second-syllable /o/ is more similar to third- and fourth-syllable /o/. These vowel-specific effects are reflected by the significant interaction between vowel and syllable number [F(19, 3902.8) = 17.75, p < .001].

Figure 10
Figure 10

|F2| (z) by position for each vowel quality in words derived from disyllabic roots.

Duration was also a significant predictor of |F2| in this dataset [F(1, 3622.7) = 33.90, p < .001]. There were also a significant two-way interaction and between vowel and duration [F(7, 3904.7) = 4.90, p < .001]. The interaction between syllable number and duration was, however, not significant [F(3, 3909.9) = 1.19, p = .31]. Moreover, the three-way interaction between vowel, syllable number, and duration was significant [F(19, 3903.7) = 2.65, p < .001].

Recall from Section 4.1.2 that positional variation for |F2| was best analyzed as a binary distinction between initial and non-initial syllables rather than an incremental difference between each syllable. To test this for the data from disyllabic roots, pairwise comparisons in Table 9 support the superiority of equivalent reduction of all non-initial syllables. This binary account of positional variation predicts all comparisons involving the initial syllable (left side of the table) will be significant; all 21 are significant and in the right direction in Table 9. Moreover, this analysis predicts that all comparisons that don’t involve the initial syllable should be non-significant. Of the 22 comparisons on the right side of the table, only five are significant, and of these, only one is in the right direction.

Table 9

Pairwise comparisons for |F2| (z) of vowels by position in words elicited from disyllabic roots.

Vowel Syll. Est. t p Sig. Syll. Est. t p Sig.
ɑ 1–2 0.29 9.22 <.001 2–3 0.06 2.20 .03
1–3 0.35 10.49 <.001 2–4 0.05 1.04 .34
1–4 0.33 6.81 <.001 3–4 –0.02 –0.44 .66
e 1–2 0.15 5.91 <.001 2–3 0.10 4.09 <.001
1–3 0.26 9.33 <.001 2–4 0.07 1.61 .11
1–4 0.22 5.12 <.001 3–4 –0.04 –0.87 .38
i 1–2 0.69 12.20 <.001 2–3 0.06 1.34 .18
1–3 0.75 14.91 <.001 2–4 0.20 2.99 .003
1–4 0.89 12.89 <.001 3–4 0.14 2.33 .02
ɯ 1–2 0.57 13.45 <.001 2–3 –0.37 –9.02 <.001
1–3 0.20 4.56 <.001 2–4 –0.31 –4.87 <.001
1–4 0.25 3.96 <.001 3–4 0.06 0.94 .35
o 1–2 0.28 7.82 <.001 2–3 0.06 1.73 .08
1–3 0.35 8.09 <.001 2–4 –0.01 –0.19 .85
1–4 0.27 4.11 <.001 3–4 –0.07 –1.15 .25
ø 1–2 0.48 10.76 <.001 2–3 –0.21 –5.43 <.001
1–3 0.28 6.82 <.001 2–4 –0.19 –3.73 <.001
1–4 0.29 5.35 <.001 3–4 0.01 0.27 .78
u 3–4 0.03 0.27 .79
y 1–2 0.46 8.72 <.001 2–3 0.02 0.39 .69
1–3 0.48 11.48 <.001 2–4 0.02 0.26 .79
1–4 0.48 7.05 <.001 3–4 0.00 0.01 1

Furthermore, I tested the binary and incremental analyses of reduction via model comparison. Again, the model that treats position as binary provided significantly better model fit than the model with position encoded continuously [ΔAIC = 255.3]. When the data in Figure 10 is considered, this result is unsurprising. In Figure 10, there is a clear difference between mean |F2| of initial- and non-initial positions, but there is no clear difference between different non-initial means for each vowel quality.

4.2.3. Summary

Data from words deriving from disyllabic roots confirm the observations from Section 4.1, indicating that |F1| is reduced in non-initial syllables. As above, this positional shift produces a more open vowel quality than the zero value derived from normalization. In addition, results from Section 4.2 have further demonstrated a significant reduction of |F2| in non-initial syllables. In addition to positional effects, duration exerts a variable effect on vowel realization—shorter vowels are produced with less peripheral vowel qualities. Moreover, the specific pattern of reduction reported in Sections 4.1 and 4.2 supports a binary distinction between initial syllables, which are immune to reduction, and non-initial syllables, which are targets for reduction. Treating all non-initial vowels as equally reduced (or alternatively, initial-vowels as strengthened), is supported by both pairwise and model comparisons.

5. Discussion

5.1. General discussion

The acoustic properties of Kyrgyz vowels demonstrate significant positional variation. Two different patterns emerged in the data. For F1, non-low vowels exhibit higher F1 in non-initial positions. I have argued that this is centralization, with both high and mid vowels lowering toward a more open central vowel quality. However, this effect appears to have little effect on the low vowel. While F1 and |F1| of /ɑ/ does vary some, the evidence in Figure 4 is more consistent with low vowel raising as undershoot, since the correlation between duration and |F1| was most substantial for /ɑ/.

At this point, it is worthwhile to consider the role of stress. One might wonder if F1 increases are due to final stress in the language. Under this interpretation, stress induces vowel lowering, rather than centralization. Increased F1 typically correlates with increased intensity, a common manifestation of stress (Ladefoged, 2003). Crucially, lowering also occurs in non-final positions, as well, as seen in Figure 3 and Figure 9. The fact that lowering appears to occur in all non-initial syllables suggests that this is not stress-induced. More generally, the effect of stress on vowel F1 and F2 appears to be minimal. In some languages, stressed positions host a larger range of contrasts (e.g., in Italian, /ɛ/ and /ɔ/ are found in stressed syllables only). No such patterns are found in Kyrgyz, although this study has not investigated other potential acoustic correlates of stress, including intensity, f0 and its relation to intonation, or phonation differences. The fact that words were elicited in isolation precludes some of these, particularly the relationship between stress and intonation. Further work will likely have more specifics to contribute here; at present, I claim only that stress has no obvious effect on the first two formants.

The relative density of the vowel space, in particular the existence of several acoustically central vowels, /y ø ɯ/, may influence the extent of reduction in Kyrgyz. A densely packed central region of the vowel space could be dispreferred for functional reasons. At the same time, harmony may influence the extent of reduction, preventing more substantial reduction in non-initial syllables. The relationship between harmony and reduction is thus potentially unstable, as noted in Binnick (1991). If vowel harmony renders certain vowels more predictable, in turn these vowels may be more reduced. However, if vowel reduction is large enough, the effect of harmony can be obscured, the very point made in Pearce (2008, 2012), and even lost over time. Binnick (1991) argues that the result of harmony, i.e., predictability, sows the seeds for the eventual decay of the pattern. In many dialects of the related language Uzbek, vowel harmony has almost been entirely lost, and interestingly, non-initial vowels are generally limited to [ə] and [ɑ] (McCollum 2019b, ch. 4). Data from more languages would be necessary to really evaluate the connection between harmony and reduction, but the contrast between Kyrgyz and Uzbek is suggestive—harmony in Kyrgyz is robust, and reduction is smaller; harmony is generally absent in contemporary Uzbek, and reduction is larger.

5.2. The nature of reduction in Kyrgyz

The results above support a binary distinction between initial and non-initial vowels. Patterns of F1 and F2 variation clearly support a two-way distinction in Kyrgyz. F1 and F2 reduction, when taken together, also suggest that variation is not due to incremental reduction. As one further comparison, Johnson and Martin (2001) identify reduction in the size of the vowel space in Creek is reduced from initial to final positions by comparing the size of vowel polygons across positions. Thus, reduction is evident when the area of the vowel polygon in some position is smaller than in a known non-reduced position. If supralaryngeal declination results in a gradual diminution of the vowel space, then the polygons of the vowel space in Kyrgyz should contract in each successive syllable. Vowel polygons for all syllables are shown in Figure 11, and their areas are compared in Table 10. Since no root-internal tokens of /u/ were produced in the disyllabic root data, all /u/ were excluded from the rightmost polygons below. In Figure 11, the size of the vowel space stays relatively constant in all non-initial syllables, suggesting that the global pattern of variation does not result from supralaryngeal declination. In contrast, there is a notable difference between the initial-syllable polygon and the non-initial polygons. On average, the vowel space contracts by 21% in non-initial syllables of words derived from monosyllabic roots, and by 24% in non-initial syllables of words derived from disyllabic roots.

Figure 11
Figure 11

Vowel polygons from mean F1 and F2(z) by position for monosyllabic (left) and disyllabic (right) roots. Due to the lack of root-internal /u/ in the disyllabic root data, all /u/ is excluded from the right-hand plot. The acoustically central /ø ɯ/ are removed from both plots.

Table 10

Vowel polygon areas by position in monosyllabic and disyllabic roots for the polygons in Figure 11.

Monosyllabic roots Disyllabic roots
area (z2) area (z2)
σ1 4.23 σ1 3.51
σ2 3.27 σ2 2.57
σ3 3.40 σ3 2.77
σ4 2.70

If the reduction of vowel quality distinctions is binary rather than incremental, this supports either the initial strengthening or the predictability-based accounts. One point to be made here is that although variation has been discussed in terms of reduction, it is eminently plausible that the binary phonetic distinction between initial and non-initial syllables is more appropriately construed as enhanced versus non-enhanced positions. Although the distinction between initial and non-initial is generally consistent with an initial strengthening account, there are reasons to suspect other factors are at work. First, extant research has found relatively small effects of initial strengthening on initial-syllable vowels, especially when a consonant is in domain-initial position (Fougeron & Keating, 1997; Cho, 2005; Cho & Keating, 2009). Since most words collected contained an initial consonant, the effect of initial strengthening on initial-syllable vowels is likely small. In contrast, the acoustic differences reported above are quite large. In order to more definitively assess whether variation is due to initial strengthening, it would be necessary to examine the realization of domain-initial consonants. Since initial strengthening is often localized to the initial segment in a prosodic domain, it should trigger hyperarticulation of domain-initial consonants, as well.

A reviewer points out that some Bantu languages exert relatively large strengthening effects to both stem-initial consonants and immediately following vowels (Idiatov & Van de Velde, 2016). In other languages, stem- or word-initial syllables exhibit distributional asymmetries that suggest the phonologization of a pre-existing strengthening effect (Lionnet, 2017; Lionnet & Hyman, 2018). For instance, in Esimbi, the entire range of contrastive vowel qualities are licensed on the word-initial syllable; only high vowels occur elsewhere (Hyman, 1988; see also Walker, 2011). Also, despite the existence of strengthening effects in both the phonetics and phonology of these languages, in most if not all cases, these effects are linked to word-initial accent (Lionnet & Hyman, 2018, pp. 651–655). Since stress is final in Kyrgyz, such an analysis is less tenable.

While the size of the effects reported here is potentially inconsistent with some previous results on domain-initial strengthening, the Kyrgyz results fit quite well with a predictability-based account. Since the predictability of a given element, e.g., a word or phoneme, may partially condition its phonetic realization, the presence of harmony in the language may enable reduction of non-initial vowels. For instance, if the initial-syllable vowel is /ɑ/, then the next vowel is most likely to be /ɑ/ or /ɯ/, since the front vowels are generally banned after back vowels, and the round vowels are generally banned after unrounded vowels. In general, only two vowels readily occur after a given vowel, x—that same vowel x, or the vowel that differs from x only in its [high] feature. In a language with eight vowel qualities, like Kyrgyz, one might expect a given vowel to occur roughly 12% of the time, but once the quality of vowel x in syllable n is known, the likelihood of x in syllable n+1 increases dramatically. To illustrate this, the unigram probability of /ø/ in the 2016 one-million sentence corpus of Kyrgyz newspapers in the Leipzig Corpora Collection is 0.05 (Goldhahn et al., 2012). However, after /ø/, the probability of another /ø/ increases almost tenfold, to 0.50. This general trend holds not just for identical vowels, but for vowels that agree in backness and rounding with the preceding vowel. The unigram probability of /i/ is 0.11, but the probability of /i/ given a previous /e/ is 0.34, a threefold increase. This increase in predictability emerging from harmony renders vowels in non-initial positions susceptible to reduction. If cast in terms of Lindblom’s (1990) H&H theory, the listener, based on their history with the language, expects sequences of harmonic vowels. The speaker is able to exploit this expectation and reduce effort without compromising the efficient communication of the message.

5.3. The relationship between phonology and phonetics

Returning to Pearce (2008, 2012) and the claim that harmony produces triggers and targets that are phonetically indistinguishable, it is clear that F2 varies significantly by position in Kyrgyz. Underlying Pearce’s (2008, 2012) claim that harmony blocks reduction is an assumption that the substitution of phonological symbols exerts a limiting force on phonetic variation. Under this view though, the results reported in the previous section are unexpected.

It is not controversial that phonological knowledge plays a crucial role in phonetic realization. Backness and rounding harmony in Kyrgyz render backness and rounding predictable in non-initial syllables. If predictability motivates reduction, this offfers a relatively direct link between phonetic variation and phonological representations, in particular, theories of underspecification (e.g., Archangeli, 1988; Steriade, 1995). Typically, underspecified (non-initial) vowels are representationally distinguished in the phonology. As noted in previous work, this phonological distinction has potential phonetic ramifications, too (Zsiga, 1997; Lanfranca, 2012; see also Washington, 2006, 2008). Since targets of harmony are more predictable, and triggers are less predictable, this predicts that, all else being equal, triggers should not be phonetically reduced relative to targets. In other words, acoustic distinctions in the vowel space should be maximally preserved among triggers, but potentially reduced among targets. To state it differently, regardless of the direction of harmony, all else being equal, targets of harmony should never exploit a larger phonetic vowel space than triggers of harmony.

One related prediction of the proposed account is that disharmonic vowels should be realized with more peripheral vowel qualities than alternating vowels because they are not predictable based on context. This prediction critically differs from Zsiga’s (1997) prediction. Under Zsiga’s analysis, all else being equal, both alternating and non-alternating vowels should be indistinguishable. The predictability-based prediction is borne out in the language most closely related to Kyrgyz, Kazakh. In Kazakh, the comitative suffix has an invariantly front vowel, while the question enclitic in colloquial speech has an invariantly back vowel. McCollum (2019a) reports that F2 is more peripheral for each of these morphemes than that of alternating vowels in the language. This result from Kazakh further supports a predictability-based analysis of vowel reduction in Turkic.

6. Conclusion

This paper has investigated positional variation of Kyrgyz vowels, finding centralization of vowels in non-initial syllables. The nature and extent of this phonetic reduction suggests that prosodic effects are not the primary source of explanation. Rather, these results support the claim that reduction in languages with vowel harmony may depend on predictability. This proposal links reduction at the featural level with other forms of predictability-based phonetic reduction. The findings here are suggestive, offering harmony patterns as a new testing ground for examining theories of positional variation, as well as the relationship between phonology and phonetics.

Additional file

The additional file for this article can be found as follows:


A PDF file containing the full list of stimuli. DOI: https://doi.org/10.5334/labphon.247.s1


First of all, I would like to thank the Kyrgyz speakers who were kind enough to spend their time with me in Bishkek. I would also like to thank Marc Garellek, Sharon Rose, two anonymous reviewers, and Associate Editor Lisa Davidson for their feedback. Any errors are my own.

Funding Information

This work was supported by a University of California President’s Dissertation Year Fellowship.

Competing Interests

The author has no competing interests to declare.


Akaike, H. (1974). A new look at the statistical model identification. In Selected Papers of Hirotugu Akaike (pp. 215–222). Springer. DOI:  http://doi.org/10.1007/978-1-4612-1694-0_16

Archangeli, D. (1988). Aspects of underspecification theory. Phonology, 5(2), 183–207. DOI:  http://doi.org/10.1017/S0952675700002268

Aylett, M., & Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1), 31–56. DOI:  http://doi.org/10.1177/00238309040470010201

Bates, D., Machler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. DOI:  http://doi.org/10.18637/jss.v067.i01

Beckman, M. E., Edwards, J., & Fletcher, J. (1992). Prosodic structure and tempo in a sonority model of articulatory dynamics. In G. J. Docherty & D. R. Ladd (Eds.), Papers in Laboratory Phonology 2: Gesture, Segment, Prosody (pp. 68–86). Cambridge, UK: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511519918.004

Binnick, R. (1991). Vowel harmony loss in Uralic and Altaic. In G. William Boltz & C. Michael Shapiro (Eds.), Studies in the historical phonology of Asian languages (pp. 35–52). Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/cilt.77.03bin

Boersma, P. (1998). Functional phonology: Formalizing the interactions between articulatory and perceptual drives. Den HaagHolland Academic Graphics/IFOTT.

Boersma, P., & Weenink, D. (2015). Praat: Doing phonetics by computer (version 5.4.18).

Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication, 20(3–4), 255–272. DOI:  http://doi.org/10.1016/S0167-6393(96)00063-5

Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods & Research, 33(2), 261–304. DOI:  http://doi.org/10.1177/0049124104268644

Byrd, D. (2000). Articulatory vowel lengthening and coordination at phrasal junctures. Phonetica, 57(1), 3–16. DOI:  http://doi.org/10.1159/000028456

Cho, T. (2002). The effects of prosody on articulation in English. New York: Routledge.

Cho, T. (2005). Prosodic strengthening and featural enhancement: Evidence from acoustic and articulation realizations of /ɑ, i/ in English. Journal of the Acoustical Society of America, 117(6), 3867–78. DOI:  http://doi.org/10.1121/1.1861893

Cho, T. (2006). Manifestation of prosodic structure in articulatory variation: Evidence from lip kinematics in English. In L. Goldstein, D. H. Whalen & C. T. Best (Eds.), Laboratory Phonology (Vol. 8, pp. 519–548). Berlin: Mouton de Gruyter.

Cho, T., & Keating, P. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29(2), 155–190. DOI:  http://doi.org/10.1006/jpho.2001.0131

Cho, T., & Keating, P. (2009). Effects of initial position versus prominence in English. Journal of Phonetics, 37(4), 466–85. DOI:  http://doi.org/10.1016/j.wocn.2009.08.001

Cohen Priva, U. (2008). Using information content to predict phone deletion. In N. Abner & J. Bishop (Eds.), Proceedings of the 27th West Coast Conference on Formal Linguistics (pp. 90–98). Somerville, MA: Cascadilla Proceedings Project. Retrieved from http://www.lingref.com/cpp/wccfl/27/paper1820.pdf

De Jong, K. J. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. The Journal of the Acoustical Society of America, 97(1), 491–504. DOI:  http://doi.org/10.1121/1.412275

Delattre, P. (1969). An acoustic and articulatory study of vowel reduction in four languages. IRAL-International Review of Applied Linguistics in Language Teaching, 7(4), 295–326. DOI:  http://doi.org/10.1515/iral.1969.7.4.295

Edwards, J., Beckman, M. E., & Fletcher, J. (1991). The articulatory kinematics of final lengthening. Journal of the Acoustical Society of America, 89(1), 369–82. DOI:  http://doi.org/10.1121/1.400674

Flemming, E. (2001). Scalar and categorical phenomena in a unified model of phonetics and phonology. Phonology, 18(1), 7–44. DOI:  http://doi.org/10.1017/S0952675701004006

Fougeron, C., & Keating, P. (1997). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America, 101(6), 3728–3740. DOI:  http://doi.org/10.1121/1.418332

Fulop, S. A., Kari, E., & Ladefoged, P. (1998). An acoustic study of the tongue root contrast in Degema vowels. Phonetica, 55(1–2), 80–98. DOI:  http://doi.org/10.1159/000028425

Gafos, A. I. (1999). The Articulatory Basis of Locality in Phonology. New York: Garland.

Gahl, S., Yao, Y., & Johnson, K. (2012). Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech. Journal of Memory and Language, 66(4), 789–806. DOI:  http://doi.org/10.1016/j.jml.2011.11.006

Gendrot, C., & Adda-Decker, M. (2006). Is there a universal impact of duration on formant frequency values of oral vowels? An automated analysis of speech from eight languages. Laboratory Phonology, X, 53–56.

Goldhahn, D., Eckart, T., & Quasthoff, U. (2012). Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (pp. 31–43). Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/pdf/327_Paper.pdf

Goldsmith, J. A., & Riggle, J. (2012). Information theoretic approaches to phonological structure: The case of Finnish vowel harmony. Natural Language & Linguistic Theory, 30, 859–96. DOI:  http://doi.org/10.1007/s11049-012-9169-1

Guion, S. G., Post, M. W., & Payne, D. L. (2004). Phonetic correlates of tongue root vowel contrasts in Maa. Journal of Phonetics, 32(4), 517–42. DOI:  http://doi.org/10.1016/j.wocn.2004.04.002

Harris, K. S. (1978). Vowel duration change and its underlying physiological mechanisms. Language and Speech, 21(4), 354–361. DOI:  http://doi.org/10.1177/002383097802100410

Hebert, R., & Poppe, N. (1963). Kirghiz Manual. Uralic and Altaic Series 33. Indiana University Press.

Hyman, L. M. (1988). Underspecification and vowel height transfer in Esimbi. Phonology, 5(2), 255–273. DOI:  http://doi.org/10.1017/S0952675700002293

Idiatov, D., & Van de Velde, M. (2016). Stem-initial accent and C-emphasis prosody in North-Western Bantu. Paper presented at The 6th International Conference on Bantu Languages.

Johanson, L. (1998). The structure of Turkic. In L. Johanson & E. A. Csato (Eds.), The Turkic Languages (pp. 30–66). New York: Routledge.

Johnson, K., & Martin, J. (2001). Acoustic vowel reduction in Creek: Effects of distinctive length and position in the word. Phonetica, 58(1–2), 81–102. DOI:  http://doi.org/10.1159/000028489

Jurafsky, D., Bell, A., Gregory, M., & Raymond, W. D. (2001). Probabilistic relations between words: Evidence from reduction in lexical production. In J. L. Bybee & P. L. Hopper (Eds.), Typological Studies in Language (pp. 229–254) Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/tsl.45.13jur

Kara, D. S. (2003). Kyrgyz. Munich: Lincom Europa.

Kaun, A. R. (1995). The typology of rounding harmony: An Optimality Theoretic Approach (Doctoral dissertation, University of California, Los Angeles, CA). Retrieved from http://roa.rutgers.edu/article/view/238

Kirchner, M. (1998). Kirghiz. In L. Johanson & E.A. Csato (Eds.), The Turkic Languages (pp. 344–356). New York: Routledge.

Klatt, D. H. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3(3), 129–140. DOI:  http://doi.org/10.1016/S0095-4470(19)31360-9

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). LmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13). DOI:  http://doi.org/10.18637/jss.v082.i13

Ladefoged, P. (2003). Phonetic data analysis: An introduction to fieldwork and instrumental techniques. Malden, MA: Blackwell.

Lanfranca, M. (2012). An acoustic study of underspecified vowels in Turkish (Master’s thesis, University of Kansas, Lawrence, KS). Retrieved from https://kuscholarworks.ku.edu/handle/1808/10635.

Lenth, R., Singmann, H., & Love, J. (2018). Emmeans: Estimated marginal means, aka least-squares means. R Package Version 1(1). Retrieved from https://cran.r-project.org/web/packages/emmeans/emmeans.pdf

Lindblom, B. (1963). Spectrographic study of vowel reduction. Journal of the Acoustical Society of America, 35(11), 1773–1781. DOI:  http://doi.org/10.1121/1.1918816

Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech Production and Speech Modelling (pp. 403–439). Dordrecht: Kluwer. DOI:  http://doi.org/10.1007/978-94-009-2037-8_16

Lionnet, F. (2017). Stem-initial prominence in West and Central Africa: Niger-Congo, Areal, or Both? Paper presented at The 48th Annual Conference on African Linguistics.

Lionnet, F., & Hyman, L. M. (2018). Current issues in African phonology. In Tom Güldemann (Ed.), The Languages and Linguistics of Africa (Vol. 11, pp. 602–708). Berlin: Mouton de Gruyter. DOI:  http://doi.org/10.1515/9783110421668-006

Lobanov, B. M. (1971). Classification of Russian vowels spoken by different speakers. Journal of the Acoustical Society of America, 49(4B), 606–8. DOI:  http://doi.org/10.1121/1.1912396

Luce, P. A. (1986). Neighborhoods of words in the mental lexicon. Research on Speech Perception, Technical Report, 6, 1–91. Retrieved from https://eric.ed.gov/?id=ED353610

McCollum, A. G. (2015). Labial harmonic shift in Kazakh: Mapping the pathways and motivations for decay. In A. E. Jurgensen, H. Sande, S. Lamoureux, K. Baclawski & A. Zerbe (Eds.), The Proceedings of the 41st Annual Meeting of the Berkeley Linguistics Society (Vol. 41, 329–52. DOI:  http://doi.org/10.20354/B4414110012

McCollum, A. G. (2019a). The empirical consequences of data collection methods: A case study from Kazakh vowel harmony. Linguistic Discovery, 16(2), 72–110. DOI:  http://doi.org/10.1349/PS1.1537-0852.A.495

McCollum, A. G. (2019b). Gradience and locality in phonology: Case studies from Turkic vowel harmony (Doctoral dissertation, University of California San Diego, San Diego, CA). Retrieved from https://escholarship.org/uc/item/7sx31303

McCollum, A. G. (2019c). Gradient morphophonology: Evidence from Uyghur vowel harmony. In Proceedings of the Annual Meeting on Phonology. San Diego, CA. DOI:  http://doi.org/10.3765/amp.v7i0.4565

Mooshammer, C., & Geng, C. (2008). Acoustic and articulatory manifestations of vowel reduction in German. Journal of the International Phonetic Association, 38(2), 117–136. DOI:  http://doi.org/10.1017/S0025100308003435

Ní Chiosáin, M., & Padgett, J. (2001). Markedness, segmental realization, and locality in spreading. In Linda Lombardi (Ed.), Segmental Phonology in Optimality Theory: Constraints and Representations (pp. 118–156). Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511570582.005

Nord, L. (1986). Acoustic studies of vowel reduction in Swedish. In Quarterly Progress and Status Report, 4, 19–36. KTH Royal Institute of Technology in Stockholm. Retrieved from https://api.semanticscholar.org/CorpusID:52201781

Nowak, P. M. (2006). Vowel reduction in Polish (Doctoral dissertation, University of California, Berkeley, CA). Retrieved from https://escholarship.org/uc/item/1vh204j4

Öhman, S. (1967). Word and sentence intonation: A quantitative model. In Quarterly Progress and Status Report, 8, 20–54. KTH Royal Institute of Technology in Stockholm. Retrieved from https://api.semanticscholar.org/CorpusID:59753878

Pearce, M. (2008). Vowel harmony domains and vowel undershoot. In UCL Working Papers in Linguistics, 20, 115–140.

Pearce, M. (2012). Effects of harmony on reduction in Kera. Linguistic Variation, 12(2), 292–320. DOI:  http://doi.org/10.1075/lv.12.2.05pea

Pisoni, D. B., Nusbaum, H. C., Luce, P. A., & Slowiaczek, L. M. (1985). Speech perception, word recognition and the structure of the lexicon. Speech Communication, 4(1–3), 75–95. DOI:  http://doi.org/10.1016/0167-6393(85)90037-8

R Core Team. (2017). R: A language and environment for statistical computing (version 3.4.2). R foundation for statistical computing.

Rose, S., & Walker, R. (2011). Harmony systems. In J. A. Goldsmith, J. Riggle & A. C. L. Yu (Eds.),The Handbook of Phonological Theory, Second Edition (pp. 240–290). Malden, MA: Blackwell. DOI:  http://doi.org/10.1002/9781444343069.ch8

Scarborough, R., & Zellou, G. (2013). Clarity in communication: ‘Clear’ speech authenticity and lexical neighborhood density effects in speech production and perception. The Journal of the Acoustical Society of America, 134(5), 3793–3807. DOI:  http://doi.org/10.1121/1.4824120

Seyfarth, S. (2014). Word informativity influences acoustic duration: Effects of contextual predictability on lexical representation. Cognition, 133(1), 140–155. DOI:  http://doi.org/10.1016/j.cognition.2014.06.013

Steriade, D. (1995). Underspecification and markedness. In J. A. Goldsmith (Ed.), The Handbook of Phonological Theory (pp. 114–174). Malden, MA: Blackwell. DOI:  http://doi.org/10.1111/b.9780631201267.1996.00006.x

Svantesson, J. (1985). Vowel harmony shift in Mongolian. Lingua, 67(4), 283–327. DOI:  http://doi.org/10.1016/0024-3841(85)90002-6

Tabain, M. (2003). Effects of prosodic boundary on /AC/ sequences: Acoustic results. Journal of the Acoustical Society of America, 113(1), 516–31. DOI:  http://doi.org/10.1121/1.1523390

Tiffany, W. R. (1959). Nonrandom sources of variation in vowel quality. Journal of Speech and Hearing Research, 2(4), 305–317. DOI:  http://doi.org/10.1044/jshr.0204.305

Toktonaliev, K. T., (Ed.) (2015). Kyrgyz tilinin zhazma grammatikasy: Azyrky Kyrgyz Adabij Tili. Bishkek: Avrasia Press.

Turnbull, R. (2015). Assessing the listener-oriented account of predictability-based phonetic reduction (Doctoral dissertation, The Ohio State University, Columbus, OH). Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=osu1429796768

Van Son, R. J. J. H., & Pols, L. C. W. (1999). An acoustic description of consonant reduction. Speech Communication, 28(2), 125–140. DOI:  http://doi.org/10.1016/S0167-6393(99)00009-6

Van Son, R. J. J. H., & Pols, L. C. W. (2003). Information structure and efficiency in speech production. In Eighth European Conference on Speech Communication and Technology. Retrieved from https://www.isca-speech.org/archive/archive_papers/eurospeech_2003/e03_0769.pdf

Vayra, M., & Fowler, C. A. (1992). Declination of supralaryngeal gestures in spoken Italian. Phonetica, 49(1), 48–60. DOI:  http://doi.org/10.1159/000261902

Walker, R. (2011). Vowel Patterns in Language. Cambridge: Cambridge University Press. DOI:  http://doi.org/10.1017/CBO9780511973710

Washington, J. N. (2006). Root vowels and affix vowels: Height effects in Kyrgyz vowel harmony. Manuscript, Western Washington University. Retrieved from http://jnw.name/dl/papers/2006wi-kgvowels.pdf

Washington, J. N. (2008). Positional effects on the vowel space in Kyrgyz. Manuscript, Indiana University.

Washington, J. N. (2016). An investigation of vowel anteriority in three Turkic languages using ultrasound tongue imaging (Doctoral dissertation, Indiana University, Bloomington, IN). Retrieved from https://www.proquest.com/docview/1824673636?accountid=13626. ProQuest document ID 1824673636

Wightman, C. W., Shattuck-Hufnagel, St., Ostendorf, M., & Price, P. J. (1992). Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America, 91(3), 1707–17. DOI:  http://doi.org/10.1121/1.402450

Yunusaliev, B. M. (1966). Kirgizskij yazyk. In V. V. Vinogradov, B. A. Serebrenikov, N. A. Baskakov, Yu. D. Deshiriev & V. F. Belyaev (Eds.), Yazyki Narodov SSSR, Vol 2: Tjurkskie yazyki (pp. 482–505). Leningrad: Nauka.

Zsiga, E. (1997). Features, gestures, and Igbo vowels. Language, 73(2), 227–74. DOI:  http://doi.org/10.2307/416019