In this paper we consider the effects that lexical stress can have on the same consonants in different languages. In particular we are concerned with whether the effect of lexical stress on individual consonants is moderated by the overall make-up of the phoneme inventory. To this end, we focus on the oral stop burst, in 3-, 4-, 5-, and 6-place systems. We compare the well-studied 3-place stop system of English /p t k/ with the 4-place system of the Indonesian language Makasar /p t c k/, with the 5- and 6-place systems of three Central Australian languages—Pitjantjatjara and Warlpiri /p t ʈ c k/ and Arrernte /p t̪ t ʈ c k/. Relative to English, Makasar has an extra alveo-palatal place of articulation (henceforth simply palatal); in their turn, Pitjantjatjara and Warlpiri also have an extra retroflex place of articulation; and Arrernte in its turn has an extra dental place of articulation. Table 1 shows the relevant places of articulation for each language. We provide a little more detail on the individual languages further below. We first begin with a brief consideration of the issues of articulatory prosody, and of how they might interact with the size and make-up of the phoneme inventory.
1.1 Articulatory prosody and the role of contrast
It is well established that suprasegmental prosodic structure has an effect on segmental articulations, both at the laryngeal level and at the supra-laryngeal level, and both in spectral terms and in temporal terms (for recent reviews see Fletcher, 2010, and Cho, 2011). Much work has focused on the articulatory consequences of prosodic structure, both on consonants and on vowels (e.g., Beckman & Edwards, 1994; Cho, 2005; Cho & Keating, 2009; de Jong, 1995; Fougeron & Keating, 1997; Tabain, 2003b; see Cho, 2011, for further references), although some work has looked at the acoustic, and subsequently perceptual, consequences of these segmental prosodic effects (e.g., Cho et al., 2007; Georgeton & Fougeron, 2014; Kuzla & Ernestus, 2011; Tabain, 2003a; Tabain & Perrier, 2005, 2007). These prosodic effects have also been studied in a variety of languages, including English, French, Korean, and Taiwanese (in addition to the papers already cited, see Keating et al., 2003).
In a recent study, we examined the effect of lexical stress on the spectral and temporal properties of stop bursts in Pitjantjatjara (Tabain & Butcher, 2015a). We found various effects of stress on stop burst duration, and on spectral properties of the stop burst (i.e., on spectral moments and on spectral tilt). Perhaps the most interesting effects were found for the apical contrast /t/ vs. /ʈ/ (alveolar versus retroflex). As in many languages with such a contrast, the phonemic contrast is neutralized in initial position (Evans, 1995; Hamann, 2003; Hamilton, 1996; Steriade, 2001). However, in Pitjantjatjara, word-initial position is the location for lexical stress (Tabain et al., 2014), which means that the prosodically prominent position in the word is the location of neutralization for the apical contrast. In general, the stop burst for apicals is characterized by a broad spectral peak from about 1–2 kHz to about 4 kHz, with a very sharp drop in energy at about 4 kHz. This sharp drop in energy is at a slightly lower frequency for the retroflex /ʈ/ than for the alveolar /t/.1 However, in the stressed but neutralized context, which we write /T/, the sharp drop in energy occurs at a slightly higher frequency than for either the retroflex or the alveolar (as seen in Figure 1 of Tabain & Butcher, 2015a). These subtle but consistent effects are reflected in spectral tilt, spectral centre of gravity, and spectral skewness measures, such that the retroflex has a lower spectral tilt and centre of gravity (and higher skewness) than the alveolar, while the neutralized but stressed /T/ has a higher spectral tilt and centre of gravity (and lower skewness) than both the unstressed retroflex and the unstressed alveolar.
In contrast to the apicals, the velar /k/ showed a shift in spectral balance to lower frequencies under lexical stress. In general, there was less spectral energy in the stressed context, particularly in the frequency range above about 2 kHz (though exact details vary greatly according to vowel context and speaker group). This was reflected in lower spectral tilt and centre of gravity, and higher spectral skewness. Consequently, the contrast between a “light” alveolar sound and a “dark” velar sound was enhanced in stressed position, with the spectral tilt, centre of gravity, and skewness measures showing effects in opposite directions for these sounds (i.e., higher spectral tilt and centre of gravity for /T/, and lower for /k/ under stress; lower spectral skewness for /T/ and higher for /k/ under stress). These results were interpreted as an enhancement of the feature [grave], where the velar is [+grave] and the alveolar is [–grave].
As regards the remaining stops of Pitjantjatjara, the palatal stop burst /c/ was relatively invariant across stressed and unstressed position, presumably due to its high degree of coarticulatory resistance (it is a laminal articulation; c.f. Recasens, 1990).2 Similarly, the bilabial /p/ also showed relatively little effect of stress context, presumably due to considerations of articulatory-to-acoustic mapping (i.e., the entire cavity behind the constriction, rather than the anterior cavity, is excited at the moment of stop release).
Thus far we have only considered spectral properties of the stop burst. However, additional effects according to place of articulation were observed for duration of the stop burst. For /c/ and /k/, and also to a lesser extent for /p/, the stop burst was significantly longer in stressed position than in unstressed position. This, however, was not the case for the apicals, where no lengthening was observed under stress. The stop burst duration for all of the apicals is very short—around 15–20 ms. We interpreted this result to mean that in initial, stressed position, the contrast between a long burst-duration stop such as /c/ or /k/, and a short burst-duration stop such as /t/ or /ʈ/, is enhanced by making the long stop bursts longer, and keeping the short stop bursts about the same (assuming that it is difficult to produce a stop burst much shorter than the values just cited when a full stop closure has been achieved). As a result, it would seem that stop burst duration is an important feature of stop place of articulation in multi-place systems such as Pitjantjatjara’s.
The question then arises, to what extent are these observations typical of other languages, and not just Pitjantjatjara? It has been observed that velar consonants in a non-front vowel context are articulated farther back in Australian languages than in European languages such as English or German, with consequences for formant transitions (Butcher & Tabain, 2004). It is not clear whether this is due to the presence of multiple coronals in the phoneme inventories of Australian languages, or whether it is due more precisely to the presence of the palatal phoneme. The observed “darkening” of the velar stop burst in stressed position could therefore be a contrastiveness strategy that only applies in languages that have a more posterior place of articulation, as Australian languages do. The question of why the alveolar stop burst is “lightened” in stressed position, however, does not follow this line of reasoning, since a higher spectral centre of gravity brings the alveolar centre of gravity value closer to that for the palatal phoneme. It could, however, be argued that the strong affrication typical of lamino-palatal articulations precludes this from being a problem.
We therefore choose to investigate the realization of the various stop burst phonemes under stress in languages with different phoneme inventories. It is well established that the exact make-up of the phoneme inventory places restrictions on the degree of coarticulation observed—for instance, vowel-to-vowel coarticulation may be limited in a part of the vowel space that is particularly crowded (Manuel, 1990). Relatedly, Butcher (2006) has argued that, for Australian languages, the place-of-articulation imperative places limitations on common coarticulatory processes, such as anticipatory vowel nasalization, which would compromise formant transition cues in languages with many places of articulation.
A similar principle can be applied in the domain of articulatory prosody. For instance, in an articulatory prosody study of the vowel /i/ in French, Tabain & Perrier (2005) observed tongue body strategies at prosodically strong boundaries that were reflected in significant changes to F3. They argued that this served as a feature enhancement of the unrounded aspect of this vowel, in a language where front vowels can be either rounded or unrounded. Similarly, in a follow-up study of /u/ in French, Tabain & Perrier (2007) found that speakers used a trade-off of lip- and tongue-movement strategies in order to keep F2 low at prosodically strong boundaries—this was argued to prevent confusion between back rounded /u/ and front rounded /y/. French is a language with a particularly rich set of monophthong vowel contrasts, and speakers therefore seek to enhance the contrast in prosodically strong positions.
We begin by providing an overview of the various languages studied here, including a brief description of their vowel and stress systems.
Arrernte (Breen & Dobson, 2005; Breen & Pensalfini, 1999; Henderson, 2013; Henderson & Dobson, 1994; Wilkins, 1989), Pitjantjatjara (Douglas, 1957, 1964; Goddard, 1986, 1993, 1996; Tabain & Butcher, 2014), and Warlpiri (Hale, 1995; Hoogenraad & Laughren, 2012; Nash, 1986) are all languages of Central Australia. Each language has about 2,000 speakers, and all are still being learned by infants. Arrernte is spoken in and around the administrative township of Alice Springs, while Pitjantjatjara is spoken at least 200 km southwest of Alice Springs (towards the Ayers Rock region), and Warlpiri at least 200 km northwest of Alice Springs (towards Yuendumu community). Pitjantjatjara is more accurately described as a dialect of the greater Western Desert Language, which occupies about one-sixth of the main Australian continent.
Australian Aboriginal languages are known for their relatively large number of coronal place contrasts (Dixon, 1980, 2002; Evans, 1995): most languages have either three or four coronal places of articulation. In any given language, the multiple coronal places of articulation typically extend across the oral stop, nasal, and lateral series. A 4-place system includes two apical consonants (i.e., produced with the tongue tip), which are alveolar and “retroflex” (i.e., post-alveolar) in place of articulation; and two laminal consonants (i.e., produced with the tongue blade, which in practice often involves both the tip and part of the tongue body), which are dental and (alveo-)palatal in place of articulation. Arrernte is a language with a 4-place contrast—namely lamino-dental, apico-alveolar, apico-postalveolar (retroflex), and lamino-(alveo)palatal. The other two languages studied here— Pitjantjatjara and Warlpiri—have three coronal places of articulation. Relative to Arrernte, they lack the lamino-dental place of articulation. Hence, the full inventory for Arrernte is /p t̪ t ʈ c k/, and the inventory for Pitjantjatjara and Warlpiri is /p t ʈ c k/. However, it is important to note that for the three languages with an apical contrast /t/~/ʈ/, this contrast is neutralized in initial position, to /T/, as discussed above for Pitjantjatjara. However, the consequences of this neutralization in relation to stress are different for Arrernte, as compared to Pitjantjatjara and Warlpiri—this will be discussed a little further below.
The Australian Aboriginal languages studied here do not have a laryngeal contrast on the stops—the stops are generally described as voiceless unaspirated, although burst/aspiration can be quite long for the palatal and the velar, and voicing may occur intervocalically.
Makasar is an Indonesian language, spoken in and around the city of Makassar (note relative spellings) on the southern part of the island of Sulawesi (Jukes, 2006). It has around 2 million speakers, but it is not a dominant language in socio-political terms. The particular dialect of English in the present study is Australian English. It has around 20 million speakers. Cox and Palethorpe (2007) offer a concise overview of this language variety.
It should be noted that both Makasar and English have a laryngeal contrast on the stops (broadly, voiceless aspirated vs. unaspirated for English, and voiced vs. voiceless for Makasar). As mentioned above, the Australian Aboriginal languages do not have such a contrast. It is also worth noting that for all four languages with a palatal consonant in this study, this consonant is lamino-alveo-palatal, with a noticeable affrication. We should also note in this respect that the English affricate /ʧ/ has many phonetic similarities to the alveo-palatal phonemes of the other languages—i.e., a period of closure followed by a strong palatal-like affrication. We have not included it here because it does not always pattern with the stops phonologically (e.g., in terms of syllable structure), but its existence in the phoneme inventory is a potential influence on the stop realizations in English.
1.2.1 Vowel systems
Although Arrernte may be described as having a three-vowel system /i a ə/, the high vowel /i/ has low lexical frequency and low functional load—see Henderson (2013) and Tabain and Breen (2011) for discussion of the complexities of the Arrernte vowel system, which more recently may also include /u/. Pitjantjatjara and Warlpiri have a clear three-vowel system /i, a, u/. They have an additional length contrast on the three vowels, but this length contrast is likewise of very low lexical frequency/functional load. Makasar has a five-vowel system /i e a o u/. Australian English has 18 vowels, including 12 monophthongs (usually in short-long pairs) /ɪ i: e e: æ ɐ ɐ: ɔ o: ʊ ʉ: ɜ:/; a schwa /ə/; and five diphthongs /æɪ ɑe oɪ æɔ əʉ/. Note that the short-long pairs for the low central vowel and the mid front vowel have the same quality.
Given the very different vowel spaces in these five languages, we choose here to focus only on the central vowel context following the stop burst (however, we did not control for the preceding context—see below for further information). For Arrernte, the following context includes the vowels /a ə/—it should be noted that both of these vowels can occur in both stressed and unstressed position in Arrernte. For Makasar, Pitjantjatjara, and Warlpiri, we use the /a a:/ vowel contexts. For English, we use /a a:/ and /ə/—however, note that in English, the low central vowels can only occur in stressed position, while the schwa vowel can only occur in unstressed position (contra Arrernte).
The realization of stress also varies between the languages studied here. The outline here is based on the references cited above for each language.
Stress in Arrernte is described as occurring on the second underlying VC syllable of the word (Breen & Pensalfini, 1999). All Arrernte words are deemed, underlyingly, to begin with a vowel, but the first vowel may or may not be realized on the surface. Stress thus occurs, on the surface, either on the second vowel, if the word begins with a vowel; or on the first vowel, if the word begins with a consonant. Recent work shows that stress in Arrernte is encoded phonetically by extra duration on the vowel, and on the preceding (i.e., CV) consonant, and also by a pitch peak on the vowel (Tabain, under revision; see also Tabain & Breen, 2011). However, the vowels behave differently under stress, with /a/ being relatively more affected than /ə/ (termed “elastic” versus “non-elastic” vowels). Finally, although Wilkins (1989, Section 188.8.131.52 and 2.2) refers to secondary stress, he does not elaborate on it, and no other texts on Arrernte make reference to secondary stress.
Pitjantjatjara and Warlpiri both have stress on the first syllable. Tabain, Fletcher, and Butcher (2014) have shown that in Pitjantjatjara, stress is encoded by duration, a word-tone on the first syllable, and possibly by loudness. There is little evidence for vowel reduction, and there is no evidence for secondary stress in the language. Pentland (2004) presents a study of stress and prosody in Warlpiri, likewise highlighting the importance of duration, and focusing in particular on the role of consonantal lengthening.
Stress in Makasar is usually on the penultimate syllable, though a significant subset of words have stress on the antepenultimate syllable (these are words that have undergone a process called Echo-VC; see Jukes, 2006, for details). Tabain & Jukes (2016) show that stress is encoded by duration, pitch, and loudness, with no evidence for vowel reduction. There is no secondary stress (except in reduplications).
English stress has been very well studied; see Fletcher (2010) for a recent overview. In summary, English stress is encoded by duration, pitch, and loudness, but unlike the other languages studied here, it is accompanied by a categorical vowel reduction in unstressed syllables. It is also the only language studied here that has a clear secondary stress. In the present study, we do not make a distinction between the primary stressed and the secondary stressed syllables of English—all full low central vowels are coded as stressed, and all schwa vowels are coded as unstressed.
An important caveat is needed regarding the interaction of stress with apical consonants in Warlpiri and Pitjantjatjara. As mentioned above, for Arrernte, Warlpiri, and Pitjantjatjara, the apical contrast /t/~/ʈ/ is neutralized in initial position, to /T/. However, as just mentioned, in Pitjantjatjara and Warlpiri, the first syllable is the stressed syllable. This means that for both of these languages, the stressed apical is also the neutralized apical /T/, and the non-neutralized /t/ and /ʈ/ only occur in unstressed position.
In the case of Arrernte, however, it is possible for both /t/ and /ʈ/ to occur in both stressed and unstressed position, since many words begin with a vowel, even on the surface, and stress is on the second underlying vowel. (Neutralized /T/ does occur in our database for Arrernte, but it is not included in this study since it is relatively infrequent).
For further details on possible word structures in the languages studied here, including some of the words used in the present study, the reader is referred to the Illustrations of the IPA published for Arrernte (Breen & Dobson, 2005), Australian English (Cox & Palethorpe, 2007), Makasar (Tabain & Jukes, 2016), and Pitjantjatjara (Tabain & Butcher, 2014). Indeed, the Makasar and Pitjantjatjara Illustrations form part of the data presented here. Examples of the English words used in the present study, as well as additional examples of Arrernte words, can be found in Tabain (2011), and examples of Warlpiri words can be found in Tabain, Fletcher, and Butcher (2011)—both of these are articulatory studies and so only present a subset of the words used in the current acoustic study.
Before we present our data and results, a few words are needed regarding enhancement of spectral contrasts. We have referred to the apicals /t/ and /ʈ/ of Pitjantjatjara becoming “lighter” under stress, and the velar /k/ becoming “darker” (in terms of spectral centre of gravity, which is the first spectral moment). We suggested that this might represent enhancement of the feature [grave] under stress, the apicals being [–grave] and the velar being [+grave] (here we see [grave] as being the counterpart of [acute]). We also pointed out that the palatal /c/ does not appear to show any spectral enhancement under stress. One other feature that deserves mention here is [diffuse] (whose counterpart is [compact]), which Tabain (2012) has suggested characterizes the lamino-dental /t̪/ spectrum of Arrernte. In that study, diffuseness was measured using the second spectral moment, with the dental having a higher value on this measure than the other three coronals in this language. We therefore explore this second spectral measure in the current study as well, in order to determine if it too plays a role in spectral enhancement under stress.
2.1 Speakers and recordings
Data are presented for 28 speakers from the five languages: seven speakers of Arrernte, five speakers of English,3 seven speakers of Makasar, nine speakers of Pitjantjatjara, and five speakers of Warlpiri. Four of the Arrernte speakers, six of the Pitjantjatjara speakers, and all of the Warlpiri speakers were recorded in their home communities in about 1990 by author AB. The six Pitjantjatjara speakers were recorded to cassette tape in their home community of Ernabella, South Australia. The Arrernte and Warlpiri speakers recorded by author AB were recorded to reel-to-reel tape at the Institute for Aboriginal Development in Alice Springs.4 In addition, one Arrernte speaker was recorded to reel-to-reel tape by author GB at the Central Australian Aboriginal Media Association (CAAMA) radio studio in Alice Springs in the early 1980s (this recording was for Arrernte teaching purposes); and two Arrernte speakers and three Pitjantjatjara speakers were recorded by author MT in professional-grade recording studios, direct to computer under the supervision of a professional recording technician (at Macquarie University in Sydney, in 2004 for Arrernte, using a Bruel & Kjaer microphone; and at La Trobe University in Melbourne, in 2010 for Pitjantjatjara, using a Neumann U87 microphone). The English speakers were recorded at the same time as the Arrernte speakers at the Macquarie University recording studio. Six of the Makasar speakers were recorded in a quiet office at the Department of Fisheries on the outskirts of Makassar in 2013 by author AJ, using a Sony PCM-M10 solid state recorder. One speaker of Makasar was also recorded at the La Trobe University studios in 2013. The sample rate for all digital recordings was 44.1 kHz with 24 bits per sample in WAV format.
Most of the speakers of the Australian Aboriginal languages were female, with one male for each language. For English and Makasar, the split between male and female was more even (three females for each language, with two males for English and four males for Makasar). No speaker-normalization was carried out for these data.
We are aware that the differences in recording medium (analog vs. digital) have consequences for the calculation of spectral moments (see below for details of these)—visual inspection of spectra showed a noticeable drop in energy at around 5–6 kHz in the analog recordings that was not present in the digital recordings. However, the patterns according to stop place were examined for each speaker separately, and there was no reason to believe that this recording difference affected the relative pattern of results (e.g., alveolar vs. retroflex, or stressed vs. unstressed). It should be pointed out here that the extremely remote desert locations where the Australian languages are spoken, as well as their endangered status, makes the collection of any sort of language data problematic. We therefore chose to keep what recordings were available to us, especially in light of the fact that many speakers who had participated were senior figures in the maintenance of language in their communities, and their contribution was valued by the community. 5
2.2 Stimuli and measures
The data for this study come from recordings conducted for general phonetic descriptive purposes for each of the languages studied here. Stimuli consisted of single words which were repeated by the speaker three times in a row, without carrier phrase (or twice in a row in the case of the Arrernte speaker from the 1980s). The list of words was designed to illustrate the sounds of each language in different positions in the word (i.e., word-initial, -medial, and, where permitted, -final), and in different vowel contexts. Dis-fluent tokens were discarded. The English wordlist was specifically designed to mimic the wordlists in use for the study of Aboriginal languages at the time. However, for Makasar, speakers read a list of the Swadesh words used in historical linguistic research, supplemented by a list of words designed to illustrate particular phonemic contrasts. As a result, the Makasar wordlist featured much more common words than either the English or Aboriginal languages lists.
As mentioned above, we only analyze stop bursts where the following vowel is a central vowel. Given the real-word stimuli for all of these languages, it would not have been possible to make meaningful comparisons with both preceding and following contexts controlled for. As a result, the preceding context in the present study may be the left edge of the word, any vowel of the language, or any consonant of the language (homorganic or hetero-organic). For this reason, we take preceding context into account in our statistical analyses, outlined further below. For further details on possible word structures in the language, including some of the words used in the present study, the reader is referred to the Illustrations of the IPA published for Arrernte (Breen & Dobson, 2005), Australian English (Cox & Palethorpe, 2007), Makasar (Tabain & Jukes, 2016), and Pitjantjatjara (Tabain & Butcher, 2014). Examples of the English words used in the present study, as well as additional examples of Arrernte words, can be found in Tabain (2011).
The offset of the stop burst in the present study was located at the onset of voicing for the following vowel. As a result, the stop burst total duration includes any aspiration or affrication, in addition to the initial transient.
Spectral analyses of the data were based on a 10 ms Hamming windowed Fast Fourier Transform (FFT),6 centred at stop release. Due to this very short window, no portion of the vowel was included in the FFT spectrum, even for the shortest stop bursts. Spectral tilt (a regression on the amplitude values as a function of frequency) and the four spectral moments (centre of gravity, standard deviation or variance, skewness, and kurtosis) were calculated in the frequency range 1–6 kHz. Spectral moments capture gross characteristics of the spectrum as may be encoded by the human auditory system (Forrest et al., 1988). However, as should be clear from the discussion of the Tabain and Butcher (2015a) Pitjantjatjara study in the Introduction section, spectral tilt, centre of gravity, and skewness simply patterned together as regards stress effects on burst spectra, and kurtosis did not prove especially useful. For this reason and for the sake of economy given the larger number of languages here, we choose to present only the first two spectral moments in the present study. The first spectral moment, more commonly called spectral centre of gravity (CoG), is the average frequency in a given range (here 1–6 kHz), as weighted by the intensity values in each frequency bin. The second spectral moment gives the variance around the mean—in the present study we convert this to standard deviation, in order to re-produce Hertz values. It should be noted that standard deviation results were not presented in the study of stop burst spectra under stress in Pitjantjatjara, since it was not found to be useful for stress in that language. However, standard deviation has proven useful in previous work on jaw movement and stop bursts in Arrernte, where it is used to separate the lamino-dental stop /t̪/ from the three other coronal stops /t ʈ c/ (Tabain, 2012). We therefore chose to include the second spectral moment in the present study, given its established usefulness in separating out the dental stop from the other coronals in Arrernte. However, although we present plots of standard deviation for all of the languages, we only table statistical results for Arrernte, which was the only language to have more than one significant result for this measure.
For the sake of comparison of these derived measures with the raw data, we also present averaged FFT spectra for the recordings of Arrernte, English, and Makasar (the reader is referred to Tabain & Butcher, 2015a, for spectral plots of Pitjantjatjara). The English and Makasar data were all recorded digitally, but as mentioned above, only two of the Arrernte speakers were recorded digitally. For this reason, we chose to present the analog data recorded by author AB for Arrernte, since this represents recordings of the most speakers for this language (four). We thought it unwise to combine spectra from different recording media.
Finally, duration values were also calculated for the stop burst, given the different behaviours of the apical consonants vs. the bilabial, palatal, and velar consonants of Pitjantjatjara (i.e., no lengthening for stop bursts under stress for the apicals).
All labelling for this study was carried out using the Emu speech labelling tool (http://emu.sourceforge.net/). The spectral and durational analyses were conducted using EMU/R version 4.2 or higher, interfaced with the R statistical package version 2.15 or higher (R Core Team, 2014). The moments() function of EMU/R was used for the spectral moments analyses.
2.3 Statistical analysis
Linear mixed effects models were used to examine the duration, centre of gravity (CoG), and standard deviation results using the lme() function of the nlme package in R (R Core Team, 2014). This function also estimates t- and p-values. LME models were calculated for each language and each consonant separately, since the interest here is in how each language behaves in its own right. The fixed factor was Prosodic Context (either Strong or Weak—meaning stressed or unstressed). The random factors were Speaker and Preceding Context.7 Preceding Context was coded as one of eight categories: Word-Initial Position, Central Vowel, Front Vowel, Back Vowel, Labial Consonant, Apical Consonant, Laminal Consonant, and Velar Consonant. Note that the Word-Initial Position included the glottal stop for Makasar (since both silence and a glottal consonant were considered to have minimal effect on supralaryngeal articulation). The Apical Consonants were alveolar or retroflex, and the Laminal Consonants were dental or palatal. Note that the preceding consonants may have had different manners of articulation (stop, nasal, lateral, or rhotic). Some of the preceding vowels in English were diphthongs, and in this case the second vowel target determined the frontness/backness categorization for the random factor.
It should be noted that by presenting separate analyses for each consonant in each language, we are essentially presenting a series of pair-wise analyses (Strong vs. Weak), without first testing for an overall effect of consonant. This approach was adopted for two main reasons. Firstly, our preliminary investigations suggested that there was not enough statistical power in our study for the complex interactions between consonant and prosodic category to emerge. It will be recalled from the Introduction that different effects were expected for different consonants: for instance, in the case of spectral centre of gravity, we would expect /t/ to show a higher CoG in Strong (i.e., stressed) prosodic contexts, and /k/ to show a lower CoG in the same context. At the same time, we might expect /c/ to show no effect. Thus, an overall test looking at the effect of prosodic context on a particular measure across all consonants may miss an effect that would be found using a pair-wise approach.
Secondly, the interaction between prosodic context (Strong vs. Weak) and consonant place of articulation is not an equivalent interaction across the five languages. For two of the languages, Pitjantjatjara and Warlpiri, the apicals /t/ and /ʈ/ necessarily have the neutralized apical /T/ as their counterpart in the Strong (i.e., stressed) prosodic context, and the underlying phonemes /t/ and /ʈ/ are only realized in Weak prosodic positions. However, this is specifically not the case for English and Makasar, where there is no neutralization for any of the consonants. The final language, Arrernte, presents a third possibility, where the two apicals may have the neutralized /T/ as their Strong prosodic context counterpart in absolute initial position, but may also be realized as their underlying phonemes /t/ and /ʈ/ if a word-initial vowel is available (see Wilkins, 1989, and Henderson, 2013, for discussion of the intricacies of whether Arrernte initial vowels are realized or not). In the present analyses we ignore the neutralized /T/ in Arrernte and focus on the non-neutralized phonemes; however, neutralization is nevertheless a possibility in the Arrernte phonological system.
Given the above circumstances a Bonferroni correction of the default significance level of p < 0.05 was applied in the present study, giving a significance level of p < 0.002 (0.05 divided by 23 analyses: 3 English + 4 Makasar + 5 Pitjantjatjara + 5 Warlpiri + 6 Arrernte). A trend was defined at p < 0.05. However, the reader is alerted to the fact that effects of prosodic category are much more subtle for spectral measures than for temporal measures. As noted by Fletcher (2010), prosodic categories are often encoded via duration, pitch, and intensity, and as such the effect of prosodic category on stop burst duration can be expected to be quite robust. By contrast, as noted in Tabain and Butcher (2015a), spectral contrasts are primarily used to encode phonemic (e.g., stop place) contrasts, and any prosodic contrasts are expected to be ancillary to this. We would therefore expect that, a priori, more data would be required for a spectral effect to be recorded as significant than for a durational effect to be recorded as significant.
Given this statistical approach, we necessarily conducted pair-wise analyses for languages where there were no significant results on a particular measure (e.g., it will be seen below that Makasar shows no significant effects whatsoever for any measure, and Warlpiri shows no significant effects on the spectral measures). We nevertheless chose to report the results of the analyses for each consonant in each language, for the sake of completeness, and also since the LME models provide useful information on the estimated means for each prosodic category.
Finally, it should be noted that we do not make direct statistical comparisons between languages, or even between individual consonants—our sole concern is with Strong vs. Weak (i.e., stressed vs. unstressed) contrasts. This should be born in mind during the presentation of results.
Table 2 shows the number of tokens in the database. It can be seen that almost 10,000 tokens in total were analyzed. English has the smallest number of tokens, partly because there are only five speakers, and partly because there are only three stop places. There are more Weak tokens than Strong tokens (about 2.5 times as many). Makasar has a more even spread between Strong and Weak token numbers for /p t k/; however, in our database there were no Weak (unstressed) tokens of /c/. For this reason, we elected to use /ɟ/ instead for this language, for which we had a good number of Weak tokens. Nevertheless, there were still far more Strong tokens than Weak for /ɟ/ (187 vs. 50).
|Language & Speakers||p||t̪ ‘th’||t||ʈ ‘rt’||c (ɟ)||k||Total|
For Pitjantjatjara the relative number of Strong vs. Weak tokens depends on place of articulation. There are more Strong /p/ than Weak /p/ tokens, while the opposite is true for the three coronal consonants /t ʈ c/—the reader is reminded that the Strong /t/ and /ʈ/ are in fact neutralized /T/. For /k/, the number of tokens is more balanced for Strong and Weak contexts. These observed patterns are broadly the same for Warlpiri, with the exception that in Warlpiri there are more Weak tokens for /p/ than Strong tokens. There are also fewer retroflex /ʈ/ tokens for Warlpiri, and the numbers of Strong vs. Weak palatal tokens are roughly the same for this language.
Finally, for Arrernte as for Pitjantjatjara, there are more Strong /p/ tokens than Weak, but many more Weak tokens than Strong for /c/ and /k/. There is an imbalance for the apicals: there are more Strong tokens for alveolar /t/, but (about three times) more Weak tokens for retroflex /ʈ/. Notably, the number of Strong vs. Weak tokens is more balanced for the dental /t̪/.
As a general observation, there are far more /k/ and /p/ tokens than coronal tokens in the Australian Aboriginal languages, and /k/ also dominates English and Makasar (with /t/ also relatively common in Makasar).
Figure 1 shows the Duration, Centre of Gravity (CoG), and Standard Deviation (SD) results as plots of means and confidence intervals, for each language separately. Table 3 gives the results from the LME model.
|(a)||Burst (ms)||/p/||/t̪/ ‘th’||t||/ʈ/ ‘rt’||/c/ (ɟ)||/k/|
|English||df = 161||df = 174||df = 238|
|SE = 2.4||SE = 2.7||SE = 1.9|
|t = –12.10||t = –4.02||t = –10.73|
|p < 0.001||p < 0.001||p < 0.001|
|Makasar||df = 321||df = 436||df = 219||df = 556|
|SE = 0.7||SE = 0.6||SE = 1.6||SE = 0.8|
|t = 1.45||t = 1.02||t = 0.67||t = 0.88|
|p = 0.14||p = 0.30||p = 0.49||p = 0.37|
|Pitjantjatjara||df = 26||df = 25||df = 26||df = 26||df = 26|
|SE = 1.9||SE = 1.4||SE = 1.4||SE = 1.7||SE = 1.9|
|t = –2.60||t = 1.75||t = 0.20||t = –6.99||t = –6.80|
|p = 0.01||p = 0.09||p = 0.84||p < 0.001||p < 0.001|
|Warlpiri||df = 806||df = 14||df = 10||df = 344||df = 479|
|SE = 1.7||SE = 3.4||SE = 4.5||SE = 2.5||SE = 2.9|
|t = –1.58||t = 0.38||t = –0.12||t = –2.65||t = –3.70|
|p = 0.11||p = 0.70||p = 0.90||p < 0.001||p < 0.001|
|Arrernte||df = 556||df = 371||df = 299||df = 327||df = 555||df = 777|
|SE = 0.6||SE = 0.91||SE = 0.78||SE = 0.7||SE = 1.35||SE = 1.2|
|t = 2.98||t = 1.71||t = 2.46||t = 0.21||t = –4.02||t = –4.09|
|p = 0.003||p = 0.08||p = 0.01||p = 0.83||p < 0.001||p < 0.001|
|(b)||CoG (Hz)||/p/||/t̪/ ‘th’||t||/ʈ/ ‘rt’||/c/ (ɟ)||/k/|
|English||df = 161||df = 174||df = 238|
|SE = 28.6||SE = 40.9||SE = 31.9|
|t = 1.56||t = 0.33||t = 2.80|
|p = 0.12||p = 0.74||p = 0.005|
|SE = 23.8||SE = 18.3||SE = 32.7||SE = 16.2|
|t = –1.25||t = 0.42||t = 0.03||t = 1.34|
|p = 0.20||p = 0.67||p = 0.97||p = 0.17|
|SE = 23.6||SE = 22.4||SE = 25.7||SE = 21.1||SE = 9.6|
|t = –0.40||t = –1.96||t = –3.33||t = –0.70||t = 3.20|
|p = 0.69||p = 0.06||p < 0.01||p = 0.48||p = 0.003|
|SE = 21.6||SE = 38.0||SE = 55.6||SE = 26.4||SE = 31.9|
|t = 0.18||t = –0.42||t = –0.96||t = –0.85||t = 0.20|
|p = 0.85||p = 0.67||p = 0.35||p = 0.39||p = 0.83|
|SE = 19.0||SE = 18.7||SE = 18.7||SE = 20.2||SE = 16.5||SE = 12.5|
|t = –3.67||t = –1.23||t = –4.53||t = –1.29||t = –1.30||t = 0.89|
|p = 0.003||p = 0.21||p < 0.001||p = 0.19||p = 0.19||p = 0.37|
|(c)||S.D. (Hz)||/p/||/t̪/ ‘th’||/t/||/ʈ/ ‘rt’||/c/||/k/|
|Arrernte||df = 556||df = 371||df = 299||df = 327||df = 555||df = 777|
|SE = 6.9||SE = 7.6||SE = 8.6||SE = 9.6||SE = 6.6||SE = 6.3|
|t = –1.56||t = –2.57||t = –1.48||t = 1.76||t = 1.07||t = 2.39|
|p = 0.11||p = 0.01||p = 0.13||p = 0.07||p = 0.28||p = 0.01|
We first consider the burst duration results. It can be seen that while burst duration in Strong syllables is significantly longer for all three consonants of English, this is not the case for any of the consonants of Makasar. The English differences are larger for /p/ and /k/ than for /t/. The reader is reminded that both English and Makasar have a voicing contrast for the stops, with English being an aspiration language and Makasar being a voicing language. As such, it is likely that the English results reflect the tendency for voiceless (aspirated) stops to be less aspirated when not in a stressed syllable (Cox & Palethorpe, 2007). It is possible that if we had considered pre-voicing for Makasar, we might have found some significant results for stressed versus unstressed contexts.
For the Aboriginal languages, by contrast, results vary according to place of articulation. For Pitjantjatjara, as was seen in our previous study, there are large lengthening effects for /c/ and /k/, and a lesser lengthening effect (trend) for /p/. For the apicals however, the difference is not significant. This pattern is almost exactly replicated for Warlpiri, except that the difference in durations for /p/ is also not significant. For Arrernte, the Pitjantjatjara and Warlpiri pattern is repeated for /c/ and /k/; for the remaining consonants, there are almost no differences between Strong and Weak syllables (there is a trend for /p/ and /t/, but this is not likely to be perceptible, since the differences are only about one millisecond). It should be noted that for /p t̪ t ʈ/ of Arrernte, the Strong burst duration values are, on average, actually shorter than the Weak burst duration values. It is likely that these more complex patterns in the Aboriginal languages are made possible by the fact that there is no voicing contrast in these languages.
Turning now to the spectral measures, we first consider CoG. For English, the only result that approaches significance (at p = 0.005) is for /k/, with a lower CoG in Strong syllables.8 For Makasar, once again, there are no significant results. For Pitjantjatjara, as was highlighted in the Introduction, /k/ has a lower CoG in Strong syllables (approaching significance at p = 0.003), while the neutralized apical /T/ has a higher CoG in Strong syllables (though this is only significant compared to /ʈ/). However, this Pitjantjatjara pattern is not so clearly repeated for CoG in Warlpiri and Arrernte. For Warlpiri, there are no significant effects for CoG for any of the consonants. For Arrernte, the only significant results are for /p/ and /t/, where in both cases CoG is higher in Strong syllables (recall that in Arrernte both /t/ and /ʈ/ can occur in both Strong and Weak syllables, but the result is only significant for /t/).
Finally we consider the SD results. Although data for all five languages are plotted in Figure 1, we report the LME results only for Arrernte in Table 3 (as foreshadowed above). It can be seen that in Arrernte, the dental /t̪/ has a higher SD in Strong syllables, as was expected (a trend at p = 0.001); in addition, and perhaps unexpectedly given that the possibility was not foreshadowed in the Introduction, the velar /k/ has a lower SD in Strong syllables, suggesting a more peaky spectrum (likewise a trend at p = 0.001). In this respect, it is worth noting that the SD result for English /k/ was also significantly lower in Strong syllables (intercept = 1286, beta = 57, SE = 14.3, t = 4.02, p = 0.0001—this was the only significant result for the other languages).
To summarize—for English, all stops show lengthening of burst duration in Strong syllables, but only /k/ shows an effect on CoG and on SD, suggesting a darker, but more peaky spectrum.
For the Australian Aboriginal languages, there is a clear lengthening of burst duration for /c/ and /k/, while for the apicals /t/ and /ʈ/, and the dental /t̪/, there is no lengthening effect (results for /p/ are a little more ambiguous). However, the CoG patterns observed in Pitjantjatjara were not observed at all in Warlpiri, and only partially in Arrernte. The higher CoG for /t/ was repeated in Arrernte, but not the lower CoG for /k/. By contrast, Arrernte /k/ had a lower SD in Strong syllables, suggesting a peakier, though not necessarily darker, spectrum for the velar in Strong syllables. In addition, Arrernte had a significant effect on SD for /t̪/ ‘th’, with a higher SD for the dental in Strong syllables, as expected.
There were no significant effects at all for Makasar, either durational or spectral.
Figure 2 presents averaged FFT spectra from various digital recordings of these languages. It should be noted that the frequency range is 0–6 kHz, which is not the same range as was used for the moments analysis (1–6 kHz). The reader is also reminded that while the English and Makasar spectra contain data from all speakers, the Arrernte plot only contains data from four of the seven speakers.
Looking first at the plot for English, it can be seen that there is minimal difference according to prosodic context for /p/, as is reflected in the lack of significant results for the moments measures presented above. However, there does appear to be a little more energy overall in the frequency range above 1 kHz for the Weak prosodic context, and less energy below 1 kHz for this context (this difference was reflected in spectral tilt measures, not shown here, with the Strong prosodic context having a steeper tilt). For /t/, there is more energy in the frequency range above 1 kHz for the Strong prosodic context, although the overall shapes are very similar for the two prosodic contexts, which may explain the lack of significant moments results for this stop. (By contrast, there is less energy for /t/ in the frequency range below 1 kHz in the Strong prosodic context—this may reflect the greater probability of intervocalic voicing in the Weak prosodic context.)
The spectra for English /k/ show greater energy in the frequency range 2–3 kHz in the Strong prosodic context (and also in the frequency range below 1 kHz, which does not affect the moments measures). This peak at 2–3 kHz accounts both for the lower spectral centre of gravity for English /k/, and also for the lower standard deviation.
We next consider the plot for Makasar. It can be seen that the plots are extremely similar according to prosodic context, especially for /t/ and /k/. There are some differences for /p/ and /ɟ/, with /p/ having a little more energy above 2 kHz in the Strong context, and /ɟ/ having a little less energy above 1 kHz in the Strong context (this latter effect may conceivably be a consequence of differences in the voicing source in Strong context). The spectral moment plots do show some small differences according to prosodic context for these sounds (such as higher CoG and SD for /p/ in the Strong context), but these failed to reach significance. Examination of individual Makasar results showed many differences between speakers according to prosodic context.
Finally we consider Arrernte. It can be seen that for the speakers presented here, the spectra for /p/ are very similar in the two prosodic contexts. However, the CoG results presented above suggested a significant difference between the two contexts, and examination of the other Arrernte spectra not shown here suggests that some speakers do show similar effects for /p/ as we saw above for Makasar (i.e., a little more energy at higher frequencies in Strong prosodic contexts). However, this was certainly not the case for all speakers.
We next consider the dental /t̪/ for Arrernte. In general the spectra appear very similar in the two prosodic contexts. Examination of spectra across speakers (including speakers not shown here) suggests that the most consistent difference between Strong and Weak contexts occurred in the frequency range 2–3 kHz (for some speakers a little higher). It is possible that the significant result for Standard Deviation is capturing some of these subtle differences in this particular frequency range, particularly given that the Centre of Gravity is calculated as being at about 3 kHz in the present study.
The spectra for Arrernte /t/ and /ʈ/ show a broad spectral peak up to about 4 kHz, as mentioned in the Introduction section. It can be seen here that the drop in energy for the broad spectral peak occurs at a higher frequency for the Strong prosodic context than for the Weak context, for both apicals. However, for other speakers not shown here, the prosodic effect is more extreme for /t/. These observations account for the differences in CoG evident for both of these apical stop bursts. By contrast, for the palatal /c/, the spectra are very similar in the two prosodic contexts, although the spectral peak between about 3 and 4 kHz is shifted slightly downward in the Strong prosodic context relative to the Weak context. However, this effect is very subtle, which may explain why there were no significant effects of prosodic context on the spectral moments for the palatal.
Finally, we consider the velar stop /k/ for Arrernte. It can be seen that, as for English, there is a spectral peak at about 1–2 kHz. This peak clearly has greater energy in the Strong prosodic context, as was the case for English. In addition, there is another spectral peak at 3–4 kHz, with greater energy in the Strong prosodic context in this frequency range. These observations likely account for the significantly lower standard deviation for Arrernte /k/ in the Strong prosodic context.
We have seen that languages can differ markedly in whether and how they choose to encode stress on the stop burst. Some languages, such as Makasar in our sample, do not encode stress on this consonantal portion at all, either temporally or spectrally. This is in contrast to very strong durational cues on the vowel for this language. This supports the idea that languages can choose a different conglomeration of cues for encoding stress (Cutler, 2012). For instance, spectral tilt on the vowel is a useful cue to stress in Dutch, but less so in English (Fletcher, 2010). Similarly, extra duration on the stop burst is a part of stress in English, but not in Makasar—however, as mentioned above, this may be due to the fact that English has a positive VOT contrast on the stops, while Makasar has a negative VOT contrast. And Warlpiri, with its 5-place system like Pitjantjatjara’s, shows no effects whatsoever on spectral qualities of the consonants—though unlike Makasar, it does show temporal effects of stress, as had been suggested by Pentland (2004).
It seems that stop burst duration has a special status in multi-place systems, at least in those of the Central Australian languages examined here. It is clear that both /c/ and /k/ have extra stop burst duration under stress, whereas the more anterior coronals /t̪ t ʈ/ do not. It could, therefore, be argued that [burst duration], with values long and short, is a contrastive feature of multi-place systems, with the anterior coronals being short, and the posterior coronal and /k/ being long. In fact, a similar proposal has been made by Gallagher (2011) in relation to laryngeal contrasts in Quechua—she proposes a feature [long VOT] to distinguish ejectives and aspirated stops from the glottals /h/ and /ʔ/. In the present paper, we have seen that any VOT contrasts in stop place of articulation are enhanced under stress. However, the behaviour of /p/ is a little more ambiguous. On raw burst duration values, it patterns with the anterior coronals in these languages. However, there is no clear pattern across the three Central Australian languages as to how this consonant behaves under stress in terms of burst duration. It could well be that, due to the strong visual cues for this consonant, burst duration need not be enhanced under stress (and, it could even be argued, remains unspecified).
In spectral terms, it seems that consonants are not equally amenable to enhancement under stress. The palatal /c/ seems to resist modification to its stop burst in all of the languages studied here, regardless of the number of places of articulation. As mentioned above, this is likely a concomitant of the palatal’s strong coarticulatory resistance, due to the laminal articulation recruiting a large part of the tongue body, and temporarily immobilizing the jaw in order to achieve the clearly shaped acoustic output of the affricated stop burst (Tabain, 2012).
Perhaps a little unexpectedly, the velar /k/ showed effects on the stop burst spectrum in three of the languages studied here, including English with its 3-place system. For English and Pitjantjatjara, /k/ had a darker quality (i.e., lower centre of gravity) under stress. For Arrernte, and English too, /k/ had a lower standard deviation, suggesting a “peakier” spectrum. Above we discussed the possibility that the feature [+grave] is enhanced for /k/ under stress; it seems, in addition, that a feature such as [+compact] or [–diffuse] may also be enhanced for the velar, depending on the language. However, exactly what “compact” and “diffuse” might mean in relation to the spectra we examined in Figure 2 is somewhat unclear, and this question may benefit from further study. Indeed, articulatory modelling would be needed in order to determine how exactly the frequency peaks at 1–2 kHz and at 3–4 kHz might be enhanced under stress conditions. Moreover, whether velars in languages with a uvular place of articulation undergo similar spectral enhancements is another topic for further study.
Another consonant which has its feature [compact] or [diffuse] enhanced under stress is the dental /t̪/, as we predicted earlier given that this is the feature (as reflected in the measure of Standard Deviation) which separates the dental from the other coronals in Arrernte. However, for this consonant, the effect is in the opposite direction to the velar, with the dental being even more [+diffuse] under stress. Our examination of the spectra suggested that this change in the second spectral moment was due to less energy in the 2–3 kHz range under stress—again, exactly what articulatory and aerodynamic conditions would allow for such changes to be made to the output spectra is another topic for further inquiry. In general, it is not at all clear what articulatory strategies might be used to make a consonant more or less “diffuse”. Predictions for raising or lowering of spectral centre of gravity are relatively straightforward for stop bursts, relating to issues of cavity length and formant affiliation. However, where diffuseness/compactness is concerned, the overall set of losses in the system needs to be considered, and this is not a trivial matter. Since losses mostly occur at radiation, one possibility is that at the moment of stop burst, the jaw is in a lower position for a more diffuse spectrum (and vice versa for a more compact spectrum). Another, more tenuous, possibility for the velar /k/, is that a more uniform resonating tube applies in the case of the slightly more diffuse spectrum (since spectral peaks may appear when cross-sectional area is not uniform along the tube). Finally, characteristics of the voice source, as affected by lexical stress, may contribute to the spectral drop-off at frequencies above 1 kHz. These are clearly questions which require further articulatory study and acoustic modelling, and it is quite likely that different solutions may be proposed for different places of articulation.9
The other consonant, in addition to /k/, which showed effects in more than one language was /t/, which had a “lighter” quality (i.e., higher spectral centre of gravity) in both Arrernte and Pitjantjatjara. It is perhaps surprising that Arrernte /t/ showed this effect as well, since the presence of the dental /t̪/ in the phoneme inventory might have prevented this feature enhancement (the dental has a higher CoG value than the alveolar and retroflex). The alveolar /t/ was not enhanced spectrally in English, which did have a strong effect on the velar. It could therefore be said that for any given consonant, if there is any sort of feature enhancement in the language, that enhancement goes in a particular direction—for the velar /k/, it is a lower spectral centre of gravity and/or a more peaky/compact spectrum, while for the alveolar /t/ it is a higher spectral centre of gravity.10
We have not commented on the spectral properties of the bilabial /p/ until now. In general, this is a consonant which does not show extensive effects of stress in spectral terms. This could be due to considerations of articulatory-to-acoustic mapping, as discussed above; or it could be that in addition, the visual cues to this consonant are so strong that any acoustic enhancement is less useful.
In sum, we have argued that the various effects of stress observed in this study are a reflection of feature enhancement, and contribute to the overall conglomeration of cues to stress. The particular features which we have suggested are at play are [grave] for the alveolar and velar, and [diffuse] for the velar and the dental, with the velar being [+grave] and [–diffuse], the alveolar [–grave], and the dental [+diffuse]. Importantly, languages can choose whether or not to enhance the features for a given consonant.11
We have also argued that in multi-place systems, [burst duration] is a feature which separates the anterior coronals /t̪ t ʈ/ from the posterior coronal /c/ and the velar /k/, resulting in extra duration under stress for the latter, with their intrinsically long burst durations, but not for the former, with their intrinsically short burst durations. In these multi-place systems, [burst duration] is a feature that separates these two groups of consonants.