This study examines the effects of stress on the stop burst in five languages differing in number of places of articulation, as reflected in burst duration, spectral centre of gravity, and spectral standard deviation. The languages studied are English (three places of articulation /p t k/), the Indonesian language Makasar (four places /p t c k/), and the Central Australian languages Pitjantjatjara, Warlpiri (both five places /p t ʈ c k/), and Arrernte (six places /p t̪ t ʈ c k/). We find that languages differ in how they manifest stress on the consonant, with Makasar not showing any effect of stress at all, and Warlpiri showing an effect on burst duration, but not on the spectral measures. For the other languages, the velar /k/ has a “darker” quality (i.e., lower spectral centre of gravity), and/or a less diffuse spectrum (i.e., lower standard deviation) under stress; while the alveolar /t/ has a “lighter” quality under stress. In addition, the dental /t̪/ has a more diffuse spectrum under stress. We suggest that this involves enhancement of the features [grave] and [diffuse] under stress, with velars being [+grave] and [–diffuse], alveolars being [–grave], and dentals being [+diffuse]. We discuss the various possible spectral effects of enhancement of these features. Finally, in the languages with five or six places of articulation, the stop burst is longer only for the palatal /c/ and the velar /k/, which have intrinsically long burst durations, and not for the anterior coronals /t̪ t ʈ/, which have intrinsically short burst durations. We suggest that in these systems, [burst duration] is a feature that separates these two groups of consonants.
In this paper we consider the effects that lexical stress can have on the same consonants in different languages. In particular we are concerned with whether the effect of lexical stress on individual consonants is moderated by the overall make-up of the phoneme inventory. To this end, we focus on the oral stop burst, in 3-, 4-, 5-, and 6-place systems. We compare the well-studied 3-place stop system of English /p t k/ with the 4-place system of the Indonesian language Makasar /p t c k/, with the 5- and 6-place systems of three Central Australian languages—Pitjantjatjara and Warlpiri /p t ʈ c k/ and Arrernte /p t̪ t ʈ c k/. Relative to English, Makasar has an extra alveo-palatal place of articulation (henceforth simply palatal); in their turn, Pitjantjatjara and Warlpiri also have an extra retroflex place of articulation; and Arrernte in its turn has an extra dental place of articulation. Table
Places of articulation for the languages examined in this study.
apical | laminal | |||||
---|---|---|---|---|---|---|
bilabial | alveolar | retroflex | dental | palatal | velar | |
p | t | ʈ | t̪ | c | k | |
p | t | ʈ | - | c | k | |
p | t | ʈ | - | c | k | |
p | t | - | - | c | k | |
p | t | - | - | - | k |
It is well established that suprasegmental prosodic structure has an effect on segmental articulations, both at the laryngeal level and at the supra-laryngeal level, and both in spectral terms and in temporal terms (for recent reviews see
In a recent study, we examined the effect of lexical stress on the spectral and temporal properties of stop bursts in Pitjantjatjara (
In contrast to the apicals, the velar /k/ showed a shift in spectral balance to lower frequencies under lexical stress. In general, there was less spectral energy in the stressed context, particularly in the frequency range above about 2 kHz (though exact details vary greatly according to vowel context and speaker group). This was reflected in lower spectral tilt and centre of gravity, and higher spectral skewness. Consequently, the contrast between a “light” alveolar sound and a “dark” velar sound was enhanced in stressed position, with the spectral tilt, centre of gravity, and skewness measures showing effects in opposite directions for these sounds (i.e., higher spectral tilt and centre of gravity for /T/, and lower for /k/ under stress; lower spectral skewness for /T/ and higher for /k/ under stress). These results were interpreted as an enhancement of the feature [grave], where the velar is [+grave] and the alveolar is [–grave].
As regards the remaining stops of Pitjantjatjara, the palatal stop burst /c/ was relatively invariant across stressed and unstressed position, presumably due to its high degree of coarticulatory resistance (it is a laminal articulation; c.f.
Thus far we have only considered spectral properties of the stop burst. However, additional effects according to place of articulation were observed for duration of the stop burst. For /c/ and /k/, and also to a lesser extent for /p/, the stop burst was significantly longer in stressed position than in unstressed position. This, however, was not the case for the apicals, where no lengthening was observed under stress. The stop burst duration for all of the apicals is very short—around 15–20 ms. We interpreted this result to mean that in initial, stressed position, the contrast between a long burst-duration stop such as /c/ or /k/, and a short burst-duration stop such as /t/ or /ʈ/, is enhanced by making the long stop bursts longer, and keeping the short stop bursts about the same (assuming that it is difficult to produce a stop burst much shorter than the values just cited when a full stop closure has been achieved). As a result, it would seem that stop burst duration is an important feature of stop place of articulation in multi-place systems such as Pitjantjatjara’s.
The question then arises, to what extent are these observations typical of other languages, and not just Pitjantjatjara? It has been observed that velar consonants in a non-front vowel context are articulated farther back in Australian languages than in European languages such as English or German, with consequences for formant transitions (
We therefore choose to investigate the realization of the various stop burst phonemes under stress in languages with different phoneme inventories. It is well established that the exact make-up of the phoneme inventory places restrictions on the degree of coarticulation observed—for instance, vowel-to-vowel coarticulation may be limited in a part of the vowel space that is particularly crowded (
A similar principle can be applied in the domain of articulatory prosody. For instance, in an articulatory prosody study of the vowel /i/ in French, Tabain & Perrier (
We begin by providing an overview of the various languages studied here, including a brief description of their vowel and stress systems.
Arrernte (
Australian Aboriginal languages are known for their relatively large number of coronal place contrasts (
The Australian Aboriginal languages studied here do not have a laryngeal contrast on the stops—the stops are generally described as voiceless unaspirated, although burst/aspiration can be quite long for the palatal and the velar, and voicing may occur intervocalically.
Makasar is an Indonesian language, spoken in and around the city of Makassar (note relative spellings) on the southern part of the island of Sulawesi (
It should be noted that both Makasar and English have a laryngeal contrast on the stops (broadly, voiceless aspirated vs. unaspirated for English, and voiced vs. voiceless for Makasar). As mentioned above, the Australian Aboriginal languages do not have such a contrast. It is also worth noting that for all four languages with a palatal consonant in this study, this consonant is lamino-alveo-palatal, with a noticeable affrication. We should also note in this respect that the English affricate /ʧ/ has many phonetic similarities to the alveo-palatal phonemes of the other languages—i.e., a period of closure followed by a strong palatal-like affrication. We have not included it here because it does not always pattern with the stops phonologically (e.g., in terms of syllable structure), but its existence in the phoneme inventory is a potential influence on the stop realizations in English.
Although Arrernte may be described as having a three-vowel system /i a ə/, the high vowel /i/ has low lexical frequency and low functional load—see Henderson (
Given the very different vowel spaces in these five languages, we choose here to focus only on the central vowel context following the stop burst (however, we did not control for the preceding context—see below for further information). For Arrernte, the following context includes the vowels /a ə/—it should be noted that both of these vowels can occur in both stressed and unstressed position in Arrernte. For Makasar, Pitjantjatjara, and Warlpiri, we use the /a a:/ vowel contexts. For English, we use /a a:/ and /ə/—however, note that in English, the low central vowels can only occur in stressed position, while the schwa vowel can only occur in unstressed position (contra Arrernte).
The realization of stress also varies between the languages studied here. The outline here is based on the references cited above for each language.
Stress in Arrernte is described as occurring on the second underlying VC syllable of the word (
Pitjantjatjara and Warlpiri both have stress on the first syllable. Tabain, Fletcher, and Butcher (
Stress in Makasar is usually on the penultimate syllable, though a significant subset of words have stress on the antepenultimate syllable (these are words that have undergone a process called Echo-VC; see
English stress has been very well studied; see Fletcher (
An important caveat is needed regarding the interaction of stress with apical consonants in Warlpiri and Pitjantjatjara. As mentioned above, for Arrernte, Warlpiri, and Pitjantjatjara, the apical contrast /t/~/ʈ/ is neutralized in initial position, to /T/. However, as just mentioned, in Pitjantjatjara and Warlpiri, the first syllable is the stressed syllable. This means that for both of these languages, the stressed apical is also the neutralized apical /T/, and the non-neutralized /t/ and /ʈ/ only occur in unstressed position.
In the case of Arrernte, however, it is possible for both /t/ and /ʈ/ to occur in both stressed and unstressed position, since many words begin with a vowel, even on the surface, and stress is on the second underlying vowel. (Neutralized /T/ does occur in our database for Arrernte, but it is not included in this study since it is relatively infrequent).
For further details on possible word structures in the languages studied here, including some of the words used in the present study, the reader is referred to the Illustrations of the IPA published for Arrernte (
Before we present our data and results, a few words are needed regarding enhancement of spectral contrasts. We have referred to the apicals /t/ and /ʈ/ of Pitjantjatjara becoming “lighter” under stress, and the velar /k/ becoming “darker” (in terms of spectral centre of gravity, which is the first spectral moment). We suggested that this might represent enhancement of the feature [grave] under stress, the apicals being [–grave] and the velar being [+grave] (here we see [grave] as being the counterpart of [acute]). We also pointed out that the palatal /c/ does not appear to show any spectral enhancement under stress. One other feature that deserves mention here is [diffuse] (whose counterpart is [compact]), which Tabain (
Data are presented for 28 speakers from the five languages: seven speakers of Arrernte, five speakers of English,
Most of the speakers of the Australian Aboriginal languages were female, with one male for each language. For English and Makasar, the split between male and female was more even (three females for each language, with two males for English and four males for Makasar). No speaker-normalization was carried out for these data.
We are aware that the differences in recording medium (analog vs. digital) have consequences for the calculation of spectral moments (see below for details of these)—visual inspection of spectra showed a noticeable drop in energy at around 5–6 kHz in the analog recordings that was not present in the digital recordings. However, the patterns according to stop place were examined for each speaker separately, and there was no reason to believe that this recording difference affected the relative pattern of results (e.g., alveolar vs. retroflex, or stressed vs. unstressed). It should be pointed out here that the extremely remote desert locations where the Australian languages are spoken, as well as their endangered status, makes the collection of any sort of language data problematic. We therefore chose to keep what recordings were available to us, especially in light of the fact that many speakers who had participated were senior figures in the maintenance of language in their communities, and their contribution was valued by the community.
The data for this study come from recordings conducted for general phonetic descriptive purposes for each of the languages studied here. Stimuli consisted of single words which were repeated by the speaker three times in a row, without carrier phrase (or twice in a row in the case of the Arrernte speaker from the 1980s). The list of words was designed to illustrate the sounds of each language in different positions in the word (i.e., word-initial, -medial, and, where permitted, -final), and in different vowel contexts. Dis-fluent tokens were discarded. The English wordlist was specifically designed to mimic the wordlists in use for the study of Aboriginal languages at the time. However, for Makasar, speakers read a list of the Swadesh words used in historical linguistic research, supplemented by a list of words designed to illustrate particular phonemic contrasts. As a result, the Makasar wordlist featured much more common words than either the English or Aboriginal languages lists.
As mentioned above, we only analyze stop bursts where the following vowel is a central vowel. Given the real-word stimuli for all of these languages, it would not have been possible to make meaningful comparisons with both preceding and following contexts controlled for. As a result, the preceding context in the present study may be the left edge of the word, any vowel of the language, or any consonant of the language (homorganic or hetero-organic). For this reason, we take preceding context into account in our statistical analyses, outlined further below. For further details on possible word structures in the language, including some of the words used in the present study, the reader is referred to the
The offset of the stop burst in the present study was located at the onset of voicing for the following vowel. As a result, the stop burst total duration includes any aspiration or affrication, in addition to the initial transient.
Spectral analyses of the data were based on a 10 ms Hamming windowed Fast Fourier Transform (FFT),
For the sake of comparison of these derived measures with the raw data, we also present averaged FFT spectra for the recordings of Arrernte, English, and Makasar (the reader is referred to
Finally, duration values were also calculated for the stop burst, given the different behaviours of the apical consonants vs. the bilabial, palatal, and velar consonants of Pitjantjatjara (i.e., no lengthening for stop bursts under stress for the apicals).
All labelling for this study was carried out using the Emu speech labelling tool (
Linear mixed effects models were used to examine the duration, centre of gravity (CoG), and standard deviation results using the lme() function of the
It should be noted that by presenting separate analyses for each consonant in each language, we are essentially presenting a series of pair-wise analyses (Strong vs. Weak), without first testing for an overall effect of consonant. This approach was adopted for two main reasons. Firstly, our preliminary investigations suggested that there was not enough statistical power in our study for the complex interactions between consonant and prosodic category to emerge. It will be recalled from the Introduction that different effects were expected for different consonants: for instance, in the case of spectral centre of gravity, we would expect /t/ to show a higher CoG in Strong (i.e., stressed) prosodic contexts, and /k/ to show a lower CoG in the same context. At the same time, we might expect /c/ to show no effect. Thus, an overall test looking at the effect of prosodic context on a particular measure across all consonants may miss an effect that would be found using a pair-wise approach.
Secondly, the interaction between prosodic context (Strong vs. Weak) and consonant place of articulation is not an equivalent interaction across the five languages. For two of the languages, Pitjantjatjara and Warlpiri, the apicals /t/ and /ʈ/ necessarily have the neutralized apical /T/ as their counterpart in the Strong (i.e., stressed) prosodic context, and the underlying phonemes /t/ and /ʈ/ are only realized in Weak prosodic positions. However, this is specifically not the case for English and Makasar, where there is no neutralization for any of the consonants. The final language, Arrernte, presents a third possibility, where the two apicals may have the neutralized /T/ as their Strong prosodic context counterpart in absolute initial position, but may also be realized as their underlying phonemes /t/ and /ʈ/ if a word-initial vowel is available (see
Given the above circumstances a Bonferroni correction of the default significance level of
Given this statistical approach, we necessarily conducted pair-wise analyses for languages where there were no significant results on a particular measure (e.g., it will be seen below that Makasar shows no significant effects whatsoever for any measure, and Warlpiri shows no significant effects on the spectral measures). We nevertheless chose to report the results of the analyses for each consonant in each language, for the sake of completeness, and also since the LME models provide useful information on the estimated means for each prosodic category.
Finally, it should be noted that we do not make direct statistical comparisons between languages, or even between individual consonants—our sole concern is with Strong vs. Weak (i.e., stressed vs. unstressed) contrasts. This should be born in mind during the presentation of results.
Table
Number of tokens in the database. The number of speakers is given in parentheses after the language name. Note that for Makasar, the voiceless /c/ was replaced with the voiced /ɟ/ due to a lack of Weak tokens for /c/.
Language & Speakers | p | t̪ ‘th’ | t | ʈ ‘rt’ | c (ɟ) | k | Total | |
---|---|---|---|---|---|---|---|---|
Strong | 44 | 72 | 60 | |||||
3 female | Weak | 137 | 118 | 199 | ||||
2 male | ||||||||
Strong | 161 | 213 | 187 | 285 | ||||
3 female | Weak | 196 | 250 | 50 | 307 | |||
4 male | ||||||||
Strong | 421 | 87 | – | 192 | 510 | |||
8 female | Weak | 189 | 146 | 213 | 347 | 485 | ||
1 male | ||||||||
Strong | 256 | 27 | 191 | 264 | ||||
4 female | Weak | 581 | 123 | 47 | 173 | 240 | ||
1 male | ||||||||
Strong | 371 | 205 | 199 | 86 | 216 | 245 | ||
6 female | Weak | 225 | 189 | 128 | 262 | 369 | 566 | |
1 male |
For Pitjantjatjara the relative number of Strong vs. Weak tokens depends on place of articulation. There are more Strong /p/ than Weak /p/ tokens, while the opposite is true for the three coronal consonants /t ʈ c/—the reader is reminded that the Strong /t/ and /ʈ/ are in fact neutralized /T/. For /k/, the number of tokens is more balanced for Strong and Weak contexts. These observed patterns are broadly the same for Warlpiri, with the exception that in Warlpiri there are more Weak tokens for /p/ than Strong tokens. There are also fewer retroflex /ʈ/ tokens for Warlpiri, and the numbers of Strong vs. Weak palatal tokens are roughly the same for this language.
Finally, for Arrernte as for Pitjantjatjara, there are more Strong /p/ tokens than Weak, but many more Weak tokens than Strong for /c/ and /k/. There is an imbalance for the apicals: there are more Strong tokens for alveolar /t/, but (about three times) more Weak tokens for retroflex /ʈ/. Notably, the number of Strong vs. Weak tokens is more balanced for the dental /t̪/.
As a general observation, there are far more /k/ and /p/ tokens than coronal tokens in the Australian Aboriginal languages, and /k/ also dominates English and Makasar (with /t/ also relatively common in Makasar).
Figure
Plots of means and 95% confidence intervals for each of the three measures examined in the present study. Data for each language are presented separately. Blue circles denote Strong syllables, and red circles denote Weak syllables. Note that for Pitjantjatjara and Warlpiri, the blue circle for the Strong neutralized apical is plotted above the /t/ only, even though it contrasts with both /t/ and /ʈ/ ‘rt’. (Note also that ‘th’ = /t̪/, ‘rt’ = /ʈ/ and ‘j’ = /ɟ/.)
Results from a linear mixed effects model for each language and each consonant separately, with Speaker and Preceding Context set as random factors. (a) Burst Duration, (b) Centre of Gravity, and (c) Standard Deviation. The degrees of freedom are given in the top row for each consonant. The following values are listed in each cell: the estimated mean for the Strong syllable condition (intercept); the difference between the Strong condition mean and the Weak condition mean (beta); the standard error of the difference (
(a) | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
(b) | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
Intercept | |||||||
Beta | |||||||
(c) | |||||||
Intercept | |||||||
Beta | |||||||
We first consider the burst duration results. It can be seen that while burst duration in Strong syllables is significantly longer for all three consonants of English, this is not the case for any of the consonants of Makasar. The English differences are larger for /p/ and /k/ than for /t/. The reader is reminded that both English and Makasar have a voicing contrast for the stops, with English being an aspiration language and Makasar being a voicing language. As such, it is likely that the English results reflect the tendency for voiceless (aspirated) stops to be less aspirated when not in a stressed syllable (
For the Aboriginal languages, by contrast, results vary according to place of articulation. For Pitjantjatjara, as was seen in our previous study, there are large lengthening effects for /c/ and /k/, and a lesser lengthening effect (trend) for /p/. For the apicals however, the difference is not significant. This pattern is almost exactly replicated for Warlpiri, except that the difference in durations for /p/ is also not significant. For Arrernte, the Pitjantjatjara and Warlpiri pattern is repeated for /c/ and /k/; for the remaining consonants, there are almost no differences between Strong and Weak syllables (there is a trend for /p/ and /t/, but this is not likely to be perceptible, since the differences are only about one millisecond). It should be noted that for /p t̪ t ʈ/ of Arrernte, the Strong burst duration values are, on average, actually shorter than the Weak burst duration values. It is likely that these more complex patterns in the Aboriginal languages are made possible by the fact that there is no voicing contrast in these languages.
Turning now to the spectral measures, we first consider CoG. For English, the only result that approaches significance (at
Finally we consider the SD results. Although data for all five languages are plotted in Figure
To summarize—for English, all stops show lengthening of burst duration in Strong syllables, but only /k/ shows an effect on CoG and on SD, suggesting a darker, but more peaky spectrum.
For the Australian Aboriginal languages, there is a clear lengthening of burst duration for /c/ and /k/, while for the apicals /t/ and /ʈ/, and the dental /t̪/, there is no lengthening effect (results for /p/ are a little more ambiguous). However, the CoG patterns observed in Pitjantjatjara were not observed at all in Warlpiri, and only partially in Arrernte. The higher CoG for /t/ was repeated in Arrernte, but not the lower CoG for /k/. By contrast, Arrernte /k/ had a lower SD in Strong syllables, suggesting a peakier, though not necessarily darker, spectrum for the velar in Strong syllables. In addition, Arrernte had a significant effect on SD for /t̪/ ‘th’, with a higher SD for the dental in Strong syllables, as expected.
There were no significant effects at all for Makasar, either durational or spectral.
Figure
Averaged FFT spectra for three languages. Note that the frequency range is 0–6kHz, and that the magnitude scales differ for each plot. (top left) English (all five speakers), (top right) Makasar (all seven speakers), (bottom) Arrernte (four speakers only).
Looking first at the plot for English, it can be seen that there is minimal difference according to prosodic context for /p/, as is reflected in the lack of significant results for the moments measures presented above. However, there does appear to be a little more energy overall in the frequency range above 1 kHz for the Weak prosodic context, and less energy below 1 kHz for this context (this difference was reflected in spectral tilt measures, not shown here, with the Strong prosodic context having a steeper tilt). For /t/, there is more energy in the frequency range above 1 kHz for the Strong prosodic context, although the overall shapes are very similar for the two prosodic contexts, which may explain the lack of significant moments results for this stop. (By contrast, there is less energy for /t/ in the frequency range below 1 kHz in the Strong prosodic context—this may reflect the greater probability of intervocalic voicing in the Weak prosodic context.)
The spectra for English /k/ show greater energy in the frequency range 2–3 kHz in the Strong prosodic context (and also in the frequency range below 1 kHz, which does not affect the moments measures). This peak at 2–3 kHz accounts both for the lower spectral centre of gravity for English /k/, and also for the lower standard deviation.
We next consider the plot for Makasar. It can be seen that the plots are extremely similar according to prosodic context, especially for /t/ and /k/. There are some differences for /p/ and /ɟ/, with /p/ having a little more energy above 2 kHz in the Strong context, and /ɟ/ having a little less energy above 1 kHz in the Strong context (this latter effect may conceivably be a consequence of differences in the voicing source in Strong context). The spectral moment plots do show some small differences according to prosodic context for these sounds (such as higher CoG and SD for /p/ in the Strong context), but these failed to reach significance. Examination of individual Makasar results showed many differences between speakers according to prosodic context.
Finally we consider Arrernte. It can be seen that for the speakers presented here, the spectra for /p/ are very similar in the two prosodic contexts. However, the CoG results presented above suggested a significant difference between the two contexts, and examination of the other Arrernte spectra not shown here suggests that some speakers do show similar effects for /p/ as we saw above for Makasar (i.e., a little more energy at higher frequencies in Strong prosodic contexts). However, this was certainly not the case for all speakers.
We next consider the dental /t̪/ for Arrernte. In general the spectra appear very similar in the two prosodic contexts. Examination of spectra across speakers (including speakers not shown here) suggests that the most consistent difference between Strong and Weak contexts occurred in the frequency range 2–3 kHz (for some speakers a little higher). It is possible that the significant result for Standard Deviation is capturing some of these subtle differences in this particular frequency range, particularly given that the Centre of Gravity is calculated as being at about 3 kHz in the present study.
The spectra for Arrernte /t/ and /ʈ/ show a broad spectral peak up to about 4 kHz, as mentioned in the Introduction section. It can be seen here that the drop in energy for the broad spectral peak occurs at a higher frequency for the Strong prosodic context than for the Weak context, for both apicals. However, for other speakers not shown here, the prosodic effect is more extreme for /t/. These observations account for the differences in CoG evident for both of these apical stop bursts. By contrast, for the palatal /c/, the spectra are very similar in the two prosodic contexts, although the spectral peak between about 3 and 4 kHz is shifted slightly downward in the Strong prosodic context relative to the Weak context. However, this effect is very subtle, which may explain why there were no significant effects of prosodic context on the spectral moments for the palatal.
Finally, we consider the velar stop /k/ for Arrernte. It can be seen that, as for English, there is a spectral peak at about 1–2 kHz. This peak clearly has greater energy in the Strong prosodic context, as was the case for English. In addition, there is another spectral peak at 3–4 kHz, with greater energy in the Strong prosodic context in this frequency range. These observations likely account for the significantly lower standard deviation for Arrernte /k/ in the Strong prosodic context.
We have seen that languages can differ markedly in whether and how they choose to encode stress on the stop burst. Some languages, such as Makasar in our sample, do not encode stress on this consonantal portion at all, either temporally or spectrally. This is in contrast to very strong durational cues on the vowel for this language. This supports the idea that languages can choose a different conglomeration of cues for encoding stress (
It seems that stop burst duration has a special status in multi-place systems, at least in those of the Central Australian languages examined here. It is clear that both /c/ and /k/ have extra stop burst duration under stress, whereas the more anterior coronals /t̪ t ʈ/ do not. It could, therefore, be argued that [burst duration], with values long and short, is a contrastive feature of multi-place systems, with the anterior coronals being short, and the posterior coronal and /k/ being long. In fact, a similar proposal has been made by Gallagher (
In spectral terms, it seems that consonants are not equally amenable to enhancement under stress. The palatal /c/ seems to resist modification to its stop burst in all of the languages studied here, regardless of the number of places of articulation. As mentioned above, this is likely a concomitant of the palatal’s strong coarticulatory resistance, due to the laminal articulation recruiting a large part of the tongue body, and temporarily immobilizing the jaw in order to achieve the clearly shaped acoustic output of the affricated stop burst (
Perhaps a little unexpectedly, the velar /k/ showed effects on the stop burst spectrum in three of the languages studied here, including English with its 3-place system. For English and Pitjantjatjara, /k/ had a darker quality (i.e., lower centre of gravity) under stress. For Arrernte, and English too, /k/ had a lower standard deviation, suggesting a “peakier” spectrum. Above we discussed the possibility that the feature [+grave] is enhanced for /k/ under stress; it seems, in addition, that a feature such as [+compact] or [–diffuse] may also be enhanced for the velar, depending on the language. However, exactly what “compact” and “diffuse” might mean in relation to the spectra we examined in Figure
Another consonant which has its feature [compact] or [diffuse] enhanced under stress is the dental /t̪/, as we predicted earlier given that this is the feature (as reflected in the measure of Standard Deviation) which separates the dental from the other coronals in Arrernte. However, for this consonant, the effect is in the opposite direction to the velar, with the dental being even more [+diffuse] under stress. Our examination of the spectra suggested that this change in the second spectral moment was due to less energy in the 2–3 kHz range under stress—again, exactly what articulatory and aerodynamic conditions would allow for such changes to be made to the output spectra is another topic for further inquiry. In general, it is not at all clear what articulatory strategies might be used to make a consonant more or less “diffuse”. Predictions for raising or lowering of spectral centre of gravity are relatively straightforward for stop bursts, relating to issues of cavity length and formant affiliation. However, where diffuseness/compactness is concerned, the overall set of losses in the system needs to be considered, and this is not a trivial matter. Since losses mostly occur at radiation, one possibility is that at the moment of stop burst, the jaw is in a lower position for a more diffuse spectrum (and vice versa for a more compact spectrum). Another, more tenuous, possibility for the velar /k/, is that a more uniform resonating tube applies in the case of the slightly more diffuse spectrum (since spectral peaks may appear when cross-sectional area is not uniform along the tube). Finally, characteristics of the voice source, as affected by lexical stress, may contribute to the spectral drop-off at frequencies above 1 kHz. These are clearly questions which require further articulatory study and acoustic modelling, and it is quite likely that different solutions may be proposed for different places of articulation.
The other consonant, in addition to /k/, which showed effects in more than one language was /t/, which had a “lighter” quality (i.e., higher spectral centre of gravity) in both Arrernte and Pitjantjatjara. It is perhaps surprising that Arrernte /t/ showed this effect as well, since the presence of the dental /t̪/ in the phoneme inventory might have prevented this feature enhancement (the dental has a higher CoG value than the alveolar and retroflex). The alveolar /t/ was not enhanced spectrally in English, which did have a strong effect on the velar. It could therefore be said that for any given consonant, if there is any sort of feature enhancement in the language, that enhancement goes in a particular direction—for the velar /k/, it is a lower spectral centre of gravity and/or a more peaky/compact spectrum, while for the alveolar /t/ it is a higher spectral centre of gravity.
We have not commented on the spectral properties of the bilabial /p/ until now. In general, this is a consonant which does not show extensive effects of stress in spectral terms. This could be due to considerations of articulatory-to-acoustic mapping, as discussed above; or it could be that in addition, the visual cues to this consonant are so strong that any acoustic enhancement is less useful.
In sum, we have argued that the various effects of stress observed in this study are a reflection of feature enhancement, and contribute to the overall conglomeration of cues to stress. The particular features which we have suggested are at play are [grave] for the alveolar and velar, and [diffuse] for the velar and the dental, with the velar being [+grave] and [–diffuse], the alveolar [–grave], and the dental [+diffuse]. Importantly, languages can choose whether or not to enhance the features for a given consonant.
We have also argued that in multi-place systems, [burst duration] is a feature which separates the anterior coronals /t̪ t ʈ/ from the posterior coronal /c/ and the velar /k/, resulting in extra duration under stress for the latter, with their intrinsically long burst durations, but not for the former, with their intrinsically short burst durations. In these multi-place systems, [burst duration] is a feature that separates these two groups of consonants.
The exception is the /i/ vowel context, where the spectral tilt and centre of gravity is slightly higher for the retroflex than for the alveolar (
However, the exception was again the /u/ vowel context, where the /c/ stop burst had lower tilt and centre of gravity, and higher skewness values, under stress, presumably due to lengthening of the front cavity by a more extreme labial gesture.
Two of the English speakers were authors MT and RB.
Details of the field recordings were as follows: For the cassette recordings, a Sony TCM-5000EV cassette recorder was used, with a frequency response of 90–9,000 Hz according to the manufacturer. It was felt at the time that any inferiority to larger machines was offset by the fact that the recorder weighed only 1.45 kg with batteries, and could be carried around at all times and be instantly ready for use. The microphones used in conjunction with this machine were the Sony ECM-D8 model. This was a very small, flat, omnidirectional electret condenser microphone, with a frequency response of 150–15,000 Hz, a signal-to-noise ratio of over 40 dB (at 1,000 Hz) and a dynamic range of more than 76 dB (manufacturer’s figures). It used the Sony Boundary Effect system, whereby when the microphone is placed on a solid surface, such as a table or large book, the frequency response of direct sound (i.e., from the speaker) is raised about 6 dB more than in conventional microphones, whereas the frequency response of reflected sound (i.e., extraneous noise) is raised by only 3 dB. This approximately 3 dB difference between the frequency response for direct sound and reflected sound enhances the recording of speakers near to the microphone at the expense of background noises further away, thereby making the microphone particularly well-suited for fieldwork conditions. The tape used was mainly TDK Type-I D and Fuji Type-I DR-Ix, both of which had a frequency response in excess of that claimed by the manufacturer for the microphone and recorder (better than 45–13,000 Hz according to our own laboratory tests at the time of recording). For the recordings at the Institute for Aboriginal Development in Alice Springs, a Revox B-77 Mk II open reel half-track tape recorder and Sennheiser MD-427 microphone were used. All recordings were done in a quiet room.
Many of these recordings have been presented as part of previous acoustic studies on these languages. The Makasar recordings were used for Tabain and Jukes (
An anonymous reviewer asks why a regular (single-window) Fourier analysis was used, rather than a multitaper analysis, which takes several different windows of the same portion of the signal (in this case the stop burst). The answer lies in the fact that this research project has been ongoing for many years, before multitaper analysis began to be used in speech research for the analysis of obstruents. With the addition of each new language, we simply used the same analysis techniques as we had used for previous languages (cf. published results in
Gender (Male or Female) and Recording Medium (Digital or Analog) were also included as random factors in an earlier version of the model. This model was compared with the present model containing only Language, Speaker, and Preceding Contexts as random factors. In the case of every measure, there was no significant difference between the model that included Gender and Medium and the one that did not (with the ANOVA returning a
An anonymous reviewer asks whether the significant results for /k/ in English might not reflect the fact that there is a clear shift in vowel quality from /a/ to /ə/ in stressed versus unstressed syllables. We do not think this is likely, since as the vowel plots in Cox & Palethorpe (
We thank Pierre Badin and Pascal Perrier for discussion of these issues, but do not hold them accountable for our interpretation!
The associate editor asks if there is any articulatory evidence for these observations. A preliminary study of Arrernte /t/ in stressed, and /ʈ/ in stressed and unstressed positions, suggests that closure for the unstressed /ʈ/ is indeed produced further back along the palate than for either of the stressed apicals (unstressed /t/ was not studied). Results are based on electro-palatography and electro-magnetic articulography data from two female speakers of Arrernte.
An anonymous reviewer asks if these results specifically favour enhancement in prosodically strong positions, or if they could also be interpreted as consistent with enhancement of lexical distinctions or functional load. Unfortunately, our wordlists were not created to answer such questions, and we must leave such questions open to further study.
We are extremely grateful to the Australian Research Council for funding this research over very many years, including most recently a Future Fellowship to Marija Tabain. We would also like to thank the many research assistants who have worked on these databases; the associate editor Lisa Davidson and two anonymous reviewers for their time; and our speakers for their interest in language work (crosses mark the names of those who have passed away)—for Arrernte, Veronica Dobson, Rosie Ferber†, Therese Ryder, Janet Turner, Margaret Kemarre Turner, Raphael Turner†, and Sabella Turner; for Makasar, Isna, Muna, Sanga, Sarro, Sikki, Tinggi, Tompo; for Pitjantjatjara, Anmanari Alice, Kanytjupai Armstrong, Margaret Dagg, Manyiritjanu Lennon, Tinimai, and Mike Williams from Ernabella, as well as Charmaine Coulthard, Hilda Bert, and Kathleen Windy from Areyonga; for Warlpiri, Bess Nungarrayi Price, Rene Robinson, Connie Nungarrayi Rice/Walit†, Kay Napaljarri Ross†, and Derek Jungarrayi Wayne.
The authors declare that they have no competing interests.