1.1. Long and short tone spans
In Luganda, a Bantu language spoken in Uganda, there is a contrast between high (H) and low (L) tone (Tucker, 1962; Cole, 1967; Stevick, 1969; Hyman, 1982; Hyman, Katamba, & Walusimbi, 1987; Hyman & Katamba, 1993; Hyman & Katamba, 2010). In (1), for example, all the syllables have low tone except the boldfaced one. Here and henceforth, the high-tone span is boldfaced in the transcription, and the underlined syllables are the ones with underlying high tone. Morpheme gloss abbreviations are defined in the Appendix.
- Omulenzi anona ennyama.
- “The boy is getting the meat.”
In (1), high tone is associated with a single syllable, which is a short high-tone span. Such a span in Luganda has an f0 rise that begins before the onset of the high-toned syllable and ends in an f0 maximum around the end of the vowel, followed immediately by an f0 fall over the course of the next syllable (Myers, Namyalo, & Kiriggwajjo, 2018). The timing of the f0 rise offset depends on the duration of the onset, vowel, and coda of the high-toned syllable, with longer segments associated with a later rise offset (Myers et al., 2018). The f0 rise and fall is coordinated quite precisely with the segments, as in other languages (Silverman & Pierrehumbert, 1990; Prieto, van Santen, & Hirschberg, 1995; Arvaniti, Ladd, & Mennen, 1998; Xu, 1998; Myers, 2003).
However, there are also high-tone spans in Luganda that are longer than one syllable, which we will refer to as long high-tone spans. There are two processes of unbounded leftward spread of high tones (to be described in more detail below in Section 1.3), each of which yields long high-tone spans (Tucker, 1962; Cole, 1967; Stevick, 1969; Hyman, 1982; Hyman et al., 1987; Hyman & Katamba, 1993; Hyman & Katamba, 2010), as in (2).
- Omulenzi yalera nnawolovu.
- “The boy held the chameleon.”
In (2), the high-tone span extends over four syllables from the second syllable of the verb yalera to the antepenult of the object NP nnawolovu. There is no upper limit on the length of a high-tone span in Luganda.
There are various reasons that we might expect long high-tone spans to differ from short high-tone spans in f0 level or f0 timing. Xu and Wang (2001) and Xu and Sun (2002) have shown that even at normal speaking rates it takes most of a syllable’s duration for an f0 target to be attained. In a short high-tone span, the speaker must then immediately begin the transition to the next target, while in a long high-tone span that next transition is in some later syllable. Time pressure is likely to be a greater factor in the realization of a short high-tone span than in the case of a long high-tone span.
One frequent effect of such time pressure is undershoot of the target (Lindblom, 1963; Moon & Lindblom, 1994). F0 excursions were reduced in faster speech compared to normal-rate speech for one English speaker in Ladd, Faulkner, Faulkner, and Schepman (1999), and two French speakers in Fougeron and Jun (2000). In Thai, Gandour, Tumtavitikul, and Satthamnuwong (1999) found the pitch range was reduced at faster speaking rates in unstressed syllables for tones with greater f0 excursions (high, falling, rising). Grabe (1998) and Grabe, Post, Nolan, and Farrar (2000) found that the f0 range was truncated when the duration of the voiced vowel was shortened through addition of voiceless consonants, in German, and in English spoken in Leeds and Belfast.
Undershoot is also conditioned by time pressure in segmental categories. Long vowels tend to be more peripheral and so more spread out in F1 * F2 space than corresponding short vowels, e.g., in Finnish (Lehtonen, 1970: 86; Wiik, 1965), Chickasaw (Gordon, Munro, & Ladefoged, 2000), and Japanese (Hirata & Tsukada, 2009). Likewise, there are more complete closures with long stop consonants than corresponding short ones, e.g., in Japanese and Swedish (Löfqvist, 2005), Tashliyt Berber (Ridouane, 2007: 130), and Persian (Hansen & Myers, 2017). In general, shorter segments are more reduced and more subject to the coarticulatory influence of neighboring sounds than longer segments (Lindblom, 1963; Moon & Lindblom, 1994).
Another response to time pressure is to increase the velocity of speech movements. For example, for the majority of English speakers in Ladd et al. (1999) and Grabe (1998), faster speaking rate did not affect f0 excursion, but was associated with faster f0 change. For these speakers, the height of high-tone targets was maintained under time pressures, but the transition to peak f0 was shortened.
Similarly, articulatory gestures in the production of short segments are significantly faster than in corresponding long segments. In German, Hertrich and Ackermann (1997) found less gestural stiffness (the slope of peak velocity against displacement) in the opening gesture for long vowels than for short vowels. Gestural velocity was found to be lower in the production of long consonants in Japanese compared to short ones (Smith, 1995; Löfqvist, 2007), and gestural stiffness was less for long labial consonants than for short ones (Smith, 1995; Löfqvist, 2005). Peak tongue position in medial lingual consonants in Japanese is attained later in long consonants than in short ones (Fujimoto, Funatsu, & Hoole, 2015). In vowel-vowel sequences in Finnish, the formant transition to the second vowel is longer if that vowel is long than if it is short, and this difference in formant transition duration has been shown to be a perceptual cue for vowel length in the language (Myers & Hansen, 2005). Löfqvist (2005) suggests that long segments are produced not by stopping at the target position and holding there, but by making the transitions to and from that target more gradual.
If the response to time pressure in Luganda tone is undershoot, then the f0 rise excursion and the subsequent f0 fall excursion will be smaller in short high-tone spans than in long ones. If the response is temporal compression, then the f0 rise will be earlier in the span-initial syllable in short high-tone spans than in long high-tone spans, and the f0 fall will be later in the span-final syllable than in the short high-tone spans. If time pressure is not a significant factor in tone realization, there will be no difference between short and long high-tone spans beyond the duration of the high f0 plateau.
1.2. Lexical and intonational tones
In addition to the length of the span, another important consideration in the phonetic realization of high-tone spans in Luganda is the source of the high tone. The high tones in (1) and (2) are lexical high tones, belonging to morphemes within the constituent words. There are also intonational tones. In (3), for example, there is no lexical high tone, but the sentence is realized with a high tone span extending from the second syllable of the verb phrase to the end of the sentence.
- Omulwanyi alera ennyoni.
- “The fighter is carrying a bird.”
This high-tone span always includes the last syllable of the phrase, and it doesn’t belong to any segmental word. Hyman and Katamba (1993) thus identify it as a boundary tone H%. Its meaning and distribution are unclear: Hyman (1982) describes it as characteristic of list intonation, while Hyman and Katamba (2010) describe it as “indicating ‘finality.’” The citation-form transcriptions in Snoxall (1967), Cole (1967), and Stevick (1969) all include this intonational high tone for all items ending in two or more syllables without a lexical high tone. We have found such a final high-tone span to be typical in any Luganda statement ending in such a sequence.
The intonational high tone has always been transcribed the same as the lexical high tone in the literature on Luganda tone, and no distinction between the two was described in the detailed phonetic descriptions of Luganda tones in Tucker (1962, 1967). However, Fainleib and Selkirk (2014), in a small-scale instrumental study with one Luganda speaker, found that for their participant H% differed phonetically from lexical H. They found that the f0 rise at the onset of an intonational H% span as in (3) was smaller than for a lexical high tone, and that there was a fall in f0 over the course of the high-tone plateau in the intonational high tone but not in the lexical high-tone span. In other words, the raised f0 interval was close to even in f0 value in the case of a lexical high-tone span, but had a drop of about 9 Hz over the course of an intonational high-tone span.
H% can differ from other H tones in phonetic implementation. Pierrehumbert (1980: 26), in initially defining boundary tones for English, notes that “H* and H% are equally high tones,” differing primarily in where they occur in the phrase (on a stressed syllable, and on a phrase-final syllable, respectively). However, Pierrehumbert (1980: 90) argues that H% is subject to Upstep, while pitch accent H* is not. In Japanese, Pierrehumbert and Beckman (1988: 81) propose that the downtrend called catathesis is triggered only by accentual H, but can be undergone by either an accent H or boundary H%. Myers (1996) found that H% in Chichewa, unlike lexical high tones in that language, was not subject to downdrift, did not condition it, and was transparent to downdrift effects between lexical high tones on either side of it.
The findings of Fainleib and Selkirk (2014) suggest that lexical high tones in Luganda have greater f0 excursions and less declination than the intonational high tones. However, only a single speaker participated in the study, so it does not provide a firm basis for generalization. In the following study, their claims about the phonetic differences between lexical and boundary high tone are tested with a larger sample of speakers.
There is still not much information available about the phonetic implementation of boundary tones, despite an important cross-linguistic body of work focused on the distribution and interpretation of such tones, e.g., the studies in Jun (2005) and Downing and Rialland (2017). The current study is intended to add to the still small literature on how boundary tones differ in phonetic realization from other intonational and lexical tones.
1.3. High tone spread in Luganda
Since the following experimental study will deal with the difference between long and short high-tone spans, it is necessary to review what the literature has established about the conditions under which a high tone spreads or does not spread in Luganda.
One of the two sources of unbounded high tone spans is the process of High Tone Plateauing (Hyman & Katamba, 2010), in which the two separate lexical high tones are joined into a single extended high-tone span (Hyman et al., 1987; Hyman & Katamba, 1993; Pak, 2008; Hyman & Katamba, 2010). This process is given in Figure 1 in the formulation of Hyman and Katamba (2010), and it is exemplified in (4).
- Omulenzi yalera nnawolovu.
- “The boy held the chameleon.”
The verb yalera in (4b) and the object nnawolovu in (4c) each have one high-toned syllable when they occur in isolation. But when the two words occur in succession, as in (4d), there is a high-tone span extending from the high-toned syllable of the verb to that of the complement.
There are some syntactic conditions for this process (Hyman et al., 1987; Pak, 2008; Hyman & Katamba, 2010). According to Hyman and Katamba (2010), a sequence of words that meets the phonological structural description in Figure 1 is only subject to the rule if the words belong to the same tone group (TG). A tone group corresponds to a head-initial maximal projection XP, in which the head is not an ‘inherently-focused’ verb type, e.g., negative, imperative, or infinitive. It is further required that the complement in XP not include an ‘initial-vowel’ augment, a morpheme involved in the marking of focus.
The second source of unbounded high tone spans in Luganda is High Tone Anticipation. If a word with a high tone follows a word within the same phrase which ends in a low-toned syllable, the high tone in the later word is spread leftward up to the second syllable of the phrase or to the second syllable after a high tone (Hyman et al., 1987; Hyman & Katamba, 1993; Pak, 2008; Hyman & Katamba, 2010). This process is formalized by Hyman and Katamba (2010) as in Figure 2, where PW is a phonological word, and TP is a tone phrase. A tone phrase is a constituent superordinate to the tone group, and it consists of the VP or any preverbal maximal projection. An example of High Tone Anticipation is given in (5).
- Omuntu alera omuwere omutono.
- “The person is holding a small infant.”
In the isolation forms in (5b) and (5c), the verb alera has no high tone and the complement omuwere has a high tone on the penult. But when they are juxtaposed in the sentence (5e), the high tone of the complement extends leftward to the second syllable of the verb. It is stopped from spreading onto the first syllable of the verb, in this analysis, by an initial L% boundary tone on the first syllable of the tone phrase.
If a high tone does not meet the conditions for High Tone Plateauing or High Tone Anticipation, it will not be spread and so it will retain a short high tone span on just one syllable. The high tone on the second syllable of the verb in (1), for example, is at the beginning of a tone phase (as defined above) and so does not meet the conditions for either tone spread process.
In this paper, we will assume for purposes of discussion that the tone-bearing unit in Luganda is the syllable, contrary to previous work that has identified the mora as the tone-bearing unit in the language (Tucker, 1962; Hyman, 1992; Hyman & Katamba, 1993; Hyman & Katamba, 2010). Myers et al. (2018) found that the timing of the f0 peak and the subsequent fall depended on the duration of the onset, the vowel of the syllable, as well as the duration of the postvocalic consonantal interval. They did not find that this dependency was sensitive to whether or not the segment was moraic, and so concluded that the mora was not relevant to the timing of f0 events in Luganda. However, in the following experiment, this controversy is moot, since the test syllables are all simple open syllables with short vowels, so that the syllable consists of just one mora.
The experiment was designed to test whether the phonetic differences described above exist in Luganda between short and long high tone spans, and between lexical and boundary high tones.
Ten native speakers of Luganda participated in the study: three females and seven males. All had grown up in the Central region of Uganda and were living in the Kampala area at the time of the study. The speakers are described in Table 1.
The speakers ranged in age at the time of the experiment from 24 to 49. They came from all over the Central region of Uganda, the traditional homeland of the Baganda, from the Rakai district in the Southwestern corner of the region (S5) to Mukono in the Southeast (S2), Mubende in the Northwest (S3, S4, S10), and Kayunga in the Northeast (S7).
The test sentences in the study were all affirmative statements with a lexical subject, a verb in the present or past tense, and a nominal complement (either an object or a locative). Each had a medial high-tone span, within which all consonants were sonorant, and outside of which all syllables lacked lexical high tone. The initial and final syllable of the high tone span was always an open syllable with a short vowel.
There were four sentence classes: HH, LH, HL, and LL. These four classes differ in whether they have a long high tone span or a short one, and whether the high tone span consists of a boundary tone or a lexical tone. Twenty sentences of each of the four types were produced by each speaker, for a total of 80 sentences per speaker, and 800 sentences for the study. The test sentences are listed in the Appendix.
The HH sentences (listed in  in the Appendix) have a lexical high tone in the verb and also in the following complement. The verb does not belong to an ‘inherently focused’ inflection class, and the complement does not have an augment, so the verb and the complement belong to the same tone group, as defined in Hyman and Katamba (2010). High Tone Plateauing, as in Figure 1, thus applies, yielding a high tone span extending from the underlying verb-stem-initial high-toned syllable to the underlying high-toned syllable in the complement. The HH sentences have a long high-tone span with a lexical high tone.
The LH sentences (listed in  in the Appendix) have a verb with no lexical high tones, and a complement with one lexical high tone on a non-final syllable. The verb and the complement belong to the same tone phrase, as defined above, so they are subject to High Tone Anticipation, as in Figure 2. This yields a high-tone span extending from the second syllable of the verb (i.e., the first syllable of the verb stem) to the underlying position of the high tone in the complement. The LH sentences have a long high-tone span with a lexical high tone, like the HH sentences.
The HL sentences (listed in  in the Appendix) have a lexical high tone on the verb stem and none in the complement. A high tone in that position cannot spread leftward, since it is at the beginning of the tone phrase as defined above. As a result, the high tone in these sentences is realized on a single syllable. The HL sentences have a short high-tone span with a lexical high tone.
The LL sentences (listed in  in the Appendix) have no lexical high tones either in the verb or the complement. In such a case, an intonational H% is inserted and spread leftward under the conditions of High Tone Anticipation. The LL sentences have a long high-tone span with an intonational high tone.
Recordings were made in a quiet room on the campus of Makerere University in Kampala, Uganda, using a Shure SM10A head-mounted microphone and a Marantz PMD670 solid-state recorder, with a sampling rate of 44.1 kHz and 16-bit amplitude resolution.
The sentences were presented to the participants in a PowerPoint slideshow on a laptop computer, with each sentence on a separate slide. The slides were shuffled into a quasi-random order. Speakers were instructed to read each sentence to themselves first, and then to produce it without internal pauses, as a statement and a separate utterance (rather than as a member of a list). They were told that if they were not satisfied with their initial production, they could keep saying the sentence until they felt they had it right. They proceeded at their own pace through the sentences, but were instructed to finish saying a sentence before pressing the key to switch to the next one. Where the speaker had produced a sentence more than once, the last one was selected for analysis, unless it had a clear internal pause or slip.
The LL sentences were elicited in a separate block, since it was assumed on the basis of the previous literature that the final H% was optional, and that extra instructions would be required to elicit it. Speakers were presented with an example LL sentence, and were asked if they could say the sentence with more than one pitch pattern, but with the same meaning. The intention was to elicit the two pronunciations that have been described in the literature: one with all low-toned syllables, and one with the optional H% plateau. Speakers would then be instructed to produce each sentence first with one pronunciation, and then with the other.
However, it turned out that most speakers were unaware of the two options. Only three of the 10 speakers felt they had more than one pronunciation with the same meaning, and they were asked to produce each sentence both ways. The speakers who had only one pronunciation were asked to produce the LL sentences with that pronunciation.
The speakers who had just one pronunciation consistently produced the LL forms with a high-tone span extending from the second syllable of the verb to the final syllable of the sentence. This was also the pronunciation provided by all speakers for two instances of a sentence of the LL class that was mistakenly included in the main recording block without any separate instructions. For two of the speakers who felt they had two pronunciations of the LL sentences, both pronunciations had the final H% plateau, and the two pronunciations differed just in that one was produced with a greater pitch range than the other. Only one speaker produced the expected pair of pronunciations: one entirely low-toned, and the other with a high-toned plateau extending from the second syllable of the verb to the end of the sentence. It is probably worth noting that this speaker has been trained in Luganda phonetics, and was aware of how Luganda tone has been described.
For the speakers whose pronunciations differed just in pitch range, the production with the more reduced pitch range was selected for analysis, since it was produced more fluently and with less hesitation than the more emphatic, expanded-range productions. For the speaker with two tonally distinct pronunciations, the one with H% was selected for the analysis, since these are the only ones with a high-tone span to measure. For this speaker there were two tokens of LL sentences for which no H%-final pronunciation was offered. These two tokens were excluded from the dataset, leaving 798 utterances for the analysis.
Measurements were made in Praat (Boersma & Weenink, 2013). The onset and offset of the following segmental intervals were marked in the annotation tier: the consonant at the beginning of the first high-toned syllable in the high tone span (C1), the vowel of that syllable (V1), the consonant following V1 (C1a), the consonant at the beginning of the last high-toned syllable in the span (C2), and the vowel of that syllable (V2). The duration of each of these intervals was obtained, and the onset timepoints for C1 and C2 were selected as reference points for the timing of f0 events. The onset of each consonant interval was set at the endpoint of the decline in waveform amplitude from the preceding vowel, while the offset of each consonant interval was set at the point at which amplitude began to rise at the end of the constriction interval. The vowel intervals were the intervals between consonant intervals. In the long-span conditions, the first syllable of the span was distinct from the final syllable, but in the short-span condition, the first and last syllable of the span were the same syllable, so C1 was the same interval as C2, and V1 was the same as V2.
A PitchTier representation of the soundfile was constructed using the autocorrelation method, with 5 Hz smoothing. The f0 rise was defined as the interval of pitch points in the PitchTier which overlapped with the C1-V1 syllable and in which f0 was monotonically rising. The rise onset was the first pitch point that was followed by a point with a higher numerical f0 value, and the rise offset was the first point after that which was followed by a point with an equal or lower f0 value. The f0 fall was defined as the monotonically falling interval after that which overlapped with the span-final C2-V2 syllable. In the (short-span) HL condition, the rise offset was quite often at the same point as the fall onset, in which case both represented a single point that was the f0 maximum of the rise-fall pattern. The f0 plateau was the interval of raised f0 points extending from the rise offset to the fall onset.
The following f0 measurements were obtained:
|(6)||(a)||Rise onset value: The f0 value at rise onset|
|(b)||Rise excursion: The f0 value at rise offset minus the value at rise onset|
|(c)||Plateau excursion: The f0 value at rise offset minus the value at fall onset|
|(d)||Fall excursion: The f0 value at fall onset minus the value at fall offset|
F0 timing measurements were made relative to the test syllable onset: the onset of C1 for the f0 rise, and the onset of C2 for the f0 fall. The following f0 timing measurements were obtained:
|(7)||(a)||Peak delay: Rise offset minus C1 onset|
|(b)||Relative peak delay: Peak delay divided by C1-V1 duration|
|(c)||Fall delay: Fall onset minus C2 onset|
|(d)||Relative fall delay: Fall delay divided by C2-V2 duration|
Generally, there was a final fall in f0 at the end of LL sentences, but in 32 cases, f0 remained at the same high level to the end of the utterance, or actually rose at the end. Of these cases, 29 were produced by two speakers, S4 and S5, who thus had such productions in the majority of their LL sentences. The cases without a final fall were evenly distributed across the 5 LL sentences. These 32 cases are excluded from analyses concerned with the final f0 fall, leaving 766 utterances for these analyses. Sample annotated displays are given below in Figures 3–6.
With regard to the f0 rise and fall excursion in (6), the findings of Fainleib and Selkirk (2014) would lead us to expect a greater rise excursion and a greater fall excursion in the lexical tone spans (HH, LH, HL) than in the intonational tone span (LL). In addition, if long tone spans are hyperarticulated compared to short ones, we would expect the long spans (HH, LH, LL) to have a greater rise excursion and a greater fall excursion than the short spans (HL). On the basis of Fainleib and Selkirk (2014), it is also expected that LL will have a greater negative plateau excursion than in the lexical tone conditions (HH, LH, HL), reflecting declination during the plateau.
If long high-tone spans have slower f0 transitions than short high-tone spans, we would expect the long spans to have greater peak delay than the short ones. At the other end of the span, slower f0 transitions would be reflected in a lower mean fall delay (i.e., an earlier start to the f0 fall) for the long tone spans compared to the short ones.
2.6. Statistical analysis
Mixed-model analyses were conducted, using the packages lme4 (Bates, Maechler, Bolker, & Walker, 2014), lmerTest (Kuznetsova, Brockhoff, & Bojesen, 2014) in R (R Core Team, 2017). The fixed effects were tone (LH, HH, HL, LL), and, for the analysis of peak and fall delay, the segmental durations C1 and V1, or C2 and V2. For tone, the default value was HH, and the other three categories were marked. The participant and the sentence were included as random intercepts, and the interaction of subject with tone was included as a random slope. To explore further pairwise comparisons among the tone conditions, the resulting mixed model was analyzed using lsmeans (Lenth & Love, 2018). The alpha level was .05.
Representative measurement displays are presented in Figures 3–6. In each figure, a spectrogram in (a) has the segmental intervals (C1, V1, C2, V2) indicated. The figure in (b) is the corresponding smoothed PitchTier representation, with highlighting of the f0 rise interval. In these PitchTier images, the pitch range varies from utterance to utterance to fit the pitch points in the display.
In Figure 3, it can be seen that in the HL condition, the f0 rise takes up all of the high-toned syllable (the C1-V1 interval), and the f0 fall begins as soon as the rise is done. In Figures 4 and 5, on the other hand, we see the long high-tone spans of the HH and LH conditions, respectively, in which the f0 fall comes several syllables after the f0 rise. Figure 6 is an example of the LL condition, with a long high-tone span extending from the second syllable of the verb to the final syllable of the utterance.
3.1. F0 level
Figure 7 gives the mean normalized f0 for each tone class at each major position in the high tone span: rise onset, rise offset, fall onset, and fall offset. For comparison across speakers, the f0 values were converted to z-scores relative to the mean and standard deviation for each speaker.
The mean normalized values are quite similar for the three lexical high-tone classes: HH, HL, and LH. The f0 rise is smaller in extent than the following f0 fall for these tones, leading to a drop of an average of 15 Hz from the low tone level before H to a lower low tone level after H (cf. Tucker, 1962: 129). The f0 plateau between the rise offset and the fall onset is quite flat in the lexical high-tone classes, with a mean difference of 1.6 Hz in HH, 0.0 Hz in HL, and 0.4 in LH.
In the intonational tone class LL, on the other hand, the rise onset, rise offset, and fall onset are all substantially lower than in the other tone classes. The f0 rise is greater than the following f0 fall, leading to a mean rise of 4 Hz from the rise onset to the fall offset. Between the rise offset and the fall onset, there is a mean f0 drop of 4.2 Hz.
There is a significant main effect for Tone: LL in all of these comparisons, and no other significant main effects. The negative coefficients (β) for this factor indicate that the rise onset value, the rise excursion, and the fall excursion were all significantly less in the intonational LL class than in the default HH class. Post-hoc pairwise comparison using lsmeans found that the means for these measurements were significantly less in LL than in all three other classes, and no other pairs were significantly different from each other.
On the other hand, there is a positive coefficient associated with Tone: LL in the analysis of plateau excursion, indicating that the f0 drop over the course of the f0 plateau was significantly greater in LL than in HH. Post-hoc pairwise comparison using lsmeans found that the means for these measurements were significantly greater in LL than in all three other classes, and no other pairs were significantly different from each other.
3.1.2. Discussion of the f0 level results
There was no significant difference in f0 scaling between short and long high-tone spans. Thus there was no evidence of hyperarticulation of long high-tone spans or undershoot/reduction of short high-tone spans. The Luganda facts, then, are comparable to those cases in the literature in which pitch movement is not truncated under time pressures (Ladd et al., 1999; Fougeron & Jun, 1998; Grabe, 1998; Grabe et al., 2000). The f0 value of a high tone is not affected in Luganda by the length of the span.
On the other hand, rise excursion and fall excursion were significantly smaller in the intonational LL tone class than in the lexical high tone classes (HH/LH/HL), while declination within the high-tone span was greater in the intonational class than in the lexical classes. These results support for the characterization of the realization of Luganda H% in Fainleib and Selkirk (2014). More generally, the results provide support for the claim that boundary tones can be subject to different phonetic implementation patterns than other tones of the same tone level (Pierrehumbert, 1980; Pierrehumbert & Beckman, 1988; Myers, 1996).
A reviewer suggests that the differences in f0 level between the boundary H% and the lexical H conditions might be due to confounding factors. For example, the mean number of syllables preceding the high tone span varied by tone class: 5.0 (HH), 4.7 (HL), 4.0 (LH), 4.0 (LL). However, when we added the number of syllables as a factor to the analyses of f0 value, it was not found to be a significant factor in any of these analyses. The same was true of other potential factors: the duration of the utterance up to the onset of the first high-toned syllable, the mean length of the utterance, or the height of the vowel in V1.
The LL condition also differed from the other conditions in being elicited in a separate block with distinct instructions. The distinct instructions turned out to be unnecessary, and indeed it must have added some to the confusion of the speakers. But we are convinced that this mis-step on our part did not affect the results. It is first of all difficult to see how the instructions or the separation into a block could consistently lead to a smaller f0 excursion for the LL condition compared to the other conditions. Moreover, the reduced pitch excursion and increased plateau excursion were also observed in other sentences which lacked lexical high tones in the verb phrase. These include the sentences in Fainleib and Selkirk (2014), which were elicited without the special instructions. They also include two sentence tokens per speaker of this sort which were mistakenly included in the main recording block and were produced by the same speakers as in this study. Finally, the instances of the pattern include many such sentences in a subsequent recording session for another study, which were included in a larger block without special instructions. We conclude that the reported f0 contour is the normal production for sentences ending in a sequence of syllables without lexical high tones, and is not an artefact of our elicitation strategy.
Fainleib and Selkirk (2014) attributed the phonetic differences between lexical and intonational H to an anticipatory dissimilatory effect (Xu & Wang, 2001), i.e., the raising of H before L, as in Kamba (Clements, 1983), Japanese (Poser, 1984), Hausa (Inkelas, Leben, & Cobler, 1987), Thai (Potisuk, Gandour, & Harper, 1997), Bimoba (Snider, 1998), Mandarin (Xu & Wang, 2001), Yoruba (Laniran & Clements, 2003), and in Engenni and Cahi Rimi (Hyman, 1993). Hyman (1982) and Hyman and Katamba (2010) posit a low tone inserted at the end of each lexical high-tone span in Luganda, but not at the end of a boundary high-tone span. The raising of the lexical high tone before this L tone can account for the higher f0 level for lexical H compared to intonational H. Moreover, if we assume a general f0 declination over the course of a phrase (Pierrehumbert & Beckman, 1988), then the flatter f0 plateau in the lexical high tone spans can be attributed to the anticipatory dissimilatory effect counteracting and nullifying declination in the lexical spans but not the intonational spans.
Alternatively, the difference between the lexical and intonational high tones could be attributed to final lowering—the lowering of f0 values toward the end of a phrase (Poser, 1984; Liberman & Pierrehumbert, 1984; Pierrehumbert & Beckman, 1988; Herman, 1996). The decline in f0 over the course of the intonational high-tone span would in this view be due to the limitation of final lowering to the final syllables of the span. Lowering of phrase-final high tones to mid is attested in the Bantu languages Kukuya (Paulian, 1975) and Kombe (Elimelich, 1976), while lowering of phrase-final high to low tone is found in Rimi (Olson, 1964) and Kikuyu (Clements, 1984).
The experiment reported here does not provide evidence deciding between these two possible explanations, or ruling out other alternative explanations. It was only designed to test the claim that these effects exist. To test the claim that the effect is due to final lowering, for example, we would need to compare sentences with final H% to sentences with phrase-final lexical H. Such a comparison was included in subsequent experiments, the results of which remain to be determined.
Interestingly, the difference in f0 peak level between boundary H% and lexical H is anticipated in the low-toned or toneless material preceding the f0 rise. The rise onset f0 level is the f0 value at the point at which f0 begins to rise toward the peak, and in Table 2 it was seen that this value was significantly higher before the higher peak of the lexical H than before the lower peak of the boundary H%. The height of the upcoming f0 peak thus has assimilatory (or coarticulatory) effects on the f0 level of the preceding baseline interval.
The final H% boundary tone in Luganda has been described as optional (Meeussen, 1965; Kalema, 1977; Hyman, 1982; Hyman & Katamba, 2010). However, in this study, final H%, as manifested by an interval of raised f0 extending from the second syllable of the verb phrase to the end of the sentence, was present in every sentence ending in a sequence of syllables without lexical high tones. It was present in every instance of the LL condition, except for one alternate pronunciation of each sentence by one speaker (S4). It could be that this final H% is obligatory for these speakers in this particular discourse situation, involving isolated statements. Or it could be that the final H% satisfies a phonological requirement that a sentence end in high tone. The fact that it has previously been described as optional might reflect a change in the language from the time of the earlier descriptions to the present day, or it could be that the reduced pitch range of the H% has led to it being overlooked in some cases in previous descriptions.
3.2. F0 timing
Figure 8 presents the time-course of normalized f0 as a function of the proportion of the syllable duration. Each point in Figure 8a represents a sample at a 10% increment of the total duration of the C1-V1-C1a interval at the beginning of the high tone span, while each point in Figure 8b represents a 10% increment in the duration of the C1-V2 interval at the end of the span.
Throughout the first half of the C1-V1-C1a interval in Figure 8a, f0 rises in all tone classes, though at a lower f0 level in the LL intonational tone class. Then at about 60% of the C1-V1-C1a interval (corresponding to the end of the C1-V1 syllable), f0 in the short-span HL class begins to descend, while the long-span HH and LH classes do not do so.
At the other end of the high tone span, shown in Figure 8b, the f0 fall for the long-span classes HH, LH, and LL begins in the span-final C2-V2 syllable. The span-final syllable in the case of the short-span HL condition is the same as the span-initial syllable, so for that class we see the same f0 rise as in Figure 8a. The f0 fall for the HL class began at the end of the syllable, at the completion of the rise, while in the other classes began early in the syllable. In the LL class, the fall onset was earlier than in any of the other classes.
Mean relative peak delay (Figure 9a) was less in HL (0.97) than in the other classes: HH (1.62), LH (1.55), LL (1.41). The f0 peak was attained close to the end of the vowel in the initial high-toned syllable in the short-span condition, while that point was in the following syllable in the long-span conditions. Mean relative fall delay, displayed in Figure 9b, was 1.00 for HL, indicating that the fall began on average at the end of the vowel (at the end of the rise), while it was lower in the lexical long high-span classes—HH (0.19) and LH (0.32)—indicating that in these classes the fall began early in the span-final syllable. The LL class had the lowest mean fall delay (0.02).
Thus in both the rise and the subsequent fall, the short-span HL class has the peak that is closest to the outer edge of the tone span.
|C1 duration (ms)||0.67||27.0||9.8||<0.001||*|
|V1 duration (ms)||0.33||99.2||3.4||<0.001||*|
|C2 duration (ms)||1.14||53.2||13.8||<0.001||*|
|V2 duration (ms)||0.53||463.4||5.5||<0.001||*|
In Table 6, presenting the analysis of peak delay, there are significant main effects for segmental interval duration C1 and V1, indicating that the f0 rise is timed with respect to the segments of the span-initial syllable, as found previously by Myers et al. (2018). When those segments are longer, peak delay is correspondingly longer. There are similar segmental main effects in Table 7, indicating the same temporal correlation in the timing of the f0 fall relative to the segments of the last syllable in the high-toned span.
There is also a significant main effect for Tone: HL in both Tables 6 and 7. Taking into account the coefficient (β), the results indicate that the f0 rise ends earlier in the span-initial syllable in the short-span HL class than in the default long-span HH class, and the f0 fall begins later in the span-final syllable in HL than in HH. In pairwise post-hoc comparison, HL has a significantly lower mean peak delay than those of any of the three other tone classes, and a significantly greater fall delay. The edge of the f0 plateau is thus significantly closer to the edge of the high-toned syllable sequence in HL than in the other conditions both in the rise and the fall.
The symmetry between the rise and fall does not extend to the other factors, however. In Table 6, there is a main effect for Tone: LH in the analysis of peak delay, and in pairwise comparisons LH as a significantly greater mean peak delay than any other class. In Table 7, on the other hand, there is a main effect for Tone: LL, and it is LL that has a significantly lower mean fall delay than any other class.
3.2.2. Discussion of the f0 timing results
There was a significantly shorter peak delay and a significantly longer fall delay in the short high-tone HL spans than in the long high-tone spans (HH, LH, LL). Thus in the short-span condition, both outer edges of the high-f0 plateau were closer to the outer edges of the corresponding syllables than in the long spans. In the short high-tone span HL condition, the rise has to be rushed to be completed in time to move on to the next target, while the fall is pushed later in the syllable because the rise must be completed before the fall can begin. In the long high-tone spans, on the other hand, the rise and the fall are spread out enough that they don’t impede each other, with the result that each is drawn out more relative to the tone-bearing syllables.
The difference between short and long spans is parallel to those cases of tone realization in which f0 targets are maintained under time pressure, but the transitions are compressed in duration (Ladd et al., 1999; Fougeron & Jun, 2000; Grabe, 1998; Grabe et al., 2000). As in the production of long consonants or vowels, the production of a long high tone span is characterized by a slower rate of articulatory adjustment than the production of a short counterpart (Löfqvist, 2005). The long high tones of Luganda are an interesting special case in this regard, in that the laryngeal adjustments that yield raised f0 are unbounded in duration, unlike other ‘long’ articulatory gestures in speech.
There were two unexpected effects. With respect to peak delay, LH had a significantly higher mean than any other class. With respect to fall delay, LL had a significantly lower mean than any other class.
The early onset of the fall in LL could be related to small fall excursion, established in Section 3.1.1. Smaller displacement in articulation is associated with lower maximum velocity of movement (Kelso, Vatikiotis-Bateson, Saltzman, & Kay, 1985; Munhall, Ostry, & Parush, 1985; Kollia, Gracco, & Harris, 1995). But if this was the explanation of the effect, it would be expected that peak delay in the rise would be lower than in the other classes, which was not the case.
Alternatively, the early onset of the fall in LL could be related to the fact that the fall is optional. It was noted in Section 2.3 that of the 198 instances of LL, 166 had a final fall and 32 did not. Two speakers had a final fall in only a minority of their LL tokens. On the other hand, in the other conditions there was always a fall after the peak, and indeed the fall in those classes was greater in extent than the initial rise (Figure 5). This distinction is reflected in the representations of Hyman and Katamba (2010), who posit a low tone following the lexical tones, and no such tone following the boundary H%. It could be then that the final fall often found in the LL condition is due to final lowering rather than a target L, and so is on a different time frame.
The fact that LH had a higher mean peak delay than any of the other tone classes is more puzzling. The only difference between HH and LH is that the initial syllable of the high-tone span is an underlyingly high-toned syllable in HH, and not in LH. If the two are phonetically distinct, then this underlying information must be preserved in the surface, phonetically-interpreted representation, as suggested for theory-internal reasons by Prince and Smolensky (1993) and McCarthy (2003). The greater peak delay in the LH case might then reflect a lesser urgency in marking a derived initial high-toned syllable that is redundant and predictable.
Short high-tone spans in Luganda do not differ from long high-tone spans in f0 level or f0 excursion, but they do differ in timing. The f0 transitions in short high-tone classes are shorter than in long high-tone classes, reflecting the greater time pressures in the short high-tone span. The temporal compression of the transitions in the short spans parallels the response to time pressures that has been found in other languages due to variation in speaking rate or the duration of the voiced sonorant interval.
Intonational H% tone-spans in Luganda have a smaller f0 rise and fall excursion than lexical H tone spans. This confirms the claim of Fainleib and Selkirk (2014) that these two categories are phonetically distinct, contrary to the previous literature. This suggests that it is worth investigating further how boundary tones differ in phonetic implementation from other lexical and intonational tones.