There is mounting evidence suggesting that temporal information is necessary in representations of lexical tone. Gestural models of tone provide a natural entry point to linking abstract association with physical realization, but remain underdeveloped. We present the results of two acoustic production studies on two dialects of Serbian, a lexical pitch accent language. In the Belgrade dialect, pitch accents are aligned relatively late in the tone-bearing unit, while in the Valjevo dialect, pitch accents are phonetically retracted, sometimes into the preceding syllable. We varied the phonetic duration of syllable onsets of candidate tone-bearing units in falling (experiment 1) and rising (experiment 2) pitch accents, and measured the effects on the timing of F0 excursions. Consistent interactions between F0 excursions and the segmental content indicate that the phonological system of abstract tone association is the same in both dialects, despite differences in temporal alignment. We argue that this apparent mismatch between association and alignment can be expressed straightforwardly in the Articulatory Phonology framework by allowing tone gestures to coordinate with other gestures in all the ways that segmental gestures can, rather than restricting tone to c-center coordination.
The question of how to generate the phonetic timing of tone contours from a temporally impoverished phonological representation has long been a topic of debate in phonological theory (e.g.,
We present data from two acoustic studies on two varieties of Serbian, one with late alignment of accentual peaks and one with early alignment, and show that they share an abstract phonological system but have distinct phonetic realizations. Articulatory Phonology is well equipped to capture these facts: Both abstract phonological relationships and phonetic realization can be derived from coordinative relationships, where the presence of a coordinative relationship between two gestures indicates phonological association and the precise nature of that coordinative relationship predicts the phonetics. We expand on the gestural model of tone and show that tonal coordination is more diverse than what has been previously hypothesized, and propose that tone gestures can be coordinated in all the same manners as segmental gestures.
In Autosegmental theories of tone, tones reside in a separate tier from segments, and are ‘associated’ to the segmental content of a word (
In addition, phonetic information can also play a role in the analysis of a given language’s TBU. Specifically, the association of a tone to a TBU implies some sort of overlap in time (
Examples of two phonologically identical HL tones that produce two phonetically distinct contours, with left alignment (left) and right alignment (right).
However, it is not uncommon for pitch targets to occur outside their TBU, thus calling into question the assumption that phonetic alignment of tone targets and phonological association of tones are tightly paired (see discussion in
There are comparatively fewer cases of early tonal target achievement, but there are documented cases. In tonal crowding, peaks occur early relative to the TBU; however, this term specifically describes circumstances where pitch targets are shifted to the left due to tonal pressures from the right, such as the addition of a boundary tone (
In recent years, it has become increasingly clear that temporal information should be included in phonological representation. Contrastive alignment within a syllable was hypothesized to not be possible in lexical tone languages (
Further evidence for the necessity of timing information in tonal representation comes from systems of intonation. Contrasts in alignment in intonation have led to the introduction of the star convention in Autosegmental-Metrical theory (
Furthermore, it is unclear that targets are the only crucial point to consider in the relationship between TBUs and tone. Tone targets have been preferred as points of alignment in accounts of lexical tone, but there is accumulating evidence that the onsets of F0 movement are reliably timed (within a particular tone of a given language) as well. For example, the early versus late rises in English described by Pierrehumbert (
Gestures are, importantly, specified in both space and time, and provide natural points for alignment. Recently, tone has fruitfully been treated as a gesture (e.g.,
Timing information comes from two sources, and the acoustic linearization can then be derived from the combined spatial and temporal details included in a gestural constellation. First, any individual gesture is specified with some stiffness, which is an abstraction of duration analogous to spring stiffness (
In more restrictive versions of AP, such as the coupled oscillator model (
Coordinative schematics for c-center, with either two onset consonants or a simple onset with tone gesture. Solid lines indicate onset-to-onset (in-phase) coordination; dashed lines represent onset-to-target (anti-phase) coordination.
This same structure has not been verified in intonation languages thus far (
However, the body of literature addressing pitch as a potential articulatory gesture remains relatively small, and thus the c-center hypothesis for tone is largely untested. Only a few languages have been investigated: Prior to the development of the c-center hypothesis for tone, the author is only aware of work on Mandarin and some speculation on Thai (
Furthermore, as discussed in the previous section, evidence from studies in the Autosegmental(-Metrical) framework suggests that gestural targets, as well as gestural onsets, are also a potential point of temporal stability between tones and their TBUs. Gafos (
This suggests that a gestural model more similar to that proposed in Gafos (
Serbian
The names of the accents are indicative of the bundle of prosodic characteristics that have been included in the term ‘accent.’ The length descriptors refer to the phonological length of the vowel in the ‘accented’ (stressed) syllable: Short accents have a short vowel, and long accents have a long vowel. The pitch descriptor refers, generally speaking, to the pitch contour of the ‘accented’ syllable: Syllables with falling accents start high and fall moving into the following syllable, while syllables with rising accents start low and rise moving into the following syllable. Schemata for the four accent types are provided in blue in
Schemata of the F0 movements for the four accent types on trisyllabic words, comparing Belgrade (red) and Valjevo (blue) dialects. Contours adapted from
There are currently two major autosegmental proposals for the representation of the Serbian accentual system. For this paper, we take as a starting point the analysis presented by Inkelas and Zec (
A breakdown of the features of the four Serbian accents as they occur on initial syllables, following
Accent | Vowel | Stress | Pitch | Phonology | Orthography | Gloss |
---|---|---|---|---|---|---|
short | initial | initial | /ˈlu |
lȕka | ‘onion. |
|
short | initial | second | /ˈlula |
lùla | ‘smoking pipe’ | |
long | initial | initial | /ˈlu: |
Lûka | ‘Luka (name)’ | |
long | initial | second | /ˈlu:ka |
lúka | ‘harbor’ |
The main alternative analysis utilizes the star convention from Autosegmental-Metrical theory to express the contrast between rising and falling accents, and crucially associates both pitch accents to the stressed syllable, indicating that the TBU is always the stressed syllable. Smiljanić (
Of particular scientific interest are the Belgrade and Valjevo dialects of Serbian, which are typically assumed to have the same system of tonal contrast, but with different temporal alignment. Zec and Zsiga (
In this study, we consider three hypotheses to capture the differences between the Belgrade and Valjevo dialects of Serbian. In the first set of hypotheses, the dialects share a common system of TBUs. The difference between the two hypotheses is which syllable serves as the TBU—most crucially, in rising accents.
Alternatively, it could be the case that the Belgrade and Valjevo systems have diverged regarding which syllable is the TBU. This hypothesis corresponds to a scenario where the retracted peaks in Valjevo Serbian are reflective of a shift in TBU, rather than solely a phonetic shift.
In order to determine what the TBU is, we will examine the temporal interactions and points of temporal stability between segments and tones by varying the segmental content of the potential TBUs. The c-center hypothesis for tone only generates Hypotheses 2 and 3—specifically, the c-center structure for tone cannot generate Valjevo rising accents if they are realized fully before the segmental TBU, which is the case in Hypothesis 1, but could potentially produce contrastive alignment within one TBU, which is the case in Hypotheses 2 and 3. An expanded gestural model for tone, on the other hand, is able to generate all three scenarios. Thus, if the patterns of temporal stability and interaction support Hypothesis 1, it is necessary to expand the gestural model of tone.
In order to address these questions, we present the results of two acoustic studies on these dialects of Serbian. In the first study, we examine the interaction between segment timing and the timing of falling accent contours, which are uncontroversially associated to the initial syllable. In the second study, we compare the timing of falling accents, which provide a baseline of expected interactions between TBUs and tones, to the timing of rising accents contours, which are more controversial, and can distinguish between the hypotheses.
In this experiment, we focus on the falling accent contour and establish the relationship between the segmental content of a TBU and F0 timing. The falling accents are rather uncontroversial compared to rising accents; both the Inkelas and Zec (
Data was collected from a total of 24 participants (13 Belgrade, 11 Valjevo). Speakers of the Belgrade dialect were all born and raised in Belgrade, though typically one or both parents were from elsewhere. Speakers of the Valjevo dialect had all been raised in Valjevo, but were currently studying in Belgrade. Since living in Belgrade for an extended amount of time affects the realization of accent, the Valjevo speakers were all young university students who still had family ties in Valjevo and frequently visited home. One Valjevo participant was excluded after data collection as they had already accommodated their accentual production to the Belgrade dialect, as determined by trained Serbian phoneticians (one native Valjevo speaker, one native Belgrade speaker).
Several additional exclusions had to be made. The bulk of exclusions were due mainly to frequent errors, either improper focus placement (seven participants, including broad focus, narrow focus on the carrier word, and list intonation), which made it impossible to find accentual peaks in the target word, or errors in segmental or accentual production or both (three participants). Accentual errors were largely due to the use of non-words in the study, some of which share orthographic representation with existing words with different accents. There were two particularly problematic words: First, the word
The final dataset includes data from 13 participants: five speakers of Belgrade Serbian (ages 19–39; 3M, 2F) and eight speakers of Valjevo Serbian (ages 19–22; 2M, 6F).
Target words were formed from three base words, where the first syllable onset of these words was varied to create a set of five rhyming words, using three simple onsets /r, l, m/ and two complex onsets /mr, ml/:
Since Serbian typically does not mark any aspect of accent in its orthography, participants received a list of words that would be used in the study, which marked the accents using dictionary conventions and grouped them together to make clear what accents they had. They were informed that the nonce words were supposed to be a ‘perfect rhyme’ with the real word it looked like, i.e., that they had the same pitch accent. They were also told that the nonce words were supposed to refer to other things in the same lexical category as the real word—e.g., since
In order to reduce boredom and consequent list intonation, there were two carrier phrases in this study:
During the experiment, participants first heard a spoken prompt, which was recorded by a native speaker of Belgrade Serbian. The prompt either claimed that there were no instances of a lexical category (e.g.,
The 50 sentences were put in random order and then split into two groups of 25, creating two blocks to prevent fatigue (thus, one round of the experiment had two blocks with 25 trials each). After the two blocks were completed, the 50 sentences were randomized again, instead of repeating the first random order. The experiment repeated for three rounds, for a total of 150 sentences.
The experiment was presented in PsychoPy. Due to difficulties triggering clean recordings with native PsychoPy tools, the experiment was recorded in Audacity as one sound file with a Samson GoMic.
Data was initially aligned with the Montreal Forced Aligner (
F0 was collected using Praat’s ‘Get Pitch’ function, and smoothed with a bandwidth of 10 Hz. The corrected text grids and F0 tracks were then processed with a Matlab script. Pitch track landmarking was semi-automatized using a Matlab script that found F0 extrema located within certain segment-based boundaries. Boundaries were refined by iteratively hand-checking pitch landmarking to ensure that the correct local extrema were found. Due to the dialect-based timing differences, the boundaries were ultimately defined slightly differently for Belgrade and Valjevo dialects—for example, the leftmost boundary was the acoustic onset of the target word for both dialects, but the rightmost boundary was the beginning of the second syllable nucleus for the Belgrade dialect, and the beginning of the second syllable onset for the Valjevo dialect. Dialect-specific boundaries ensured that the algorithm was provided a sufficiently large region to search for target word contours, while still avoiding potential extraneous peaks associated with the carrier phrase or phrase-final intonation. The resulting pitch landmarks were then used to bound where further landmarks (i.e., maximum onset velocity, F0 valley, and F0 gesture onset) could be located.
After pitch landmarking was completed, each trial was verified by hand for reasonable tracking. Several trials had to be excluded due to irreconcilably incorrect pitch tracking, or pitch tracks that did not clearly show minima or maxima. Only F0 trajectories with clear 0 velocity points between the carrier and target words can be used to determine the start and duration of the pitch excursion, as otherwise it is impossible to determine where the upward pitch excursion for the target word begins. F0 minima were more prone to unreliability than pitch maxima due to intonational differences within and between participants. Various intonational patterns completely obscured this minimum, including focus on the carrier word rather than on the target word. Two examples of such tokens from the Belgrade dialect are given in
Examples of F0 shapes that were removed for the analysis of F0 onsets.
Examples of F0 shapes that allow analysis of F0 onsets.
In the Belgrade dialect, out of a possible 450 trials, 427 had suitable peak marking; out of those 427 trials, 401 also had suitable minima. In the Valjevo dialect, out of a possible 720 trials, 693 had suitable peak marking; out of those 690 trials, 456 also had suitable minima. Some participants had more data excluded than others, but with one exception, remaining data was balanced across syllable onset and accent within participant.
As the absolute F0 peak is less stable and prone to small fluctuations, peak timing was measured using the gestural release rather than the actual target F0 peak (
Data was analyzed with linear-mixed effects models in R (
Dependent variables are H offset (timing of accentual peak relative to the beginning of the word), the timing of the onset of the pitch excursion (interval between the start of the upward pitch trajectory and the beginning of the word), and the duration of the pitch excursion (interval between H offset and start of pitch excursion). A schematic of these variables is provided in
A schema of the dependent variables used in analysis. The blue line is a schematized short falling accent, with black dots to mark the start (leftmost) and H offset (rightmost) of the pitch excursion.
As different syllable onsets were chosen in order to compare effects of syllable onset complexity versus phonetic duration, we will first present the acoustic duration characteristics of the onsets used, which differ in duration as intended. Including dialect as a predictor of syllable onset duration did not significantly improve model fit (
In both dialects, H offset occurs later with in words with longer syllable onset duration; these patterns are illustrated in
Scatter plots comparing the relationships between varied syllable onset durations and H offset in falling words (both dialects).
Summary of the LME model for H offset:
(Intercept) | 67.36 | 10.29 | 6.55 | <0.0001*** |
OnsetDuration | 0.68 | 0.02 | 31.55 | <0.0001*** |
ShortAccent | 17.69 | 2.79 | 6.35 | <0.0001*** |
Valjevo | –29.94 | 12.88 | –2.32 | 0.04* |
ShortAccent:Valjevo | –11.88 | 3.55 | –3.34 | 0.0009*** |
In models for the Belgrade dialect alone, syllable onset duration as a single fixed effect significantly improves the model; words with longer syllable onsets have later H offsets (
Patterns are similar in the Valjevo dialect. The addition of syllable onset duration significantly improves the fit of the model; words with longer syllable onsets have later H offsets (
In all, both dialects show effects of syllable onset duration on the timing of H achievement. In a model that includes both dialects, adding dialect as a fixed effect significantly improves the fit of the model; as described previously in the literature, pitch peaks overall occur earlier in the TBU in the Valjevo dialect (104.0 ± 8.54 ms) than in the Belgrade dialect (142.0 ± 10.80 ms). Adding the interaction between dialect and syllable onset duration does not significantly improve the model (
Scatter plots comparing the relationships between varied syllable onset durations and start of the pitch excursion in falling words (both dialects).
The Valjevo dialect behaves somewhat similarly, though the pitch excursion moves less in response to syllable onsets than in the Belgrade dialect. The addition of syllable onset duration significantly improves the fit of the model in the Valjevo dialect; pitch excursions start slightly later in words with longer syllable onsets (
In a model that considers both dialects together, adding syllable onset duration significantly improves model fit. Adding dialect does not improve the model fit (
Excursion start timing:
(Intercept) | –33.88 | 5.36 | –6.32 | <0.0001*** |
OnsetDuration | 0.40 | 0.03 | 13.83 | <0.0001*** |
Valjevo | 22.79 | 7.07 | 3.23 | 0.003** |
OnsetDuration:Valjevo | –0.28 | 0.04 | –7.22 | <0.0001*** |
Scatter plots comparing the relationships between varied syllable onset durations and excursion duration in falling words (both dialects).
In the Valjevo dialect, syllable onset duration significantly improves the fit of the model; words with longer syllable onsets have longer pitch excursions (
When considering both dialects together, the addition of dialect marginally improves the fit of the model (
Excursion duration:
(Intercept) | 104.72 | 9.15 | 11.44 | <0.0001*** |
OnsetDuration | 0.29 | 0.04 | 7.89 | <0.0001*** |
ShortAccent | 15.50 | 2.85 | 5.43 | <0.0001*** |
Valjevo | –46.77 | 12.03 | –3.89 | 0.0008** |
ShortAccent:Valjevo | –8.73 | 3.99 | –2.19 | 0.03* |
OnsetDuration:Valjevo | 0.21 | 0.05 | 34.22 | <0.0001*** |
This experiment provided a baseline for the behavior of pitch accents and their TBUs in Serbian. In both dialects, H offset is later when the lexical H is associated to a syllable with longer syllable onset. However, the two dialects achieve this in different ways. In the Belgrade dialect, pitch excursions both start later and get longer with increases in syllable onset duration. In the Valjevo dialect, the pitch excursions start only slightly later with increased syllable onset duration, and the bulk of the H offset timing comes from stretching the pitch excursion with increased syllable onset duration. These patterns of ‘stretching’ suggest that both dialects have some form of H gesture target coordination, but differ on how the onset of the H gesture is coordinated with the syllable. As the association of falling accents is uncontroversial, this indicates that the two dialects express the same phonological association with different coordinative schemes. Despite these small differences, however, in both dialects it is true that the timing of the TBU provides timing information to the accentual pitch excursion. As rising accents may differ in phonological representation between the two dialects, we can compare this baseline of how falling accents behave in each dialect to the behavior of rising accents in each dialect in Experiment 2.
In order to probe the association of the H tone in rising accents in Belgrade and Valjevo Serbian, we examine the effects of syllable onset on the coordination and timing of the rising accent in two rising accent variations: First, when only the stressed syllable varies in onset complexity, and second, when only the post-tonic syllable (proposed TBU) varies in onset complexity.
Participants were recruited according to the same criteria as in Experiment 1. Data was collected from 19 participants (10 Belgrade, nine Valjevo). Some data had to be excluded due to consistent production errors, either segmental or accentual (three participants) or due to an alternative lexical accent
The target words for Experiment 2 were very similar to those used in Experiment 1, but focus solely on short accents. This serves to control the duration of the experiment and keep the statistical models more simple, and also makes for a more straightforward comparison between rising and falling accents, as the length descriptor refers to the length of the stressed vowel, not the length of the vowel with the H tone.
As in Experiment 1, the target words were formed from three real words, with one syllable onset varied (using /r, l, m, mr, ml/) to make four additional nonce words. The three base words are
In order to prevent some boredom and make sure the participants were paying attention throughout, two different stimuli frames were used:
As in Experiment 1, participants first heard a context prompt (recorded in advance by a native speaker of Valjevo Serbian who has been living in Belgrade for 20 years) and then read the given response. In order to prevent overlap, rushing, and list intonation, the written response only appeared on the screen after the context prompt ended. The context prompt asked the participant if they wanted or had asked for a certain object; the object in the question was a semantically plausible replacement for the target word, and had the same accent and syllable number as the target words. The same prompts were used for all words in an accentual group, so it was not possible to fully anticipate what the response was, as there are always five possible responses for any given prompt. Two example questions and responses are presented in
Example contexts and responses.
There were 15 target phrases total (three accent types × five syllable onsets), with two questions for each target phrase. As in Experiment 1, the order of presentation was fully randomized for every round of the experiment. For this experiment, the 30 prompt questions were put in random order and then split down the middle to make two blocks (thus, two blocks with 15 sentences each). After the two blocks were completed, the 30 sentences were randomized again, instead of repeating the first random order. The sentences were repeated five times, for a total of 150 trials.
The experiment was presented using PsychoPy. Participants were recorded in a quiet room, using either a TASCAM DR-100mkII microphone (four participants, all Belgrade speakers), a Sennheiser noise-canceling headset (four participants, all Valjevo speakers), or a Shure head-mounted microphone (four Belgrade speakers, two Valjevo speakers).
Data was processed and segmented as in Experiment 1. Single trials were excluded if there were significant disfluencies or segmental or accentual errors (participants who were entirely excluded tended to have at least 30 errors in the experiment; participants who had occasional trials excluded did not exceed 10 errorful trials). Some additional data cleaning was necessary for the analysis of the pitch excursions. Out of a possible 1,200 Belgrade tokens, 1,163 had a clear maximum and were retained for an analysis of H achievement. Of those, 1,073 also had clear minima and were retained. Out of a possible 900 tokens in the Valjevo dialect, 882 tokens had a clear maximum, and 846 tokens additionally had clear minima.
Data was analyzed in R using the same procedures as Experiment 1. Models had fixed effects of TBU syllable onset duration (duration of the onset of the syllable proposed to be the TBU in the
The dependent variables are the same as in Experiment 1: H offset, timing of F0 excursion onset, duration of F0 excursion. However, for the rising accents the syllable that the timing of the pitch excursion is compared to is the proposed TBU (i.e., the second syllable), rather than the first syllable. This is illustrated in
A schema of the dependent variables used in analysis, as marked on falling
The same set of onsets was used in Experiment 2 as in Experiment 1; however, as Serbian uses duration as a correlate of stress, it is also necessary to verify the durational differences between onsets in unstressed syllables. As for Experiment 1, the addition of dialect as a fixed effect does not significantly improve the model (
There is a similar pattern for the duration of the nucleus of the syllable with the varied syllable onset. Prosodic template as a fixed effect significantly improves the fit of the model (
The patterns of H offset timing are illustrated for both dialects in
Scatter plots comparing the relationships between varied syllable onset durations and H offset in H+Stress, Stress-only, and H-only words (both dialects).
In the Belgrade dialect, the duration of the varied syllable onset as a single predictor significantly improves the fit of the model; accentual peaks occur later when the syllable onset is longer (
In contrast, when using the duration of the TBU syllable onset, all templates have a positive relationship between syllable onset duration and H offset. The duration of TBU syllable onset duration as a single fixed effect significantly improves the fit of the model. This model (
Accentual peaks pattern similarly in the Valjevo dialect. The duration of the varied syllable onset as a single predictor significantly improves the fit of the model; accentual peaks occur later when the syllable onset is longer (
As in the Belgrade dialect, when using the duration of the TBU syllable onset, all prosodic templates have a positive relationship between syllable onset duration and H offset. The duration of TBU syllable onset duration significantly improves the fit of the model, and this model (
Overall, the Belgrade and Valjevo dialects look similar to each other. In a model including both dialects, adding dialect as a fixed effect significantly improves the fit of the model; Valjevo peaks occur earlier relative to the start of the syllable (
Summary of the LME model for H offset, using the duration of the TBU syllable onset. Model:
(Intercept) | 53.79 | 9.16 | 5.87 | <0.0001*** |
OnsetDuration | 0.54 | 0.05 | 11.79 | <0.0001*** |
H+Stress | 27.47 | 4.94 | 5.56 | <0.0001*** |
Stress-only | –3.39 | 7.57 | –0.45 | 0.65 |
Valjevo | –62.54 | 13.46 | –4.65 | 0.0002*** |
OnsetDuration:H+Stress | 0.34 | 0.05 | 6.83 | <0.0001*** |
OnsetDuration:Stress-only | 0.002 | 0.13 | 0.01 | 0.99 |
OnsetDuration:Valjevo | –0.13 | 0.05 | –2.66 | 0.008** |
H+Stress:Valjevo | 42.34 | 3.62 | 11.688 | <0.0001*** |
Stress-only:Valjevo | –13.23 | 3.60 | –3.67 | 0.0002*** |
For the excursion characteristics, we will focus on the duration of the TBU syllable onset, as it provides a better prediction of the data than the varied syllable onset.
Scatter plots comparing the relationships between varied syllable onset durations and H offset in H+Stress, Stress-only, and H-only words (both dialects).
In the Belgrade dialect, the addition of prosodic template significantly improves the fit of the model. Pitch excursions in words in the H-only template start the earliest (–35.3 ± 9.70 ms) and are similar but significantly different from excursions in words in the Stress-only template (–22.3 ± 9.70 ms,
The patterns differ in the Valjevo dialect. The addition of prosodic template significantly improves the fit of the model. Pitch excursions start the earliest in rising words; for the H-only template, pitch excursions start 100.81 ± 9.89 ms before the start of the TBU syllable, and in the Stress-only template 150.39 ± 9.90 ms before the start of the TBU syllable. Pitch excursions in the H+Stress template start approximately concurrently with the start of the TBU (7.24 ± 9.95 ms). All templates are significantly different from each other (
For models considering both dialects, the addition of dialect significantly improves the fit of the model. The interaction between dialect and syllable onset duration also significantly improves the model fit, as does the interaction between dialect and prosodic template. Although overall pitch excursions start earlier in the Valjevo dialect than in the Belgrade dialect, the effect of dialect is larger for rising accent words (where accentual peaks are retracted from the second syllable in the Valjevo dialect) than for falling accent words (where accentual peaks still occur in the stressed syllable in both dialect). The three-way interaction between dialect, prosodic template, and syllable onset duration also significantly improves model fit. As suggested by the individual dialect models, syllable onset duration affects the timing of the start of the excursion in H+Stress words but not the two rising accent words in the Belgrade dialect; this differs from the Valjevo dialect where syllable onset duration does not affect the start of the pitch excursion in any template. These results parallel those from Experiment 1. The full model summary is available in
Full LME model for the start of pitch excursion, with the duration of the TBU syllable onset. Model:
(Intercept) | –44.70 | 11.15 | –4.00 | 0.0002*** |
OnsetDuration | 0.12 | 0.09 | 1.29 | 0.20 |
H+Stress | 28.51 | 10.48 | 2.72 | 0.007** |
Stress-only | –7.89 | 16.43 | –0.48 | 0.63 |
Valjevo | –62.77 | 16.64 | –3.77 | 0.0005*** |
OnsetDuration:H+Stress | 0.45 | 0.11 | 3.96 | <0.0001*** |
OnsetDuration:Stress-only | 0.47 | 0.29 | 1.59 | 0.11 |
OnsetDuration:Valjevo | –0.03 | 0.14 | –0.25 | 0.81 |
H+Stress:Valjevo | 70.42 | 15.61 | 4.51 | <0.0001*** |
Stress-only:Valjevo | –22.60 | 24.73 | –0.91 | 0.36 |
OnsetDuration:H+Stress:Valjevo | –0.38 | 0.17 | –2.21 | 0.03* |
OnsetDuration:Stress-only:Valjevo | –0.80 | 0.45 | –1.78 | 0.808 |
Scatter plots comparing the relationships between varied syllable onset durations and H offset in H+Stress, Stress-only, and H-only words (both dialects).
The Valjevo dialect patterns slightly differently than Belgrade in the duration of their pitch excursions. The addition of prosodic template significantly improves the fit of the model. Unlike in the Belgrade dialect, however, words in the H-only (123.0 ± 17.50 ms) and the H+Stress (126.0 ± 17.50 ms) templates have the shortest pitch excursions (no significant difference between the two,
In a model that considers both dialects together, dialect as a fixed effect does not significantly improve the fit of the model (
Full LME model for the duration of the pitch excursion, with the duration of the TBU syllable onset. Model:
(Intercept) | 100.22 | 12.505 | 8.02 | <0.0001*** |
OnsetDuration | 0.41 | 0.09 | 4.60 | <0.0001*** |
H+Stress | –0.79 | 9.87 | –0.08 | 0.94 |
Stress-only | 11.70 | 15.49 | 0.76 | 0.45 |
Valjevo | –1.48 | 18.78 | –0.08 | 0.94 |
OnsetDuration:H+Stress | –0.11 | 0.11 | –1.00 | 0.32 |
OnsetDuration:Stress-only | –0.64 | 0.28 | –2.30 | 0.02* |
OnsetDuration:Valjevo | –0.08 | 0.13 | –0.64 | 0.52 |
H+Stress:Valjevo | –26.00 | 14.71 | –1.77 | 0.08 |
Stress-only:Valjevo | 5.21 | 23.31 | 0.22 | 0.82 |
OnsetDuration:H+Stress:Valjevo | 0.34 | 0.16 | 2.14 | 0.03* |
OnsetDuration:Stress-only:Valjevo | 0.92 | 0.42 | 2.17 | 0.03* |
Taken together, these results suggest that Valjevo accentual peaks occur earlier not via shorter pitch excursions, but rather through starting the pitch excursions earlier relative to the TBU. However, in both dialects, the pitch accents receive timing information (durational in the case of Valjevo; both duration and initial timing in Belgrade) from the TBU proposed in Inkelas and Zec (
The results from these two experiments are both consistent with each other and the Inkelas and Zec (
In Experiment 2, we compared the behavior of rising accents to falling accents in both dialects. The falling accents confirmed the findings from Experiment 1; peak timing is influenced by TBU characteristics in both dialects, but that timing is achieved through different timing strategies in each dialect. In both dialects, the interactions between tones and segments were parallel in rising accents when considering the segments of the post-tonic syllable, rather than the tonic syllable, indicating that the post-tonic syllable is the TBU for the lexical H in rising accents in both dialects. This supports Hypothesis 1, which is that both dialects follow the Inkelas and Zec (
Critically, the effect of the post-tonic syllable onset was found in both dialects, even though rising accent peaks in the Valjevo dialect were retracted, in some cases fully into the tonic syllable. Note that it is likely that ‘true’ peaks in fact occurred earlier than what was documented in this study, as H offset timing used the 20% offset velocity threshold for consistency, which places H offset timing slightly later than either gestural target achievement or gestural peak. Thus, the Belgrade and Valjevo dialects share the abstract aspects of their phonology, but do not share phonetic realization.
The c-center hypothesis for tone fails to predict two aspects of the data presented above. First, it is impossible for the same c-center coordinative structure to produce the two alignment strategies exhibited by the two dialects: The Belgrade dialect achieves peak alignment by changing both the timing of the start of the pitch excursion as well as the duration of the pitch excursion; the Valjevo dialect mainly utilizes the duration of the pitch excursion, keeping the onset of the gesture relatively constant. The behavior of Valjevo H gestures more closely resembles in-phase coordination between the intonational tone gesture and the syllable onset, which previously has only been documented in the small set of intonation systems examined articulatorily (
Second, the c-center hypothesis for tone does not predict the temporal retraction of Valjevo rising pitch accents, nor does a more general onsets-only theory of gestural coordination, as in the coupled oscillator model (
We can now build coordinative models for both the Belgrade and the Valjevo dialects. In the Belgrade falling accent, the H gesture must be coordinated such that the start of the pitch excursion shifts rightward as the syllable onset grows longer. This is simply generated by coordinating the onset of the tone gesture with the onset of the vowel as a centering gesture, while the onset consonants are coordinated essentially in a c-center structure (onset-to-onset coordination with the vowel, and onset-to-target coordination between multiple C gestures). This differs from the c-center models proposed for Mandarin and Thai in that the tone is coordinated with the ‘centering’ gesture, rather than behaving as a consonant-like gesture that displaces away from the c-center. This generates the rightward shift of the tone gesture onset that is only approximately 50% of the duration of the syllable onset, as the syllable onset consonants would displace away from the onset of the H gesture in both directions, producing only a partial shift to the right of the acoustic start of the word. The timing of the gestural onset of the H gesture is affected by the c-center structure, just as the vowel gesture would be affected. This configuration is illustrated in
Coordinative topology models of tonal representation for Belgrade and Valjevo falling accents. The altered c-center structure in the Belgrade dialect produces the bidirectional displacement of the consonant gestures away from the tone gesture. In contrast, the Valjevo tone gesture is coordinated with the first onset consonant only and thus the onset of the tone gesture does not shift with additional consonants.
In contrast, the onset of the H gesture in the Valjevo falling accent was largely unaffected by syllable onset duration, a timing pattern most simply generated by an onset-to-onset coordinative relationship between the H gesture and the first consonant gesture of the syllable onset, illustrated in
To capture target timing (H offset in this study), there are two possibilities. First, tone gestures can have some sort of ‘base’ stiffness which provides a degree of timing, but is then influenced by the gestures it is coordinated with. Alternatively, we can specify that gestures have target coordination as well as onset coordination, similarly to proposals of gestural anchoring (
Finally, an additional model configuration is necessary for the rising accent in Valjevo Serbian, which takes advantage of two additional points of coordination as offered by Gafos (
Coordinative topology model of tonal representation and resulting gestural score for Valjevo Serbian rising accent, showing both the stressed and post-stress (H-bearing) syllable. The bending dotted line denotes that the gestural target of the H gesture is coordinated with the vowel gesture/c-center of the next syllable.
The relatively stable timing of the H gesture onset regardless of syllable onset duration indicates that the onset of the H gesture may be coordinated to some other gesture, but it is unclear which gesture that may be. Although an onset-to-onset relationship between the H gesture and the first consonant gesture of the syllable would produce a symmetric analysis of falling and rising accents in the Valjevo dialect, such coordination seems unlikely: Compare a mean H gesture onset – acoustic syllable onset lag of –100.9 ms (
Based on these findings, we propose that an articulatory TBU corresponds to a constellation of segmental gestures that a tone gesture is included in. In this model, the existence of a coordinative relationship plays the role of phonological association and the precise parameters determine the output phonetics. To be clear, the articulatory TBU is not only the segmental gesture that the tone gesture is directly coordinated with, but rather the entire constellation in which the tone gesture participates. In the case of Serbian, the tone gesture is included in a syllable-sized constellation of gestures. Thus, a tone gesture may be directly coordinated with just the syllable onset gesture, or just the vowel gesture, but through that coordination is incorporated with the rest of the syllable as well.
Framing the articulatory TBU as the whole constellation, rather than only the gestures with direct coordinative relationships, has two advantages. First, it captures the exchange of timing information between gestures that are not directly coordinated with each other. For example, as shown in these two studies, pitch gestures shift and stretch in accordance with the duration of the entire segment onset, even in the case of complex syllable onsets, where not all of the segmental gestures have a direct coordinative relationship with the tone gesture. This conceptualization also suggests a more equitable role between segments and tone in terms of speech planning and control, rather than tones simply being ‘carried’ by or even ‘aligned to’ segments. There is mounting evidence of this prediction, in that tones can also influence temporal aspects of segments—for example, differences in duration between contour versus level tones (e.g., Mandarin) or between rising versus falling tones (
Second, this formulation preserves the insights on phonological patterning from both traditional tonal phonology and more recent articulatory work. For example, Gao’s (
Using gestural models provides tonal phonology with useful tools for expressing relationships between tones and their TBUs. The timing patterns of lexical pitch accents in the Belgrade and Valjevo dialects of Serbian indicate that an articulatory conceptualization of the TBU with potential coordination with gestural targets as well as gestural onsets better predicts existing tone data than more restrictive models that only allow the coordination of gestural onsets. Allowing coordination with the same gestural landmarks as proposed for segmental gestures both unifies the treatment of segmental versus pitch gestures and predicts a greater range of tone patterns that may be found across languages, including those observed in Valjevo Serbian. This highlights the need for more cross-linguistic work in tone under an Articulatory Phonology lens, as tones do vary in how they coordinate with segmental gestures, as demonstrated in this paper.
In this list we are providing the accentual symbols according to the Serbian tradition; for the rest of this chapter, we will provide IPA when necessary alongside the orthography.
As spoken in Belgrade.
In phrase-final position, both Belgrade and Valjevo dialects shift a H peak onto the preceding syllable due to tonal crowding.
The full experiment also included a set of short rising words (base word
Producing
Note that <v> in Serbian represents the approximant [
The syllabification of ò
I would like to thank Draga Zec and Elizabeth Zsiga for their valuable insight on Serbian pitch accent, as well as Andrej Bjelaković and Biljana Čubrović for their help recruiting participants and collecting data. I am also extremely grateful to all of the speakers who agreed to participate in this research.
The author has no competing interests to declare.