1. Introduction
Tonal contrasts in many languages are restricted to syllables with greater overall sonority, which bestows an enhanced ability to bear the fundamental frequency information that is critical for conveying tone distinctions (Hyman, 1988; Gordon, 2001a; Zhang, 2002). The dimensions contributing to sonority as it relates to lexical tone are varied and include syllable structure, stress, proximity to a word boundary, and word length, all of which impact the duration of the sonorous portion of the syllable and hence the capacity of a syllable to support tone contrasts. Syllables with a long vowel (or, in some languages, also a sonorant coda), stressed syllables, word-final syllables and/or syllables in shorter words thus are privileged licensers of phonological tone distinctions cross-linguistically. Of these various predictors of tone-bearing ability, the typologically most pervasive one is syllable structure. In a survey of 187 languages with lexical tone, Zhang finds that well over half (104 languages) asymmetrically permit a richer range of tone contrasts in syllable types that have greater overall sonority: those containing a long vowel and additionally, in many languages, those ending in a sonorant coda consonant. Relatively rare are languages, only 22 languages in Zhang’s survey, that permit a full range of tones on all syllable types, including those that are relatively impoverished in aggregate sonority, such as syllables containing a short vowel, whether open or closed by an obstruent. In this paper, we focus on one of these relatively rare cases involving the South Slavic language Croatian, which is apparently quite permissive in permitting tone contours on all syllable types.
In the typology of syllable-driven tone restrictions, syllables containing a long vowel are the most privileged in terms of tone-bearing ability. In many languages, contour tones are thus restricted to syllables containing a long vowel. This restriction is illustrated by Krongo (Kadu; Sudan) (Reh, 1985), in which a tone left stranded by an optional apocope process reassociates to a syllable with a long vowel but not to a syllable containing a short vowel: /àbáːnà/ → àbâːn ‘strike’ with tonal reassociation of the stranded low to yield a falling tone vs. /tùkúlì/ → tùkúl ‘side (of body)’ and /náŋgùrúʃì/ → náŋgùrúʃ without tonal relinking, i.e., *tùkûl, *náŋgùrûʃ. The greater capacity of long vowels to carry contour tones relative to short vowels is phonetically motivated: Long vowels possess greater total periodic energy than short vowels, including in the low frequency harmonics that are important in the perception of f0 (House, 1990; Gordon, 2001a; Zhang, 2002). The aggregate tone-bearing capacity of a syllable rime containing a long vowel is thus enhanced relative to that of a rime with a short vowel.
A variant of this pattern is found in languages, e.g., Kiowa (Tanoan; United States) (Watkins, 1984), with a less stringent criterion for licensing contour tones, permitting them both on syllables containing a long vowel and on syllables containing a short vowel followed by a sonorant coda. This restriction is also phonetically motivated: because a short vowel followed by a sonorant coda possesses greater total periodic energy in the rime than a short vowel either in an open syllable or in a syllable containing an obstruent coda, it is better equipped to carry tone contours (Gordon, 2001a; Zhang, 2002).
Zhang’s (2002) survey further indicates that the nature of the tone is also predictive of distributional restrictions. Contour tones, which are composed of two phonological tones, are thus less restricted than complex tones, which are comprised of three (or more) phonological tones. Two other relevant factors are the tonal slope and the direction of movement. Tones requiring greater f0 excursions, e.g., low to high or high to low, are typologically more restricted than those with smaller transitions, e.g., low to mid or mid to low. Finally, rising tones are also dispreferred relative to falling tones, in keeping with the greater time necessary to both implement f0 rises in production and to recognize them in perception. Kiowa, for example, has falling tones but lacks rising tones, and rising tones are also rare in Krongo compared to other tones. The typological patterns observed by Zhang are mirrored by language-internal frequency data showing that cross-linguistically rarer tones also tend to be less frequent within languages that permit them than tones that are more common across languages (Gordon, 2016).
Our study investigates one of the relatively rare cases of tone distribution, that of Croatian, a language that allows, in many varieties, contour tones (falling and rising pitch accents) on a full suite of syllables including short voweled syllables that are either open (CV) or contain an obstruent coda (CVO): cf. Croatian (Neo-Štokavian/standard) pȁra [pâra] ‘steam’ ‒ pàra [pǎra] ‘money’ ‒ pȃra [pâːra] ‘of the pair’ ‒ pára [pǎːra] ‘scratch’, pȁrka [pârka] ‘of the park’ ‒ màrka [mǎrka] ‘stamp’ and pȁtka [pâtka] ‘duck’ – kràtka [krǎtka] ‘short (fem.)’.1 In particular, we explore the acoustic realization of the Croatian tones in the phonetically under-researched Split variety to assess whether speakers make any phonetic accommodations in the timing and scaling of pitch peaks to facilitate the production of contour tones in syllables inherently less well suited than others to supporting tonal distinctions. Section 2 provides background on the current study, beginning with an overview of how different languages treat the interaction of tone and syllable structure, followed by an introduction to tone and syllable structure in the Croatian language. The methodology is presented in Section 3 followed by results in Section 4 and discussion in Section 5. Section 6 concludes the paper.
2. Background
2.1. Phonological and phonetic interactions between tones and syllables across languages
Restrictions on contour tones are typically modeled in phonological theory as constraints on associations between tonal targets and moras, an abstract unit of timing associated with segments in the rime (Woo, 1969; Hyman, 1989). Contour tones are composed of two tone targets, typically a high and low, where the high precedes the low in a falling tone and follows the low in a rising tone. In contrast, level tones are assumed to consist of a single tone target.
Languages with syllable-based restrictions on contour tones prohibit the association of more than one tone to a single mora. It thus follows that only syllables containing at least two moras may support contour tones in these languages (Figure 1).
In all languages, phonemic short vowels are monomoraic, while phonemic long vowels are bimoraic. The primary source of cross-linguistic variation lies in the moraic status of consonants. In languages that allow contour tones on syllables closed by a sonorant but not those closed by an obstruent, e.g., Kiowa, only sonorant codas are moraic. In other languages, e.g., Krongo, no coda consonants are moraic; as a result, only syllables containing a long vowel may support a contour tone. In some languages, syllable-based tone restrictions are manifested primarily as static constraints on the distribution of tones, e.g., Cantonese (Bauer & Benedict, 1997), whereas in other languages, e.g., Krongo and Kiowa, active processes also simplify contour tones in syllables that do not support them.
Given their perceptual underpinnings, it is not surprising that the categorical restrictions on tone contours in many languages are mirrored by phonetic effects designed to mitigate the challenges of implementing tonal excursions on syllables that provide suboptimal backdrops for those excursions. Such effects are observed both for lexical tone and for intonational tones, even in languages purported to allow a full range of tone contrasts regardless of syllable type. Gordon (2001a) and Zhang (2002) present several case studies of languages revealing varied strategies to facilitate the accommodation of tones on syllables that are intrinsically deficient in their ability to support contour tones. These measures include, in addition to phonologically deleting a tone, as in Krongo, (1) rescaling of tonal targets (i.e., lowering high targets and/or raising low targets) to minimize the extent of f0 excursions, (2) lengthening the syllable with which tones are associated in order to allow more time for the realization of tonal excursions, and (3) shifting one or more tone targets to adjacent syllables in order to minimize tonal crowding. These strategies also shed light on the few languages reported to allow contour tones on syllables closed by an obstruent but not on open syllables containing a short vowel.
Rescaling is a common phenomenon. In Luganda (Bantu; Uganda), which allows contour tones on syllables ending in an obstruent coda, a falling tone associated with a syllable closed by a voiceless obstruent is realized with less of a drop in f0 than a falling tone on other syllable types carrying contours (Dutcher & Paster, 2008). Hausa (Afro-Asiatic; Niger, Nigeria), another language tolerating contour tones on obstruent-closed syllables, employs lengthening of vowels bearing contour tones in order to mitigate tonal crowding in syllables closed by an obstruent (Gordon, 2001a).
Temporal shifting of tones is another well-documented strategy for reducing tone crowding, including in intonation systems, where it has been claimed to underlie typological biases in stress location attributed to avoidance of tonal crowding between pitch accents and boundary tones (Hyman, 1977; Gordon, 2001b). It can even lead to categorical tonal processes as observed in Mbui (Niger-Congo; Cameroon) phrasal tonology: For example, underlying /lɔ̀ɔ́ + wa/ is realized as [lɔ̀ɔ̀ wá] ‘look for me’, where the underlyingly toneless object pronoun wa receives its surface tone from the second component of the low-high contour on the root (Hyman & Schuh, 1974).
All of these temporal and scaling strategies have the effect of mitigating tonal crowding. The same strategies for reducing crowding of tones through rescaling, lengthening, temporal shift, and categorical deletion are observed in intonation systems, where they are modeled in autosegmental metrical theories of intonation in terms of associations between moras and intonational tones, including boundary tones, phrasal tones, and post-lexical pitch accents (Pierrehumbert, 1980; Beckman & Pierrehumbert, 1986). The parallel between lexical and intonational tones in their response to tonal crowding is not surprising given that the realization of both is subject to the same physical constraints.
The tonal crowding factors at play in the realization of tone sequences within syllables also exert themselves across syllables by reducing the extent of f0 excursions between adjacent syllables. This effect can be seen in processes involving either partial or complete assimilation of adjacent tones. For example, in FeʔFeʔ-Bamileke (Atlantic-Congo; Cameroon) (Hyman & Schuh, 1974), a low tone raises to a raised low preceding a non-low tone: si1 pʉɐ1 ‘without a bag’ vs. si2 mɔh3 ‘without a fire’ where higher numbers represent higher tones. It may be noted that the distinction between the raised low and low tone is phonemically contrastive in FeʔFeʔ-Bamileke, cf. zɔk1 ‘knee’ vs. ʧɐk2 ‘pot’. Temporal mitigation of tone crowding is also observed in intonation systems. In Chickasaw (Muskogean; Oklahoma) (Gordon, 2008), the final word in a question contains a H* pitch accent followed by a final L% boundary tone. The pitch accent is confined to a three-syllable window at the right edge of a word where the location of the pitch accent is not predictable from stress as word-final syllables carry primary stress in most cases. The H* occurs on a final syllable only if it contains a long vowel, following the common cross-linguistic pattern whereby a falling contour tone is restricted to the most sonorous syllable type (cf. Krongo), e.g., katahˈtãː ʧi.haː.ˈʃaːH*L% ‘Who are you angry at?’ Otherwise, if the final syllable does not contain a long vowel, H* retracts to a heavy (CVV or CVC) penultimate syllable, e.g., kaˈtaːt ok.ʃitH*.ˈtaL% ‘Who is closing it?’, or, if the penult is light (CV), to the antepenult, e.g., kaˈtaːt honH*.ko.ˈpaL% ‘Who is stealing it?’ Crucially, not only is the phonological alignment of the pitch accent sensitive to syllable type, but the f0 peak associated with H* is also timed progressively earlier within the syllable, the closer the syllable carrying the pitch accent is to L%, i.e., earliest in ʧi.haː.ˈʃaːH*L% and latest in honH*.ko.ˈpaL%, a pattern that reflects the pressure to increase the distance between H* and L% both within and across syllables.
These cross-syllable strategies to mitigate tonal crowding indicate that constraints against tautosyllabic contour tones are merely one instantiation of a more general tendency to minimize f0 excursions, a pressure that itself belongs to the broader class of coarticulatory processes affecting tone. Although not universal (see Hyman & Schuh, 1974, and Hyman, 2014, on dissimilatory tone processes), tonal coarticulation is a typologically pervasive phenomenon operative on phonological tone whether, lexical or intonational.
2.2. Tone and syllable structure in Croatian
This paper explores the role of syllable structure in the realization of tone contrasts in the Split variety of Croatian (Rešetar, 1900; Browne & McCawley, 1965; Magner, 1978; Inkelas & Zec, 1988; Smiljanić, 2004; Godjevac, 2005; Kapović, 2015). Croatian belongs to a larger cluster of other national languages that includes Bosnian, Serbian and Montenegrin and is collectively known by the acronym BCMS. Like Standard Croatian, Split Croatian possesses a contrast between a rising and a falling tone on stressed non-final syllables, where the realization of the complete contour often extends into the post-tonic syllable. It may be again noted that in the Croatian linguistic orthography, tone type is crossed with vowel length to yield the four categories of short falling, short rising, long falling, and long rising: pȁra (SF) ‘steam’, pàra (SR) ‘money’, Lȗka (LF) ‘Luke’, lúka (LR) ‘port’. These categories can be labelled “prosodemes,” and we use this terminology throughout the paper. As the examples in Table 1 show, the falling and rising tones are both possible on all syllable types, including syllables containing a long vowel (CVV), as well as those containing a short vowel with a sonorant coda (CVR), an obstruent coda (CVO), or no coda at all (CV).
Examples of tone type by syllable structure. Note that CV denotes a Consonant-Short Vowel syllable type with no coda consonant, CVO denotes a Consonant-Short Vowel-Obstruent syllable type, CVR denotes a Consonant-Short Vowel-Resonant (i.e., sonorant) syllable type, and CVV denotes a Consonant-Long Vowel syllable type either with or without a coda consonant.
| Falling | Rising | |
| CV | kȕka ['kûka] ‘hook’ | kùka ['kǔka] ‘of the hip’ |
| CVO | pȁtka ['pâtka] ‘duck’ | kràtka ['krǎtka] ‘short (fem.)’ |
| CVR | pȁrka ['pârka] ‘of the park’ | màrka ['mǎrka] ‘stamp’ |
| CVV | rȃdio ['râːdio] ‘radio’ | rádio ['rǎːdio] ‘worked’ |
Croatian is typically regarded as a case of a pitch-accent language in which tone has a smaller functional load relative to prototypical tone languages. As in other pitch accent languages (and unlike canonical tone languages), there is a single accent per prosodic word in Croatian, and the location of the pitch accent is positionally limited to the stressed syllable in a word, though its phonetic exponents may extend beyond the tonic syllable. The vast majority of languages in Zhang’s (2002) survey are canonical tone languages rather than pitch-accent languages. Typological data on syllable-based tone restrictions in pitch-accent languages, in which tone plays a more limited role in signaling distinctions, is sparser, likely due in part to analytic indeterminacy about the phonological treatment of tone in these languages (see Hyman, 2006, on the taxonomy of pitch accent in relation to tone). Because pitch-accent languages typically observe culminativity—the requirement that there be a single syllable per prosodic word that is more prominent than others—they are often amenable to alternative analyses that resemble those used to account for stress. Particularly relevant to the present paper is the possibility of analyzing pitch accents as a single tone target, typically a high tone, rather than a sequence of multiple tone targets. As such, issues of tonal crowding are potentially less acute in pitch-accent languages. Nevertheless, evidence suggests that the same types of syllable-sensitive constraints on tone are operative in pitch-accent languages as in more prototypical tone languages, perhaps because lexical pitch accents may have two (or more) tonal targets in practice, even if a theoretically more parsimonious account appealing to a single tone target is, in principle, available. In fact, both the Krongo-type and Kiowa-type sonority-driven tone restrictions are observed in linguistic relatives of Croatian that are typically classified as pitch accent languages: Long vowels are thus privileged licensers of tone in some South Slavic varieties, e.g., standard Slovene (Carlton, 1991, pp. 312–322; Greenberg, 2003; Kapović, 2015, pp. 81–85), while both long vowels and syllables closed by a sonorant preferentially support tonal distinctions in the Balto-Slavic languages: Lithuanian (Senn, 1957–66; Kenstowicz, 1972) and Latvian (Derksen, 1996).
As in other pitch accent languages, different analyses of the lexical tone contrasts in Croatian have been proposed, although we are not aware of formal treatments of tone specifically for the Split variety. Previous analyses of Croatian tone and, more broadly, tonal BCMS varieties, differ in terms of both the assumed lexical tone targets and in the temporal docking of these tones. Some of these analytic possibilities are shown schematically in Figure 2 for disyllabic words, drawing on the insights gleaned from other instrumental analyses of BCMS tone (e.g., Purcell, 1973; Lehiste & Ivić, 1986; Smiljanić & Hualde, 2000; Smiljanić, 2004; Godjevac, 2005; Pletikos, 2008; Zsiga & Zec, 2012; Zintchenko Jurlina, 2013, 2019; Langston, 2018).2 On a phonological level, falling prosodemes consist of a high tone followed by a low tone and rising prosodemes of a low tone followed by a high tone, though the actual realization of the prosodemes varies depending on context. However, it has to be stressed here that different Neo-Štokavian (BCMS) varieties can have phonetically widely different systems—the basic systems can be the same, i.e., the opposition of short and long “falling” and “rising” tones—but their phonetic makeup can vary widely, i.e., the “rising” tones may not actually be phonetically rising. Thus, results from analysis of one regional variety cannot be generalized, nor should one approach them as the same system, except in very abstract terms, which was often the mistake made in much of the previous literature (Kapović, 2015, pp. 687–688).
In the simplest possible phonological analysis, the tones comprising both the falling and rising prosodemes may be assumed to fall within the stressed syllable. This analysis, which we term the monosyllabic accent analysis (Figure 2a), is reflected in the traditional Western South Slavic linguistic orthography and is also consistent with the strong typological affinity of both lexical tones and postlexical pitch accents to stressed syllables. Following our earlier discussion, closed syllables and those containing a long vowel may be assumed to be bimoraic, though nothing crucial hinges on this decision for purposes of the phonetic analysis; for these syllable types, there is a one-to-one mapping between tones and moras phonologically, though the acoustic realization of tone may differ between the syllable types. On the other hand, open syllables containing a short vowel are monomoraic and thus display a phonological association between two tones and a single mora.
A variant of this approach, which we designate the disyllabic accent analysis (Figure 2b), assumes that the first tone of the bitonal sequence is realized in the stressed syllable and the second trailing tone surfaces in the post-tonic syllable. This approach is consistent with the observation that the second tonal target, in particular for the rising tone, is characteristically realized on the post-tonic syllable in certain prosodic contexts.
A third possibility, which we label the monotonal accent analysis (Figure 2c), is that only the high tones are underlyingly specified and that the low tones are supplied on the surface by a default post-lexical rule of low tone insertion (Inkelas & Zec, 1988; Zsiga & Zec, 2012). Under this approach, the underlying contrast between falling and rising tone is in the location of the high tone: early in the first syllable for the falling tone, and either late in the first syllable or in the post-tonic syllable (as depicted in Figure 2c), in the case of the rising tone.
Finally, because the post-tonic syllable is typically associated with a fall in f0 in many contexts, yet another analytic option is to assume that both accents have an L target to the right of the stressed syllable at the boundary of a prosodic domain larger than the syllable. We term this approach the Final L account (Figure 2d), where there are different domains with which the final L could potentially be linked. An association with the right edge of the immediately post-tonic syllable could be construed as a foot-based final L, given that the stressed and post-tonic syllable are amenable to analysis as a metrical foot (see Köhnlein & Cameron, 2024, on foot-based analyses of pitch accent linked to stress). Other possibilities include a linkage with the right edge of the prosodic word or a phrase, as proposed by Zsiga and Zec (2012) in their study of the Belgrade variety of BCMS, although it has to be noted that modern Belgrade dialect is in a very advanced stage of tone loss and the great majority of speakers do not have a pitch-accent system anymore (unlike, e.g., Split or neighboring Novi Sad in Serbia. See Kapović, 2023, p. 247).
It may be noted that the Final L analysis has subvariants. For the falling accent, one could assume that the stressed syllable also has a low target, as in the monosyllabic analysis, or that the only low target occurs after the stressed syllable, as in the disyllabic analysis. For the rising accent, one could likewise assume that the high target is either in the stressed syllable, as in the monosyllabic analysis, or that it is realized in the post-tonic syllable, as in the disyllabic analysis. For expository simplicity, the representations in Figure 2d adopt the disyllabic approaches to both accents in combination with the low target in the post-tonic syllable.
In this paper, we explore which of the analyses in Figure 2 is most compatible with the Split variety of Croatian across different syllable types that diverge in their ability to phonetically support tone excursions. It may be noted that the four analyses depicted in Figure 2 do not represent an exhaustive set of possibilities for treating the accent distinction. Different combinations of timing (monosyllabic vs. disyllabic) and accent composition (monotonal vs. bitonal) are conceivable for either the rising or falling accent, and the two accent types do not necessarily pattern together in either their timing or tonal composition. Other potential analyses will be considered in the context of the discussion of the phonetic results. As a working hypothesis, however, we adopt the monosyllabic analysis assumed in traditional phonological analyses of Croatian in which the falling and rising prosodemes are represented, respectively, as HL and LH sequences aligned with the stressed syllables (as in Figure 2a).
Importantly, regardless of the analysis ultimately adopted, the tonal contrast is only two ways, as there is no level accent to contrast with the falling and rising accents given that the accents are historically linked to stress. In particular, the rising accent arose from words that shifted their stress one syllable to the left but retained the original location of the pitch properties associated with stress. The result was a phonetically rising tone on the newly stressed syllable. In this way, the original contrast in stress was transformed into one of tone, as observed in modern standard (Neo-Štokavian) word pairs like jȁgoda [ˈjâɡoda] ‘strawberry’ with inherited initial stress and falling tone vs. lòpata [ˈlǒpata] ‘shovel’ with original second syllable stress that now also has initial stress and rising tone.3
2.3. Tone and syllable structure in Split Croatian
This paper examines the timing of f0 peaks and troughs associated with the prosodemes of Split Croatian to (1) assess the impact of syllable structure on temporal and scaling aspects of their realization and (2) evaluate the phonetic evidence for the different analyses of accent advanced in Section 2.2. The Split variety was chosen for the purposes of another study,4 which considered this variety in light of the historical development of lexical pitch accent from Proto-Slavic.5 In the present study, we test the following hypotheses about the realization of tones, which are guided by the typology of tone restrictions dependent on syllable structure and assume the monosyllabic analysis of Croatian tone:
H1: Composition and Alignment of Tone Targets: Tone targets for both the falling and rising prosodeme will be associated with the stressed syllable. Phonological tone targets (f0 peaks and troughs) will be diagnosed through their association with f0 maxima (for high tones) and f0 minima (for low tones) that display less variability in their temporal alignment than f0 landmarks resulting from interpolation between phonological targets.
H2: Timing: Tone targets will be timed differently for syllable types depending on how conducive they are to supporting a tonal contour, i.e., a sequence of two phonological tone targets. We hypothesize that open syllables containing a short vowel (CV) and closed syllables containing a coda obstruent (CVO) are least amenable to contours. By contrast, syllables containing a long vowel (CVV) are most amenable to tonal contours, with syllables closed by a sonorant (CVR) ranking in between CVV and CV/CVO in terms of conduciveness to supporting multiple tone targets. In terms of direction, we see two possibilities: The first possibility is that the degree of temporal separation between High and Low targets will be increased for syllable types less well suited to supporting contour tones. In this situation, the phonetic realization of a contour tone is more likely to extend beyond the syllable with which the tone is phonologically associated for lower sonority syllable types. The second possibility is that the degree of temporal separation between High and Low targets will be decreased for syllable types less well suited to supporting contour tones, as the tonal targets are forced to be realized in a shorter time frame. Either way, one may expect a difference according to syllable type.
H3: Scaling: Tone excursions, i.e., the difference between the maximum and minimum f0 values, will be smaller in syllables least equipped to carry contour tones. Once again, we hypothesize that open syllables containing a short vowel (CV) and open syllables containing a coda obstruent (CVO) are least amenable to contours. By contrast, syllables containing a long vowel (CVV) are most amenable to tonal contours, with syllables closed by a sonorant (CVR) ranking in between CVV and CV/CVO in terms of conduciveness to tonal contours. That is, the difference in f0 level between High and Low targets will be decreased for syllable types less well suited to supporting contour tones. Thus, for the difference between maximum and minimum f0 values, CV/CVO < CVR < CVV.
H4: Timing and Scaling: Tone targets associated with rising tones will be more prone to greater temporal separation and reduced f0 excursions than tone targets associated with falling tones—that is, rising tones will be particularly sensitive to differences in syllable structure because they take longer to realize than falling tones (Sundberg, 1979).
It may be noted that these hypotheses are not mutually exclusive. It is thus possible that a combination of pitch rescaling and temporal shift could be employed to minimize tonal crowding or that the rising and falling prosodemes could behave asymmetrically with respect to their phonological alignment of tones or the strategies they adopt to reduce crowding of tones.
Because Croatian lacks phonemic level tones, there is less pressure to realize the two-way tone contrast as a distinction between a rising and falling tone, since one of the two could easily be rescaled to minimize the tone excursion, effectively recasting the contrast as one between a contour tone and a level tone. In practice and in keeping with H4, one would expect the rising tone to be most prone to leveling out.
Furthermore, because tonal contrasts are limited to the stressed syllable in Croatian, there is additional room on which to maximize the temporal separation of the low and high tone targets that comprise the two contour tones: HL in the case of the falling accent and LH in the case of the rising accent. This differs from more canonical tone languages in which contrasts in lexical tone are associated with adjacent syllables.
In short, the absence of a contrast between contour tones and level tones coupled with the confinement of tone distinctions to the stressed syllable in Croatian provide considerable leeway for increasing temporal separation between tone targets and/or rescaling them to minimize the articulatory and perceptual demands of contour tones.
3. Methods
3.1. Speakers and recordings
Thirteen speakers (including one male) of the Split variety of Croatian were recorded in March 2022 at the Department of Phonetics recording studio at the Faculty of Humanities and Social Sciences, University of Zagreb, under the supervision of a recording technician and author MK. Speakers were recorded using an AKG C414 B-ULS microphone and RME Fireface UFX soundcard. Recordings were saved as mono WAV files using a 44.1 kHz sampling rate and 16-bit bit-depth. Collected data were also employed in another study of acoustic characteristics of tone and stress in Split Croatian.
All speakers were born in Split between 1996 and 2002 (with most born between 1999 and 2002) and arrived in Zagreb for their university studies between 2015 and 2021. It should be noted that although the Split variety is not the standard variety, it is in fact a prestigious and well-known variety in Croatia, associated with popular music and with various celebrities in the country. With more than 160,000 inhabitants, Split is the second largest town in Croatia and the main city in the south of the country and the Dalmatian coast. Thus, the speakers in our recordings were not speaking a stigmatized variety of the language in the university setting and were encouraged to pronounce the words as they normally do. Author MK verified that the speakers spoke the Split variety of Croatian during the recording.
3.2. Stimuli
Speakers read a list of 65 real Croatian words that illustrated the four prosodemes. Four of these words contained a syllabic /r/ in the stressed syllable, and we will not consider these words in the present study due to difficulties involving segmentation of this sound. This leaves us with 61 lexical items in the present study.
Words were read in isolation, with no carrier phrase. The words were either stand-alone lexical items or prosodic words with enclitics (Croatian has an extensive system of enclitics and proclitics). All words contained lexical stress on the first syllable. Words contained at least three syllables and up to six syllables (15 words with three syllables, 23 words with four syllables, 19 words with five syllables and four words with six syllables). The purpose of having longer words was to make sure there were at least two syllables following the initial stressed syllable to allow for any phrasal tones that may occur in these single-word utterances.
In setting up the wordlist, we endeavored to find words in which the lexical pitch accent occurred in six different syllable types: an open syllable containing a phonemic short vowel (CV); an open syllable containing a phonemic long vowel (CVV); a short vowel syllable closed by sonorant (CVR); a long vowel syllable closed by a sonorant (CVVR); a short vowel syllable closed by an obstruent (CVO); and a long vowel syllable closed by an obstruent (CVVO). Syllable structure is notoriously difficult to diagnose in Croatian as typical diagnostics used in other languages are not applicable. In particular, the common strategy of comparing word-initial and word-medial clusters is not informative because a rich array of initial clusters is permitted in Croatian (Kapović, 2023, pp. 216–222), including many that speakers would potentially treat as spanning a syllable boundary intervocalically. We thus adopted the typologically most common strategy for syllabifying intervocalic consonant clusters according to which at least one consonant belongs to a coda and at least one to an onset. In practice, this means that a CC cluster, the only type of cluster found in our data, is split by a syllable boundary. Syllables were distributed as follows in our data (Table 2–see also Appendix A for wordlist):
Distribution of syllable types in our data.
| Falling | Rising | |
| CV | 6 | 4 |
| CVO | 8 | 10 |
| CVR | 1 | 4 |
| CVV | 11 | 17 |
| CVVO | 3 | 7 |
| CVVR | 3 | 1 |
A word of caution is needed at this point. There is an overall paucity of words containing a target syllable closed by a sonorant. This is a reflection of the lexical situation of the language, particularly for the trisyllabic and longer words that we targeted. As a consequence, results for syllables closed by a sonorant coda must be interpreted with caution.
The wordlist was randomized, and each speaker read four repetitions of the list. (One speaker did not produce the word klȁckali ̮su ̮se [Short Falling] ‘they were see-sawing’, due to the addition of this word to the list after the speaker had been recorded.) We thus have a total of 3172 possible word tokens for analysis (61 words × 4 repetitions × 13 speakers, minus four tokens for the speaker who did not read a particular word).
One final note concerns vowel quantity on the post-tonic syllable. Croatian contrasts long and short vowels in all positions of the word (except before the stressed syllable). In the present study, we were not able to properly control for post-tonic length of the vowel, given the other factors we attempted to control. For this reason, both long and short vowels occur in the syllable following the stressed syllable in our word list. Nevertheless, we have included post-tonic vowel quantity as a factor in our inferential statistical analyses (see below).
3.3. Data analysis
Phonetic transcriptions of the words were imported from a spreadsheet and used for preliminary phonetic segmentation with the Munich AUtomatic Segmentation system (MAUS: Kisler et al., 2017) pipeline function MAUS->PHO2SYL. Manual correction of the phonetic MAUS labelling was conducted using the EMU Speech Database Management System (Winkelmann et al., 2017; Winkelmann et al., 2019), interfaced with the R statistical software package (R Core Team, 2020). The Snack signal processor (Sjölander, 2014) was used for calculating formants within the VoiceSauce software package, which was also used to extract voicing measures including Straight f0 (Shue, 2010; Vicenik et al., 2020). All of these signal measures were extracted at a sample rate of 1000 Hz (i.e., every 1 ms). Plots were generated using the ggplot2 package in R (Wickham, 2016).
3.3.1. Identifying peaks and troughs in the f0 traces
For the purposes of identifying peaks and troughs in the f0 traces, we chose the vowel as the window of analysis (in either the stressed or post-stressed syllable). To maximize our chances of finding a peak and/or trough within this window, we added 10 ms before the vowel, and 10 ms after the vowel. We were unable to use the syllable as the basis for our analysis because many consonants either side of the vowel were (voiceless) obstruents, with either no f0 or unreliable f0 traces.
Dominant peaks and troughs were identified by progressively smoothing the signal until only two roots (zero crossings) of the first derivative of the smoothed signal were present. This strategy eliminated peaks and troughs caused by noise, while selecting the strongest peaks/troughs. We chose to search for two roots as the target number of roots, since we considered it possible to find both a maximum and a minimum in each vowel window, based on our examination of pitch traces in a related study using the same database (from which Figure 3, presented further below, is taken). In cases where a smoothing iteration produced fewer than two roots, the function returned the roots (e.g., three or four) from the previous smoothing iteration. However, if the first smoothing iteration identified two or fewer roots, it returned the root(s) from this first smooth.
Smoothing was implemented using the smooth.spline() function in baseR, which also allows prediction of the derivative, while the rootsolve package was used to identify the roots (Soetaert, 2009; Soetaert & Herman, 2009). Note that in 17 instances, the function failed to converge even after 20 smoothing iterations (this total of 17 instances includes both the stressed and post-stress vowels).
Examination of data returned by the smoothing function showed that most tokens contained either two or three turning points, and a smaller number of tokens contained only one turning point. Very few tokens contained either zero turning points (i.e., a straight line) or four or more turning points: These tokens were discarded. In addition, of the tokens that contained only one turning point, very few of these contained a pitch minimum as opposed to a pitch maximum. For this reason, tokens with only a single minimum were also discarded. Moreover, examination of the data suggested that where three turning points were found, the last turning point was quite late in the token. Given that we had added 10 ms to the end of the vowel window, we decided to discard the third turning point where three turning points were found and only keep the first two for that particular token.
Finally, we removed any turning points which had an f0 value of less than 100 Hz or greater than 300 Hz, since these were likely to be a result of tracking errors. All of the above filtering procedures left 3000 tokens where the stressed syllable was analyzed, and 2678 tokens where the post-stressed syllable was analyzed from the original 3172 tokens submitted to the smoother. Table 3 shows the number of tokens for each prosodeme in the two syllable positions (stress and post-stress).
Number of tokens by prosodeme (falling or rising, and short or long) and syllable position (stress or post-stress).
| Stress | Post-Stress | |||
| Short | Long | Short | Long | |
| Falling | 701 | 523 | 646 | 487 |
| Rising | 904 | 872 | 782 | 763 |
Table 4 shows the number of turning points used for statistical analysis after the above filtering procedure. Note that this table combines tokens where only one turning point was found in a syllable, with tokens where two turning points were found in the syllable. As such it is not possible to add any two cells in a meaningful way. The table can simply tell us that for falling pitch accents, it was a little more common to find a maximum in the stressed syllable than a minimum; for rising pitch accents, both maxima and minima were found in the stressed syllable. Also, for rising pitch accents, a maximum was more likely to be found than a minimum in the post-stressed syllable. None of this precludes the possibility that both a maximum and a minimum were found for a syllable (and of course, also the possibility that neither was found).
Number of individual minima and maxima found for each pitch accent type, in each syllable position.
| Stress | Post-Stress | |||
| Minimum | Maximum | Minimum | Maximum | |
| Falling | 1020 | 1223 | 1080 | 1133 |
| Rising | 1720 | 1776 | 1302 | 1543 |
3.4 Statistics
Linear Mixed Effects (LME) analyses were conducted using the nlme package of R (Pinheiro et al., 2016). LME models allow us to set speaker and word as random effects in the data analysis and are robust against differing numbers of tokens in each cell, as is the case in the present study.
For vowel duration, we used the following command:
lme(Duration~SyllableWeight, data=data, random=~1|speaker/words)
where SyllableWeight was one of Open; (Closed-by-)Sonorant; or (Closed-by-)Obstruent. We conducted separate tests for the Short Falling, Short Rising, Long Falling and Long Rising prosodemes (i.e., four prosodemes); and for the stress and post-stressed syllables (two syllable positions). This gives us eight separate tests (four prosodemes multiplied by two syllable positions). As a result, we set a significance level of p < 0.00625 after Bonferroni correction (0.05 divided by eight tests). These results are presented in Appendix C.
For the maxima and minima (i.e., pitch peaks and troughs), we used the following command:
lme(Measure~Quantity*SyllableWeight, random=(~1|speaker/words)
where Quantity was either Long or Short (length of the stressed vowel); and SyllableWeight was one of Open; (Closed-by-)Sonorant; or (Closed-by-)Obstruent. As can be seen, for studying the pitch minima and maxima, we treated Quantity as an independent variable, since we expected that the dependent variable would be strongly affected by whether the stressed vowel was long or short. We conducted separate tests for the falling and rising pitch accents (i.e., two accents); for the stress and post-stressed syllables (two syllable positions); and for pitch minima and maxima (two turning points). This, once again, gives us eight separate tests (two accents multiplied by two syllable positions multiplied by two turning points). As a result, we once again set a significance level of p < 0.00625 after Bonferroni correction.
The measures examined in these LME models were the f0 values (in Hertz); the relative timing of the turning points (i.e., timing based on normalized time–this is therefore a dimensionless value between zero and one); and the absolute timing of the turning points (i.e., timing without any normalization of time–measured in milliseconds). These results are presented in Appendix D.
Although it would have been desirable to include post-stress vowel length as a factor in these LME models, our models did not converge when we did so. Note also that LME models, which included a random slope for speaker with another variable, failed to converge. Nevertheless, we included the single male speaker together with our female speakers, since examination of the f0 data suggested that the male speaker did not pattern obviously differently from the other speakers and did not necessarily have the lowest f0 of the 13 speakers. Indeed, examination of the speaker random effects intercept values for the various LME models we ran showed that at least one female speaker had a lower f0 than the male speaker. This is consistent with the sociolinguistic observation that female speakers of Croatian tend to have a low f0.
4. Results
The four prosodemes of Split Croatian are shown in Figure 3. The falling pitch accent is in navy, and the rising pitch accent is in orange/brown. The long pitch accents are plotted using solid lines, and the short pitch accents are plotted using dotted lines. The f0 information plotted here is from the vowel only and is based on GAM (Generalized Additive Model)6 smooths of the f0 traces from the 13 speakers of Split Croatian, whose data are presented in the current study (further details are provided in the Method section, below). Both the stressed vowel (left panel) and post-stress vowel (right panel) are plotted.
It can be seen that the falling pitch accents are characterized by a high fall on the stressed syllable (left panel). On the post-stressed syllable, the pitch is quite low. By contrast, after an initial low fall, the rising pitch accent is characterized by a fairly level pitch on the stressed syllable, followed by a longer, high fall on the post-stressed syllable. The overall much smaller pitch excursion associated with the rising tone (a re-scaling effect), and the fact that the high target for the rise is reached in the post-stressed syllable rather than the stressed syllable itself (a temporal separation effect observed in previous work on other varieties of Croatian, e.g., Lehiste & Ivić, 1986; Inkelas & Zec, 1988; Smiljanić, 2004), are consistent with the typological bias against rising tones relative to falling tones and support Hypothesis 4. On the basis of our experience with pitch patterns observed in phrasal contexts, it may be noted that we consider the slightly higher pitch at the left edge of the stressed syllable to be a post-lexical pitch accent in this particular dialect/variety: It results in a slight fall at the start of the vowel for the rising accent and, presumably, results in a higher fall for the falling accent (given that the high fall of the falling accent in the stressed syllable has a greater f0 range than the high fall in the post-stressed syllable of the rising accent). This interpretation of the left-edge f0 results is in line with Godjevac (2005), who posits left-edge post-lexical word tones for Serbo-Croatian. Note however that we consider the sharp drop in f0 in the stress syllable at the end of the short rising accent (dotted orange line) to be an artefact of the GAM-smoothing process–there is no consistent impression of such a sharp drop auditorily. For the reader’s interest, we present GAM-smoothed f0 trajectories for the individual words, color-coded for syllable type, in Appendix B.
Our remaining presentation of results is in three parts. In the first part, we consider the vowel duration results according to syllable type. In the second part, we present pitch minima and maxima (examining both f0 values and timing) regardless of syllable type. This presentation functions partly as a “sanity check” to ensure that our tracker is finding the expected peaks and troughs in the f0 traces, based on our visual examination of f0 traces in the language as presented in Figure 3, above. It also functions as an informal indication of what the speakers’ f0 targets are likely to be for the lexical pitch accents of the language. We follow Bruce (1977) and Liberman and Pierrehumbert (1984) in assuming that if a pitch peak or trough at a particular location shows less variability across tokens than peaks or troughs at other locations, then that particular peak or trough reflects a phonological target. The Results section looks at the pitch minima and maxima (f0 values and timing) according to syllable type, for the rising and falling accents separately.
4.1 Duration
Figure 4 shows the vowel duration according to syllable type, and LME results are presented in Appendix C. As one would expect, in the stressed syllable (the left column), the long vowels have greater duration than the short vowels, as shown by solid versus dashed lines. A first, broad inspection of these boxplots suggests that any differences according to syllable type are not extensive. However, a closer inspection of the boxplots in this left column shows that the long vowels are longer when they are in an open syllable (solid brown boxes in the middle). The LME results estimate that long vowels in open syllables are about 22 ms longer (than in obstruent-closed syllables) for the falling accent and about 16 ms longer for the rising accent. No such effect applies in the case of short vowels in open syllables. However, the LME results do suggest that in the case of the short rising prosodeme, the vowel is about 12 ms longer in a sonorant-closed syllable than in an obstruent-closed syllable.
Vowel duration according to syllable type (shown as different colours). Data show the different pitch accents in rows (falling and rising); the different stress vowel quantities according to linetype (long and short); and the different syllable positions in columns (stress and post-stressed syllables). Note that long/short refers to quantity of the stressed syllable.
Turning now to the post-stressed syllable (right column), the reader is reminded that the long/short coding in these panels refers to the quantity of the stressed syllable, not of the post-stressed syllable. In addition, the reader is also reminded that we were unable to control for quantity in the post-tonic syllable (i.e., whether the post-stress vowel is long or short), given the other factors which we also sought to control. With these caveats in mind, it may be noted that the vowel following a short rising syllable (broken line boxes in the bottom right panel) is longer than the vowel following a long rising syllable (solid boxes in the same panel). We suggest that this effect is essentially due to the fall on the post-stressed syllable being a defining feature of the rising prosodeme in this variety of Croatian; that is, a slightly longer vowel duration is needed on the post-stressed syllable in order to fully realize the final fall of the short rising prosodeme. It may also be noted that there are otherwise minimal effects according to syllable type in the post-stressed syllable. LME results estimate that the long rising prosodeme is about 22 ms shorter in the sonorant-closed syllable context than in the obstruent-closed syllable context (solid brown box in the bottom right panel). However, the reader is reminded that there is only one lexical item in our wordlist which has a sonorant coda on a long rising prosodeme–this is the word bólničārka [bǒːlnit͡ʃaːrka] ‘nurse’ which has a short post-tonic vowel /i/. For this reason, we may consider this result non-informative for our purposes.
In summary, the data show minimal effects of syllable type on either stressed or post-stress vowel duration. The exception is for the long prosodemes (falling and rising), where the vowel is longer in an open syllable. We attribute this to the common typological pattern of open syllable vowel lengthening (Maddieson, 1985) but limited to phonemic long vowels which, unlike short vowels, are free to lengthen without potentially jeopardizing the phonemic contrast in vowel length.
4.2 Pitch minima and maxima regardless of syllable type
We now turn to the timing and f0 levels of the accents. In this section we offer informal observations of the results, regardless of syllable type, to make sure that the maxima and minima we are finding are in line with expectations. Figure 5 shows the f0 values of the minima and maxima across the four prosodemes of Split Croatian. Minima are shown on the top row and maxima in the bottom row; the stressed vowel is shown in the left column and the post-stress vowel in the right column.
In the bottom left panel, we can see that the maxima in the stressed syllable have a higher f0 for the falling accents (blue boxes) than the rising accents (brown boxes), for both the long and short vowels (denoted by linetype). This is in line with expectations based on the f0 traces shown in Figure 3. At the same time, the maxima in the post-stressed syllable have a higher f0 for the rising accents than for the falling accents (bottom right panel). This is also in line with expectations based on Figure 3.
Figure 6 shows the normalized time values (from zero to one) of the minima and maxima across the four prosodemes. Time was normalized within the get_trackdata() function in the emuR package. For reasons of space, we do not present results for absolute time (in ms) in this paper. However, broadly speaking, they mirror the results for normalized time.
Normalized time (from zero to one) for minima (top row) and maxima (bottom row) of stressed (left) and post-stress (right) vowels. Rising and falling pitch accents are shown by color. Long and short vowels are shown by line type. Note that long/short refers to quantity of the stressed syllable.
In the stressed syllable (left column), maxima for the rising accents (brown boxes–bottom left) occur later than the maxima for the falling accents (blue boxes–also bottom left). The maximum for the falling accent occurs quite early, at about the 20% point of the vowel duration. This is what we would expect for a rising versus falling pitch accent docked onto the stressed syllable.
Conversely, the maxima in the post-stressed syllable (bottom right) occur earlier in the rising accent than in the falling accent–this is likewise what we would expect given that the rising accent has a clear high fall on the post-stressed syllable in the Split variety. The maximum for the rising accent similarly occurs quite early in the post-stressed syllable, also at about the 20% point of vowel duration.
Importantly, the variability in the timing of the maxima differs between the rising/falling accents. The falling accent has less variability in the timing of the maximum in the stressed syllable, while the rising accent has less variability in the maximum in the post-stressed syllable. This reflects the importance of the stressed syllable peak for the falling accent, but of the post-stressed syllable peak for the rising accent.
Considering now the minima of the stressed syllable (top left panel), we see that although there is less of a difference in the means between rising and falling accents, there is a clear pattern of less variability in the timing of the minimum of the rising accent. The minimum of the rising accent seems to be timed for about the halfway point of the stressed vowel, whereas the timing of the minimum of the falling accent is much more variable. As for the post-stressed syllable (top right panel), we see that the minimum for the rising accent tends to occur later than the minimum for the falling accent. This is in line with the fact that the f0 maximum for the rising tone is not reached until the post-stress syllable, which forces the immediately following minimum to be realized relatively late in the same syllable.
The f0 values in Figure 5 and the timing results in Figure 6 provide insight into the analysis of phonological targets for the two accents. For the falling accent, the f0 maximum is higher in the stressed syllable than the post-tonic syllable and is less variable in its timing, occurring consistently early, a pattern that is consistent with an early H target in the stressed syllable. Furthermore, the f0 minimum is lower and less variable in the post-tonic syllable than in the stressed syllable for the falling accent, suggesting the possibility of an L target in the post-tonic syllable. These results are consistent with both the Disyllabic (Figure 2b) and the Final L (Figure 2d) analyses and are inconsistent with both the Monosyllabic (Figure 2a) and the Monotonal (Figure 2c) analyses.
The rising accent is more probative in distinguishing between the Disyllabic and the Final L analysis. The f0 minimum, though only slightly lower than the f0 maximum, is consistent in its alignment, occurring shortly before the midpoint of the stressed vowel, suggesting the presence of an L target in the first syllable in contrast to the prediction of the Monotonal account. Furthermore, the presence of an f0 maximum on the post-tonic syllable suggests an H target early in the post-tonic syllable. There is also a clear downward trend in pitch for the rising tone following the H target in the post-tonic syllable. Overall, f0 results for the rising tone are most compatible with the Final L analysis positing an LHL sequence, with the first L tone docking on the stressed syllable, the H on the post-tonic syllabic, and the second L either at the end of the post-tonic syllable or to its right.
One remaining question concerns the domain with which the final L is associated: the foot, word, or phrase? Our data do not allow a definitive analysis, though they suggest an asymmetry between the falling and rising tone. For the falling tone, the relatively early realization of the f0 minimum (at around the halfway point) in the post-stressed syllable is consistent with an L target in the post-stressed syllable. In contrast, the f0 minimum for the rising accent is reached quite late in the post-stressed syllable. One possibility is a foot-based approach (Köhnlein & Cameron, 2024), in which the first two syllables form a trochaic foot characterized by an L in the weak (post-tonic) syllable final preceded by an H for the falling accent and an LH sequence for the rising accent. For both accents, the tone(s) preceding the weak-syllable L are plausibly phonologically aligned with the stressed syllable, but, in the case of the rising accent, tonal crowding attributed to the initial L forces the trailing H to be realized early in the post-tonic syllable.
The asymmetry between the falling and rising accents accords with f0 data in other studies (e.g., Pletikos, 2007; Zsiga & Zec, 2012), showing a pitch trough in the post-stressed syllable followed by a low plateau through the end of the word in words with a falling accent but a continuous fall in pitch through the end of a word with a rising accent. Clearer insight into the source of the final low tone after the rising accent must await further acoustic study of syllables to the right of the immediately post-tonic one in words and phrases of differing lengths.
4.3. Pitch minima and maxima according to syllable type
We now turn to the timing and f0 of the accents according to syllable type. We present results separately for the falling and rising pitch accents. LME results for the data in this section are presented in Appendix D.
4.3.1. Falling
Figure 7 shows the f0 maxima and minima for the falling pitch accents according to syllable type. (The reader is reminded that, in all cases, open~obstruent~sonorant refers to the structure of the stressed syllable, not of the post-stressed syllable.)
Looking at the top two panels, the f0 minimum falls on the post-tonic syllable across different types of stressed syllables in keeping with the overall result in Figure 5. Likewise, the bottom two panels confirm a consistent f0 peak within the stressed syllable. Focusing on the top left panel (minima of the stressed syllable), we can see that the f0 minimum is higher for short vowels than for long vowels (by about 11 Hz, according to the LME estimate). At the same time, the f0 maximum (bottom left panel) is lower for short vowels than for long vowels (by about 9 Hz, according to the LME estimate). One can therefore conclude that the f0 fall is greater in long vowels than in short vowels, by about 20 Hz. Given that the f0 values for our speakers tend around 200 Hz, this represents a difference of about 10%. Thus, there is a re-scaling of f0 values in the stressed syllable of the falling accent based on vowel length in support of Hypothesis 3.
However, there seems to be minimal effect of the presence or type of coda consonant on f0 minima and maxima in the stressed and post-stressed syllables. The exception is an effect of open syllables on the f0 maximum in the stressed syllable–however, this only applies for long vowels, which have a lower f0 maximum in open syllables compared to their counterparts in closed syllables. This pitch scaling effect is not predicted since all long vowels are phonologically well suited to accommodating targets in a contour tone. In fact, because long vowels in open syllables are longer than those in closed syllables (Figure 4), they should be especially well suited to accommodating contour tones without any need of rescaling to minimize the pitch excursion between high and low targets. The LME results also estimate a lower f0 minimum for short vowels in the post-stressed syllable by about 5 Hz, a pattern that is inconsistent with predictions given that short vowels offer less time for pitch excursion than long vowels.
Figure 8 shows the relative timing of maxima and minima for the falling pitch accents according to syllable type. The f0 values for the phonological high target in the stressed syllable and the low target in the post-tonic syllable are less variable than their counterparts in the other syllable across all syllable types. Furthermore, it may be noted that there is a clear effect of long/short vowel on the timing of the maximum in the stressed syllable (bottom left panel). According to the LME estimate, the f0 maximum occurs 0.12 later in the short vowel than in the long vowel (where relative time is measured between zero and one). It is also important to note the very low variability in this panel compared to the other panels. At the same time, the variability in timing of the f0 maximum in the stressed syllable is greater for short vowels than for long vowels. This suggests a greater stability in timing for long vowels within an overall greater stability in timing for this particular pitch event compared to other pitch events.
In terms of syllable type, perhaps the most obvious effects can be seen for the minimum in post-stressed syllables (top right panel)–in this case, the minimum is timed later following an open syllable (brown boxes in the middle) by about 0.11 (normalized time) according to the LME estimate. In addition, the LME estimates that the maximum in the stressed syllable (bottom left panel) is also timed later in open syllables–by about 0.07–though this is not so obvious when looking at the plot. In general, one could tentatively assume that pitch events are timed later in open syllables compared to closed syllables. This difference is not dependent on vowel length as would be predicted by Hypothesis 3, according to which one might expect the High pitch target for the falling tone to be realized earlier in an open stressed syllable with a short vowel in order to allow sufficient time for the pitch fall.
In summary, for falling accents, we see that pitch is re-scaled in the stressed syllable, with a lesser fall in short vowels than in long vowels (pitch floor raised, and ceiling lowered). At the same time, the pitch peak is timed later in short vowels than in long vowels, a result that perhaps runs counter to Hypothesis 3, which would predict an earlier realization of the High target in order to allow adequate time for the fall in a short vowel. Any effects of presence/absence of coda consonant or type of coda seem to be limited to a slightly later timing of the pitch minimum in post-stressed syllables following an open stressed syllable. This difference is mainly attributed to stressed long vowels and is also not predicted by Hypothesis 2, according to which an earlier realization of the post-stress minimum would be expected following short vowels in open syllables in order to increase temporal separation from the high target in the stressed syllables.
4.3.2. Rising
Figure 9 shows the f0 maxima and minima for the rising pitch accents according to syllable type (parallel to Figure 7 for the falling pitch accents).
A visual examination of these plots shows no difference according to syllable type or vowel length in the stressed syllable (left column). Most apparent is the tendency for the f0 maximum to be considerably higher in the post-tonic syllable than in the stressed syllable in the open syllable condition, whereas the maximum value is similar across the two syllables in the other conditions. This result suggests that the overall alignment of the H tone of the rising accent in the post-tonic syllable observed earlier in Figure 6 is largely driven by cases in which the rising accent is associated with an open stressed syllable. The parallel behavior of short and long vowels in open syllables is not predicted by Hypothesis 2 since the two syllable types differ considerably in their sonority. Minor characteristics of open syllables observed in the LME results are a slightly lower minimum and slightly lower maximum in open syllables and a slightly higher maximum for short vowels. However, these differences are all around 4 Hz, in the context of an overall data estimate of about 180 Hz. We could therefore conclude that any such differences are unlikely to be perceived. The LME results also show no significant differences for either syllable type or vowel length in the case of the post-stress minima (top right panel); however, they do suggest a higher f0 maximum in the post-stressed syllable following short vowels (by about 8 Hz) and following an open syllable (by about 12 Hz). These effects on the maximum may be seen in the plot (bottom right panel), but the relative f0 differences are once again relatively small, and therefore not likely to be perceived.
Figure 10 shows the relative timing of maxima and minima for the rising pitch accents according to syllable type (parallel to Figure 8 for the falling pitch accents).
Considering first the stressed syllable, there are no significant differences in timing of the pitch minima (top left panel) according to either vowel length or syllable type. As regards the timing of the pitch maxima (bottom left panel), the LME results suggest that the maximum occurs earlier in short vowels than in long vowels (by about 0.14 normalized time). However, there is considerable variability in timing, so no conclusions about timing of pitch targets for the falling accent should be drawn. There are also significant LME results involving the sonorant-closed syllables, including an interaction with vowel length–however, the reader is reminded that there is only one lexical item for the long rising pitch accent in a sonorant-closed syllable (the word for ‘nurse’), and as such it would be unwise to consider this a robust result (it is the solid orange box with the noticeably lower mean).
Turning now to the post-stressed syllable, we see a noticeably later timing of the post-stress minimum following an open syllable (top right panel). The LME results estimate that it occurs 0.08 later (on a scale from zero to one). Similarly, the LME results estimate that the post-stress maximum (bottom right panel) occurs 0.12 earlier following an open syllable (there is once again an interaction involving the Sonorant here, which we disregard for the reasons given above). It therefore seems that the final fall is temporally extended following an open syllable, with the peak starting earlier and the trough occurring later in the post-stressed syllable. One may also observe at this point the overall lesser variability in the timing of the post-stress maximum following open syllables, which together with the higher f0 maximum in the post-tonic syllable relative to a stressed open syllable in Figure 8, provides support for the H component of the rising tone being aligned with the post-tonic syllable. However, because a similar pattern is observed for both short and long vowels in open syllables, it does not provide support for a link between the sonority of a syllable and the temporal realization of tone targets as predicted by Hypothesis 2. The reduced variability in timing of the maximum f0 in the post-tonic syllable does, however, present an interesting contrast with the f0 values in Figure 8.
In summary, we see no differences of syllable type or of vowel length on the f0 values themselves. However, the timing of the pitch peaks is earlier in short vowels than in long vowels in the stressed syllable. In the post-stressed syllable, we see an effect of the preceding open syllable, whereby the post-tonic fall is temporally extended: starting earlier and finishing later. It is not clear why there may be such an effect. It is also important to note that the minimum in the stressed syllable is timed for around halfway through the vowel regardless of any suprasegmental factors, and the maximum is timed for early in the post-stress vowel to an extent also regardless of any suprasegmental factors. The relatively lesser variability in the timing of these two pitch targets is once again evident in Figure 10.
5. Discussion
Broadly speaking, one is left with the impression that the timing of important pitch minima and maxima in this lexical pitch accent language is largely fixed regardless of syllable type and to an extent even regardless of vowel duration. For the falling accent, there is a clear early maximum in the stressed syllable. A minimum is seen at around the midpoint of the post-tonic syllable; however, as seen in Figure 6, this low target (top right panel) does not appear to be as precisely controlled as the early maximum (bottom left panel).
For the rising accent, there is a clear early minimum in the stressed syllable (top left panel of Figure 6), followed by a clear early maximum in the post-stress syllable (bottom right panel). There is also a minimum seen quite late in the post-stress syllable for this lexical pitch accent (top right panel), but similar to the case for the falling syllable, this second low target for the rising accent does not appear to be as precisely controlled as the low target in the stressed syllable or the high target in the post-stress syllable.
There are several ways one could interpret these results. One way involves treating the following L tone as an intonational tone. For the falling accent, one could posit a simple H target, with any final L tone treated as an intonational tone, either aligned with the right edge of the foot or the right edge of some larger intonational unit (word, phrase, etc.). This would lead to the sequence H Lx for the falling accent (where x is agnostic as to whether the tone is podal, phrasal, or intonational).
Similarly, for the rising accent, one could posit a simple LH target, with any final L tone treated as an intonational tone, giving the sequence LH Lx for this accent. The main factor supporting an intonational analysis for the L tone following the pitch accent lies in the high variability in the Lx target, compared to the H (falling) and LH (rising accent) tones. If an approach such as this is adopted, it would seem more likely that the Lx tone is aligned with a smaller prosodic unit, such as the foot, rather than a larger prosodic unit, given that it seems to occur quite close to the preceding H tone.
The other approach is to treat the trailing L as a property of the pitch accent itself, giving the sequence HL for the falling tone and LHL for the rising tone. The main argument in favor of this interpretation is that for the falling accent, the L tone in the post-stress syllable occurs at around the midpoint of the vowel. This could be considered early for an intonational tone. In the case of the rising tone, the post-stress L tone occurs at around the three-quarter mark of the vowel, which could also be considered very tight timing for an intonational tone. This analysis is more in the spirit of Godjevac’s analysis, which posits H*+L for the falling tone and L*+H for the rising tone (note that the falling tone is not L*+HL in her system).
To complicate this discussion somewhat, it is important to note that the timing of the L tone in the post-stress syllable differs between the two lexical pitch accents, being later for the rising accent than for the falling accent. If we assume that this post-stress L tone is intonational, the default expectation would be that it would align similarly for the two lexical pitch accents, presumably at the right edge of the post-tonic syllable. Yet this is not what seems to be happening. The fact that the timing of the low is different between the two lexical accents might suggest that an L is part of the falling accent (thus giving HL) but not the rising accent (i.e., leaving the rising accent as LH instead of LHL). Of course, this does not preclude the possibility of an intonational Lx tone in both cases, thereby giving HL Lx for the falling tone and LH Lx for the rising tone. This is the first interpretation of the difference in timing of the post-stress L tone. An alternative interpretation, one that does not include the L as part of the rising accent (thus leaving simply LH instead of LHL), is that the default placement for the intonational low is at the midpoint of the post-stress vowel; however, the late realization of the high for the rising accent pushes the intonational low rightward (due to tonal crowding).
Our data do not allow us to tease out the behavior of the following L tone, but we suggest this as a possible area for future research. However, if one accepts that a fall is produced on the post-stress syllable of the rising accent, one could then be tempted to say that any such fall on the post-stress syllable is a remnant of the historical stress retraction—that is, the durational cues to stress retracted, while the pitch movement remained in its original place.
On a quite separate point, we note that the initial rise of the rising pitch accent on the stress syllable is a shallow rise. As mentioned above, we attribute this to a post-lexical high pitch accent at the left edge of the word, in line with Godjevac’s analysis of Serbo-Croatian (thus making the initial stress syllable a %H H(L) sequence for the falling accent, and a %H LH(L) sequence for the rising accent).
Crucially, timing patterns for the phonological peaks and troughs for the two tones are not conditioned by syllable structure, at least not in a way that suggests sensitivity to the intrinsic ability of different syllable types to support f0 information. The f0 maximum of the falling tone thus does not occur any earlier in a syllable that is open as opposed to one that ends in a coda obstruent. Nor do any peaks or troughs associated with the rising tone reliably shift based on syllable structure.
However, there is evidence for re-scaling of f0 range in short vowel falling accents, with higher minimum and lower maximum f0 in the stressed syllable, relative to the long vowel falling accent as predicted. In addition, the f0 peak of the falling accent is later (and more variable) for short vowels than for long vowels, as also predicted. On the other hand, as mentioned above, there is no consistent effect of syllable structure (i.e., open or closed) or type of coda (obstruent or sonorant).
In the case of the rising pitch accent, there is a blanket scaling effect such that all the pitch excursions are smaller than for the falling pitch accent. In addition, the high target for the rising accent is realized in the post-stressed syllable, which has the effect of mitigating tonal crowding between the low and high targets comprising the rise. Both of these effects were predicted on the basis of rising tones taking longer to execute than falling tones, an asymmetry that is mirrored by a typological bias against rising tones relative to falling tones. However, neither timing nor scaling is sensitive to syllable structure or vowel length for the rising accent. The timing of the f0 minimum for the rising accent is, in fact, remarkably stable across vowel length and syllable structure (at around 50% of vowel length).
One may then ask, why is there no effect of syllable structure and only a limited effect of vowel length on the temporal realization or scaling of the accent, localized to the asymmetric scaling of the high and low targets for short vs. long vowels for the rising accent? There are two possible answers to this question. The first answer relates to the temporal location of the pitch events and their frequency range. The timing of the f0 maximum for the falling tone is quite early in the vowel, at around the 20% timepoint, while the low target for the falling tone occurs in the post-tonic syllable. It is thus possible that syllable structure is less relevant. In other words, it is possible that if the high target was located later in the stressed syllable and/or if the low target occurred in the stressed syllable itself, the f0 maximum would be shifted leftwards if there were a following (obstruent) coda consonant. Relatedly, for the rising tone, the rise in the stressed syllable is not extensive in terms of f0, and the peak instead occurs (early) on the post-stressed syllable. As a result, the syllable structure of the stressed syllable is unlikely to affect f0 targets in the stressed syllable.
The second possible reason for the relatively fixed pitch accents of Croatian relates to syllable structure. As noted above, Croatian is extremely permissive in terms of syllable onsets (Kapović, 2023, pp. 216–221). Stop-stop clusters are permitted word-initially (in words such as ptica ‘bird’); stop-nasal clusters are permitted (in words such as dno ‘bottom’); nasal-lateral clusters are permitted (in words such as mlijeko ‘milk’); /tl/ clusters are permitted (in words such as tlak ‘pressure’), etc. This permissiveness may lead to a particular ambiguity in terms of word-internal syllable boundaries and may interact in some way with ambisyllabicity of word-internal consonants. These issues, of course, require further study. At this point, we may simply suggest that given the permissiveness of syllable onsets, having pitch targets timed for relatively early in the vowel avoids any timing problems that could arise from following consonant clusters. It could be argued that this is a chicken and egg situation, in that it is not clear if the pitch accents are timed early because of the ambiguity in word-internal syllable affiliation of consonants, or if the language is very permissive in terms of syllable onsets precisely because the pitch targets are fixed. Our feeling is that the former explanation is more likely, given that pitch targets tend to be quite variable according to dialect, as is exemplified by our study of the Split dialect when compared to other Croatian dialects.
The fact that vowel length seems to be the only property of the syllable that exerts an impact on the realization of the accents, and only in a limited capacity for the falling accent, accords with the importance of vowel length as a phonemically contrastive feature of Croatian, unlike syllable structure, which is not contrastive. The psychological salience of vowel length is also encoded in the orthographic separation of prosodeme by vowel length in addition to pitch accent type.
There is also one result that is quite intriguing in our data, but that does not follow from any of the hypotheses linking overall sonority to the capacity to support contour tones: Namely, after open syllables with a rising tone, the f0 maximum on the post-stressed syllable is higher than after closed syllables regardless of vowel length. This is perhaps the clearest effect we have seen in terms of syllable structure, and it occurs on the post-stressed syllable of a pitch accent whose f0 contour is largely realized on the post-stressed syllable. One could hypothesize that by having only one consonant between vowels, the speaker is better able to reach the ideal target on the second vowel. Viewed another way, the high target for the rising tone is not obscured by the single intervening consonant following an open syllable, unlike following a closed syllable where the actual pitch peak occurs during the two-consonant interval separating the stressed and the post-stress vowel. For this purpose, one could assume that it is the number of intervening consonants that is relevant rather than the nature of the consonants (obstruent or sonorant). This is an intriguing possibility that, to our knowledge, has not been fully explored but could be given the inclusion of intervocalic (voiced) consonant spans in the pitch measurements.
6. Conclusion
Results of an acoustic analysis of pitch timing and scaling of the rising and falling accents of Split Croatian support the existence of stable phonological targets for the two accents. The falling accent is consistently associated with a high f0 target early in the stressed syllable followed by a gradual fall to a low target at the end of the immediately post-tonic syllable. The rising accent, in contrast, carries a low tone on the stressed syllable, a high tone early in the post-tonic syllable and another low tone linked to the right edge of a prosodic domain that likely extends beyond the immediately post-tonic syllable—the result is a f0 rise followed by a fall.
Our data offers relatively little evidence of tonal accommodation based on syllable structure. The most significant finding is based on tone type: The degree of f0 excursion for the rising tone is reduced relative to that of the falling tone, and its pitch peak and subsequent fall is realized to the right of the stressed syllable, in keeping with the greater difficulty in implementing and perceiving rising pitch contours in comparison to falling ones. The presence or type of coda consonant does not impact the acoustic realization of the tone distinction. Vowel length only exerts a minor influence on pitch in the form of a reduction in the pitch difference between the high and low targets of the falling tone on short vowels relative to long vowels. The overall absence of an effect of syllable structure on pitch is plausibly attributed to the relative indeterminacy of syllable boundaries in Croatian, which undermines any potential role for the syllable in conditioning stable timing or scaling effects in the implementation of the pitch distinction.
Appendices
Appendix A. List of words used for this study. For prosodeme, LF – Long Falling, LR – Long Rising, SF – Short Falling, SR – Short Rising.
| Prosodeme | Orthography (with accents) | Phonemic | Syllabification | Post-tonic | Translation |
| LF | bȋrāni su | biːraːni su | Open | Long | they’re chosen |
| LF | plȃćenīci | plaːt͡ɕeniːt͡si | Open | Short | mercenaries |
| LF | sȗđeno īm ̮je | suːd͡ʑeno iːm je | Open | Short | it is meant to be |
| LF | vȇzāni su | veːzaːni su | Open | Long | they’re tied |
| LF | vrȃćeni su | vraːt͡ɕeni su | Open | Short | they’re returned |
| LF | kȗpljeni su | kuːpʎeni su | Obstruent | Short | they’re bought |
| LF | stȇgnūti su | steːɡnuːti su | Obstruent | Long | they’re tightened together |
| LF | trȗdnica | truːdnit͡sa | Obstruent | Short | pregnant woman |
| LF | dȋrnūti su | diːrnuːti su | Sonorant | Long | they’re touched |
| LF | Kȃrlovčāni | kaːrlovt͡ʃaːni | Sonorant | Short | inhabitants of Karlovac |
| LF | Vȋnkovčānka | viːnkovt͡ʃaːnka | Sonorant | Short | (female) inhabitant of Vinkovac |
| LR | Béčānka | beːt͡ʃaːnka | Open | Long | (female) inhabitant of Vienna |
| LR | Bráčānka | braːt͡ʃaːnka | Open | Long | (female) inhabitant of Brač |
| LR | bránili su ga | braːnili su ɡa | Open | Short | they defended him |
| LR | Hvárānka | hvaːraːnka | Open | Long | (female) inhabitant of Hvar |
| LR | náčēlnica | naːt͡ʃeːlnit͡sa | Open | Long | (female) head of country |
| LR | príčali su mu | priːt͡ʃali su mu | Open | Short | they told him |
| LR | rádili su ga | raːdili su ɡa | Open | Short | they did it |
| LR | vézali su ga | veːzali su ɡa | Open | Short | they tide him up |
| LR | vrátili su se | vraːtili su se | Open | Short | they returned |
| LR | jávljali su mu se | jaːvʎali su mu se | Obstruent | Short | they got in touch with him |
| LR | jávljānje | jaːvʎaːɲe | Obstruent | Long | getting in touch |
| LR | lúdnica | luːdnit͡sa | Obstruent | Short | madhouse |
| LR | pívnica | piːvnit͡sa | Obstruent | Short | brewery |
| LR | skítnica | skiːtnit͡sa | Obstruent | Short | vagabond |
| LR | súdnica | suːdnit͡sa | Obstruent | Short | courtroom |
| LR | trúbljēnje | truːbʎeːɲe | Obstruent | Long | trumpeting |
| LR | bólničārka | boːlnit͡ʃaːrka | Sonorant | Short | nurse |
| SF | gȍdina | ɡodina | Open | Short | year |
| SF | jȁgodica | jaɡodit͡sa | Open | Short | little strawberry |
| SF | pȍpīli su ga | popiːli su ɡa | Open | Long | they drank him |
| SF | prȉjatelji | prijateʎi | Open | Short | friends |
| SF | prȍdāli su ga | prodaːli su ɡa | Open | Long | they sold him |
| SF | rȕšili su ga | ruʃili su ɡa | Open | Short | they tore him down |
| SF | bȍckali su ga | bot͡skali su ɡa | Obstruent | Short | they pricked him |
| SF | klȁckali su se | klat͡skali su se | Obstruent | Short | they were see-sawing |
| SF | lȕpkali su ga | lupkali su ɡa | Obstruent | Short | they kicked him a bit |
| SF | nȁzvāli su ga | nazvaːli su ɡa | Obstruent | Long | they called him |
| SF | prȁvljeni su | pravʎeni su | Obstruent | Short | they’re made |
| SF | slȁvljeni su | slavʎeni su | Obstruent | Short | they’re celebrated |
| SF | stȁvljeni su | stavʎeni su | Obstruent | Short | they’re put |
| SF | tȉpkali su mu | tipkali su mu | Obstruent | Short | they typed to him |
| SF | slȍmljeni su | slomʎeni su | Sonorant | Short | they’re broken |
| SR | lòpatica | lopatit͡sa | Open | Short | spatula/shoulder blade |
| SR | òčūvāno je | ot͡ʃuːvaːno je | Open | Long | they are preserved |
| SR | Slàvōnija | slavoːnija | Open | Long | Slavonia |
| SR | vèčerali su | vet͡ʃerali su | Open | Short | they ate dinner |
| SR | àsfāltnī | asfaːltniː | Obstruent | Long | asphalt (adj.) |
| SR | àtlāntskī | atlaːntskiː | Obstruent | Long | Atlantic (adj.) |
| SR | dìvljali su | divʎali su | Obstruent | Short | they raved |
| SR | ìspipali su ga | ispipali su ɡa | Obstruent | Short | they groped him |
| SR | ìzvadili su ga | izvadili su ɡa | Obstruent | Short | they took him out |
| SR | nàpravili su ga | napravili su ɡa | Obstruent | Short | they made him |
| SR | nèprāvda | nepraːvda | Obstruent | Long | injustice |
| SR | pètljali su se | petʎali su se | Obstruent | Short | they got mixed in |
| SR | škàkljali su ga | ʃkakʎali su ɡa | Obstruent | Short | they tickled him |
| SR | zàbrānjeni su | zabraːɲeni su | Obstruent | Long | forbidden |
| SR | Dàlmācija | dalmaːt͡sija | Sonorant | Long | Dalmatia |
| SR | grànčica | ɡrant͡ʃit͡sa | Sonorant | Short | little branch |
| SR | màlvāzija | malvaːzija | Sonorant | Long | type of wine |
| SR | tàmničāri | tamnit͡ʃaːri | Sonorant | Short | jailers |
Appendix B. GAM-smoothed f0 traces plotted separately for each word. Data are from 13 speakers. F0 traces are presented separately for the stress syllable and the post-stress syllable. F0 traces are colour-coded according to syllable type: Open (navy), Obstruent (brown) and Sonorant (cyan).
Appendix C. Results from a Linear Mixed Effects Model examining the effect of Syllable Weight on Duration. Separate tests are conducted for Rising and Falling accents; Short and Long prosodemes (i.e., quantity of the stress vowel); and Stress and Post-stress syllables. The reference for Syllable Weight is Obstruent. Significance is marked in bold, and is set at 0.0065 (a Bonferroni adjustment of 0.05 based on eight separate tests: two accents, times two stress vowel quantities, times two syllable positions). The R command was:
lme(Duration~SyllableWeight, data=data, random=~1|speaker/words)
| STRESS SYLLABLE | ||||||
| Short Falling | Std.Error | DF | t-value | p-value | ||
| Duration (ms) | 91.8 | 2.94 | 507 | 30.87 | <0.0001 | |
| SyllWeight: Open | –2.9 | 2.96 | 179 | –1.00 | 0.3185 | |
| SyllWeight: Sonorant | –9.9 | 5.85 | 179 | –1.69 | 0.0919 | |
| Long Falling | ||||||
| Duration (ms) | 132.2 | 5.84 | 380 | 22.62 | <0.0001 | |
| SyllWeight: Open | 22.4 | 4.17 | 128 | 5.38 | <0.0001 | |
| SyllWeight: Sonorant | 8.1 | 4.64 | 128 | 1.74 | 0.0832 | |
| Short Rising | ||||||
| Duration (ms) | 96.5 | 3.19 | 670 | 30.24 | <0.0001 | |
| SyllWeight: Open | 1.9 | 2.69 | 219 | 0.71 | 0.4752 | |
| SyllWeight: Sonorant | 11.9 | 2.69 | 219 | 4.44 | <0.0001 | |
| Long Rising | ||||||
| Duration (ms) | 137.4 | 5.13 | 651 | 26.76 | <0.0001 | |
| SyllWeight: Open | 16.3 | 2.31 | 206 | 7.05 | <0.0001 | |
| SyllWeight: Sonorant | 7.4 | 4.90 | 206 | 1.51 | 0.1318 | |
| POST-STRESS SYLLABLE | ||||||
| Short Falling | ||||||
| Duration (ms) | 78.0 | 3.43 | 454 | 22.70 | <0.0001 | |
| SyllWeight: Open | –1.9 | 1.89 | 177 | –1.04 | 0.2967 | |
| SyllWeight: Sonorant | 8.6 | 3.64 | 177 | 2.36 | 0.0191 | |
| Long Falling | ||||||
| Duration (ms) | 76.1 | 3.85 | 345 | 19.74 | <0.0001 | |
| SyllWeight: Open | 6.7 | 3.39 | 127 | 1.97 | 0.0500 | |
| SyllWeight: Sonorant | –9.5 | 3.78 | 127 | –2.51 | 0.0133 | |
| Short Rising | ||||||
| Duration (ms) | 99.8 | 4.88 | 551 | 20.45 | <0.0001 | |
| SyllWeight: Open | 5.1 | 5.81 | 216 | 0.88 | 0.3760 | |
| SyllWeight: Sonorant | 9.5 | 5.81 | 216 | 1.65 | 0.1002 | |
| Long Rising | ||||||
| Duration (ms) | 90.1 | 4.07 | 22.14 | 1196 | <0.0001 | |
| SyllWeight: Open | –2.8 | 3.48 | –0.81 | 205 | 0.4138 | |
| SyllWeight: Sonorant | –21.7 | 7.33 | –2.96 | 205 | 0.0034 | |
Appendix D. Results from a Linear Mixed Effects Model examining the interaction between Quantity and Syllable Weight for the measures examined in this study. Separate tests are conducted for Rising and Falling accents; Stress and Post-stress syllables; and pitch Minima and Maxima. The reference for Quantity is Long; and the reference for Syllable Weight is Obstruent. Significance is marked in bold, and is set at 0.0065 (a Bonferroni adjustment of 0.05 based on eight separate tests: two accents, times two syllable positions, times two pitch events). The R command was:
lme(Measure~Quantity*SyllableWeight, random=(~1|speaker/words)
| Falling | ||||||
|
Stress syllable Minimum |
Std.Error | DF | t-value | p-value | ||
| f0 (Hz) | 181.75 | 7.947 | 695 | 22.87 | <0.0001 | |
| QuantityShort | 11.01 | 2.694 | 307 | 4.09 | <0.0001 | |
| SyllWeightOpen | 1.57 | 2.874 | 307 | 0.55 | 0.5851 | |
| SyllWeightSonorant | 4.15 | 3.193 | 307 | 1.30 | 0.1944 | |
| QuantityShort: SyllWeightOpen | –1.68 | 3.527 | 307 | –0.48 | 0.6349 | |
| QuantityShort: SyllWeightSonorant | –5.48 | 5.06 | 307 | –1.08 | 0.2797 | |
| Normalized Time | 0.57 | 0.041 | 695 | 13.84 | <0.0001 | |
| QuantityShort | –0.06 | 0.045 | 307 | –1.27 | 0.2042 | |
| SyllWeightOpen | –0.01 | 0.048 | 307 | –0.27 | 0.7872 | |
| SyllWeightSonorant | 0.00 | 0.054 | 307 | –0.04 | 0.9645 | |
| QuantityShort: SyllWeightOpen | –0.03 | 0.059 | 307 | –0.44 | 0.6605 | |
| QuantityShort: SyllWeightSonorant | 0.08 | 0.085 | 307 | 0.99 | 0.3214 | |
| Raw Time (ms) | 83.29 | 5.853 | 695 | 14.23 | <0.0001 | |
| QuantityShort | –29.89 | 6.490 | 307 | –4.60 | <0.0001 | |
| SyllWeightOpen | 13.33 | 6.923 | 307 | 1.93 | 0.0551 | |
| SyllWeightSonorant | 7.00 | 7.694 | 307 | 0.91 | 0.3634 | |
| QuantityShort: SyllWeightOpen | –15.17 | 8.501 | 307 | –1.78 | 0.0753 | |
| QuantityShort: SyllWeightSonorant | –2.50 | 12.201 | 307 | –0.20 | 0.8378 | |
|
Stress syllable Maximum |
||||||
| f0 (Hz) | 214.68 | 7.673 | 886 | 27.98 | <0.0001 | |
| QuantityShort | –8.97 | 2.106 | 319 | –4.26 | <0.0001 | |
| SyllWeightOpen | –12.42 | 2.261 | 319 | –5.49 | <0.0001 | |
| SyllWeightSonorant | –6.28 | 2.504 | 319 | –2.51 | 0.0126 | |
| QuantityShort: SyllWeightOpen | 11.15 | 2.804 | 319 | 3.98 | <0.0001 | |
| QuantityShort: SyllWeightSonorant | 1.69 | 4.161 | 319 | 0.41 | 0.6855 | |
| Normalized Time | 0.16 | 0.022 | 886 | 7.42 | <0.0001 | |
| QuantityShort | 0.12 | 0.026 | 319 | 4.55 | <0.0001 | |
| SyllWeightOpen | 0.07 | 0.028 | 319 | 2.41 | 0.0166 | |
| SyllWeightSonorant | 0.05 | 0.030 | 319 | 1.67 | 0.0961 | |
| QuantityShort: SyllWeightOpen | –0.04 | 0.034 | 319 | –1.13 | 0.2579 | |
| QuantityShort: SyllWeightSonorant | –0.01 | 0.051 | 319 | –0.21 | 0.8312 | |
| Raw Time | 24.15 | 2.950 | 886 | 8.19 | <0.0001 | |
| QuantityShort | 5.91 | 3.286 | 319 | 1.80 | 0.0730 | |
| SyllWeightOpen | 15.03 | 3.525 | 319 | 4.26 | <0.0001 | |
| SyllWeightSonorant | 9.06 | 3.894 | 319 | 2.33 | 0.0206 | |
| QuantityShort: SyllWeightOpen | –12.25 | 4.368 | 319 | –2.81 | 0.0053 | |
| QuantityShort: SyllWeightSonorant | –7.63 | 6.475 | 319 | –1.18 | 0.2398 | |
| Falling | ||||||
|
Post-Stress syllable Minimum |
Std.Error | DF | t-value | p-value | ||
| f0 (Hz) | 165.57 | 7.222 | 751 | 22.92 | <0.0001 | |
| QuantityShort | –4.64 | 1.326 | 311 | –3.5 | <0.0001 | |
| SyllWeightOpen | –3.22 | 1.430 | 311 | –2.25 | 0.0250 | |
| SyllWeightSonorant | –2.60 | 1.610 | 311 | –1.61 | 0.1074 | |
| QuantityShort: SyllWeightOpen | 1.43 | 1.797 | 311 | 0.80 | 0.4252 | |
| QuantityShort: SyllWeightSonorant | 0.78 | 2.585 | 311 | 0.30 | 0.7626 | |
| Normalized Time | 0.53 | 0.026 | 751 | 20.38 | <0.0001 | |
| QuantityShort | 0.06 | 0.030 | 311 | 2.08 | 0.0385 | |
| SyllWeightOpen | 0.11 | 0.032 | 311 | 3.30 | 0.0011 | |
| SyllWeightSonorant | –0.01 | 0.036 | 311 | –0.35 | 0.7258 | |
| QuantityShort: SyllWeightOpen | –0.08 | 0.040 | 311 | –2.07 | 0.0390 | |
| QuantityShort: SyllWeightSonorant | –0.07 | 0.058 | 311 | –1.24 | 0.2158 | |
| Raw Time (ms) | 49.59 | 3.341 | 751 | 14.84 | <0.0001 | |
| QuantityShort | 6.82 | 3.315 | 311 | 2.06 | 0.0404 | |
| SyllWeightOpen | 14.31 | 3.577 | 311 | 4.00 | <0.0001 | |
| SyllWeightSonorant | –6.05 | 4.018 | 311 | –1.51 | 0.1332 | |
| QuantityShort: SyllWeightOpen | –13.03 | 4.482 | 311 | –2.91 | 0.0039 | |
| QuantityShort: SyllWeightSonorant | 2.70 | 6.466 | 311 | 0.42 | 0.6769 | |
|
Post-Stress syllable Maximum |
||||||
| f0 (Hz) | 170.60 | 7.719 | 799 | 22.10 | <0.0001 | |
| QuantityShort | 0.78 | 2.345 | 316 | 0.33 | 0.7390 | |
| SyllWeightOpen | 3.78 | 2.526 | 316 | 1.50 | 0.1358 | |
| SyllWeightSonorant | 2.15 | 2.815 | 316 | 0.76 | 0.4458 | |
| QuantityShort: SyllWeightOpen | –2.07 | 3.154 | 316 | –0.66 | 0.5129 | |
| QuantityShort: SyllWeightSonorant | –11.31 | 4.561 | 316 | –2.48 | 0.0137 | |
| Normalized Time | 0.43 | 0.029 | 799 | 14.68 | <0.0001 | |
| QuantityShort | –0.04 | 0.031 | 316 | –1.38 | 0.1689 | |
| SyllWeightOpen | –0.05 | 0.033 | 316 | –1.49 | 0.1370 | |
| SyllWeightSonorant | –0.08 | 0.037 | 316 | –2.12 | 0.0351 | |
| QuantityShort: SyllWeightOpen | 0.08 | 0.042 | 316 | 1.99 | 0.0472 | |
| QuantityShort: SyllWeightSonorant | 0.16 | 0.060 | 316 | 2.67 | 0.0080 | |
| Raw Time | 41.49 | 3.472 | 799 | 11.95 | <0.0001 | |
| QuantityShort | –3.21 | 3.392 | 316 | –0.95 | 0.3452 | |
| SyllWeightOpen | –2.28 | 3.657 | 316 | –0.62 | 0.5337 | |
| SyllWeightSonorant | –10.99 | 4.072 | 316 | –2.70 | 0.0073 | |
| QuantityShort: SyllWeightOpen | 4.76 | 4.572 | 316 | 1.04 | 0.2985 | |
| QuantityShort: SyllWeightSonorant | 22.91 | 6.589 | 316 | 3.48 | 0.0006 | |
| Rising | ||||||
|
Stress syllable Minimum |
Std.Error | DF | t-value | p-value | ||
| f0 (Hz) | 182.97 | 7.034 | 1265 | 26.01 | <0.0001 | |
| QuantityShort | 1.32 | 0.847 | 437 | 1.56 | 0.1202 | |
| SyllWeightOpen | –4.26 | 0.859 | 437 | –4.96 | <0.0001 | |
| SyllWeightSonorant | –0.64 | 1.83 | 437 | –0.35 | 0.7262 | |
| QuantityShort: SyllWeightOpen | 2.15 | 1.342 | 437 | 1.60 | 0.1104 | |
| QuantityShort: SyllWeightSonorant | –3.12 | 2.096 | 437 | –1.49 | 0.1378 | |
| Normalized Time | 0.50 | 0.023 | 1265 | 21.98 | <0.0001 | |
| QuantityShort | 0.00 | 0.017 | 437 | 0.14 | 0.8858 | |
| SyllWeightOpen | –0.02 | 0.017 | 437 | –0.87 | 0.3841 | |
| SyllWeightSonorant | 0.02 | 0.037 | 437 | 0.43 | 0.6667 | |
| QuantityShort: SyllWeightOpen | 0.03 | 0.027 | 437 | 1.00 | 0.3191 | |
| QuantityShort: SyllWeightSonorant | –0.01 | 0.043 | 437 | –0.31 | 0.7534 | |
| Raw Time (ms) | 76.57 | 3.068 | 1265 | 24.95 | <0.0001 | |
| QuantityShort | –18.78 | 2.635 | 437 | –7.13 | <0.0001 | |
| SyllWeightOpen | 5.70 | 2.672 | 437 | 2.13 | 0.0335 | |
| SyllWeightSonorant | 7.80 | 5.693 | 437 | 1.37 | 0.1716 | |
| QuantityShort: SyllWeightOpen | –4.18 | 4.178 | 437 | –1.00 | 0.3180 | |
| QuantityShort: SyllWeightSonorant | –1.61 | 6.524 | 437 | –0.25 | 0.8052 | |
|
Stress syllable Maximum |
||||||
| f0 (Hz) | 189.64 | 7.210 | 1321 | 26.3 | <0.0001 | |
| QuantityShort | 4.09 | 1.073 | 437 | 3.81 | 0.0002 | |
| SyllWeightOpen | –4.55 | 1.096 | 437 | –4.16 | <0.0001 | |
| SyllWeightSonorant | –2.51 | 2.328 | 437 | –1.08 | 0.2814 | |
| QuantityShort: SyllWeightOpen | 0.10 | 1.695 | 437 | 0.06 | 0.9535 | |
| QuantityShort: SyllWeightSonorant | –4.73 | 2.664 | 437 | –1.77 | 0.0768 | |
| Normalized Time | 0.56 | 0.025 | 1321 | 22.36 | <0.0001 | |
| QuantityShort | –0.14 | 0.024 | 437 | –5.93 | <0.0001 | |
| SyllWeightOpen | 0.01 | 0.024 | 437 | 0.35 | 0.7272 | |
| SyllWeightSonorant | –0.19 | 0.051 | 437 | –3.76 | 0.0002 | |
| QuantityShort: SyllWeightOpen | 0.04 | 0.037 | 437 | 1 | 0.3186 | |
| QuantityShort: SyllWeightSonorant | 0.25 | 0.059 | 437 | 4.19 | <0.0001 | |
| Raw Time (ms) | 89.20 | 4.898 | 1321 | 18.21 | <0.0001 | |
| QuantityShort | –40.44 | 4.063 | 437 | –9.95 | <0.0001 | |
| SyllWeightOpen | 10.06 | 4.151 | 437 | 2.42 | 0.0157 | |
| SyllWeightSonorant | –28.91 | 8.817 | 437 | –3.28 | 0.0011 | |
| QuantityShort: SyllWeightOpen | –1.90 | 6.419 | 437 | –0.3 | 0.7670 | |
| QuantityShort: SyllWeightSonorant | 40.61 | 10.09 | 437 | 4.02 | <0.0001 | |
| Rising | ||||||
|
Post-Stress syllable Minimum |
Std.Error | DF | t-value | p-value | ||
| f0 (Hz) | 174.94 | 7.751 | 869 | 22.57 | <0.0001 | |
| QuantityShort | –2.48 | 1.689 | 415 | –1.47 | 0.1431 | |
| SyllWeightOpen | –2.27 | 1.693 | 415 | –1.34 | 0.1800 | |
| SyllWeightSonorant | 6.76 | 3.441 | 415 | 1.97 | 0.0501 | |
| QuantityShort: SyllWeightOpen | 2.66 | 2.638 | 415 | 1.01 | 0.3130 | |
| QuantityShort: SyllWeightSonorant | –9.14 | 3.968 | 415 | –2.30 | 0.0218 | |
| Normalized Time | 0.61 | 0.020 | 869 | 30.18 | <0.0001 | |
| QuantityShort | 0.04 | 0.026 | 415 | 1.46 | 0.1443 | |
| SyllWeightOpen | 0.08 | 0.026 | 415 | 3.14 | 0.0018 | |
| SyllWeightSonorant | 0.04 | 0.051 | 415 | 0.86 | 0.3909 | |
| QuantityShort: SyllWeightOpen | –0.04 | 0.040 | 415 | –1.07 | 0.2871 | |
| QuantityShort: SyllWeightSonorant | –0.02 | 0.059 | 415 | –0.33 | 0.7447 | |
| Raw Time (ms) | 67.70 | 4.504 | 869 | 15.03 | <0.0001 | |
| QuantityShort | 10.39 | 4.583 | 415 | 2.27 | 0.0238 | |
| SyllWeightOpen | 7.42 | 4.602 | 415 | 1.61 | 0.1078 | |
| SyllWeightSonorant | –10.83 | 9.401 | 415 | –1.15 | 0.2498 | |
| QuantityShort: SyllWeightOpen | 1.82 | 7.165 | 415 | 0.25 | 0.7999 | |
| QuantityShort: SyllWeightSonorant | 20.49 | 10.833 | 415 | 1.89 | 0.0593 | |
|
Post-Stress syllable Maximum |
||||||
| f0 (Hz) | 186.15 | 8.083 | 1092 | 23.03 | <0.0001 | |
| QuantityShort | 7.83 | 2.112 | 433 | 3.71 | 0.0002 | |
| SyllWeightOpen | 12.47 | 2.134 | 433 | 5.84 | <0.0001 | |
| SyllWeightSonorant | 7.41 | 4.450 | 433 | 1.66 | 0.0967 | |
| QuantityShort: SyllWeightOpen | –4.51 | 3.290 | 433 | –1.37 | 0.1712 | |
| QuantityShort: SyllWeightSonorant | –12.99 | 5.100 | 433 | –2.55 | 0.0112 | |
| Normalized Time | 0.34 | 0.023 | 1092 | 14.42 | <0.0001 | |
| QuantityShort | –0.04 | 0.024 | 433 | –1.58 | 0.1140 | |
| SyllWeightOpen | –0.12 | 0.024 | 433 | –4.74 | <0.0001 | |
| SyllWeightSonorant | –0.11 | 0.051 | 433 | –2.22 | 0.0268 | |
| QuantityShort: SyllWeightOpen | 0.01 | 0.038 | 433 | 0.30 | 0.7633 | |
| QuantityShort: SyllWeightSonorant | 0.18 | 0.058 | 433 | 3.09 | 0.0021 | |
| Raw Time (ms) | 37.20 | 3.009 | 1092 | 12.36 | <0.0001 | |
| QuantityShort | –1.98 | 3.528 | 433 | –0.56 | 0.5756 | |
| SyllWeightOpen | –13.67 | 3.558 | 433 | –3.84 | <0.0001 | |
| SyllWeightSonorant | –17.14 | 7.405 | 433 | –2.31 | 0.0211 | |
| QuantityShort: SyllWeightOpen | 2.36 | 5.488 | 433 | 0.43 | 0.6680 | |
| QuantityShort: SyllWeightSonorant | 32.43 | 8.489 | 433 | 3.82 | 0.0002 | |
Notes
- The standard Western South Slavic accentology notation is different to the IPA notation of pitch and quantity (although there are different variants of notation in Slavic linguistics as well). For the reader’s reference, we provide the following table:
[^]
Western South Slavicist IPA Description Our paper e e unstressed short vowel – ē eː unstressed long vowel – è ě short vowel, rising tone SR é ěː long vowel, rising tone LR ȅ ê short vowel, falling tone SF ȇ êː long vowel, falling tone LF - Croatian varieties vary in the functional load of tone, with some even lacking lexical tone. For instance, the urban Croatian dialects of Zagreb and Rijeka, Croatian—not much different in its modern forms than the Split variety—is spoken with a stress accent without any tone or length distinctions (Kapović, 2023, pp. 245–247). Thus, all the words in Table 1 simply have stress on the first syllable. [^]
- Given the origin of the rising accent as a reflex of leftward stress shift, it is thus limited to non-final syllables. This reflects an all-else-is-not-equal case in which tonal contrasts are more, rather than less, restricted in final syllables for historical reasons. [^]
- In the Split variety of Croatian, the short rising prosodeme has an optional variant called a “double accent,” which has an f0 fall on the post-stressed syllable, after the initial rising tone on the stressed syllable. Duration and energy cues to stress are spread more evenly across the stress and post-stressed syllable–the interested reader is referred to Tabain et al. (2022) for further information regarding this optional variant. For the purposes of the present study, we simply note the possibility of this variant but do not investigate it further, since it is relatively rare in our data and highly dependent on individual speaker. [^]
- BCMS belongs to the South Slavic branch, which also includes Slovene, Macedonian and Bulgarian. Together with the Western and Eastern Slavic languages, South Slavic languages are ultimately descended from Proto-Slavic. [^]
- In order to produce this plot, values lower than 50 Hz and greater than 350 Hz were removed, given that these were likely to represent tracking errors for the speakers in our database. The tracked data plots are across the entire vowel duration and are not time-normalized. Details of the custom predict function used in this plot can be found at https://gist.github.com/richardbeare/b679c38dcb644ec50ea34ac061200504. [^]
Acknowledgements
Thank you to two anonymous reviewers as well as the associate editor and editor for their helpful comments on earlier versions of this paper. We would like to thank our speakers for their time and dedication to language research (Ena Marinković, Ivana Ujević, Jelena Pocedulić, Josipa Papeš, Josipa Teskera, Magdalena Andromak, Mia Mršić, Nuša Vrdoljak, Tea Domić, and four other speakers who did not wish to have their full names published), and Jordan Bićanić for help with the recordings. The ethics protocols for this work were approved by the Office of Research at the University of California, Santa Barbara. Financial support was provided by the School of Humanities and Social Sciences and the Centre for Research on Language Diversity at La Trobe University.
Competing interests
The authors have no competing interests to declare.
References
Browne, E. W., & McCawley, J. D. (1965). Srpskohrvatski akcenat. [Serbo-Croatian accent] Zbornik za filologiju i lingvistiku [Journal of Philology and Linguistics], 8, 147–151.
Bruce, G. (1977). Swedish word accents in sentence perspective. Gleerup.
Carlton, T. R. (1991). Introduction to the phonological history of the Slavic languages. Slavica Publishers.
Derksen, R. (1996). Metatony in Baltic. Rodopi. http://doi.org/10.1163/9789004653740
Dutcher, K., & Paster, M. (2008). Contour tone distribution in Luganda. In N. Abner, & J. Bishop (Eds.), Proceedings of the 27th West Coast Conference on Formal Linguistics (pp.123–131). Cascadilla Proceedings Project.
Godjevac, S. (2005). Transcribing Serbo-Croatian intonation. In S.-A. Jun (Ed.), Prosodic typology (pp. 146–171). Oxford University Press. http://doi.org/10.1093/acprof:oso/9780199249633.003.0006
Gordon, M. (2001a). A typology of contour tone restrictions. Studies in Language, 25, 405–444. http://doi.org/10.1075/sl.25.3.03gor
Gordon, M. (2001b). The tonal basis of final weight criteria, Chicago Linguistics Society, 36, 141–156.
Gordon, M. (2008). Pitch accent timing and scaling in Chickasaw. Journal of Phonetics, 36, 521–535. http://doi.org/10.1016/j.wocn.2006.10.003
Gordon, M. (2016). Phonological typology. Oxford University Press. http://doi.org/10.1093/acprof:oso/9780199669004.001.0001
Greenberg, M. L. (2003). Word prosody in Slovene from a typological perspective. Sprachtypologie und Universalienforschung [Language Typology and Universals Research], 56/3, 234–251. http://doi.org/10.1524/stuf.2003.56.3.234
House, D. (1990). Tonal perception in speech. Lund University Press.
Hyman, L. (1977). On the nature of linguistic stress. In L. Hyman (Ed.), USC studies in stress and accent (pp. 37–82). USC Linguistics Department.
Hyman, L. (1988). Syllable structure constraints on tonal contours. Linguistique Africaine, 1, 49–60.
Hyman, L., & Schuh, R. G. (1974). Universals of tone rules: Evidence from West Africa. Linguistic Inquiry, 5, 81–115.
Inkelas, S., & Zec, D. (1988). Serbo-Croatian pitch accent: The interaction of tone, stress, and intonation. Language, 64, 227–248. http://doi.org/10.2307/415433
Kapović, M. (2015). Povijest hrvatske akcentuacije. Fonetika [The history of Croatian accentuation. Phonetics], Matica hrvatska.
Kapović, M (2023). Uvod u fonologiju [Introduction to phonology]. Zagreb: Sandorf.
Kenstowicz, M. (1972). Lithuanian phonology. Studies in the Linguistic Sciences, 2, 1–85.
Kisler, T., Reichel, U., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347. http://doi.org/10.1016/j.csl.2017.01.005
Köhnlein, B., & Cameron, I. S. (2024). What word-prosodic typology is missing: Motivating foot structure as an analytical tool for syllable-internal prosodic oppositions. Natural Language and Linguistic Theory, 42, 1043–1079. http://doi.org/10.1007/s11049-023-09602-4
Langston, K. (2018). Prescriptive accentual norms vs. usage in Croatian: An acoustic study of standard pronunciation. Journal of Slavic Linguistics, 26(2), 245–305. http://doi.org/10.1353/jsl.2018.0009
Lehiste, I., & Ivić, P. (1986). Word sentence prosody in SerboCroatian. MIT Press.
Liberman, M., & Pierrehumbert, J. (1984). Intonational invariance under changes in pitch range and length. In M. Aronoff, & R. Oehrle (Eds.), Language sound structure: Studies in phonology presented to Morris Halle (pp. 157–233). MIT Press.
Maddieson, I. (1985). Phonetic cues to syllabification. In V. Fromkin (Ed.), Phonetic linguistics: Essays in honor of Peter Ladefoged (pp. 203–221). Academic Press.
Magner, T. F. (1978). City dialects in Yugoslavia. American Contributions to the Eighth International Congress of Slavists (Zagreb and Ljubljana, September 3–9, 1978) (pp. 465–482). Slavica Publishers.
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., & R Core Team (2016). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3. 1–128, http://CRAN.R-project.org/package=nlme
Pletikos, E. (2008). Akustički opis Hrvatske prosodije riječi. [An acoustic description of Croatian word prosody] [Doctoral dissertation, University of Zagreb].
Purcell, E. T. (1973). The realizations of Serbo-Croatian accents in sentence environments. Hamburger phonetische Beiträge 8. Helmut Buske Verlag.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Reh, M. (1985). Die Krongo-Sprache (Nìinò-mó-dì): Beschreibung, Texte, Wörterverzeichnis. D. Reimer.
Rešetar, M. (1900). Die serbokroatische Betonung südwestlicher Mundarten, Alfred Hölder, K. u K. Hof- und Universitäts-Buchhandler, Wien.
Senn, A. (1957–66). Handbuch der litauischen Sprache [Handbook of the Lithuanian language]. C. Winter.
Sjölander, K. (2014). Snack sound toolkit [Programming library]. KTH Royal Institute of Technology. http://www.speech.kth.se/snack
Smiljanić, R. (2004). Lexical, pragmatic, and positional effects on prosody in two dialects of Croatian and Serbian: An acoustic study. Routledge.
Smiljanić, R., & Hualde, J. I. (2000). Lexical and pragmatic functions of tonal alignment in two Serbo-Croatian dialects. In A. Okrent & J. Boyle (Eds.), Chicago Linguistic Society 36.1, 469–482. Chicago Linguistic Society.
Soetaert, K. (2009). rootSolve: Nonlinear root finding, equilibrium and steady-state analysis of ordinary differential equations. R-package version 1.6. http://doi.org/10.32614/CRAN.package.rootSolve
Soetaert, K., & Herman, P. M. J. (2009). A Practical guide to ecological modelling. Using R as a simulation platform. Springer. http://doi.org/10.1007/978-1-4020-8624-3
Tabain, M., Kapović, M., Gordon, M., Gregory, A., & Beare, R. (2022). A preliminary study of lexical pitch accents in the Split dialect of Croatian. 18th Australasian International Conference on Speech Science and Technology. Canberra, Australia, 206–210.
Vicenik, C., Lin, S., Keating, P., & Shue, Y.-L. (2020). Online documentation for VoiceSauce. http://www.phonetics.ucla.edu/voicesauce/documentation/index.html
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. http://doi.org/10.1007/978-3-319-24277-4_9
Winkelmann, R, Harring, J., & Jänsch, K. (2017). EMU-SDMS: Advanced speech database management and analysis in R. Computer Speech & Language, 45, 392–410. http://doi.org/10.1016/j.csl.2017.01.002
Winkelmann, R., Jaensch, K., Cassidy, S., & Harrington, J. (2019). emuR: Main Package of the EMU Speech Database Management System. R package version 2.0.4.
Zhang, J. (2002). The effects of duration and sonority on contour tone distribution: Typological survey and formal analysis. Routledge.
Zintchenko Jurlina, J. (2013). Wortakzent im Kroatischen [Word accent in Croatian] [Master’s thesis, Johann Wolfgang Goethe-Universität Frankfurt am Main].
Zintchenko Jurlina, J. (2019). The production of lexical tone in Croatian [Doctoral dissertation, Johann Wolfgang Goethe-Universität Frankfurt am Main].
Zsiga, E., & Zec, D. (2012). Contextual evidence for the representation of pitch accents in Standard Serbian. Language and Speech, 56, 69–104. http://doi.org/10.1177/0023830912440792

















