1 Introduction. Why we adopt AM/ToBI notation

The advent of the Autosegmental-Metrical framework of phonological analysis of intonation (henceforth, AM framework; Beckman et al., 2005; Gussenhoven, 2004; Jun, 2005; Ladd, 2008a; Pierrehumbert, 1980; Pierrehumbert & Beckman, 1988, among others) has considerably deepened our knowledge of the intonational phonology of many languages. Later, ToBI (Tones and Break Indices) was developed as a consensus system for labelling spoken utterances to mark phonologically contrastive intonational events based on the Autosegmental-Metrical model of intonational phonology. Even though it was originally designed for English (Pitrelli et al., 1994), it has become a general framework for the development of prosodic annotation systems at the phonological level. The use of the ToBI annotation conventions has become widespread and ToBI annotation systems have been developed for a number of typologically diverse languages (e.g., Beckman et al., 2005, for English; Gussenhoven, 2005, for Dutch; Grice et al., 2005, for German; Venditti, 2005, for Japanese; Arvaniti & Baltazani, 2005, for Greek; Jun, 2000, for Korean; Gordon, 2005, for Chickasaw; Khan, 2014, for Bengali; Elordieta & Hualde, 2014, for Basque; Frota, 2014, for Portuguese; Prieto, 2014, for Catalan; Beckman et al., 2002, and Hualde & Prieto, 2015, for Spanish). Two recent volumes edited by Sun-Ah Jun (2005, 2014) include detailed surveys of the intonational phonology of a set of 13 and 14 typologically diverse languages, respectively. This work has demonstrated that the AM framework of prosodic analysis and its formalization in ToBI annotation systems can be applied successfully to typologically diverse languages.

A few words are in order regarding the status of ToBI labels. In its origin, ToBI was a system of conventions designed to allow searches within prosodically annotated English corpora (see Beckman et al., 2005; Pitrelli et al., 1994). In much recent work, however, ToBI-style labels have been used to provide a phonological analysis of the intonational system of individual languages. In this sense, it can be said that ToBI-style notation has acquired the status of a language-specific phonological representation of intonational events. Thus, the ToBI annotation systems proposed for each language are assumed to be based on a well-established body of research on intonational phonology for that language. ToBI annotations in a given language thus reflect the current state of knowledge of intonational phonology in that language. Up to now, one of the clear benefits of transcribing prosodic information in a particular language with ToBI is that the system allows for a continuous assessment of the contrastive prosodic patterns of the target languages.

Nowadays, there is an ample consensus among researchers (and developers of ToBI systems across languages) on the basic tenets of the AM model, namely that prominence and phrasing are two key aspects of the intonation systems of languages. Connected to these two notions, a set of phonologically contrastive pitch events—pitch accents and boundary tones, respectively for prominence and phrasing—may be defined. Thus the ToBI conventions establish four layers of labeling (words, tones, break indices, and miscellaneous information) which are aligned with the speech signal. In the tones tier, intonation contours in a language like English are described as a sequence of phonologically distinctive tonal units (represented with High and Low targets and their combinations) that are associated with metrically prominent units and with phrasal boundaries. This phonological representation of tones is mapped onto phonetic representation through language-particular implementation rules. The break index tier represents the prosodic structure of the language through numerical indices that indicate degrees of disjuncture between any two words.

An important research goal of the prosodic analysis of diverse languages, and indeed of AM/ToBI studies, is to “establish a complete picture of a prosodic typology” (Jun, 2005, p. 5, Jun, 2014). Some authors have argued that prosodic typology can be performed on crosslinguistic comparisons of prosodic systems described at the underlying, phonological level (e.g., Gussenhoven, 2007, 2011; Hyman, 2012, and references therein).1 However, in spite of decades of work on the intonation of several languages within the common framework provided by the Autosegmental-Metrical model, studies on comparative prosodic typology are still very scarce (see Jun’s (2005) model of prosodic typology, and its revision in Jun, 2014). Ladd (2008b, p. 373), in his review of Jun’s (2005) edited volume, highlighted the problems of proposing a typology based on the comparison of abstract categories only: “The problem is that in order to do typology, you have to have a set of agreed descriptions cast in comparable terms. That kind of consensus is still lacking in the description of prosody. The broad AM approach is certainly leading us toward such a consensus, but we’re not there yet —it’s only the practical and collegial cohesiveness of the ToBI movement that makes the progress seem greater than it is. Eventually we will have to confront the disagreements more explicitly”.2 In our view some of the difficulties that comparative work on prosody is facing nowadays arise in great part from two characteristics of the actual application of the AM model to different languages. A first problem is the adoption of language-specific phonological labels and phonetic implementation rules that do not necessarily take into account crosslinguistic transparency. As suggested by Ladd (2008b) some years ago, there is still a need for consistent phonological analyses within and across languages. A second issue is the ambiguity present in some ToBI annotations, which often seem to be a compromise between broad phonetic and phonological levels of transcription (see Sections 4 and 5 below). Jun and Fletcher (2014, pp. 518–519) pointed out that in order to avoid such ambiguities (and also the problem of building typological comparisons on language-specific phonological labels), ToBI systems of one language should include very careful descriptions of the phonetic realizations of each tonal category together with the contexts where the surface form is realized. The fact is that, at present, it is difficult for a linguist to find out how, say, Spanish differs from English or Greek in their global patterns of intonation.3

We claim that cross-language and typological comparisons can be performed both at the phonological and at the broad phonetic levels. This is why we argue for the inclusion of two complementary levels of prosodic representation, namely broad phonetic and phonological, and for the use of cross-linguistically transparent and consistent labels at the phonetic/surface level. The idea of incorporating two levels of prosodic representation is not new and is found in other work within the AM framework and ToBI, where a phonetic tier label with temporary annotations is mentioned (e.g., Beckman et al., 2005; Jun & Fletcher, 2014). Jun and Fletcher (2014, p. 518) explicitly mention that the same labels “could also be used as ‘temporary’ labels as a guideline for deciding tonal categories and symbols when analyzing F0 contours in the AM framework before finalizing distinctive categories of the target language”. Moreover, the ToBI analysis of Korean (Jun, 2000, 2005) and French (Delais-Roussarie et al., 2015; Jun & Fougeron, 2000, 2002,) also incorporates two levels of tonal transcription where a number of distinct, but not contrastive, contours are annotated as surface representations of a single underlying tonal sequence. Following up on those proposals, we would like to argue for including both levels of representation. Given the fact that the AM framework and ToBI provide the tools needed for describing intonation across languages, we propose that the same set of ToBI symbols can be used for broad phonetic representation and phonological analysis. Researchers doing intonational analyses of a language could thus unambiguously use these two levels of transcription for two reasons: (a) as a temporary step towards establishing a phonological analysis; and more importantly, (b) to clarify the mapping between the phonological and the phonetic levels of representation.

The main goal of this paper is to provide a set of arguments for a two-level approach to prosodic annotation, namely broad phonetic and phonological, and to motivate the need for the development of an International Prosodic Alphabet (IPrA). We will thus distinguish phonetic from phonological representation and define “broad phonetic transcription” as a form of transcription that includes a certain amount of redundant, phonologically non-contrastive detail that is nevertheless a systematic aspect of the language. For instance, Sp. de día /de ˈdia/ ‘during the day’ would be represented as [de ˈðia] in broad phonetic transcription.4 A narrow phonetic transcription adds more phonetic detail, which may pertain to a single utterance, e.g., by using diacritics. A phonological (or phonemic) transcription, on the other hand, omits all predictable phonetic detail (see, e.g., Trask, 1996, under ‘phonemic transcription’). In our view, being explicit regarding the level of prosodic analysis has two main advantages, namely: (a) to clarify the relationship between the phonetic forms and phonological categories within a given language; and (b) to allow for a more transparent intonational comparison across languages, including crosslinguistic comparisons both at the phonological and the surface levels of representation.

The remainder of this paper is organized as follows. Section 2 points out current problems of “portability” in ToBI notation. Section 3 reports on some recent efforts to increase ToBI portability across Romance languages. Sections 4 and 5 offer some arguments for why broad phonetic transcriptions are useful in the analysis of intonational contours, and specifically why both broad phonetic and phonological representations are needed to successfully describe neutralization of prosodic contrasts, allophones in complementary distribution, as well as phonetic implementation rules. Other advantages of distinguishing two levels of prosodic analysis (e.g., broad phonetic level and abstract phonological analysis) are pointed out in Section 6. In Section 7 we discuss why we propose to adopt AM/ToBI notation for the broad phonetic transcription of intonation instead of adapting the symbols and diacritics that the IPA provides for the transcription of tonal contours.

2 Current problems of “portability” in ToBI notation

While the focus in the development of ToBI annotation systems for specific languages has been the characterization of the intonational phonology of the target language, less emphasis has been placed on what we may call the portability issue, i.e., the potential use of a generally accepted set of intonational labels that can be common across languages.

From the start of the development of the system, it was made clear that ToBI was not an International Phonetic Alphabet for prosody, and since intonation and prosodic organization differ from language to language (and even from dialect to dialect within a language), the ToBI system developed for a given language variety cannot be directly used for another language variety (see Jun, 2005, p. 2). In our view, one of the reasons for the present difficulties in cross-linguistic comparison is precisely the lack of a definition within the ToBI framework of a set of universally accepted transcription units that researchers can use. Thus, in Jun (2014) tonal patterns for each language are described by taking the ToBI inventory used for each language directly, which takes for granted a high degree of comparability across labels. However, since the ToBI labels represent abstract phonological units, they might not be comparable at the surface phonetic level. Sometimes, in fact, very different notations have been proposed for what looks like essentially the same pitch contour. A good example is the use of the L+H* vs the H*+L label for the same surface pitch contour, the much discussed on-ramp and off-ramp analyses (Gussenhoven, 2004; Ladd, 2008a, p. 99). Other times the same label is used for phonetically rather different contours, a fact to which we will return below. That is, arguably we find ourselves in the situation that Ladd (2008a, p. 129) refers to when he states that “[i]f we give identical phonological analyses to markedly different contours it makes cross-language and cross-dialect comparison […] at best difficult and at worst meaningless”.

Perhaps because the portability issue has not received sufficient attention, in the last two decades the development of different ToBI systems has revealed a significant lack of consensus in the definition of the basic set of intonation labels across languages (and across dialects of the same language).

Minimally, the same intonational label should refer to the same (or a similar) definition. Yet this is not always the case. Let us consider, to begin with, the case of boundary tones associated with the end of intonational phrases. Table 1 shows a comparison of the transcription labels used for similar phonetic realizations of a sustained level tone and a final rise across a handful of ToBI systems. Crucially, in all those systems the same three-level contrast is said to obtain among a low, a mid, and a high boundary tone. The labels !H% and H% are used in three or four of these ToBI systems for the transcription of the sustained level tone and the high rise, respectively, but the rest of the systems use other labels. From a practical point of view, what this means is that the transcription of the common vocative chant (which is realized as a rising movement associated with the accented syllable followed by a fall to a sustained level tone) can be transcribed very differently across languages (e.g., L+H* !H% in Portuguese, L+H* M% in Spanish, and L*+H !H!H% in Greek). These notational differences might depend either on label choice, like in the Spanish vs. Catalan notation for the vocative chant, or on language-specific phonological motivations, like the difference between English or Dutch; see Table 1.5

Language Final sustained pitch Final rise

English (Beckman et al., 2005) H-L% H-H%
German (Grice et al., 2005) !H-% ^H%
Dutch (Gussenhoven, 2005) Absence of boundary tone H%
Greek (Arvaniti & Baltazani, 2005, p. 95) !H!H% H-H%
Spanish (Prieto & Roseano, 2010) M% H%
Portuguese (Frota, 2014) !H% H%
Catalan (Prieto, 2014) !H% H%
Serbian/Croatian (Godjevac, 2005, p. 152) HL% H%

Table 1

Comparison of the transcription labels used for the sustained level phase-final pitch (first column) and the final rise (last column).

Indeed even though the same surface pitch contours can be analyzed as genuinely distinct phonological tunes (and indeed different researchers can have different theoretical motivations to posit different tonal labels at the phonological level for what looks like the same pitch contour at the phonetic level, as in the analyses presented in Table 1; see also Arvaniti, 2016), we claim that having access to a level of broad phonetic representation of tones where the labels are easily comparable can help improve on analytical accuracy. As stated before, comparative work is difficult (if not impossible) if labels have different interpretations in different languages or analyses.6

Let us now consider the ToBI representation of rising pitch accents across languages. As Jun (2014, p. 5) points out, rising tones can be distinguished by their alignment patterns. While L+H* has been typically used for an early rise (i.e., a rising contour in which an F0 peak occurs during the tonic or accented syllable), L*+H has most often been used (but not only) for a late rise (i.e., when the F0 valley occurs during the tonic syllable and the F0 peak is realized in the posttonic). However, within these two broad categories, we can find further differences in alignment. It is in this context that we find language-specific options in the use of the above labels, in such a way that the same ToBI label corresponds to different phonetic realizations in different languages/varieties (or in analyses of the same language by different authors). For example, the L*+H accent in English has been described as “an accent contour that is low for a good portion of the accented syllable and then rises sharply, often into the following unstressed syllable if there is one” (Ladd, 2008, p. 95, Figure 3.4).7 Similarly, Veilleux et al. (2006) describe L*+H as “bitonal high tone with low tone on accented syllable” and illustrate this accent with contours with the shape of the lefthand example in Figure 1 (see also, e.g., Godjevac, 2005, for Serbian/Croatian). Yet, in the first version of Spanish ToBI (Beckman et al., 2002, p. 33), L*+H was defined as a “late rising accent, with peak after the stressed syllable and valley toward the beginning (prenuclear accent in Mexican Spanish and some Peninsular varieties, focal accent in the Catalan-speaking region of Spain) or toward the middle of the stressed syllable (prenuclear accent in at least some Caribbean varieties)”, and other work on Spanish intonation has used this label for accents with a valley at the very beginning of the stressed syllable (e.g., Face, 2002). Figure 1 illustrates these two different definitions of the L*+H pitch accent.

Figure 1 

Schematic representations of L*+H in different ToBI proposals.

Similarly, L+H* has been used to label pitch accents with different alignment properties. In the latest version of Spanish ToBI, Prieto and Roseano (2010, p. 19) describe L+H* as a “rising pitch movement during the accented syllable with the F0 peak located at the end of this syllable”. By contrast, for Greek ToBI, Arvaniti and Baltazani (2005, p. 113), describe L+H* as a rising pitch accent in which “the H is preceded by a noticeable dip and aligns roughly in the middle of the accented vowel”.8 Precisely, this is the same definition of an H*+L pitch accent in the most recent version of the Italian ToBI proposal (see Gili-Fivela et al., 2015). Figure 2 shows the different use of the L+H* label across Spanish and Greek.

Figure 2 

Schematic representations of L+H* in different ToBI proposals.

Even though, to our knowledge, no language has been reported to have a four-way contrast in peak alignment, both Catalan and Spanish do show a three-way distinction in alignment at the broad phonetic level (see Prieto, 2014; Prieto et al., 2005). It would thus be advisable to incorporate a standard way to represent the rising pitch accents presented in Figures 1 and 2 at the broad phonetic level of transcription.

The set of examples examined in this section show that nowadays comparing ToBI notations (and their corresponding surface intonation patterns) across different languages requires a good deal of detective work. In our view, the fact that phonological contrasts can be genuinely different across languages does not preclude the possibility that all languages can share the same analytical labels at the level of the broad phonetic transcription. In the next section we examine two recent initiatives that have sought a consensus among researchers for the notation of related languages and dialects. The eventual adoption of a truly international system of labels will require, of course, a much wider consensus. Jun and Fletcher (2014) presented an independently developed proposal in the same spirit, which we will review in more detail in Section 7.

3 Recent efforts to increase portability across languages

In this section, we briefly review two successful initiatives in the development of ToBI prosodic standards for languages and language families with a great amount of dialectal variation. These two main initiatives will be assessed, as well as Jun and Fletcher’s (2014) recommendations for prosodic transcription of languages that do not have a fully developed ToBI system.

First, we will consider the volume edited by Prieto and Roseano (2010), which develops a common Sp_ToBI proposal suitable for nine geographical varieties of Spanish. The geographical varieties analyzed in this volume, each of which was studied by a different researcher or team of researchers, are the following: Argentinian (Gabriel et al.), Canarian (Cabrera and Vizcaíno), Castilian (Estebas and Prieto), Cantabrian (López Bobo, & Cuevas), Chilean (Ortiz et al.), Dominican Cibaeño (Willis), Ecuadorian Andean (O’Rourke), Mexican (de-la-Mota et al.), Puerto Rican (Armstrong) and Venezuelan Andean Spanish (Astruc et al.). Data for all varieties were obtained from the same Discourse Completion Task (semi-spontaneous speech), which was adapted to each dialect. The revised Sp_ToBI proposal was based on previous Sp_ToBI proposals (Beckman et al., 2002; Estebas-Vilaplana & Prieto, 2008; Face & Prieto, 2007), as well as on a systematic comparison of the intonational systems of the nine Spanish dialects included in the study. The Sp_ToBI system set of symbols was found to be sufficient to successfully transcribe the intonation contours documented in all nine Spanish varieties.9 Crucially this required the use of broad phonetic transcriptions, where some distinctions that were not reported to be contrastive in the specific varieties were nevertheless transcribed. For instance the fact that, as Beckman et al. (2002) note, in Caribbean Spanish rising prenuclear accents typically have a later valley than in other dialects was directly reflected in the choice of labels. Care was taken to use the same labels across language varieties for contours that are phonetically realized in the same way. This was an important step in the standardization of prosodic analysis across Spanish dialects.

The second initiative we want to mention here is represented by the volume edited by Frota and Prieto (2015), which includes prosodic analyses of nine Romance languages and their dialects. The overarching goal of the volume was to offer a description of the prosodic systems for each language in such a way that descriptions and analyses could be easily compared across languages. The languages examined are Catalan, French, Friulian, Italian, Occitan, Portuguese, Romanian, Sardinian, and Spanish. Some of these languages have considerable dialectal variation regarding intonational contours, e.g., Spanish (Hualde & Prieto, 2015), Italian (Gili-Fivela et al., 2015), Portuguese (Frota et al., 2015), and Catalan (Prieto et al., 2015). Two important conclusions were reached, namely (a) the ToBI system for each language was able to adequately transcribe the intonation contours reported for a whole set of pragmatic meanings; (b) even though work is still needed on the complete portability of the symbols used, the ToBI definitions for intonational and phrasing units for each language did not differ substantially across languages. Moreover, the project emphasized the common use of ToBI labels to represent intonation contours found across Romance languages. For instance, uniform labels were adopted by all authors for the contours schematically represented in Figure 3, all involving a rising movement and analyzed as bitonal accents.10

Figure 3 

Schematic representations and proposed transcriptions of bitonal pitch accents involving a rising component.

The transcription of the calling contour can be used to illustrate the agreement reached among the contributors to the Frota and Prieto volume in the transcription of more complex pitch movements. As shown in Table 2, in most of the languages described in the book, the vocative chant is characterized by a rising pitch accent followed by a sustained mid boundary tone, which is uniformly transcribed as L+H* !H%.

Languages Calling contour

Catalan (Prieto et al., 2015) L+H* !H%
Occitan (Sichel-Bazin et al., 2015) L+H* !H%
Portuguese (Frota et al., 2015) L+H* !H%
Romanian (Jitca et al., 2015) L+H* !H%
Sardinian (Vanrell et al., 2015) L+H* !H%
Spanish (Hualde & Prieto, 2015) L+H* !H%

Table 2

Comparison of the transcription labels used for the calling contour across the nine Romance languages included in the volume edited by Frota and Prieto (2015).

Three languages included in the volume (Friulian, French, and Italian) appear to have a slightly different intonation pattern for the vocative chant. In the case of Friulian and Italian, the chosen transcription, L+H* H!H%, indicates the presence of a bitonal boundary tone in which the F0 remains high in the posttonic syllable and is followed by a fall to a mid level in the last posttonic syllable (Gili-Fivela et al., 2015; Roseano et al., 2015). In French, the vocative chant is characterized by a rise to a high pitch level on the preaccentual syllable, followed by sustained pitch associated with the nuclear accented syllable (and postaccentual syllable, if there is one). This contour was transcribed as H+!H* !H% (Delais-Roussarie et al., 2015). The differences in notation thus reflect actual differences at the broad phonetic level. Whether at a deeper, phonological level the vocative contour should be represented in the same manner in these three languages as in the other languages remains an open question.

Frota (2016) describes ways to reconcile system-internal considerations in phonological analyses of intonation with the need to carry out cross-language comparisons by discussing data from different Romance languages (Portuguese, Catalan, Italian, and Spanish). She shows examples where surface differences in pitch scaling which look phonetically similar have to be analyzed as different phonological patterns depending on the language. This situation is parallel to what happens at the segmental level: what phonetically could be transcribed as [ɛ] can correspond to underlying /e/ in Spanish or to /ɛ/ in Catalan. One of her conclusions is that in order to improve on analytic accuracy and cross-language comparability prosody researchers “should make the options and goals explicit (which are primarily to identify the distinctive intonation categories of the target languages) and use the same labels within the same framework in identical ways (that is to express intonation categories)”.

In sum, the two initiatives reported in this section show that it should be possible to work towards establishing a common (and more transparent) set of units for crosslinguistic prosodic analysis. We argue that this can be done by adopting two levels of prosodic representation, broad phonetic and phonological. In our view, it will be of great benefit to adopt a standard use of the different labels, at least at the level of broad phonetic analysis (see Section 4). As mentioned before, several initiatives have proposed systems of low-level prosodic transcription that can be considered the “first pass” before further phonological analysis. Jun and Fletcher (2014) have proposed a set of tonal categories and diacritics that can be used in describing and analyzing intonation contours. These authors point out that the labels that they define “could also be used as ‘temporary’ labels as a guideline for describing tonal categories and symbols when analyzing F0 contours in the AM framework before finalizing distinctive categories of the target language” (p. 518). Roseano and Fernández Planas (2013) proposed a system of phonetic transcription of intonation (based on data from Romance languages analyzed within the AMPER system) that can be easily related to ToBI phonological transcription systems. They offer an automatic transcription system that can extract the phonetic prosodic features of an utterance based on its acoustic features (for some of the correspondences between underlying ToBI pitch accents and their phonetic realizations, see pp. 301ff).

Stemming from the Spanish and Romance ToBI experience, as well as from Jun and Fletcher’s proposal, we suggest the adoption of a complete International Prosodic Alphabet (IPrA), which could contain the complete set of intonational and phrasing contrastive units that have been documented across languages, using schematic contours and standard ToBI labels. As in the case of the IPA, for each language system a subset of main intonational standard labels could be chosen from this common inventory to be used for broad phonetic transcriptions. Having access to these general definitions would be a practical tool with at least two advantages: (a) portability of symbols across languages would be increased, facilitating comparative work, and (b) it would facilitate the initial labeling of databases in new and understudied varieties until the phonological contrasts have been more firmly established.

4 Why broad phonetic transcription is useful (and why ToBI annotations are often done at this level)

Crucially for the coherence of this proposal, and in line with some previous proposals within the AM literature, in this article we argue for two levels of prosodic representation, broad phonetic and phonological. This move offers considerable advantages for intonational transcription and allows for the maintenance of a universally accepted set of labels. In the segmental analysis of speech, it has proven very useful to be able to transcribe at two different levels of description. Broad phonetic transcriptions are very commonly used. Despite the fact that the IPA offers over 160 symbols for transcribing speech, only a small subset of these is used to transcribe any one language. Generally, transcriptions only include easily heard characteristics and ignore an important part of the phonetic detail. For example, a broad phonetic transcription [ˈmiɾə] may be appropriate to represent both English meter (in a non-rhotic variety with flapping of coronal stops) and Catalan mira ‘s/he looks’, even though narrower transcriptions of productions in the two languages would reveal many differences of detail: the flap [ɾ] may not have same range of realizations in English and Catalan and the two vowels are also rather different in the two languages in their spectral and durational characteristics. In spite of these substantial differences in phonetic detail, there are insights to be gained by comparing transcriptions at the broad phonetic level. Native speakers of one language learning another language may also establish certain equivalences at this level.

Phonological (phonemic) transcriptions may also reveal commonalities among languages at a deeper level, but they cannot replace broad phonetic transcriptions, both because they may hide surface-true generalizations and because of the considerable degree of subjectivity that phonemic analysis sometimes entails (see Hualde, 2004). To continue with the example given in the preceding paragraph, the consonant that we have represented as [ɾ] in [ˈmiɾə] corresponds to phonemic /ɾ/ in Catalan mira but presumably to phonemic /t/ in English meter. The vowel that we have represented as [ə] may receive quite different phonemizations in English and Catalan, and choice of phonemization for this vowel in each of the two languages may also depend on the analyst. That is, arguably the level of representation that offers the most insight for comparative work is often neither that provided by a narrow phonetic transcription (where differences of detail will always be present) nor a phonological transcription (where we can expect disagreements regarding the best phonemization for the language).

A more abstract, phonemic, analysis (e.g., a representation of English meter [ˈmiɾə] as, say, /ˈmitər/) is also necessary to understand the sound structure of a language. To give another example, in a description of the sound structure of Spanish, it is important to indicate that [ˈehta] and [ˈesta] are not two different words, but two ways of pronouncing the same word esta /ˈesta/ ‘this, fem.’ for many speakers of the language. There are thus good reasons for using two levels of analysis for the segmental structure of utterances.

The same arguments apply at the lexical suprasegmental level: word-level stress, for instance, was completely predictable in Latin (from syllable weight considerations) and thus should be absent from phonological representations in this language. In Spanish, on the other hand, it has become contrastive. Yet, it is very useful to notice that, e.g., the Spanish words /ˈanima/ ‘soul’ and /aˈmiga/ ‘girlfriend’ have stress on the same syllable as the corresponding Latin words /anima/ and /ami:ka/. Phonological transcriptions, per se, do not immediately reveal this fact. To see this we need broad phonetic transcriptions.

Consider also the different levels of prosodic analysis that we may have for the Lekeitio Basque sentences lagunen alabia da ‘she is the daughter of the friend’ and lagúnen alabia da ‘she is the daughter of the friends’, produced as neutral declarative utterances. At the phonological level, the only contrastive prosodic information is that in one of the words of the second sentence, lagúnen ‘of the friends’ is lexically accented, whereas all other words are lexically unaccented. The representation in (1a) is useful for many generalizations about the language. In (1b) we add the predictable information (within the phonological system) that, by default, the final syllable of the phrase carries an accent if all words are lexically unaccented. For other purposes, a schematic representation of prototypical contours, as in (1c), is also useful. The ToBI-style autosegmental-metrical representation in (1d) (see Elordieta, 1998; Elordieta & Hualde, 2014; Jun & Elordieta, 1997) is a transcription of such idealized contours and amounts to a broad phonetic transcription. It is important to notice that there is very little that is phonologically contrastive in such a transcription: the initial rise, represented as %LH-, is a noncontrastive element of discourse-initial phrases. The idealized shape of all pitch accents is also invariably H*+L.

    1. (1)
    1. Lekeitio Basque: neutral declarative utterances

The surface phonetic level is an additional essential level of analysis, as much is also to be learned from the quantitative study of the phonetic realizations of pitch patterns across languages.

Notice that the ToBI-style transcriptions that have been proposed for this language are largely redundant in the amount of phonetic information that they contain and, therefore, broad phonetic, rather than phonological. Much of this also applies to Tokyo Japanese ToBI (Venditti, 2005). Transcriptions at this broad-phonetic level of detail, containing a certain amount of phonologically redundant, non-contrastive information, are, nevertheless, extremely useful for many purposes, including cross-linguistic comparison. For instance, labeling all accents as H*+L is redundant for Tokyo Japanese, since there is no other option for pitch accent shapes in this Japanese variety, but this label is useful in order to compare with other Japanese varieties with a lexical choice of pitch accents or with other languages.

Using the same rationale, we propose to make use of two levels of transcription for non-lexical suprasegmental information in languages without lexical pitch features, besides the quantitative study of surface phonetic detail. As in the case of segmental and lexical-suprasegmental transcriptions, both the phonological and the broad phonetic levels of transcription would use the same set of symbols (i.e., the IPrA alphabet is based on the well-accepted ToBI phonemic labels, in our proposal).

Just like, arguably, standard aspects of the ToBI transcription of pitch-accent languages like Lekeitio Basque and Tokyo Japanese include redundant detail such as the fact that all pitch accents have the shape H*+L and that there is a phrase initial rise %LH-, the standard ToBI notation for languages like Spanish also specifies features that may be phonologically meaningless. This may include the shape of prenuclear accents, for which perhaps there is no meaningful contrast. At the phonological level, the only relevant matter may be whether or not there is a prenuclear accent in a given position.

When we compare Spanish varieties, on the other hand, there appear to be different dialectal preferences regarding the shape or the alignment of prenuclear tonal prominences. In most Spanish varieties, prenuclear prominences are typically realized as rises over the stressed syllable, with the tonal peak after the end of this syllable. There are, however, geographical varieties, including Andean Spanish (O’Rourke, 2005), Buenos Aires Spanish (Colantoni & Gurlekian, 2004), and Spanish in contact with Basque (Elordieta & Calleja, 2005), that display a preference for contours with the peak within the stressed syllable. In addition, it has been noted that in Caribbean varieties, the rise tends to start very late within the stressed syllable, most of this syllable having low pitch (Armstrong, 2010; Beckman et al., 2002; Willis, 2010). None of this is phonological, but it is extremely useful to have a broad prosodic level of transcription that reflects these differences in phonetic implementation of prenuclear accents across Spanish varieties.

One of the main advantages of making a distinction between the phonological and the broad phonetic levels of prosodic representation is the clarification of the status of each level of transcription. Even though it is generally assumed that ToBI systems must be based on solid knowledge regarding the phonological contrasts of the language, it is also common practice to propose a ToBI system of a language without having a complete picture of the phonological contrasts existing in that language. Arvaniti (2016) presents a corpus that shows a good amount of intonational variability and highlights “the importance of distinguishing between phonetic realization and phonological representation during analysis”. Precisely this important point is one of the arguments in favor of the IPrA proposal, since it allows for a straightforward clarification of the levels of analysis that exist in current ToBI practices and helps to clarify whether ToBI analyses are performed at the phonological (contrastive) level or at the broad phonetic level.

Another advantage of distinguishing two levels of transcription is to be able to systematically treat certain cases that have been difficult for ToBI phonological annotation, as they may involve phonological-broad phonetic mismatches due to contextual neutralization, allophony, and implementation rules, as we discuss in the next section. We note that proposals incorporating two levels of tonal transcription have been made for Korean (Jun, 2000, 2005) and French (Jun & Fougeron, 2000, 2002), where a number of distinct, but not contrastive, contours are analyzed as surface representations of a single underlying tonal sequence.

5 Why broad phonetic and phonological representations need to be distinguished: Neutralization of contrast, allophones in complementary distribution, phonetic implementation rules

Contrasts between phonemes are often neutralized in specific contexts. For instance, in Catalan /s/ and /z/ contrast between vowels inside words, e.g., ca[z]a ‘house’ vs ca[s]a ‘hunt’, but not word-finally, where voicing depends on the following context. Let us consider the example le[z] amigue[s] ‘the friends (female)’. There are two important facts to note in this example. One has to do with broad phonetics: the first sibilant (before a vowel in the next word) is typically realized as voiced and the second one (before a pause) is realized as voiceless. The other fact is phonological: there is no possible phonological contrast in either case; word-finally there is no contrast in voice. The correct phonological transcription (i.e., should the word-final consonant be represented as /s/, as /z/, or as phonologically unspecified /S/) would depend on the theoretical persuasion of the analyst. The existence of neutralizations of phonemic contrasts in specific positions is a good argument for using two levels of representation, phonemic and broad phonetic (again, in addition to the surface phonetic level).

The same situations arise in the analysis of intonation. For instance, a pitch accent contrast that has been the source of a good amount of inter-transcriber disagreement across several ToBI systems is the contrast between H* and L+H* (see Pitrelli et al., 1994, and Syrdal et al., 2001, for Mainstream American English ToBI, and Escudero et al., 2012, for Catalan ToBI). For American English ToBI, L+H* has been proposed to differ from H* primarily in that it shows a more substantial rising pitch movement (Beckman et al., 2005). Yet, the issue has not been settled yet. Work by Ladd (2008a) and Ladd and Morton (1997) showed that, at least for British English, the contrast between H* and L+H* might reflect a gradient difference in prominence associated with differences in pitch range, rather than a strictly binary distinction. Steedman (2014) takes this contrast to be abstract, and not necessarily cued by differences in pitch contours. Transcriptional practices in the end have in practice based the decision between H* and L+H* on the availability of the leading tone L at the phonetic level. As Ladd (2008a, p. 96) points out, “the difference between L+H* and H* is therefore often fairly clear if there is a preceding syllable to display the level of the leading L”. One possibility is that H* and L+H* are indeed contrastive units in English, but that the contrast is neutralized in certain contexts, including the context where there is no preceding syllable where the L tone could be realized. Contextual neutralization is a pervasive phenomenon in segmental phonology, and, arguably, its incidence is even greater in the intonational component. The proper understanding of neutralization phenomena is helped by the recognition of two levels of analysis in addition to surface phonetics.

The usefulness of having two levels of notation is also apparent in cases where we have distinct allophones in complementary distribution, like [ph] and [p] in English, for instance. Regarding prosody, Grice et al. (2005, p. 72) propose that the German ToBI annotation of the calling contour is (L+)H* !H%, depending on whether the L tone is available at the phonetic level or not. This could be the converse of the English situation. In English /H*/ and /L+H*/ may be phonologically contrastive units that are neutralized in initial position (with some dialects not having the contrast at all; e.g., Arvaniti & Garding, 2007). In German, instead there may be a single phonological sequence /(L+)H* !H%/ with two distinct surface realizations.

In the Catalan vocative chant, there also appears to be a single phonological schema, /L+H* !H%/, with two distinct “allo-tunes” in complementary distribution. When the vocative contour is realized over the name Paula, with stress on the initial syllable and an initial plosive, the pitch accent L+H* typically surfaces as H* (see Figure 4, right panel). By contrast, when it is realized over Marina, the L leading tone can be realized in the pretonic syllable (see Figure 4, left panel). Adopting two levels of transcription allows for the transcription of the actual phonetic realization of the utterance, with a reference to the underlying phonological representation.

Figure 4 

Phonological and broad phonetic representations of the vocative chant in Catalan, realized over a three-syllable paroxytonic word (left) and over a two-syllable paroxytonic word (right). This audio content is available at: http://dx.doi.org/10.5334/labphon.11.wav4a and http://dx.doi.org/10.5334/labphon.11.wav4b.

Another source of confusion for labelers (which has been pointed out in some inter-rater reliability analyses, e.g., Cat_ToBI; Escudero et al., 2012) is the transcription of other truncated pitch contours. It is quite common crosslinguistically for a rising-falling intonation sequence such as /L+H* L%/ to be truncated if lexical stress is on the final syllable of the word (see Grice et al., 2005, for southern Italian varieties; Prieto & Ortega-Llebaria, 2009, for Catalan and Peninsular Spanish; Armstrong, 2015, for Puerto Rican Spanish; Gabriel et al., 2010, for Argentinean Spanish; Cabrera Abreu & Vizcaino Ortega, 2010, for Canarian). One of the proposals within ToBI labeling for transcribing those sequences has been to specifically flag truncation by placing the non-realized phonological target in parenthesis (see Grice et al., 2005, p. 384). Figure 5 illustrates our analysis with Catalan examples (proper names). While the utterance-final low boundary tone can attain the tonal baseline of the speaker (e.g., 125 Hz) in a trochaic word like Maria ‘Mary’ (see Figure 5, left panel), it can get truncated in the case of an iambic word like Damià ‘Damian’ (e.g., it is realized at 150 Hz; see Figure 5, right panel).

Figure 5 

Phonological and broad phonetic representations of the contrastive focus contour in Catalan, realized over a paroxytonic word (left) and over an oxytonic word (right). This audio content is available at: http://dx.doi.org/10.5334/labphon.11.wav5a and http://dx.doi.org/10.5334/labphon.11.wav5b.

The use of upstep and downstep features, as well as alignment diacritics, has also been a source of confusion in inter-rater consistency tests, a fact that we also attribute to discrepancy between levels of notation. First of all, it appears that some languages have two phonologically distinct height levels for H targets (e.g., see Catalan or Spanish for the contrast between L+H* and L+¡H*; Prieto, 2014, and Hualde & Prieto, 2015, respectively) and also contrasts of alignment (e.g., the contrast between L+H* vs. L+<H* in Catalan and Spanish). If a language has a paradigmatic contrast between L+H* and L+¡H*, a ToBI phonological representation could include these phonological contrasts in the accent labels. This is the option adopted by Frota and Prieto (2015) in the Romance project. However, this may not be an entirely satisfactory solution, since a consequence is that non-phonological (e.g., phrasal-level) upstep and downstep cannot be incorporated in the ToBI transcription of these languages. Again, the ability to separate between phonological form and broad phonetic transcription would clarify the situation. While at the broad phonetic level of representation downstep features would encode both contrastive and non-contrastive downstep, at the phonological level only the phonologically contrastive downstep would be noted.

Finally, let us discuss another complicated case involving the labeling of H scaling patterns. In Catalan, it has been proposed that there is a phonological distinction between the following two nuclear configurations: L+H* LH% (used in echo questions) vs. L+H* L!H% (in statements of the obvious; see Prieto, 2014; Vanrell, 2011). Yet, in real speech many instances of emphatic obvious assertions can be realized with a very high boundary tone at the end (even higher than in echo questions). That is, optionally the phonological contrast may be neutralized on the surface. Again the question is, how shall we label these examples? From a phonological point of view, we may appeal to the meaning of the contour in a specific instance and label it accordingly. But this would not give us an adequate understanding of the intonational system of the language. If we just have one level of analysis and label these examples phonologically, we are missing an important piece of information about the phonetic realization of these intonation contours, which includes their optional neutralization. Arguably again the most complete prosodic analysis should include two levels of representation: one, phonological, where meaningless (and predictable) variation in alignment and range is ignored and another, broad phonetic, where these differences are represented.

6 Why a more abstract phonological representation is useful for comparative purposes

Let us consider now some facts brought up in recent experimental work on comparative Romance intonation. In the Romance languages, unlike Germanic, there is very little phonological usage of nuclear accent position (Ladd, 2008a; Vallduví, 1990). Almost always, the last content word in the phrase bears the nuclear accent. Retraction of the nuclear accent is, however, possible, if somewhat marked. This retraction may indicate either contrastive focus on a non-phrase-final word (e.g., MARINA lo ha traído, no Juan ‘MARINA brought it, not Juan’) or may be used for other pragmatic effects, as in Bolinger’s (1954) example El TELÉFONO suena ‘the confounded phone has to go on and ring’. A correlate of a non-phrase-final nuclear accent is lack of peak displacement. Thus, in Peninsular Spanish in the broad focus utterance Marina lo ha traído ‘Marina brought it’, the word Marina will show a rising contour with a peak on the last syllable of the word. In the contrastive focus utterance MARINA lo ha traído ‘MARINA brought it’, on the other hand, the accentual peak will be realized within the stressed syllable. Although shifting the nuclear accent has effects on the prosodic contour of the entire utterance, not only on the word bearing the accent, and has both tonal and nontonal cues (see Breen et al., 2010, for English), Vanrell et al. (2013) show that the location of the tonal peak by itself is a strong cue of contrastive, nuclear accent on a non-phrase-final word.

Vanrell et al. (2013) performed the same experiment in Italian. The phenomenon is essentially the same in Spanish and Italian: the peak is retracted when a non-phrase-final word bears contrastive nuclear accent. Interestingly, however, a difference between the Spanish and Italian varieties being compared is that in Italian prenuclear accents are realized without displacement of the peak to the posttonic (like in, e.g., Andean Spanish), so that retraction results in an even earlier peak. The notations that Vanrell et al. (2013) propose are L+>H*11 (regular, prenuclear) vs. L+H* (contrastive focus, retracted) for Spanish/Catalan and L+H* (regular, prenuclear) vs. H*+L (contrastive focus, retracted) for Italian.

At some level, however, for typological purposes, we may want to capture the fact that we have the same phenomenon in both languages, i.e., accent retraction to indicate nuclear accent on a nonfinal word. There can be several competing phonological analyses for this phenomenon. In one analysis, focus would introduce a phrasal boundary after the focalized word, which causes an accent H* to be retracted as a consequence of the insertion of an L tone immediately after it, whether in Spanish or in Italian. The specific surface difference between Italian and Peninsular Spanish would be a matter of the phonetic preference for the implementation of prenuclear accents in different varieties. These differences are, however, also important for comparative work. Specifically, in the case at hand, the striking fact is that a contour that is typical of prenuclear accents in one language (Italian) is interpreted as nuclear accent on a non-phrase-final word in the other language (Spanish). Underlyingly, the phenomenon is arguably the same in both languages, but the surface results are radically different. Both facts are significant. For comparative purposes it should be obvious that broad phonetic labels must be uniform for all languages being compared.

This is parallel to cases like, for instance, the /p/-/b/ contrast in Spanish vs. English, where phrase-initially [p] (voiceless unaspirated stop) counts as /p/ in Spanish but as /b/ in English. At one level, we want to express the fact that both Eng. pin and Sp. pino ‘pine’ start with the same phoneme /p/. At another level, it is also important to note that the phonetic realization of these two phonemes is quite different, so that Sp. /p/ in pino is actually more similar to Eng. /b/ in bin. Importantly, just as at the segmental and lexical-prosodic level, progress in intonational typology will benefit from adopting two levels of transcription and performing accurate comparisons at both levels of representation.

7 Towards an International Prosodic Alphabet

Even though it is not the main goal of the paper to propose a set of labels that can form part of an eventual, community-adopted IPrA12 (but rather to state the advantages of incorporating two levels of prosodic transcription for prosodic research), in this section we explain how the set of labels for the broad phonetic transcription proposed could be defined and exemplify how such a proposal would work with an example involving bitonal pitch accents.

In spite of the inconsistencies among authors in the use of ToBI intonational labels noted above, the fact is that much agreement already exists among phonologists on how to interpret labels when they are intended at the broad phonetic level. The labels and definitions proposed in Jun & Fletcher (2014, pp. 517–518) seem rather uncontroversial and could be taken as a point of departure. In fact, the IPA already provides equivalent notations in terms of accent marks and tone bars for some of them. Some examples are given in Table 3.

Jun & Fletcher (2015) IPA diacritics IPA tone bars

F0peak H á ˦a
Extra H F0 peak ^H ˥a
F0 rise LH ǎ ˩˥a
F 0 fall HL â ˥˩a

Table 3

Notation correspondences between Jun and Fletcher’s (2015) proposal (second column), IPA tone diacritics (third column), and IPA tone bars (last column).

Sufficient consensus seems to already exist regarding the unmarked interpretation of tonal labels, be they ToBI-style autosegmental labels or be they IPA tone marks.

It may be useful at this point to explain that, in our opinion, the exisiting IPA symbols and diacritics for tonal transcription are not fully adequate for intonational research and, instead, autosegmental labels are more appropriate. First, the only IPA symbols for intonation, “global rise” and “global fall” (in addition to lexical tone and word accent), are clearly not enough to capture all relevant facts about intonation contours. On the other hand, the IPA symbols that were initially devised for the analysis of lexical tone could in principle be adapted for the transcription of intonational contours. In languages without lexical tones, only accented syllables (and for some contours the immediately preceding and/or following syllable) and syllables before a boundary would need a tonal label. To give an example, the American English ToBI transcription in (2a)13 would correspond to (2b) = (2c) = (2d) in a more “surfacy” analysis of the contour (syllables with primary stress are underlined):

    1. (2)
    1. Marianna made the marmalade||

In principle, additional diacritics could also be added to the IPA inventory to signal features that have been argued to be important for the adequate description of certain intonational systems, such as the delay or retraction of accentual peaks, although indicating the tone of a syllable preceding or following the accented one may also suffice.

However, there are several important reasons for adopting the AM notation over existing IPA conventions for the transcription of tone. First, one of the main arguments for using autosegmental notation instead of the tonal symbols and diacritics provided by the IPA is the more elegant and conspicuous way in which autosegmental phonology captures the relation between underlying/phonological and broad phonetic levels of description. Autosegmental notation was in fact introduced for the analysis of lexical tone in order to better account for the mapping between the broad phonetic level and the postulated phonological level, including phenomena such as contour formation from underlying sequences of tone, tone spreading, surfacing of tone on different syllables from their lexical sponsor, floating tones, etc. (see, e.g., Goldsmith, 1990). Bruce (1977) demonstrated the usefulness of the autosegmental approach in our understanding of the intonational contours of Swedish, by providing a uniform underlying representation for the two contrastive lexical pitch-accents, in spite of surface variation as lexical and postlexical tones interact.

Applied to intonational patterns, the use of the AM/ToBI labels for both the broad phonetic and phonological levels of representation can capture the sameness of intonational contours across segmental contexts at the phonological level (including cases of truncation and compression) and their corresponding surface realizations at the phonetic level of representation.

Second, the AM model has been recently applied to dozens of typologically diverse languages. An important argument to use the AM/ToBI symbols is that the knowledge that has been recently acquired about prosodic contrasts crosslinguistically can be easily incorporated in the design of the IPrA proposal.

A clear advantage of the AM symbols is in indicating differences of alignment between segments and tonal events. Let us exemplify how the IPrA proposal could work for bitonal LH and HL pitch accents which contrast in alignment properties. Our proposal is that the IPrA set of units (pitch accents, boundary tones, non-F0 features) should be based on the set of contrastive tonal units reported in the typological literature, and should be open to new additions as more data become available. That is, new units can be incorporated if they can be unambiguously shown to be used with a distinctive value in a given language. It is well-known that many languages with lexical stress and pitch accenting have reported phonological differences in pitch alignment between the following pairs of pitch accents: L+H* vs. L*+H (e.g., English, Greek, Catalan, Georgian), between H*+L and H+L* (Swedish, Portuguese, Jamaican Creole), as well as contrasts between L+H* vs. L+<H* in some languages (Catalan, Spanish).14 The definitions of those bitonal pitch accents would have to involve a clear definition of the alignment properties of the target tonal events. For greater clarity, definitions can be complemented with line diagrams, like in some recent work on the intonation of specific languages by several authors. In Figure 6 we schematically represent a set of potential IPrA units representing alignment contrasts in bitonal rising LH and falling HL pitch accents.15

Figure 6 

Schematic representations and proposed IPrA units representing alignment contrasts in bitonal LH and HL pitch accents.

IPrA prosodic labels will need to be refined and adapted to our knowledge of reported typological contrasts across languages. Yet we should bear in mind that in some cases IPrA label definitions should be broad enough to encompass differences in phonetic realization, just as in segmental IPA transcription [d] means ‘voiced dental, alveolar, or postalveolar plosive’. Once the IPrA analytical units are accepted and published, these symbols can easily be used in the initial phase of prosodic analysis of a new language (or a new language variety) with lexical stress and pitch accentuation, before setting up their phonological inventories. Let us imagine we would like to analyze an understudied dialectal variety of Spanish. While we know that some varieties have a contrast between L*+H, L+H*, and L+<H* (e.g., Hualde & Prieto, 2015), we do not know yet whether this new Spanish variety has this contrast. Broad phonetic transcriptions of the data using these labels will be useful for two reasons. First, to provide more information on dialectal differences in the alignment patterns of bitonal pitch accents which are not phonological (see the dialectal variation in Spanish prenuclear pitch accents reported in Section 4). Second, it will represent an important temporary step that researchers can take with the goal of understanding the phonological intonation system of that language variety.

8 Conclusions

One of the original goals driving the initial creation of ToBI notation from the AM framework was to facilitate the labeling of large oral databases for typologically-diverse languages. Labeling large speech corpora, though, requires that the corpus of interest be consistently annotated with a standard label set (e.g., Wightman, 2002). Even though ToBI systems have been proposed for some dozens of languages, we are still far from having a universally accepted standard for ToBI prosodic annotation. An important goal of this paper was to motivate the use of two levels of prosodic representation and the development of a set of discrete tonal labels and diacritics (e.g., the IPrA set) that are transparent and consistent at the categorical phonetic level. It is our view that reaching an agreement on prosodic labels would substantially facilitate typological work and thus increase our understanding of the nature and structure of intonation in the languages of the world (e.g., Jun & Fletcher, 2014). These labels could be used in a systematic way (a) as temporary labels before establishing a phonological analysis of tones, (b) as a way to represent allophonic realizations of an underlying tonal category, and (c) as a way to represent hybrid or exceptional tonal categories that are not part of the intonational model of any specific language. That is, the IPrA tool can be useful for L2 prosodic studies and studies of prosodic contact, as right now it is difficult to transcribe L2 speech when working with two established ToBI systems.

Arvaniti (2016) argues against the need to use an extra level of broad phonetic representation, essentially arguing that “it is not possible for any type of (phonetic) transcription to capture the full gamut of possible variability, while at the same time using an intermediate systematic phonetic level can stop researchers from capturing essential generalizations” (pp. 25–26). First, it is precisely because of the need to clarify the level of analysis at which we are performing intonational analyses that it is crucial to have access to two levels of representation. Indeed, as in the case of segmental analysis, it is not the goal of an IPrA transcription (as it is not the goal of an IPA transcription) to capture the “full gamut of possible variability” which exists in the data, and which can be easily analyzed through access to the acoustic waveforms. The goal of such a transcription is to have access to a general (and intermediate) level of representation that can be easily interpreted by intonation analysts. This level of representation can serve as the basis (for example in the transcription of a new language) to capture the relevant phonological generalizations for that language. Thus, we believe that, even though IPA-based segmental transcriptions are not perfect and do not cover the full amount of phonetic variability, the IPA has served for over a hundred years to facilitate an intermediate (and practical) level of transcription from which both phoneticians and phonologists can refer to detailed acoustic or articulatory analyses and to abstract phonological analyses of the data, respectively. It is our firm belief that this intermediate level of representation of intonational patterns will help clarify the two levels of analysis and thus encourage language researchers to propose more abstract generalizations of the prosodic data (e.g., see Sections 5 and 6). Importantly, progress in intonational typology will benefit from adopting two levels of transcription and performing accurate crosslinguistic comparisons at both levels of representation.

In this article, we have argued that greater progress can be achieved by adopting two modifications of current transcriptional practice. In essence, we have argued for two levels of prosodic representation, broad phonetic and phonological, as well as for cross-linguistically transparent and consistent labels. Making these small changes in the way ToBI is applied to different languages can crucially clarify our understanding about how surface tonal patterns and phonological categories are related in each language. The clarification of the level of prosodic analysis will, on the one hand, allow for more abstract phonological analyses of intonation. If transparent labellings are encoded at the broad phonetic level, important generalizations and non-predictable information can be encoded at the phonological level. Conversely, adopting a level of broad phonetic transcription will also help clarify systematic analyses of fine phonetic features (F0 Max, F0 Min, duration, etc.) across languages. Oftentimes, this work does not incorporate a phonological analysis of the pitch contours and uses general IPA terms such as “global rise” or “global fall”. Having access to an IPrA tool for broad phonetic transcription of prosody to researchers interested in fine phonetic detail will help contextualize the results of detailed phonetic analyses across languages. In a nutshell, we regard this proposal as an opportunity to integrate purely phonetic and phonological work on intonation.

In sum, the thrust of this proposal is in keeping with the tenets of the Autosegmental Metrical analysis of intonation. First, we proposed to make use of a set of intonation units that are universally accepted, such that each language chooses a subset of those. Second, we claimed that progress in intonational typology could benefit from adopting two levels of transcription. Having only one level of transcription may not be sufficient for comparative work in intonation, as typological work also requires attention to matters of broad phonetic detail. The field can also benefit from comparisons at a more abstract level. Making an analogy with segmental (and lexical suprasegmental) transcriptions, we propose that both the phonological and categorical phonetic levels of representation may use the same set of symbols, which can be taken from a complete set of universal phonetic repertoire (i.e., the IPrA labels).

Competing Interests

The authors declare that they have no competing interests.

Supplementary Files

For accompanying TextGrid, Pitch, and wav files, go to http://dx.doi.org/10.5334/labphon.11.smo.