1 Introduction

In the autosegmental-metrical (AM) model of intonational phonology, intonation refers to the structured variation in phonetic features, primarily pitch, to express phrase-level meanings. At the heart of this approach to intonation, as Ladd (2008) remarks, “is the idea that intonation has a phonological organization” (Ladd, 2008, p. 3). Similar to other speech sound domains, such as segmental phonology or lexical tone, analyzing intonation involves mapping continuously variable physical parameters to categories. The categories of intonation—pitch accents and boundary tones—are organized in a set of relations and rule-governed distributions that define the intonation system of a language. This system has a specific place in grammar, which is prosodic phrase structure (Gussenhoven, 2004, 2007; Ladd, 2008, a. o.). The present article is concerned with the ways structural, i.e., phonological, pitch patterns (or categories) and surface, i.e., phonetic, pitch patterns are related, and in particular with mismatches between surface pitch similarities or dissimilarities and structural similarities or dissimilarities.

The AM approach has proved fruitful to account for intonation in a variety of typologically diverse languages (e.g., Grice, 1995; Gussenhoven, 2012; Hayes & Lahiri, 1991; Hualde, 2002; Jun, 2005a, 2014a; Pierrehumbert & Beckman, 1988) and to compare languages and language varieties on the basis of shared assumptions and methods (Frota & Prieto, 2015a; Prieto & Roseano, 2010). Within this approach, the transcription of intonation in a given language is usually taken to reflect “the understanding of the intonational and prosodic grammar of the language”, grounded “on rigorous analyses of the intonational phonology” (Beckman et al., 2005, p. 12). At the same time, it has long been recognized that determining the phonological structure of intonation is by no means an easy task. For segmental phonology, for example, the phonetic substance of the signal provides concrete acoustic landmarks that are cues to segmental features like [vowel], [consonant], [high], [nasal], which combined yield a discrete phonological representation of the signal (e.g., Stevens, 2002). These cue patterns capture systematic variation in the implementation of segmental categories, say /p/ versus /b/, that has perceptual consequences (Eimas & Corbit, 1973; Phillips et al., 2000). In the case of intonation, the properties of the speech signal that cue phonological contrasts (such as the one between a low tone and a high tone, two intonation categories which are cued by pitch height) are also involved in expressing other kinds of variation, like paralinguistic messages related to emotional state or even individual speaker differences. Thus, it may be difficult to determine whether a phonetic difference or similarity is a reflex of a phonological difference or similarity in intonation. For this, it is crucial to examine how the sound-meaning relations are organized, namely whether they have a gradient/phonetic or a contrastive/phonological nature (Gussenhoven, 2004; Ladd, 2008; Pierrehumbert, 1980). It is also critical to look at the distributions of pitch patterns, given that sound contrasts may be constrained by context (e.g., Steriade, 2007).

In accordance with the view that finding out the distributions and the contrasts are the driving forces of an analysis of any sound system, the mapping from physical realization (pitch tracks) to intonation categories (a given set of pitch accents and boundary tones) seeks to establish the surface pitch patterns, that is, the sequence of tonal targets (highs and lows in the AM approach), and to relate them to structural pitch patterns by determining what the nature of these targets are and what they mean. The ultimate goal is to identify the contrastive units of intonation and how they signal meaning (see Arvaniti, 2016, for a similar view). Importantly, both within and across languages, surface pitch similarities may be apparent and accidental. For example, in Friulian, information-seeking yes-no questions ending in a word with final stress may show a rising contour similar to that found in the non-final elements of a disjunction. However, the similarity is due to the truncation of the low boundary in the yes-no question contour and is thus an effect of constraints on tonal target realization (Roseano et al., 2015). Gussenhoven (2007, pp. 3–5) offers a clear example of a surface pitch similarity across languages which is accidental: in Japanese, an HL sequence appears on accented words; in English, a similar HL sequence may occur in an accented word. However, in Japanese the HL pitch accent is part of the lexical specification of the word and thus affects its meaning, whereas in English the pitch accent is independent of the word and contributes to the expression of sentence-level meaning. The same reasoning naturally applies to surface pitch differences which need not correspond to contrastive distinctions, as in the case of pitch register variation as a cue to degree of friendliness (Rietveld & Chen, 2006), or the truncated and non-truncated realizations of the yes-no question contour in Southern varieties of Italian (Grice, D’Imperio, et al., 2005). The complex and non-trivial ways that structural pitch patterns and surface pitch patterns may be related underscore the key role played by distribution, context, contrast, and meaning in the analysis and transcription of intonation. This is a central issue that the analyses discussed in this paper, relating to mismatches between (dis)similarities in the surface pitch forms and in the phonological categories, will illustrate.

Another critical question concerns the role of category-based transcription in cross-language comparison of intonation. As in other speech sound domains of language, intonational analysis is driven by system-internal considerations. The potential for contrast of a given unit needs to be ascertained in relation to the other units that pertain to the same language-specific system (Gili Fivela, 2008; Grice, 1995; Gussenhoven, 2012; Hayes & Lahiri, 1991, a.o.). However, cross-language comparison of intonation is an inescapable goal, if one wants to understand how languages and language varieties may differ in their intonation systems, with consequences for prosodic typology. The importance of understanding language- or variety-specific systems while conducting cross-linguistic studies is illustrated, for example, in Grice et al.’s (2005) discussion of nuclear contours in Italian varieties. Grice et al. compare the Bari and Neapolitan narrow focus contours, which look phonetically similar, but constrain positing a similar analysis to the keeping of the Bari system-internal distinction between the narrow focus and the question pitch accents. Similar observations are made in Gili Fivela (2008), again on different varieties of Italian. The need to balance system-internal considerations and the cross-linguistic study of intonation is also explicitly argued for in Ladd (2008). Comparing the transcription of calling contours and rising statement contours in varieties of English, Ladd shows that it is possible to treat identical phonological analyses “across dialects and languages as representations of ‘the same’ tune” (Ladd, 2008, p. 129). Crucially, as in the segmental domain, cues to a given phonological prosodic category can vary across languages, as well as within languages (e.g., as in the case of rising-falling contours in languages or language varieties that truncate the final low tone, and languages or varieties where truncation does not occur; Roseano et al., 2015, for Friulian; Grice et al., 2005, for varieties of Italian; Frota et al., 2015, for varieties of Portuguese). The study of comparative intonation thus requires us to address the relation between surface pitch patterns and structural pitch patterns in order to establish what counts as the same contour phonologically and thus what should be given the same transcription, within each language-specific system but with an eye on the systems of other languages or varieties. This task is especially relevant for typological purposes, since, as argued in Hyman (2012), “the central goal of phonological typology is to determine how different languages systematize the phonetic substance available to all languages” (Hyman, 2012, p. 371).

This paper tackles the challenges of transcribing intonation within and across languages by discussing surface and structure in intonational analysis on the basis of data from varieties of Portuguese, and other Romance languages such as Catalan, Italian, and Spanish. The paper is organized as follows. Section 2 addresses surface similarities to inquire whether they reflect structural similarities. Two case studies are discussed: the rise-fall pitch pattern and the narrow focus contour, and the calling contour. Section 3 focuses on surface differences to ask whether they reflect structural differences. Again, two case studies are described: tonal alignment, and peak height in nuclear falls. Finally, Section 4 highlights the need to go beyond what looks the same and what looks different on the surface, whether the goal is the analysis and transcription of a given language/variety or cross-language comparison. Section 5 concludes the paper.

2 Surface similarities and structure

Surface pitch similarities may reflect true structural similarities (that is, they may signal the same intonation categories), or they may only reflect apparent similarities, arising from accidentally similar realizations of different categories (as in the Friulian rising contour examples, and the Japanese and English examples mentioned in Section 1). From the point of view that a transcription is an analysis of the intonation system, the first case should be given identical phonological analyses, whereas in the second case different analyses are required as a consequence of differing prosodic properties that emerge through the analysis of the system. In the literature, a number of contours that arguably look the same, especially across languages and language varieties, have been under debate as to how they are analyzed and transcribed. One example of such a debate concerns the rise-fall pitch pattern found in the narrow focus contours in some languages/varieties; another concerns the rise followed by a step down from the high level to a sustained pitch which is found in calling contours in several languages (e.g., Gussenhoven, 2008; Ladd, 2008). These two cases are discussed here on the basis of data from varieties of Portuguese and other Romance languages, to argue that pitch similarities are only apparent in the narrow focus contours but reflect true structural similarities in the calling contours.

2.1 Rise-fall pitch pattern and the narrow focus contour

The first case I address is the rise-fall pitch pattern that has been recurrently described for narrow focus statements in the European varieties of Portuguese (hereafter EP; Cruz, 2013; Fernandes, 2007; Frota, 2000, 2002, 2014; Frota et al., 2015), in most Catalan varieties (Prieto, 2014; Prieto et al., 2009; Prieto et al., 2015), in Spanish (Hualde & Prieto, 2015; Vanrell et al., 2013), and in most Italian varieties (Gili Fivela et al., 2015; Grice et al., 2005). The contour is illustrated in Figure 1, for EP and Catalan. Given the surface pitch pattern similarities and the fact that a phonetically similar contour is conveying similar pragmatic meanings across languages/varieties, one might expect that identical phonological analyses and transcriptions were proposed (along the lines of suggestions, for example, in Ladd, 2008). However, that was not the case as depicted in Figure 1. System-internal analyses have led to an H*+L L% nuclear contour in some language/varieties, namely in EP and some Italian varieties (as in Bari, Lecce, and Pisa), and an L+H* L% nuclear contour in other languages/varieties, namely Catalan, Spanish, and some Italian varieties (e.g., Florence, Milan, Turin, Naples). This amounts to saying that the fall was the accentual element of the pitch pattern that was systematized (or phonologized) in some languages, and the rise the element that was systematized in others.

Figure 1 

Left panel: F0 contours of the EP utterance Casaram (‘They got married’), produced first as a neutral declarative and then as a contrastive focus. Right panel: F0 contour of the Catalan contrastive focus utterance Melmelada volen (‘Jam is what they want’). This audio content is available at: http://dx.doi.org/10.5334/labphon.10.wav1a and http://dx.doi.org/10.5334/labphon.10.wav1b

The two analyses are reminiscent of the much debated on-ramp and off-ramp analyses of rise-fall pitch patterns in Germanic languages (Baumann et al., 2007; Chen, 2011; Gussenhoven, 2008, 2016; Hanssen et al., 2008; Ritter & Grice, 2015). Importantly, in Germanic as in Romance languages, the two analyses make different predictions with respect to which part of the contour is the most relevant communicatively and perceptually, and which part of the contour is more carefully and precisely produced in cases of enhancement or in different segmental contexts that may impact on tonal realization.

I will consider production first. A closer examination of the phonetics of the contour shows that there is evidence from production that the languages/varieties do differ with respect to what the most adequate intonational analysis is. The peak alignment details are not the same, with the peak being aligned towards the end of the accented syllable in Catalan and Spanish (Prieto, 2014; Vanrell et al., 2013), and closer to the middle of the accented syllable in EP and some varieties of Italian (Frota, 2002; Gili Fivela et al., 2015; Vanrell et al., 2013). Thus, the extent to which the pitch falls in the accented syllable is considerably larger in the latter case. In addition, the presence of an L target before the peak is a clear and consistent feature of the contour in Catalan or Spanish, but no clear rise is systematically found in EP (and in the accentual fall Italian varieties, as described in Gili Fivela et al., 2015). In these languages/varieties the rise is contextual and gradient, and the peak may be realized as the end of a plateau. These differences are illustrated in Figure 2 for EP and Catalan: respectively, a larger fall in the accented syllable preceded by no rise, and a peak aligned towards the end of the accented syllable preceded by a clear rising contour. These facts constitute corroborating production evidence for the accentual fall analysis in one case and the accentual rise analysis in the other.

Figure 2 

Left panel: F0 contour of the EP utterance O pintor cantou uma manhã angelical (‘The artist sang an angelic morning’), with a narrow focus on manhã (an angelic morning, not night). Right panel: F0 contour of the Catalan narrow focus statement Volen melmelada (‘They want jam (not butter)’). This audio content is available at: http://dx.doi.org/10.5334/labphon.10.wav2a and http://dx.doi.org/10.5334/labphon.10.wav2b

I will now consider perception and in particular the different predictions made by the two phonological analyses with respect to which part of the contour is the most relevant perceptually. To examine how the Catalan/Spanish-type of contour and the EP-type of contour were perceived by both native and non-native listeners, a perception task elicited judgments of contour shape as rising or falling from 10 native EP listeners and 10 native Catalan and Spanish listeners (5 Spanish and 5 Catalan-Spanish bilinguals). The stimulus set contained four Catalan utterances (two broad focus with the nuclear contour L* L% and two narrow focus with L+H* L – Prieto, 2014) and four EP utterances (two broad focus with the nuclear contour H+L* L% and two narrow focus with H*+L L% – Frota, 2014), each repeated three times in random order. The narrow focus utterances included pitch patterns such as those in Figure 1 and Figure 2, namely one instance of each kind of phonetic pattern for both Catalan and EP. Half of the participants from each language group listened to the Catalan stimuli first, and half to the EP stimuli. Listeners were told that they would hear utterances from Catalan or from Portuguese, and that the meaning of the words or phrases was not relevant, only the way they sounded was relevant. They were specifically instructed to pay special attention to intonation while listening to each stimulus and respond whether the most salient part (that is, the part that sounded the most important perceptually) was rising or falling. No other explanation of what salience meant was given. Importantly, each utterance contained only one nuclear accent, which was assumed to be perceived as the most prominent part of the utterance. The experiment was a forced-choice task, where repeated listening of utterances was not allowed and participants were asked to respond with their first impression. A total of 240 judgments were obtained. The results are shown in Figure 3.

Figure 3 

Percentage of rising responses to the two types of accents by EP and Cat/Sp listeners, across the two languages. Error bars indicate the standard error of the mean.

A repeated measures ANOVA with two within-subject factors, Language (Catalan vs. EP) and Accent Type (broad focus vs. narrow focus), and one between-subject factor, Language Group (EP vs. Catalan/Spanish listeners) revealed a significant effect of Language, with more rising responses overall for the Catalan stimuli (F(1,58) = 22.48, p < .001, η2 = .28; mean value of 0.50 for Catalan and 0.35 for EP), a significant effect of Accent Type, with narrow focus accents yielding more rising responses overall than broad focus accents (F(1,58) = 153.19, p < .001, η2 = .73; mean of 0.67 for narrow focus and 0.18 for broad focus), and a significant effect of Language Group, with Cat/Sp participants showing more rising responses than EP participants (F(1,58) = 20.32, p < .001, η2 = .26; mean of 0.52 for Cat/Sp and 0.33 for EP subjects). A significant interaction between Language and Language Group was found, with EP participants showing different behaviour across the EP and Catalan stimuli whereas Cat/Sp participants have similar amounts of rising responses for both sets of stimuli (F(1,58) = 6.94, p < .05). A significant interaction was also found between Language, Accent Type, and Language Group (F(1,58) = 12.86, p < .01), with the narrow focus accent of Catalan being perceived as rising by both groups of listeners, whereas the narrow focus accent of EP was perceived as falling by EP listeners and rising by Cat/Sp listeners.1

The perception results provide additional evidence suggesting a different intonational analysis of the two contours, given the main effect of language and the significant interactions. First, native perception of the narrow focus contour was different: falling for EP (bar 7 in Figure 3), rising for Cat/Sp (bar 4). This is in line with the accentual fall analysis in EP and the accentual rise analysis in Catalan and Spanish. Second, EP participants were highly sensitive to the difference between the EP and Catalan narrow focus pitch patterns, which were respectively judged as falling and rising (respectively, bars 7 and 3). Although Cat/Sp participants produced overall more rising judgments for all accent types than EP participants, and judged the EP narrow focus contour as rising more than 60% of the time, their responses to the EP focus contour were still different from their responses to the Catalan focus contour (shown in bars 8 and 4; mean = 0.87, z = –2.64, p < .01). By contrast, Cat/Sp participants judged the broad focus contours from the two languages (bars 2 and 6) similarly as falling (z = –.81, p = .42). Given that participants were specifically driven to attend to a particular aspect of the sound shape of utterances, namely rising/falling intonation, and that the utterances both within and across languages included a variety of segments and words, it is unlikely that their responses could have been influenced by factors such as segmental differences or word meanings. Moreover, during debriefing, participants often mentioned the rising/falling contour shapes as being more or less evident, but never mentioned other aspects like word meaning, particular segments, or phrase meanings and utterance function. In short, the current results suggest that the predictions made by the two phonological analyses with respect to which part of the contour is the most salient perceptually are borne out.

The present findings, which are in line with the different phonological analyses of the rise-fall pitch pattern in EP and in Catalan (and Spanish), also relate nicely to earlier findings on the perception of the rise-fall pattern in different varieties of Italian (Gili Fivela, 2013), although the picture seems more complex in Italian. Speakers from Florence interpreted the pitch patterns differently from speakers from Pisa and Lecce, consistent with the different phonological analyses proposed for these Italian varieties.

In this section, I discussed the surface similarities of the rise-fall pitch pattern that has been recurrently found in narrow focus statements across several Romance languages and varieties. Based on evidence from production and perception, both within and across languages, I argued for a distinction between the EP and the Catalan narrow focus contours supporting the appropriateness of the H*+L and the L+H* analysis, a distinction akin to that found between EP and Spanish, or between certain varieties of Italian. In this case, surface pitch similarities are only apparent, and what may look the same contour phonetically effectively corresponds to accidentally similar realizations of different categories. In other words, languages can differ in which part of the rise-fall pitch contour is phonologically relevant.

2.2 The calling contour

The second case of a pitch contour that arguably looks the same, but that has been under debate as to how it is analyzed and transcribed, is the calling contour known as vocative chant. Calling contours have been shown to have strong similarities in many languages, in particular in the case of the chanted version characterized by a rise followed by a step down from the high level to a sustained pitch (Ladd, 2008). In most varieties of Portuguese, and also in Catalan and Spanish (Frota, 2014; Frota et al., 2015; Hualde & Prieto, 2015; Prieto, 2014; Prieto et al., 2015), a similar pitch pattern is found with a rising movement into a peak on the accented syllable and a following step down into a sustained final pitch that spreads in the post-tonic stretch. The vocative chant contour is illustrated in Figure 4 for Brazilian Portuguese (the variety from Bahia, in the North of Brazil) and Catalan. Given the strong surface similarities in the pitch pattern and the fact that a phonetically similar contour is conveying the meaning of calling across languages/varieties, it might be expected that identical phonological analyses and transcriptions were proposed. However, analyses have varied across Romance languages, especially in what concerns the nature of the final boundary tone (which has also been the main point of disagreement in the transcriptions of calling contours in non-Romance languages; Ladd, 2008).

Figure 4 

Left panel: F0 contour of the Brazilian Portuguese vocative chant Marina! (‘Marina!’); Right panel: F0 contour of the Catalan vocative Maria! (‘Maria!’). This audio content is available at: http://dx.doi.org/10.5334/labphon.10.wav4a and http://dx.doi.org/10.5334/labphon.10.wav4b

As observed in Ladd (2008), the final pitch in the vocative chant contour has been offered many different analyses, namely HL% (MAE_ToBI; Beckman et al., 2005), absence of boundary tone (ToDI; Gussenhoven, 2005), or !H% (GToBI; Grice, Baumann, & Benzmuller, 2005). For the Romance languages under consideration, the alternative analyses that have been proposed are !H% and M% (e.g., Frota, 2014, for Portuguese; Prieto, 2014, for Catalan; Prieto & Roseano, 2010, for Spanish). Importantly, the phonetics of the sustained boundary tone is strikingly similar in these languages: the somewhat lower pitch after the accentual peak stays level until the boundary, and the spreading of the sustained pitch goes together with the lengthening of the syllable(s) in the post-tonic stretch. Furthermore, the sustained final pitch contour was found to contrast in these languages with another type of calling contour characterized by a final pitch fall (although there are pragmatic differences in how the languages may use the two calling contours; Frota & Prieto, 2015b). In the case of Catalan, recent production and perception studies have shown that the vocative chant contour not only contrasts with a final fall calling contour but also with a final rise calling contour (Borràs-Comes et al., 2015). Given these facts, and in the absence of system-internal restrictions that could argue in favor of one analysis over the other, it seems that the differences between analyses are driven by theoretical options on how to represent final sustained pitch, and not by phonological differences in the vocative chant contours.

Unlike in the narrow focus contour, the vocative chant illustrates a case in which surface pitch similarities reflect true structural similarities (that is, they signal the same intonation category). From the point of view that a transcription is an analysis of the intonation system, in the case of the vocative chant, and contrary to the narrow focus contour discussed in the previous section, a similar analysis and transcription is thus called for. In fact, such an agreement in the phonological analysis has been reached among the authors of the Catalan, Portuguese, and Spanish chapters of Frota & Prieto (2015a), within the common goal of providing a phonological notation of intonation.

3 Surface differences and structure

Surface pitch differences may reflect true structural differences (that is, they may signal different intonation categories), or they may only reflect apparent dissimilarities that result from distributional and contextual effects (as in the Italian truncation example mentioned in Section 1), or from sound-meaning relations with a gradient, non-contrastive nature (like in the case of pitch register variation as a cue to degree of friendliness, also mentioned in Section 1). Again, from the point of view that a transcription is an analysis of the intonation system, which ultimately aims to identify the contrastive intonation categories of a given language and establish how they signal meaning, only the first type of phonetic differences (i.e., those that signal different intonation categories) needs to be reflected in transcription. Both the alignment and scaling of tonal targets are phonetic dimensions that can be systematized in different ways across and within languages. In some instances, alignment and scaling differences may express phonological distinctions; in other cases, they may arise from contextual or gradient effects (e.g., Ladd, 2008, chap. 5). Many studies have addressed the nature of pitch timing and pitch scaling differences in several languages (Borrás-Comes et al., 2014; Chen, 2003; D’Imperio & House, 1997; Gussenhoven, 1999; Ladd & Morton, 1997; Makarova, 2007; Niebuhr & Kohler, 2004; Pierrehumbert & Steele, 1989; Savino & Grice, 2011, among others). In this section, surface differences in tone alignment and peak height in nuclear falls are discussed on the basis of data from European Portuguese and Majorcan Catalan. First, it will be shown that alignment in European Portuguese displays both contextual and contrastive effects, which need to be disentangled given the phonetic nature of the former and the phonological nature of the latter. Second, surface differences in peak height will be discussed. These phonetic differences will be shown to be apparent dissimilarities with a gradient nature in EP. By contrast, in Majorcan Catalan surface pitch differences reflect true structural differences, as peak height has a contrastive nature signaling different intonation categories.

3.1 Tone alignment in Portuguese nuclear falls

Besides the nuclear fall pattern discussed in Section 2.1, European Portuguese has another type of nuclear fall. The different surface alignment patterns in the nuclear falls of EP are the focus of the present section, as an illustration of pitch contours that look different but only some of the phonetic differences signal different intonation categories.

Detailed analysis of the nuclear falls in production studies has shown the following two patterns (Frota, 2000, 2002): a fall where the low target aligns with the stressed syllable and is immediately preceded by a peak; and a fall where the high target aligns with the stressed syllable and is immediately followed by a low target. Crucially, the two contours differ with respect to the location of the peak and the fall relative to the nuclear syllable. Pragmatically, they are used to convey different meanings: a broad focus reading or a topic reading in the former case, and a narrow/contrastive focus reading in the latter case. The two contours have been respectively analyzed as H+L* and H*+L. In Figure 1 (left panel) above, the intonational difference is illustrated in a one-word utterance. In Figure 5, the two nuclear falls are shown in a multiword utterance with final nucleus.

Figure 5 

F0 contour of the utterance As angolanas ofereceram especiarias aos jornalistas (‘The Angolan girls offered spices to the journalists’) produced as a neutral statement (left panel) and with narrow focus on jornalistas (right panel). This audio content is available at: http://dx.doi.org/10.5334/labphon.10.wav5a and http://dx.doi.org/10.5334/labphon.10.wav5b

Given the phonetic difference found in production data and the way it systematically correlates with meaning, the two pitch falls constitute a good candidate for a categorical distinction in alignment patterns that is part of the intonation system of the language. Perception experiments examined how the sound-meaning relations are organized in EP nuclear falls, namely whether they have a gradient/phonetic or a contrastive/phonological nature (Frota, 2012). Results from semantically motivated perception tasks (context-matching identification, semantic scaling, and context-matching discrimination tasks) provided evidence for a discontinuity in the perception of the phonetic alignment continuum between an early peak fall category (H+L*) and a late peak fall category (H*+L), showing that the difference in EP nuclear falls is primarily an alignment contrast phonologically encoded in the intonation system. In other words, findings from production and perception converge in pointing to a surface difference in alignment patterns that reflects a true structural difference.

However, further detailed instrumental study of peak alignment in nuclear falls revealed other alignment surface differences that do not affect the meaningful alignment contrast just described (Frota, 2000, 2002). These differences are depicted in Figure 6. The peak of the early peak fall (H+L*) shows later alignment (i.e., it is realized after the onset of the nuclear syllable) when the nuclear fall signals an initial topic phrase. In this case, the nuclear word is immediately preceded by an intonational phrase (IP) boundary. However, if the nuclear word is final in the IP, as in neutral statements or final multiword topic phrases, a pattern of early peak alignment is obtained instead (with the peak realized before the nuclear syllable onset). Similarly, variation in peak alignment is also found in the late peak fall (H*+L). When the narrow focused word is final in the IP and preceded by prenuclear elements (as in Figure 5, right panel), the peak aligns closer to the nuclear syllable onset. By contrast, the peak shows later alignment into the nuclear syllable if the narrow focused word initiates the IP (as in Figure 1, left panel). All these peak alignment differences are clearly context-dependent and predictable given the distribution of the nuclear falls as initial or final nuclei: late initial peak placement is triggered by a preceding prosodic edge, and early final peak placement is triggered by a following prosodic edge (see Cangemi & Grice, 2016, for discussion of a similar type of contextually determined variation and a similar understanding of how it may be dealt with in intonation transcription).

Figure 6 

The timing intervals from H to nuclear syllable onset (HtoS0) and L to nuclear vowel offset (LtoV1) for H+L* (in initial and final nuclear words) and H*+L (in initial and final nuclear words). Adapted from Frota (2002).

The study of alignment in EP nuclear falls demonstrates that some of the timing variation in intonation, but not all aspects of it, may be phonologized, resulting in discrete contrasts that are meaning-related. It is the task of phonological analysis to determine which surface timing differences are reflexes of true structural differences, with welcome implications for the understanding of the categories of intonation and their transcription.

3.2 Peak height in nuclear falls

The second illustration of surface pitch differences that may reflect true structural differences (and thus signal different intonation categories), or that may only reflect apparent dissimilarities arising from contextual or gradient effects, concerns peak height in nuclear falls. Like alignment, variation in the scaling of tonal targets yields phonetic differences that may signal either contrastive distinctions or gradient effects. A common gradient effect found across languages is the expansion of pitch range and higher peak scaling correlated with a gradual increase in emphasis (Gussenhoven, 2004; Liberman & Pierrehumbert, 1984). Although common, this effect has not been found in several languages. For example, peak height was found to be lower in narrow or contrastive focus accents than in broad focus accents in Dutch and Italian (Hanssen et al., 2008; Vanrell et al., 2013). In this section, I describe surface peak height differences in EP and Majorcan Catalan nuclear falls to discuss whether these phonetic differences have a contrastive nature or constitute apparent dissimilarities arising from different realizations of the same intonation category.

3.2.1 Peak height in European Portuguese nuclear falls

Peak height differences in EP nuclear falls have been reported (Frota, 2000, 2002): the peak in the narrow focus accent (H*+L) tends to be scaled higher than the peak in the broad focus accent (H+L*).

An example of higher scaling of the peak is shown in Figure 7 (left panel). A comparison between Figure 7 (left panel) and Figure 5 (left panel), which illustrates the broad focus counterpart of the same utterance, would suggest that a higher peak could be a distinctive feature of the narrow focus accent. However, the narrow focus rendition of the same utterance in Figure 5 (right panel), reproduced here as the right panel in Figure 7, does not exhibit the higher or upstepped peak feature. In fact, instrumental analysis has shown that peak scaling is not a robust cue to distinguish between the broad and narrow focus accents, both within and across speakers (Frota, 2000). A similar result, with notable inter-speaker variation, has been recently reported for Catalan and Spanish (Vanrell et al., 2013). Additional evidence for the non-phonological nature of peak height variation in EP nuclear falls comes from perception. Frota et al. (2014) manipulated peak height in the declarative nuclear accent (H+L*) to match the higher peak values found in the falling nuclear accent (H+L*) in yes-no questions. They concluded that the higher peaks had no impact on the perception of declarative utterances as statements. In short, both production and perception data show that surface peak height differences in EP nuclear falls constitute apparent dissimilarities arising from different realizations of the same intonation category.

Figure 7 

F0 contour of the statement As angolanas ofereceram especiarias aos jornalistas (‘The Angolan girls offered spices to the journalists’) with narrow focus on jornalistas, showing a higher (upstepped) peak (left panel). The right panel reproduces the right panel from Figure 5 above, for ease of comparison. This audio content is available at: http://dx.doi.org/10.5334/labphon.10.wav7a and http://dx.doi.org/10.5334/labphon.10.wav5b

3.2.2 Peak height in Majorcan Catalan nuclear falls

Unlike in EP, in Majorcan Catalan phonetic differences in peak height in the H+L* nuclear fall have been reported to be systematically related to the distinction between information-seeking and confirmation-seeking yes-no questions (Prieto et al., 2015; Vanrell, 2011; Vanrell et al., 2012). While both types of questions are characterized by a falling nuclear pitch accent H+L*, the information-seeking question has a higher (upstepped) H tone (Figure 8, left panel) and the confirmation-seeking question a lower (non-upstepped) H tone (Figure 8, right panel). In addition to production evidence for an intonational contrast in peak height, evidence from perception shows that Majorcan Catalan listeners distinguish information- and confirmation-seeking questions on the basis of the difference in pitch scaling of the peak (Vanrell et al., 2012). These results indicate that the peak height difference in Majorcan Catalan nuclear falls is a scaling contrast phonologically encoded in the intonation system by means of two distinct pitch accents (¡H+L* and H+L*). In other words, both production and perception data support the view that peak height differences in Majorcan Catalan reflect true structural differences.

Figure 8 

F0 contour of the information- (left panel) and confirmation-seeking (right panel) productions of the yes-no question Teniu mandarines? (‘Do you have tangerines?’), in Majorcan Catalan. This audio content is available at: http://dx.doi.org/10.5334/labphon.10.wav8a and http://dx.doi.org/10.5334/labphon.10.wav8b

In this section, it was shown that surface differences in pitch scaling, which may look phonetically alike across languages, may signal different intonation categories in some languages, and reflect realizations of the same intonation category in other languages. This was illustrated by nuclear falls in Majorcan Catalan and in EP. In EP, phonetic variation in the scaling of the peak in H+L* L% (broad focus statement) or in H*+L L% (narrow focus statement) is speaker dependent and non-distinctive. In Majorcan Catalan, phonetic variation in peak height in H+L* L% conveys a discrete intonational contrast between information-seeking and confirmation-seeking questions. Therefore, only in the latter case do surface peak height differences correspond to true structural differences. Consequently, from the point of view that a transcription is an analysis of the intonation system, only the latter differences need to be reflected in intonation transcription by distinct pitch accent labels.

4 Discussion

In this paper, the challenges of transcribing intonation within and across languages were addressed by discussing surface and structure in intonational analysis. I have underscored the need to go beyond what looks the same and what looks different on the surface, whether the goal is the analysis and transcription of a given language/variety or cross-language comparison.

The cases examined in Sections 2 and 3 have shown that mismatches between surface (i.e., phonetic) pitch similarities or dissimilarities and structural (i.e., phonological) similarities or dissimilarities are not only frequent, but critical for our understanding of the categories of intonation and their transcription (see Table 1).

Surface Structure
Different Similar

Similar Narrow focus (§2.1) Calling contour (§2.2)
Different Alignment EP (§3.1)
Scaling MC (§3.2.2) Scaling EP (§3.2.1)

Table 1

Birdseye view of the evidence discussed in the current paper.

For example, the rise-fall pitch pattern that has been found in narrow focus statements in several Romance languages, despite the phonetic pitch similarities and the similar pragmatic meanings, was shown to reflect different intonation categories across languages (Section 2.1). Crucially, languages were found to differ with respect to which part of the pitch contour is phonologically relevant: the rise in Catalan and Spanish (L+H* L%), and the fall in European Portuguese (H*+L L%). Likewise, surface pitch differences may signal different intonation categories or reflect different realizations of the same intonation category (possibly arising from contextual effects, among other factors). For example, phonetic differences in peak height in nuclear falls within and across languages, despite their potential for distinctiveness, were shown to be contrastive in some languages and phonetic variants (e.g., intra- and inter-speaker variation, gradient effect) in others. The pitch accent contrast between ¡H+L* and H+L*, which is a phonological distinction in peak height, is a language-specific property of Majorcan Catalan (Section 3.2.2). Unlike in Majorcan Catalan (MC), in European Portuguese (EP) the pitch accent H+L* (or H*+L for that matter) may vary in peak height with no phonological consequences (Section 3.2.1). Therefore, we need to question what looks the same and what looks different to establish what counts as the ‘same’ contour, and thus should be assigned the same label, within and across languages or varieties. In Sections 2 and 3, I have illustrated how the definitions of ‘same’ and ‘different’ contours emerge through the analysis of intonation systems, where the distribution and context of surface tonal patterns, as well as contrast and meaning, play a decisive role. Laboratory studies and the use of experimental techniques, both in production and perception, are instrumental in identifying tonal categories from phonetic pitch patterns. This applies both to the analysis of a given target language/variety and to comparisons of analyses across languages/varieties. In this respect, phonological labeling based on a system-internal analysis and phonological labeling taking into account cross-language/variety comparison work in similar ways.

The view, which I have explicitly argued for here, that the relation between surface and structure is at the heart of any analysis and transcription of intonation has implications for cross-language or dialect work. Along the lines of Ladd (2008), it was shown that it is possible to treat identical phonological analyses as representations of the same contour (as in the case of the calling contour L+H* !H% in Catalan, Spanish, and Portuguese; Section 2.2). This is a desirable outcome for cross-language/variety comparison. In the same vein, it was shown that different phonological analyses are treated as representations of different contours (as in L+H* L% versus H*+L L%). By making our options and goals explicit—to identify the distinctive intonation categories of the target language(s)—and by using the same labels within the same framework in identical ways, that is to express intonation categories, we are taking a step towards analytic accuracy and cross-language comparability.

Although the task of finding categories of intonation is facilitated if the target language is otherwise well-studied, it is also true that the analysis of an under-studied language may follow similar fundamental goals and approach (as shown in Arvaniti, 2016). As Jun and Fletcher (2014) observed, “one of the primary goals should be to discover what the significant categories are for the variety or language in question” (p. 506). Current knowledge of the intonation systems of various languages, as well as overview studies of prosodic typology (Gussenhoven, 2015; Hyman, 2012; Jun, 2005b, 2014b; Ladd, 2001) are still limited given the few descriptions of intonation systems available. Nevertheless, they offer valuable guidance by informing us of the types of questions we must ask and the range of possibilities of what we may expect to find when analyzing intonation systems. Among the dimensions of prosodic variation known to be relevant to intonation are, for example, the properties of word prosody (presence/absence of stress and lexical pitch), the properties of prosodic structure (the kinds of prosodic domains and prominence relations, important to establish the relevant heads and/or edges), the types of distinctive pitch events (pitch accents and edge tones, whether alignment is distinctive, whether scaling is paradigmatically contrastive), the prosodic domain for the distribution of pitch events (which determines a more dense or sparse distribution), or the constraints on tune realization (tonal string adjustments, like truncation and compression, and segmental string adjustments, like lengthening and epenthesis). Importantly, by focusing on the relation between surface and structure, we may strive to avoid what Jun and Fletcher (2014) have called “one of the pitfalls of intonational study”, namely “to assume that similar pitch patterns between two languages […] can be accounted for in exactly the same way” (p. 508).

The AM framework provides the necessary tools for the task of finding the categories of intonation, by offering a constrained model of intonational analysis together with a set of labels that have been shown to be able to capture and describe the contrastive categories within a given target language. Importantly, the tools are flexible enough to be used in interim analyses that are necessarily part of a full-fledged analysis of any intonation system (e.g., through the use of a phonetic tier label where temporary annotations or hypotheses yet to be evaluated are made, basically resorting to the same set of symbols; Beckman et al., 2005; Jun & Fletcher, 2014). Finally, as argued in this article, the AM approach also offers the tools for cross-language or variety studies, given that phonological labeling based on a system-internal analysis and phonological labeling taking into account cross-language/variety comparison can be conducted in similar ways.

5 Conclusion

In the autosegmental-metrical framework, intonation is the structured variation in phonetic features, primarily pitch, used to express phrase-level meanings. Within this approach, the transcription of intonation requires an analysis of the intonational phonology and reflects our view of the intonation system (Beckman et al., 2005; Gussenhoven, 2004, 2007; Ladd, 2008). Assuming this approach, the present article focused on the ways structural (phonological) pitch patterns and surface (phonetic) pitch patterns are related, and in particular with mismatches between surface pitch similarities or dissimilarities and structural similarities or dissimilarities, both within and across languages/varieties. I have argued that the relation between surface and structure is at the heart of any analysis and transcription of intonation, fostering the goal of understanding how the surface forms signal the contrastive categories of the language which relate to differences in meaning. Such analysis, in my view, needs to be conducted within each language-specific system but taking into account the systems of other languages or varieties (in particular, those known to be related to the target language for which full descriptions are available). As the case studies examined in this paper hopefully illustrated, by making our options and goals explicit (which are primarily to identify the distinctive intonation categories of the target languages) and by using the same labels within the same framework in identical ways (that is, to express intonation categories), we are taking an important step towards analytic accuracy and cross-language comparability.

Last but not least, autosegmental-metrical work has led to considerable progress in the phonological analysis of intonation in an increasing number of languages and varieties, comparative studies included, and it is fair to emphasize the potential of the AM approach to continuously boost our understanding of the intonation systems of language.

Competing Interests

The author declares that they have no competing interests.