The Importance of a Distributional Approach to Categoriality in Autosegmental-Metrical Accounts of Intonation

When annotating a speech signal using an autosegmental-metrical model of intonation, transcribers associate portions of the F0 contour with labels from a finite inventory of tonal categories. In the models we are concerned with here, these categories have the status of phonological units (phonological form), bridging the intrinsic variability of the speech signal (substance) with the intrinsic fuzziness of post-lexical function (meaning). This, together with the relatively small size of the label inventory, precludes a one-to-one relationship between form and substance, and/or between form and function. A Neapolitan Italian corpus of read speech is used to investigate the distributional properties of two pitch accents that have been studied extensively with respect to substance (the alignment of F0 peaks) and meaning (sentence modality). Although there is a general consensus that peaks in this variety are aligned earlier in declaratives than in interrogatives, evidence is provided of contexts in which the converse is true, i.e., in which interrogative peaks are even earlier than their declarative counterparts. In this respect, interrogatives have a richer internal structure than declaratives. We argue that differences in how variably a prosodic category is encoded can be dealt with in an intonation transcription system, as long as this system relates phonological form (the choice of pitch accent in this case) both to phonetic substance and to meaning in a transparent way.


Background
Intonation transcription within the autosegmental-metrical framework entails the use of discrete and symbolic labels, which in most cases (e.g., Jun, 2005) refer to phonological units.When opting for a particular label, human transcribers rely on their interpretation of both phonetic substance and meaning. 1 As a result, labels used in intonation transcription refer to phonological units bridging the intrinsic variability of the speech signal (substance) with the intrinsic fuzziness of postlexical function (meaning).This, together with the small size of the label inventory, precludes a one-to-one relationship (mapping) between form and substance, and/or between form and function (e.g., Nolan, 2008).
But what is the use of a transcription practice that does not employ any such mapping?One solution to the problem of mapping is to report speakers' and listeners' preferencesor most typical behaviour-in terms of percentages (Baumann et al., 2006;Schafer et al., 2000, amongst others), e.g., meaning A is expressed by category X 80% of the time, this same meaning is expressed by category Y 15% of the time and by category Z 5% of the time.Such an approach employs what has recently been referred to as statistical gradience (Ladd, 2014).
Another solution-the one we are primarily concerned with in this paper-is to document the variability of phonetic parameters within the proposed categories (physical gradience; Ladd, 2014).We refer to this as a distributionalapproach,reflectingtheimportanceofhowthe various phonetic parameters are distributed within a phonological category.If implemented with full awareness, such an approach would provide an ideal frame for the incorporation of recent developments in our understanding of the relationship between phonetics and phonology (Pierrehumbert et al., 2000) and of the dynamics between the continuous and discrete poles of linguistic knowledge (Gafos & Benus, 2006).Moreover, it would make the conceptofmappingsuperfluous (Ohala,1990),sincethethreedimensionsofmeaning,form, andsubstancewouldnotbeseparableinthefirstplace.Justasformandmeaninghavebeen traditionally linked in linguistic theory (de Saussure, 1916), a model of intonation requires a third dimension, substance, that relates equally to both form and meaning (see also Cole & Shattuck-Hufnagel, this issue).Figure 1 sketches three dimensions as sides of the same structure (here a triangle).

Rationale
Whereas in traditional generative phonology categories are defined by the presence or absence of certain features, in a distributional approach phonological categories can be thought of as clusters in a multidimensional phonetic space (see Coleman, 2003, for a discussion in terms of contrasts, rather than categories).Such clusters can differ as to their internal structure, for example in terms of the presence or absence of sub-clusters (a sub-cluster can be seen as corresponding to an allophonic variant), or of their degree of compactness (the less compact they are, the more variation there is across individual tokens).In fact, in such an approach categories are expected to exhibit differences in internal structure-and such differences are in turn expected to bear on the functioning of the system as a whole.This includes how easily categories are acquired and accessed, and howpronetheyaretoberedefinedacrosstime.Inthissense,variabilityinencodingis seen as a resource for insights into category structure.This view is complementary to the one put forth by Cole and Shattuck-Hufnagel (this issue), in which prosodic transcription methods capitalizing on prosodic variability are proposed.Inthefollowing,wefocusondifferencesininternalstructureacrosstwointonational categories in Neapolitan Italian.This variety of Italian has been studied extensively (see D 'Imperio, 2002;Cangemi, 2014, and references therein), especially with respect to the distinction between two pitch accents (form), the alignment of F 0 peaks (substance), and sentence modality (meaning).A corpus of read speech (Section 2.1) is used to investigate distributional properties at three levels of granularity: First, we explore overall measures of dispersion in the fundamental frequency contours across sentence modalities (Section 2.2).We show that, independently of focus placement, Interrogatives display more variable contours than Declaratives, and that this is not anartefactofdurationaldifferences.Herewerelatemeaningtosubstance.
Second, we look at sub-clustering within each sentence modality (Section 2.3).This is done by looking at phonetic variability in the encoding of this functional contrast.There are already indications in other varieties of Italian that (polar) interrogatives have a more complex internal structure than declaratives.For instance, in Bari Italian, the bias and theexpectationsofthespeakerwhenaskingthequestioncanhaveaneffectonboththe pitch accent and the boundary tone (Grice & Savino, 2003;Savino, 2014a;Savino & Grice, 2011).Moreover, whereas declaratives consistently show final falls in all varieties of Italian(GiliFivelaetal.,2015),polarinterrogativesdisplayeitherfinalrisesorfinalfalls.Differencesinthefinalboundarytonearefoundnotonlyacrossregionalvarieties(see Savino, 2012, for a recent comprehensive overview), but also within a single variety, perhaps as a function of speaking style (see Grice et al., 1997, for Bari Italian).The lack of a 1:1mappingbetweenfinalF 0 movement and sentence modality is common across a range of languages besides Italian, such as German (Kohler, 2004;Kügler, 2003) and Swedish (House, 2005).In Italian, however, intonation bears the functional load of distinguishing between the two sentence modalities, there being no morphosyntactic markers of interrogativity, such as subject-verb inversion or question particles.Thus, especially in the absenceofadisambiguatingcontext,ifthefinalF 0 is falling, there needs to be a distinction in intonation in an earlier position for the utterance to be interpreted as a question.Examiningourcorpusofreadspeech,wefindthatinterrogativesareindeedencodedwith eitherfinalrisesorfinalfalls,indicatingadifferencein(internal)structuralcomplexity between declaratives and interrogatives.Here we relate form to meaning.
Third, we focus on variability within intonational categories (Section 2.4).This is done by investigating peak alignment within and across pitch accents (relating form to substance).Niebuhr et al. (2011) have shown a great degree of variability across speakers in the encoding of pitch accent contrasts, referring to, for example, 'shapers' and 'aligners', where the shape of a pitch movement can be used instead of the alignment of a peak to signal category membership.Here we explore whether variability of peak alignment is equally great across the two pitch accent types we investigate, and show that peaks are aligned more variably in interrogatives than in declaratives.
Inthefirstpartofthecorpusanalysis(Section2.2)wethusexploresentencemodality and focus placement jointly, by measuring the variability of F 0 contours in early, medial, and late focus utterances.In all cases, interrogatives are shown to have more variable F 0 contours than declaratives.In the second part (Sections 2.3-4), we concentrate on the last pitch accent in late focus declaratives and interrogatives.In these cases the focus is on the finalwordinthephrase.Thismorelocalanalysisyieldsresultswhichareimmediately comparable to those from the exploration of global F 0 contours, with interrogatives showing richer internal structure than declaratives, due to sub-clustering and higher dispersion.

Material
Our hypotheses on differences in internal structure are tested on the Danser corpus (Cangemi, 2014, sec. 4.2;Cangemi & D'Imperio, 2013), which features read speech from 21 native speakers of the Neapolitan Italian variety (aged 20-25).Recordings were carried out in a sound treated booth at the University of Naples "Federico II" Interdepartmental Research Centre for Signal Analysis and Synthesis (CIRASS), using an AKG MicroMic C520 head-mounted microphone connected to a personal computer running Audacity (Audacity Development Team, 2006) through a Shure X2u adapter.Stimuli were prompted on a computer screen using Perceval (André et al., 2003).
The 21 subjects uttered 3 randomized repetitions of 6 contextually determined prosodic variants of 2 sentences after silently reading a contextualization paragraph.The sentences shared the number and structure of syllables, stress position, and syntactic structure, according to the template [CV.CV .CV] s [CV .CV]V [CV#CV .CV] io , as in Serena vive da Lara 'Serena lives at Lara's'.Contexts presented visually (see Table 1 and the Appendix) induced one of the six combinations of corrective focus placement (on Subject, Verb, or Indirect Object) and sentence modality (Declarative or Polar Interrogative).For example, the context Serena vive da Marina?'Does Serena live at Marina's?' was used to elicit indirect object-focussed declarative utterances of Serena vive da Lara.
The resulting 756 utterances were isolated from the recording sessions using PRAAT (Boersma & Weenink, 2008) and force-aligned at the phone level using ASSI (Cangemi et al., 2011).Examples are provided in Figure 2. Phone durations and fundamental frequency contours were extracted with PRAAT and analysed with R (Bates et al., 2014;Fox & Weisberg, 2011;R Development Core Team, 2008).Fundamental frequency contours were first time-normalized by extracting F 0 values at exactly 50 equally spaced points from each utterance, then Gaussian smoothed (bandwidth 10 Hz).

Subjectfocus
Tua zia ti chiede se è Ramona che vive da Lara.Tu rispondi: Your aunt asks you if it's Ramona who lives at Lara's.You reply: Una delle tue cugine vive da Lara, ma non ricordi quale, quindi chiedi: One of your cousins lives at Lara's, but you don't remember which one, so you ask:

Verb-focus Un amico ti chiede se Serena adesso lavora da Lara. Tu rispondi:
A friend asks you if Serena now works at Lara's.

You reply:
Serena passa molto tempo con Lara, ma non ricordi perché, quindi chiedi: Serena spends a lot of time with Lara, but you don't remember why, so you ask:

Objectfocus
Tua sorella vuole sapere se Serena vive da Marina.Tu rispondi: Your sister wonders whether Serena lives at Marina's.You reply: Serena è andata a vivere da un'amica, ma non ricordi chi, quindi chiedi: Serena has moved in at a friend's, but you don't remember who, so you ask:

Macroscopic analysis (F 0 across utterances): dispersion
Afirstindicationthatthedegreeofvariabilityinrealizationmightbedifferentacross sentence modalities comes from the mere visualization of superposed utterance-long time-normalized F 0 contours, with sentence modality presented separately.As Figure 3 shows, when contours are pooled across focus placement conditions, interrogatives (right panel) show less homogeneous realizations across speakers, sentences, and repetitions than declaratives (left panel) do.This is confirmed by a Levene's test for homogeneity of variance (p < 0.001) run on contours sampled with 50 equally spaced points in time.An F-test further indicates that variance in interrogatives is 15% higher than in declaratives.
Theeffectholdswhenthethreefocusplacementconditionsareevaluatedseparately. Interrogativeswithinitial,medial,orfinalcontrastivefocushavemorevariablerealizations than declaratives with the same focus placement.This is illustrated in Figure 4, andconfirmedbyfurtherLevene'stests(allp < 0.001).Even if F 0 contours are sampled using the same number of points in time, variability might still surface as an artefact of differences in overall duration, although, overall utterance duration did not play a role in signalling sentence modality in a previous study of Neapolitan Italian (Cangemi & D'Imperio, 2011).The role of durational cues was investigated by predicting Utterance Duration with a mixed effects linear model featuring Modality (Declarative, Interrogative), Focus Placement (Subject, Verb, Object) andtheirinteractionasfixedeffects,Speaker,Sentence,andRepetitionasrandomintercepts, and by-Speaker and by-Sentence random slopes for Modality, Focus, and their interaction.Neither Modality nor the interaction between Modality and Focus reached significance(all|t|<1.69).Likelihoodratiotestsbetweenthefullmodelandasimpler model (in which Modality and the interaction between Modality and Focus are dropped) shownosignificantdifference(allp>0.25).Theseresultsindicatethatthedifferent degree of variability found across sentence modalities is not a by-product of durational differences.
Figure 4 also illustrates the well-documented finding that in Neapolitan Italian, as in other varieties (e.g., Palermo (Grice, 1995) and Bari (Grice & Savino, 1997;Grice etal.,2005)),thefinalconstituenthasapitchpeakregardlessofthelocationoffocus in the sentence.This peak is usually reduced in range if the focus is earlier: "yes/ no questions with early focus present [...] a smaller peak on the last stressed syllable of the intonational phrase" (D' Imperio, 1997, p. 25;see also D'Imperio, 2001).Obviously,peaksareintrinsicallymorepronetovariationthanflatstretches.Itmight thus be that variability is a mere by-product of the number of peaks in an utterance.In the subject-and verb-focus conditions, interrogatives (with two peaks) are thus expected to be more variable than declaratives (which have one).However, in the object-focus condition, both modalities have two peaks, and as we have seen above, interrogatives are still more variable than declaratives.This fact suggests that variability is not simply a matter of how many pitch movements are present.Under this assumption, we tested a more conservative prediction of our hypothesis, according to which interrogatives are more variable than declaratives even when the final prosodic word is analysed separately.Levene's tests support this prediction as well, both for utterances gated before the final prosodic word and, as shown in Figure 5, forthefinalprosodicwordbyitself(allp < 0.01).

Microscopic analysis (peak alignment in pitch accents): sub-clustering and dispersion
So far, we have explored the interplay between sub-clustering and dispersion at a macroscopic level, by evaluating variance in F 0 values across entire utterances or prosodic words.In this paragraph, we show that these same two sources of variability also operate at a microscopic level.In order to do this, we will focus on peak alignment in the last (nuclear) pitch accent of object-focused utterances.Peak alignment has been shown to be an important cue in distinguishing declaratives from interrogatives in Neapolitan Italian.It is for this reason that we focus on this aspect of phonetic substance here.Peak alignment was automatically extracted using a procedure in four steps.First, the F 0 contours of the last prosodic word were extracted using 50 equally spaced sampling points foreachitem,thusintrinsicallynormalizingfordurationaldifferences.Thenweextracted the number of local maxima; some items (n = 76) had a single maximum, but most had more than one; only very few (n = 13) had more than four, and were discarded from analysis.Visual inspection of the remaining cases with two to four maxima showed that two items contained two non-adjacent samples with the same exact F 0 values; these were excluded from analysis.All other items had adjacent maxima, viz.very short plateaux (< 25ms).In such cases, the peak was located at the end of the plateau (D'Imperio, 2000;   Knight&Nolan,2006).Forsixitemsthepeakslocatedinthiswaysurfacedinthefirstor finalfifthofthelastprosodicword(hencewellawayfromthemedialstressedsyllable) andwerediscardedasartefacts.Sincetwoitemswerediscardedduetodisfluencies,the finaldatasetresultedin231items.Figure 7 shows a schematic representation of intonational contours for the two sentence modalities and the two edge tones.
Figure 8 shows the distribution of peak alignment values for declaratives (dashed line) and interrogatives (solid line).In many ways the results mirror those for the overall F 0 contours across entire utterances.First, interrogatives show sub-clustering, as can be seen by the bimodal distribution.The shoulder around 35% in normalized time of the last prosodicwordisinfactthecontributionofinterrogativeswithfinalrise,inwhichthepeak is retracted in order to accommodate the following rise (see also Figure 6, right panels, and Figure 7 for actual and schematic pitch contours respectively).Declaratives, on the other hand, show a single peak around 45%.Moreover, the second peak in interrogatives, situated around 65%, has a larger kurtosis than the peak for declaratives.This indicates that even when only final falls are taken into account, dispersion is still higher in interrogativesthandeclaratives.ThispatterningisconfirmedbyLevene'stestscontrasting variability of peak alignment in declaratives and interrogatives, both when including (p < 0.01) and excluding (p=0.01)interrogativeswithfinalrises.

Summary of results
The exploration of the Neapolitan Italian read speech corpus has shown that pitch contours are more variable in interrogatives than in declaratives.This is true both at a macroscopic level, i.e., in terms of variability of F 0 tracks across the entire utterance (see Sections 2.2-3), and at a microscopic level, i.e., in terms of variability of peak alignment within individual pitch accents (Section 2.4).
Microscopic analysis.In focus-final utterances, peak alignment in interrogatives has been shown to be more variable than in declaratives (Section 2.4).This has been ascribed to the two same sources of variability invoked in the macroscopic analysis: the distribution of peak alignment values not only shows two clusters (late and very early peaks), but the late-peak cluster also has a higher kurtosis (i.e., has more dispersed realizations).
In the following, we speculate on some possible causes and consequences of the greater variability in the encoding of interrogatives (Section 3.2).We conclude with a discussion ontheimplicationsofourfindingstowardsthetheoryandpracticesoftranscription,in particular prosodic transcription (Section 3.3).

On the sources and consequences of variability in interrogatives
In a distributional approach, one would of course expect variability across realizations of a given category.More importantly for our purposes, there is also no reason to assumethatthisdegreeofvariabilityshouldbethesameacrossdifferent categories.One category might be instantiated by fairly variable tokens, while another category might be encoded more compactly.Our results show indeed that interrogatives are encoded more variably than declaratives.It is important to take a closer look at these differences, since this state of affairs might emerge as a consequence of how categories are organized in a system (and thus provide insights on a language's prosodic system),andinturnbereflectedinhowsuchcategoriesarebuilt,used,andupdated (and thus generate hypotheses on language acquisition, interaction, and sound change).An extensive discussion of the sources and consequences of such differential variability ofcategoriesinintonationisbeyondthescopeofthispaper(forthenotionofdifferentialvariabilityinadifferentdomain,seeHoetal.,2008).Inthefollowing,wewill provide only a brief overview of possible sources and one example of how the notion of differentialvariabilitymightbeusefulinotherresearchareas(viz.languagecontact).Implicationsforprosodictranscriptionwillbedealtwithmoreextensivelyinthefinal section (Section 3.3).
Escandell-Vidal (1998) explored the role of intonation in "procedural encoding" (Wilson & Sperber, 1993) of interrogatives, suggesting that speakers (in this case, of Peninsular Spanish)usedifferentintonationcontoursininterrogativesinordertoguidethelistener towards a particular understanding of the content of an utterance.Recent experimental research suggests that intonation might encode a variety of pragmatic biases in interrogatives (Borràs-Comes & Prieto, 2014;Nilsenova, 2002;Savino, 2014a;Savino & Grice, 2011) across languages.These include epistemic (the assessment of the truth of a proposition, Sudo, 2013), evidential (availability of evidence, Büring & Gunlogson, 2000), mirative (surprise or unexpectedness, Peterson, 2010), anddoxastic (disbelief, Crespo-Sendra et al., 2013) biases.At this point, it is unclear whether interrogatives are genuinely more prone than declaratives to an intonational encoding of pragmatic biases, or if such a picture is a mere consequence of the recent accumulation of experimental research focusing on interrogatives rather than declaratives.Similarly, one might ask whether intonation research has so far adopted an excessively broad understanding of interrogativity, thus somehowneglectingthedevelopmentofspecificmethodologicalparadigmstoeffectively tell apart such "nuances".As mentioned earlier (Section 1.2), interrogatives have also been shown to be more variable in terms of regional variation, in particular across varie-tiesofItalian (Savino,2012).Thismightinturnmotivatethedifferencesfoundacross speech styles (Grice et al., 1997), especially in diglossic contexts, or if speakers have accesstoahighlystratifiedrepertoire.Similarly,politenessisalsoarguedtoplayarole intheselectionofspecificintonationcontoursforinterrogatives(Astrucetal.,inpress;Cruttenden, 1986). 3iventhispicture,thenotionofdifferentialvariabilitymightproveusefulingenerating newresearchhypothesesandinaccountingforsomerecentfindings.Studieson prosodic accommodation in overt (D'Imperio et al., 2014;D'Imperio & Sneed German, 2015) and covert (Savino, 2014b) imitation tasks show that speakers of Italian varieties can adapt their pitch accent and boundary tone choices in the production of interrogatives.Crucially, Romera and Elordieta (2013) report that prosodic accommodation due to language contact is stronger for interrogatives than for declaratives.They analyze intonation patterns from a corpus of symmetrical semi-directed sociolinguistic interviews, with a native speaker of Majorcan Catalan (also an L2 speaker of the Majorcan variety of Spanish) as interviewer.The subjects were four speakers with monolingual Peninsular Spanish origins, who had been living in Majorca for 5 years on average at the time of the interview.Whereas no subject used Majorcan Spanish intonation in their production of declaratives, all subjects used Majorcan Spanish intonation for interrogatives in at least 25% of their productions.Along the lines of Trudgill (1986), the authors suggest that speakers might accommodate their production of interrogatives more readily to the Majorcan Spanish prototype because these are perceptually more salient, and thus considered to be more characteristic of the target variety.However, if interrogatives were encoded more variably in Peninsular Spanish, the higher degree of accommodation of interrogatives to the Majorcan Spanish target might also stem from the internal structure of the interrogative category itself.Accommodation might be easier if the source category has a rich structure, with sub-clusters and high internal dispersion (as with interrogatives), rather than a very peaked distribution (as with declaratives), resulting in more entrenched productions.4

Implications for intonation transcription
The alignment of F 0 peaks is an important aspect of phonetic substance that is taken into account when deciding on an inventory of intonational categories, and later when deciding on category membership, in particular pitch accent type.Extensive research on categorical perception has focused on this cue to the interrogative-declarative distinction (D'Imperio & House, 1997).Nonetheless, even when investigating this very distinction, wehaveidentifiedtonalcontextsthatcanradicallyaffectthealignmentofF 0 peaks.Takingonlythephoneticsubstanceintoaccount,pitchaccentsininterrogativeswithafinal riseshouldpatternwithapitchaccenttypeinvolvingamedialpeak(L+H*,firstanalysed as H*+L, see D 'Imperio, 2001;Grice et al., 2005;and Frota, 2016, for discussion).However, taking meaning into account, they should pattern with pitch accents with a late peak (L*+H).Atfirstsight,thismightlooklikeagoodargumentforabroadphoneticlevel of transcription dealing with substance, and a phonological level of transcription dealing with meaning.However, it is unclear whether we actually hear these peaks as medial, as other cues might be at work, especially since it is not clear how modular our perception of pitch accents and following boundary tones is (Dainora, 2006).It is thus unclear whether weintegratethepositionofthepeakwith(possiblylanguage-specific)adjustmentsmade owing to tonal context.In this case the anticipation of the peak would serve to ensure that thetonalcontourisrealized,despitethelackofsufficientsegmentalmaterialtobearthe tones assigned to it.An intonation transcription system needs to have mechanisms for dealing with contextually determined variation, i.e., adjustments due to tonal crowding.Adjustments can be made to the articulation rate: slowing down facilitates the accommodation of the tones (Erikson & Alstermark, 1972).However, rate adjustments do not involve a uniform lengthening of segments (Byrd & Saltzman, 2003).Thus, the alignment of intonational peaks will involve more or less dispersion across conditions, depending on the landmarks selected for reference.Alternatively, or in addition, the tones may move closer together, leading tofasterpitchmovements.Inthisstudyadifferentkindofadjustmentwasapparent:The whole tonal sequence starts earlier in relation to the segmental string.Anticipation has been found in Tashlhiyt Berber (Grice et al., 2015), in cases where the segmental string was entirely voiceless and thus not tone-bearing.It has also been found in Dutch (Hanssen et al., 2007) in cases of tonal crowding.All of these sources of variation make absolute andrelativealignmentmeasuresdifficulttointerpret.
Another source of variation is truncation, a process in which tones undershoot their targets.Naturally, the transcriber is faced with the decision as to whether a tone is there but only partially realised, i.e., truncated, or simply not there at all.Take, for instance, contours that Grabe (2008) analyzed as a truncated fall in German.The decision to refer to them as truncated falls might appear to have been made on the basis of the meaning of a given contour, rather than its phonetic substance (in this case the F 0 trace).Thus, a high shortleveltoneon<Schiff>maybeinterpretedasatruncatedfallbyvirtueofitsfunctional equivalence to <Schiefer> in a declarative sentence, which has a clear fall over the two syllables (Grabe, 2008).However, recent work has shown that the spectral characteristics of voiceless segments can give an impression of pitch (Kohler, 2011;Niebuhr, 2008;Ritter & Roettger, 2014), so that the phonetic substance-in the form of perceptual integration of cues-most probably played a role in the categorization, despite the lack of movement in the fundamental frequency contour.This indicates that multiple cues make itdifficulttodecideonthebasisofaselectivevisualrepresentationofthesignal-especially when this involves a visual representation of F 0 -as to category membership.This is to be expected, given that the cues encoding speech in general are acoustically diverse and functionally redundant (see Winter, 2014).
The data presented in this study raise the question as to whether peak alignment is a suitable cue in itself.In fact, it has been treated as an abstraction by Gussenhoven (2004), who shows that peak delay can be used as an enhancement of-or substitute for-peak raising.Tradeoffsbetweenpeakalignmentandotherparameterssuchaspeakscalingor shape have been documented for Russian (Rathcke, 2006) and German (Ambrazaitis & Frid, 2014;Niebuhr, 2007).Furthermore, Niebuhr et al. (2011) show that there are considerable individual differences in the implementation of intonational categories, specifically, that for some speakers the shape of the F 0 trajectory may be used, instead of the absolute alignment of a peak, to signal the same category.Therefore we should exercise caution when reducing contrasts to one dimension.
Let us examine a typical contrast in the segmental domain that is frequently discussed as analogous to peak alignment, the distinction between 'voiceless' and 'voiced' oral stops, e.g., between /p/ and /b/ (see also Arvaniti, 2016).This distinction is frequently invoked in the intonation literature in relation to one dimension-voice onset time (VOT).VOT shares with F 0 peak alignment the crucial timing of glottal and supralaryngeal gestures, although VOT is concerned with onset of vibration of the vocal folds, whereas peak alignment is concerned with modulating the frequency of vibration.
In many accounts, the terms fortis and lenis are used for this distinction, rather than voiceless and voiced,reflectingthefactthatvoicing(vocalfoldvibration)isnottheonly cue involved in the contrast.Comparing fortis and lenis plosives acoustically, fortis plosives have a longer closure duration, and a longer and stronger burst, resulting from a longer articulatory constriction duration and a higher intraoral peak pressure.Moreover, in intervocalic position the preceding vowel is longer and the transitions into the vowel are more abrupt (Kohler, 1979;Lisker, 1978;Slis & Cohen, 1969).Despite the term microprosody,thefortis-lenisdistinctioncanhaveaconsiderableeffectontheF 0 , and can be used as a cue to voicing, even when voice onset time cues are unambiguous (see, e.g., Kingston, 2007;Whalen et al., 1993).
Furthermore, despite the emphasis in the literature on VOT, the aspiration (i.e., positive VOT) in English is often drastically reduced in weak prosodic positions (such as in theunstressedsyllablein'rapid'orwordfinallyin'hip')orafterawordinitialsibilant (in 'spin').In these cases it is unlikely that a lenis symbol is selected, as the transcriber is aware of the contextually determined variation, and of course keeps the lexical meaning in mind.This is less obviously the case when transcribing obstruents across dialects.Barry and Pützer(1995)pointoutthattranscribersmightweightthedifferentcuestothefortis-lenis distinctionindifferentways,leadingtodifferentchoicesofsymbol,evenforthesame

Figure 1 :
Figure 1: Form, substance, and meaning in intonation transcription.Three sides of the same triangle.

Figure 8 :
Figure 8: Distribution of peak alignment for each sentence modality.

Figure 7 :
Figure 7: Schematic pitch contours on the last prosodic word in declaratives (dashed line) and interrogatives (solid line).

Table 1 :
Contexts for the elicitation of the six focus/modality combinations for the sentence Serena vive da Lara ('Serena lives at Lara's' / 'Does Serena live at Lara's?').