When annotating a speech signal using an autosegmental-metrical model of intonation, transcribers associate portions of the
Intonation transcription within the autosegmental-metrical framework entails the use of discrete and
But what is the use of a transcription practice that does not employ any such mapping? One solution to the problem of mapping is to report speakers’ and listeners’ preferences—or most typical behaviour—in terms of percentages (
Another solution—the one we are primarily concerned with in this paper—is to document the variability of phonetic parameters
Form, substance, and meaning in intonation transcription. Three sides of the same triangle.
Whereas in traditional generative phonology categories are defined by the presence or absence of certain features, in a distributional approach phonological categories can be thought of as clusters in a multidimensional phonetic space (see
In the following, we focus on differences in internal structure across two intonational categories in Neapolitan Italian. This variety of Italian has been studied extensively (see
First, we explore overall measures of dispersion in the fundamental frequency contours across sentence modalities (Section 2.2). We show that, independently of focus placement, Interrogatives display more variable contours than Declaratives, and that this is not an artefact of durational differences. Here we relate meaning to substance.
Second, we look at sub-clustering within each sentence modality (Section 2.3). This is done by looking at phonetic variability in the encoding of this functional contrast. There are already indications in other varieties of Italian that (polar) interrogatives have a more complex internal structure than declaratives. For instance, in Bari Italian, the bias and the expectations of the speaker when asking the question can have an effect on both the pitch accent and the boundary tone (
Third, we focus on variability
In the first part of the corpus analysis (Section 2.2) we thus explore sentence modality and focus placement jointly, by measuring the variability of
Our hypotheses on differences in internal structure are tested on the
The 21 subjects uttered 3 randomized repetitions of 6 contextually determined prosodic variants of 2 sentences after silently reading a contextualization paragraph. The sentences shared the number and structure of syllables, stress position, and syntactic structure, according to the template [CV.CV̀.CV]s [CV̀.CV]V [CV#CV̀.CV]io, as in
Contexts for the elicitation of the six focus/modality combinations for the sentence
Declarative | Interrogative | |
---|---|---|
The resulting 756 utterances were isolated from the recording sessions using
Spectrogram and fundamental frequency traces with orthographic word-level segmentation for three object-focus utterances of the sentence
A first indication that the degree of variability in realization might be different across sentence modalities comes from the mere visualization of superposed utterance-long time-normalized
Utterance-long time normalised
The effect holds when the three focus placement conditions are evaluated separately. Interrogatives with initial, medial, or final contrastive focus have more variable realizations than declaratives with the same focus placement. This is illustrated in Figure
Utterance-long time normalised
Even if
Figure
Fundamental frequency contours for final prosodic words. Vertical range 75–300 Hz. Declaratives (upper panels), interrogatives (lower), in three focus positions, subject focus (left), verb focus (middle), and indirect object focus (right).
The final prosodic word deserves particular attention, especially since (as we suggested above, Section 1.2) declaratives consistently show final falls, whereas interrogatives display either final rises or final falls. The greater variability in interrogatives might thus reflect either (i) the fact that one pragmatic category (interrogative) can be represented by two sub-clusters (final rise vs. final fall), or (ii) that dispersion of actual realizations is higher for interrogatives independently of differences in sub-clustering—or both.
In order to explore sub-clustering, we automatically classified utterance-final contours into rising and falling. Contours were classified on the basis of the difference between the mean
Automatic classification of final rises.
The greater variability in interrogatives is not only due to this final rise, however: Levene’s tests on items with final falls confirm that, even in this reduced dataset, interrogatives are realized more variably than declaratives (all
So far, we have explored the interplay between sub-clustering and dispersion at a macroscopic level, by evaluating variance in
Peak alignment was automatically extracted using a procedure in four steps. First, the
Schematic pitch contours on the last prosodic word in declaratives (dashed line) and interrogatives (solid line).
Figure
Distribution of peak alignment for each sentence modality.
The exploration of the Neapolitan Italian read speech corpus has shown that pitch contours are more variable in interrogatives than in declaratives. This is true both at a macroscopic level, i.e., in terms of variability of
In the following, we speculate on some possible causes and consequences of the greater variability in the encoding of interrogatives (Section 3.2). We conclude with a discussion on the implications of our findings towards the theory and practices of transcription, in particular prosodic transcription (Section 3.3).
In a distributional approach, one would of course expect variability across realizations of a given category. More importantly for our purposes, there is also no reason to assume that this degree of variability should be the same across different categories. One category might be instantiated by fairly variable tokens, while another category might be encoded more compactly. Our results show indeed that interrogatives are encoded more variably than declaratives. It is important to take a closer look at these differences, since this state of affairs might emerge as a consequence of how categories are organized in a system (and thus provide insights on a language’s prosodic system), and in turn be reflected in how such categories are built, used, and updated (and thus generate hypotheses on language acquisition, interaction, and sound change). An extensive discussion of the sources and consequences of such
Escandell-Vidal (
Given this picture, the notion of differential variability might prove useful in generating new research hypotheses and in accounting for some recent findings. Studies on prosodic accommodation in overt (
The alignment of
An intonation transcription system needs to have mechanisms for dealing with contextually determined variation, i.e., adjustments due to tonal crowding. Adjustments can be made to the articulation rate: slowing down facilitates the accommodation of the tones (
Another source of variation is truncation, a process in which tones undershoot their targets. Naturally, the transcriber is faced with the decision as to whether a tone is there but only partially realised, i.e., truncated, or simply not there at all. Take, for instance, contours that Grabe (
The data presented in this study raise the question as to whether peak alignment is a suitable cue in itself. In fact, it has been treated as an abstraction by Gussenhoven (
Let us examine a typical contrast in the segmental domain that is frequently discussed as analogous to peak alignment, the distinction between ‘voiceless’ and ‘voiced’ oral stops, e.g., between /p/ and /b/ (see also
In many accounts, the terms
Furthermore, despite the emphasis in the literature on VOT, the aspiration (i.e., positive VOT) in English is often drastically reduced in weak prosodic positions (such as in the unstressed syllable in ‘rapid’ or word finally in ‘hip’) or after a word initial sibilant (in ‘spin’). In these cases it is unlikely that a lenis symbol is selected, as the transcriber is aware of the contextually determined variation, and of course keeps the lexical meaning in mind.
This is less obviously the case when transcribing obstruents across dialects. Barry and Pützer (
Thus, even in the so-called segmental domain there are problems with categorization. Just as /p/ is selected by the transcriber to represent the sound in “spin” despite its zero VOT, L*+H might be selected by the transcriber in the rise-fall-rise case, despite the early alignment of the
This is true of any intonation transcription system, although priorities vary. For instance, within the British school, Crystal prioritizes substance and sees intonation as “the product of the interaction of features from different prosodic systems—tone, pitch-range, loudness, rhythmicality and tempo in particular” (
We used an absolute threshold instead of empirically determined just noticeable differences since the listener-specific sensitivity thresholds reported by ’t Hart (
As in the case of pragmatic biases, we do not exclude that politeness might affect declaratives in equal measure. It may therefore be an issue of what happens to pitch accents when followed by a final rise instead of a fall, rather than interrogatives per se.
Note that
For accompanying TextGrid and wav files, go to
The work of the first author was supported by the UoC Emerging Group “Dynamic Structuring in Language and Communication”, funded through the Institutional Strategy of the University of Cologne (ZUK 81/1).
The authors declare that they have no competing interests.