1.1 Intonation, stress, and tonal association
In most models of intonation, a distinction is made between tones found at the edges of prosodic constituents, and tones that are linked to specific strong elements within prosodic constituents. For instance, Trager and Smith (1951) made a distinction between ‘pitch phonemes’ and ‘juncture phonemes.’ Trubetzkoy (1958) distinguished between tones with a culminative function and those with a delimitative function, where culminative tones lend prominence and delimitative tones are used to mark edges. The IPO model (Cohen and ’t Hart, 1968; ’t Hart and Cohen, 1973; ’t Hart and Collier, 1975) distinguished between ‘accent lending’ and ‘non-accent-lending’ pitch movements. In Autosegmental Metrical phonological approaches to intonation (e.g., Pierrehumbert, 1980, and cf. Jun, 2014; Arvaniti, 2011; Beckman and Venditti, 2010; Ladd, 2008; Grice, 2006; Beckman, Hirschberg and Shattuck-Hufnagel, 2005; Gussenhoven, 2004 for overviews), a distinction is made between pitch accents and edge tones. An intonational pitch accent is considered a ‘prominence-cueing’ tonal event (cf. Ladd, 2008, p.34) and co-occurs with a metrically strong syllable of a prosodic constituent (most often the stressed syllable of a word).1 An edge tone, on the other hand, is defined as a tonal event co-occurring with the periphery of a prosodic constituent.
The co-occurrence of a phonological tone with a certain segmental or prosodic landmark can be expressed on the one hand in terms of phonetic alignment and on the other hand in terms of phonological association. Henceforth, alignment is taken to be continuous, referring to the exact position of a tone in relation to a landmark in the segmental string, and association is taken to be discrete, referring to the (phonological) linking of a tone with a phonological constituent.
The last two decades of research investigating phonetic alignment have yielded the insight that certain tonal targets align in a highly systematic way with respect to single or multiple reference points in the segmental string (the latter referred to as ‘segmental anchoring’ by Arvaniti, Ladd and Mennen, 1998). Indeed, many studies have shown that in a number of European languages, specific turning points (local F0 minima and F0 maxima as well as elbows) predictably occur within a small time frame determined by the segmental make-up of the text (e.g., Arvaniti and Ladd, 2009; Mücke, Grice, Becker and Hermes, 2009; Atterer and Ladd, 2004; Arvaniti et al., 1998, and Ladd, 2008, 2006 for an overview). Importantly, patterns of phonetic alignment relative to segments, which are consistent within rather than across language varieties, are typically taken as the phonetic manifestation of phonological association to a structural unit like a stressed syllable or a phrase edge (Arvaniti, Ladd and Mennen, 2000). This, in turn, goes hand in hand with a classification of the tonal event as belonging to a descriptive category like pitch accent or edge tone.
The assumption that association can be defined phonetically in terms of temporal alignment is made explicit by for example Pierrehumbert and Beckman (1988, p. 127). They claim that “tones are produced at the phonetic boundary of the node with which they are associated […]”. They go on to state that “the initial and final tone […] are produced at about the same time as the initial and final segments […]”. Thus, the most relevant characteristic of what in this case is an edge tone would be its alignment with the margins of a prosodic phrase. Similarly, the existence in a language variety of varying alignment patterns where the same tonal targets are aligned differently relative to a stressed syllable has frequently been used to posit a distinction between pitch accent categories (Face and Prieto, 2007; D’Imperio, 2006; Pierrehumbert and Steele, 1989). This kind of argumentation does not, however, entail that the tonal targets of a pitch accent must always be aligned within the temporal interval spanned by the stressed syllable (Grice, 1995; Arvaniti et al., 2000).
Given that pitch accents are usually understood to be phonologically linked to metrically stressed syllables, this category label does by definition not apply to languages that lack stress. It is therefore particularly interesting to investigate tonal behaviour in such languages. Yet, to the best of our knowledge, tonal and segmental alignment have not been systematically investigated in languages that do not exhibit word-level metrical structure, with the notable exceptions of stress-lacking Ambonese Malay (Maskikit-Essed and Gussenhoven, 2016) and Tashlhiyt Berber (Grice, Ridouane and Roettger, 2015), and (arguably) French. Among other things, Maskikit-Essed and Gussenhoven (2016) look at phrase-final rise-falls in Malay declaratives and find that the alignment of the peak is highly variable with respect to six different landmarks. They argue that there is no syllabic constituent that the peak (a H target) can be argued to associate with based on systematic alignment. Instead, the phrase-final rise-fall is analyzed as a boundary tone complex linked to the right phrasal edge, and specifically, as a succession of a phonological phrase and an intonation phrase boundary. A similar analysis is given for the simple rise that is used in phrase-final position for interrogatives.
It merits further investigation whether Tashlhiyt, as another example of a language lacking lexical stress, exhibits systematic phonetic alignment at all, and if so, what the relevant landmarks in the segmental string would be. This, in turn, raises the questions as to how tonal alignment in Tashlhiyt should be phonologically represented in the absence of word-level metrical structure.
1.2 Earlier work on Tashlhiyt
Tashlhiyt is one of three Berber languages spoken in Morocco and has recently received much interest regarding aspects of prosodic structure and tonal placement, with instrumental studies on metrical structure (Gordon and Nafi, 2012; Roettger, Bruggeman and Grice, 2015) and intonation (Grice et al., 2015; Roettger and Grice, 2015). Several observations concerning the phonological structure of Tashlhiyt suggest that the language only has prominence or accent at the phrasal level (Kossmann, 2012; Dell and Elmedlaoui, 2008, 2002; Aspinion, 1953; Stumme, 1899), that is, it has no lexical stress. Roettger et al. (2015) provided experimental evidence to support these observations: Tashlhiyt vowels (as syllable nuclei) do not exhibit the consistent acoustic-phonetic differences that one would expect to find as an indication of underlying, metrically specified prominence asymmetries (but cf. Gordon and Nafi, 2012 for a contrasting claim).2
Recent work on the intonation of Tashlhiyt has unveiled a great deal of variability in tonal placement in phrase-final position. Grice et al. (2015) report on the behaviour of two different H tones, where one marks the right edge of polar questions and the other signals contrastive focus of phrase-final words in declaratives. Despite their different functions, these H tones have in common that they can both dock onto either the penultimate or the final syllable of the phrase (see also Roettger and Grice, 2015 for similar findings with regard to declarative questions). Grice et al. (2015)’s analysis of these H tones as having primary association with the phrase edge and secondary association to a specific syllable sits well with the assumption that the language does not have lexical stress, since this presupposes that there are no a priori lexical anchors to which postlexical tones would associate.3 The present study is concerned with intonational tones in phrase-initial position, and specifically with those tones that make up the intonational marking of question words.
1.3 Question intonation
Whether and how wh-words, or question words (henceforth, qwords) are marked prosodically varies crosslinguistically. Ladd (2008) makes a distinction between two main prosodic patterns of qword interrogatives: One in which the qword is not normally accented or at least does not get the main phrasal accent (mostly in Germanic languages), and one in which the qword is the most prosodically prominent in the phrase. This latter pattern seems to be more widely attested as a default choice cross-linguistically, being reported to hold in a wide range of languages from different families; e.g., Greek (Baltazani, 2002), Moroccan Arabic (Benkirane, 1998), and Romanian and Hungarian (Ladd, 2008).
While the idea of qwords carrying inherent focus used to be problematic for early accounts of qword prosody (see Ladd, 2008, pp. 226–231 for a summary), today it is widely accepted that there may be a mismatch between assumed focal status of qwords on the one hand, and the prosodic marking expected to reflect it on the other. This holds particularly for Germanic languages, in which qwords do not usually attract the main pitch event in the phrase. Yet in certain contexts, qwords in Germanic may still attract a (nuclear) pitch accent. Pitch accent distribution in qword questions in Germanic is in fact determined by a number of factors, including paralinguistic considerations and information status of elements other than the qword (e.g., Chen, 2012 on Dutch). In contrast, in languages with a default nuclear accent on the qword, the occurrence of this accent on the qword seems to be compulsory (e.g., Baltazani, 2002 70f. on Greek).
A final issue relevant to any investigation of question intonation has to do with question length. As Ladd (2008) points out, the accentedness of qwords in languages that canonically mark their qwords with the nuclear accent is primarily a feature of short questions. This suggests that the likelihood of a non-default accent location, or the presence of multiple accents, is linked tightly to phrasing.
In this paper we limit ourselves to a case study on qwords that are in narrow focus in default initial position, as detailed in the next section.
1.4 Qword constructions in Tashlhiyt
Although previous work on Tashlhiyt has looked at polar questions, the present paper is the first investigation into intonational characteristics of questions with a qword. Most morphologically simple qwords consist of one or two syllables starting with ma, e.g., ma/mad ‘what’ and mani ‘where’. Complex question constituents (henceforth, complex qwords) are formed by man ‘which’ followed by a noun, e.g., man ahuli ‘which sheep’).4 The canonical position for qwords in Berber is phrase-initial (Kossmann, 2012) as it is in Tashlhiyt (Stumme, 1899). Many authors have drawn parallels between initial qwords and cleft constructions, with clefts in Berber being defined as phrase-initial focal constructions (Kossmann, 2012; Stoyanova, 2004; van den Boogert, 1997). Implicit in most of this work is the assumption that qwords are not licensed in any other phrasal position (which is made explicit in Stoyanova, 2004 on Tamazight and Tarifiyt Berber) due to syntactic constraints on focal interpretation. However, a quick glance at qword questions in a corpus of semi-spontaneous speech produced by pairs of speakers doing map tasks (Bruggeman and Roettger, 2017) reveals that, at least in this type of speech, question words also frequently occur in phrase-final position.
1.5 Overview of paper
The rest of this paper is structured as follows. Section 2 is exploratory in nature: First, section 2.1 provides the motivation for the pilot experiment reported on in section 2.2, in which we investigate qualitatively whether and in what contexts qwords in Tashlhiyt are marked prosodically. Following this, section 3 discusses the main experiment which concentrates on 11 different qwords (five simple and six complex) in initial position and in narrow focus. It addresses in detail the alignment and scaling characteristics of the F0 contour associated with these particular words. In the discussion in section 4, we take up the question as to how to interpret tonal placement in Tashlhiyt phonologically.
2 Exploratory study: Inventory of patterns
Very little is known about the intonation of questions with qwords in Berber, including Tashlhiyt, and this section serves to lay out the ground for the more detailed experiment in section 3. The aforementioned qword questions produced in semi-spontaneous speech (map tasks and gameplay) by 10 speakers recorded in Agadir in 2013 and 2015 (Bruggeman and Roettger, 2017) exhibited a range of patterns. Qwords in their default initial position could be marked either with pitch movement (which could be rising or falling) or with high level stretches of contour. This variability in pitch contours may have been due to the differences in information structure of these utterances. We thus decided to run a pilot experiment in order to further investigate the influence of context on the prosodic realization of qwords in Tashlhiyt.
It has been proposed that qwords are attributed focal status by virtue of contributing interrogative meaning to the phrase they occur in (Tomioka, 2007, p. 1575, Zubizarretta, 1998; Culicover and Rochemont, 1983). An extension of this idea in the prosodic domain considers that the default marking of qwords by means of a pitch accent, which occurs in a number of languages, is a reflection of this focal status (cf. Ladd, 2008). In contrast, qwords that are embedded in a clause have a different function and interpretation (e.g., Krifka, 2011), notably not (necessarily) being focal. Following this line of thought, embedded qwords should therefore be less likely to be marked with a (nuclear) pitch accent.
This section investigates precisely this issue in Tashlhiyt. It compares the same question word (manaɡu ‘when’)5 in direct interrogatives, i.e., in speech acts expressing a request for information, and in phrases with an embedded qword. In particular, we expect the qword to be in focus in a short direct interrogative and not in focus when embedded in a statement.
In order to elicit natural sounding instances of direct questions as well as embedding contexts, we scripted a telephone dialogue between two speakers (see Appendix). The dialogue was presented to single participants a few lines at a time on a laptop screen. Participants were familiarized with the content of the conversation beforehand and were then instructed to read and act out both sides. Intonation patterns discussed here reflect data from eight participants, who are the same as those who participated in the main experiment (see section 3.1.3 for details), plus one who only completed this pilot but not the main experiment.
2.2 Question word function and position
As mentioned in the introduction, the default position for qwords in Tashlhiyt is phrase-initial. A typical intonation contour with initial qword is given in Figure 1a. Qwords may occur in final position (Figure 1c) in a context where the question is repeated. There was no effect of position: Interrogative qwords are characterized by pitch movement irrespective of their position in the phrase. Specifically, figure (1a) shows that qwords receive a peak both in initial (a) and in final (c) position, with the rising and falling movements flanking the peak also for the most part realized on the qword.
In contrast to qwords in interrogatives, non-interrogative, embedded qwords are not marked by the same pitch event, as can be seen for both medial position (1b) and final position (1d).6
On the other hand, qwords that are in medial (peninitial) position, preceded by a discourse marker, are characterized by the same rising-falling pitch movement as initial and final qwords. Figure 2 shows two intonation patterns that are possible on the phrase imːa manaɡu rad tbdut lχdmt? (‘So when will you start work?’). The preferred way to produce peninitial qwords was with the rise starting prior to the qword, on the discourse marker, as in Figure 2a. The pattern in 2b, a low level stretch of contour before a rise that is initiated at the left edge of the qword, was produced by one speaker only.
2.3 Interim hypotheses: Tonal association
The previous section has shown that qwords are consistently marked intonationally in interrogatives, but not in embedded contexts. Moreover, there is only a minimal effect of phrasal position on the choice of tune that marks the interrogative qword: Irrespective of the position in which it occurs (initial, peninitial, and final position), the qword is invariably marked by (part of) a rise towards a peak or plateau and a subsequent falling movement, with the peak being reached on the qword.
Based on this inventory we might hypothesize that the localized rise-fall consists of a sequence of turning points described in terms of a sequence of a low, a high, and another low target, or LHL. Taking this as our point of departure, we could explore these turning points as potential phonological targets and investigate how they are phonetically aligned with respect to the segmental level. The resulting phonological representation should reflect what remains an invariable part of the qword contour, irrespective of changes in phrasal position (which was discussed in this section), and irrespective of segmental structure (which will be addressed in the main experiment in the next section).
The working hypothesis we propose, however, does not take the initial L turning point into account as an integral part of the qword contour. Two pieces of evidence support this position. Firstly, in peninitial position, most speakers’ rising movement starts prior to the start of the qword (as exemplified in Figure 2a). This suggests that the initial L in LHL might be an intonational event tied to the left phrasal boundary and not a target that needs to be realized on the qword itself. Secondly, further support for considering the initial low target to be best represented as separate from the qword tune proper, e.g., as a left intonational phrase boundary (%L), comes from the observation that in final position (Figure 1c) the rising movement may be largely realized prior to the qword.
A better working hypothesis would thus be that the sequence of a high and a low tonal target, HL, form the essence of the Tashlhiyt qword tune. Of these two targets, the H target seems to be overall more stably linked to the qword than the subsequent L turning point, as observation of qwords in semi-spontaneous data revealed that the location of a low turning point following the peak showed considerable variability there. Peaks or regions of high pitch, on the other hand, tend to occur on qwords much the same in semi-spontaneous data as in the present elicited data.
3 Alignment and scaling in initial position
As mentioned in section 1.1, the notion of tonal alignment with specific segmental landmarks has guided much recent work on the description of phonological association patterns. Crucially, anchor points are more often than not defined with reference to stressed or accented syllables (or specific segments within or surrounding these), either in absolute or relative terms. As Tashlhiyt lacks specifications for metrical strength that would allow us to attribute a central role to a stressed syllable, including using it as a point of reference, the question arises whether alignment of postlexical tones is stable, and if so with respect to what segmental or prosodic anchors. Of particular relevance to this discussion is the general question as to where syllable boundaries are. Syllabification in Tashlhiyt has been the subject of much work (and debate) in the past few decades (Ridouane, 2008; Dell and Elmedlaoui, 2002; Coleman, 2001, 1999; Dell and Elmedlaoui, 1985), particularly with respect to syllables without full vowels. In the present data, however, target words contain only full (phonological) vowels and the syllabification used is uncontroversial.
In addition to the alignment of F0 targets, scaling of these targets is relevant to a characterization of intonation contours. While pitch alignment and pitch scaling are theoretically independent (each can be separately manipulated in production), they often interact in predictable ways at least with reference to perception (cf. Ladd and Morton, 1997; Gussenhoven, Repp, Rietveld, Rump and Terken, 1997 for early attempts at isolating the effects of F0 scaling on perception). For example, a late peak might form an alternative realization of a peak that is higher and realized earlier, as both types of peak have a similar perceptual effect (‘peak delay’ in Gussenhoven, 2004). Similar effects are discussed in detail by Barnes, Brugos, Shattuck-Hufnagel and Veilleux (2012a), Barnes, Veilleux, Brugos and Shattuck-Hufnagel (2012b), who aim to explain the perceptual equivalence of different accent shapes by appealing to the interaction of scaling and alignment of the relevant region within the contour.
In the experiment presented here we investigate the alignment and scaling of the H that we argued forms an essential part of the Tashlhiyt qword tune, and consider its scaling with respect to the phrase-initial %L. The main questions of interest in this experiment are:
- Q1 Which, if any, segmental landmarks play a role in the location of this tonal target relative to the segmental string?
- Q2 Which characteristics (turning point alignment/scaling properties) are defining prosodic characteristics of qwords?
- Q3 Is there an interaction between alignment and scaling of tonal targets?
It is important to note that recent work carried out on Tashlhiyt, including the present study, are based on field recordings. Thus, the present experiment, although controlled in terms of content, did not take place in a laboratory setting. Recordings were made in a university room in March 2015, in Agadir, Morocco. As Tashlhiyt is still strictly a spoken language (national reforms in 2011 have slowly started to change the situation) we were reliant on a specific target group, namely students in the Amazigh (Berber) department, who are competent readers of the language. Our participants had differing local origins within the Tashlhiyt-speaking area of southern Morocco (see section 3.1.3 for details), but forms a homogeneous group in terms of age and socio-economic background.
3.1.1 Speech materials
The qwords used in this experiment are given in Table 1, with carrier sentences given below each target qword. The five simple qwords were chosen based on their sonority. Target questions had either qword-Verb-Object or qword-Verb-Adverb structure and the number of syllables following the qword was always five. Simple qwords varied in number of syllables from one to three, and included CV and CVC syllables. Complex qwords consisted of the interrogative element man ‘which’ followed by either a disyllabic or a trisyllabic noun. Syllabification is given for all polysyllabic simple qwords. It is unclear if resyllabification takes place across the elements of a complex qword constituent, but since it is not relevant to our discussion we leave this issue open.
ma iʃːta ʁ umalu?
‘What does he eat in the shade?’
man anu nzʕra ʁ umalu?
‘Which well do we see in the shade?’
mad nzʕra ʁ umalu?
‘What do we see in the shade?’
man ananas nzʕra ʁ umalu?
‘Which pineapple do we see in the shade?’
mani ʁ izʕra ahuli?
‘Where does he see the sheep?’
man ahuli nzʕra ʁ umalu?
‘Which sheep do we see in the shade?’
manwi nzʕra ʁ umalu?
‘Who do we see in the shade?’
man tizi nzʕra aHuli?
‘What time do we see the sheep?’
manaɡu /ma.na.gu/, /ma.nagw/
manaɡu nzʕra aHuli?
‘When do we see the sheep?’
man tili nzʕra ʁ umalu? ‘Which ewe
do we see in the shade?’
man butili nzʕra ʁ umalu?
‘Which shepherd do we see in the shade?’
Participants were first given an explanation in Tashlhiyt as to what the task entailed, and then seated in front of a laptop screen where they reread the instructions (also in Tashlhiyt). They were told to act out the role of a primary school teacher doing a picture-question exercise with their students. In the experiment, they were presented on each subsequent slide with a description of a picture scene, with the relevant picture shown immediately below, and the target question underneath the picture. The instructions were to read the picture description out loud and then produce the question underneath as if they were asking it to their students.
While we have independent reasons for assuming that the qword acts as the default focus of the question (see section 1.3), it was necessary to clarify that speakers consistently treated the qword as constituting the question’s single focus. Thus, lest speakers interpreted other elements in the question as (additional) foci, we created stimuli in which the lexical items in the rest of the question were both textually and visually given. In addition, the context of the task (in which subsequent questions had different phrase-initial qwords) resulted in implicit focus on the qword.
An example gloss is given below for the target item mani, for both the two context sentences and the target question. The first line shows the Latin orthography for Tashlhiyt used for presentation in the experiment, and the second line a phonemic transcription.
|‘The boy there sees a sheep on the road.’||‘You ask your students:’|
|‘Where does he see the sheep?’|
Thus, the following factors ensured that stimuli were treated as constituting a single ‘focus domain’ (cf. Ladd, 2008) with narrow focus on the qword: i) the simple sentence structure, ii) the short length of the phrase, and iii) the textual and visual givenness of all elements other than the qword.
Stimuli were presented in blocks with each target qword occurring once. As the recording session involved a number of other tasks, speakers completed a set of two blocks with the stimuli in each block having a different semi-randomized order, followed by another task, and finally the same set of two blocks again. Each set was preceded by five practice items. We did not include fillers to minimize the total duration of the experiments. Of the 7 speakers, 2 completed only one block, so that their number of repetitions per stimulus is two instead of four, as shown in Table 2. This led to a total number of 24 repetitions per target word, with a total of 120 items for the five simple qwords, and 144 for the complex qwords. After the exclusion of disfluent items (misreading, hesitation) and in some cases, interruption of the recording, 107 simple (89%) and 120 complex qwords (83%) remained.7 Production data were recorded using a PreSonus Audiobox solid-state recorder at a sampling rate of 44.1 kHz, and an AKG C420 III head-mounted microphone.
The original recordings for the task involved 9 speakers. Data from two speakers were excluded, which in one case was due to reading difficulties and in the other case due to the speaker being unable to finish recording. Table 2 gives more detailed information about the remaining speakers (6 female, indicated by ‘f’, and 1 male, indicated by ‘m’). Speakers’ parental origins varied, as did speakers’ own place of birth. The 3 speakers whose place of birth is marked by an asterisk moved to Agadir sometime during their youth, the rest had come to Agadir for the purpose of their university education. Relevant places of birth are shown on the map in Figure 3. All speakers were native and dominant speakers of Tashlhiyt. They were students at the Institut des Études Amazighes at the Ibn Zohr University of Agadir, and spoke Tashlhiyt regularly with friends and family. All speakers were multilingual with high or native fluency in Moroccan Arabic, and varying fluency in French and English as foreign languages learnt in school.
We based our scaling and alignment measurements on the contour provided by the standard pitch tracking algorithm provided in Praat (Boersma and Weenink, 2015), with manual correction of spurious pitch points and tracking errors such as octave jumps.8
We used two measurements for quantifying the properties of the H target: A single absolute F0 maximum, and a measure of a ‘high region.’ It has repeatedly been shown that small and gradual F0 displacement does not lead speakers to perceive pitch differences (’t Hart, 1976; d’Alessandro and Mertens, 1995). Given the possibility of a region in the contour within which slightly varying pitch values are perceptually equivalent, we should entertain the possibility that it is this larger region which is involved in systematic tune-text association, especially in view of the fact that the singling out of an absolute maximum in cases of microprosodic perturbation may be problematic.
In order to quantify this high region, we used a heuristic measure inspired by earlier work on pitch contour plateaux (plateaux being defined as extended regions characterized by minimal pitch displacement). In specific cases, plateaux have been taken to be a phonological accent category in contrast to a clear single F0 peak (Niebuhr and Hoekstra, 2015; Knight, 2008; Knight and Nolan, 2006). For example, to quantify high plateau accents in British English, Knight (2008) and Knight and Nolan (2006) use the start and end of a region delimited by 4% of F0 values in Hertz below the absolute maximum. A slightly different measure is adopted by Niebuhr and Hoekstra (2015), who use a 1 semitone difference criterion around the peak in their discussion of North Frisian plateaux.9 As Knight and Nolan (2006) suggest that there is little difference between plateaux delimited by 4% and 6% Hertz values around the maximum, we chose 6% difference values in F0 below the peak as plateau onset and offset in the present study, noting that in most of our speakers’ ranges, 6% in Hertz values is very similar to the 1 semitone criterion used by Niebuhr and Hoekstra (2015). Figure 4 schematically depicts the adopted plateau measurements (start and end point). In sum, then, we have one peak measure (the absolute F0 maximum) and two plateau measures (start and end points of the high region delimited by F0 values 6% below the absolute maximum on the qword).
3.2 Results and Discussion
3.2.1 Alignment of peaks in simple qwords
This section presents results of peak alignment in the five simple qwords ma, mad, mani, manwi, and manaɡu. Figure 5 shows the relative alignment of the absolute F0 maximum per word for all speakers, with word boundaries indicated by solid lines.10 Each dot represents an individual utterance.
Figure 5 shows that while maxima are not commonly realized in the very first part of the word, the temporal domain across which they are realized is surprisingly large, and spans the largest part of the word (excepting manaɡu, which exhibits an absence of peaks in the second half of the word).
In order to investigate peak alignment more precisely, we looked at the F0 maxima relative to individual segments. Figure 6 illustrates this for all five qwords for each speaker. The differing number of tokens per speakers is a function of their initial number of repetitions, which was either 2 or 4 (see Table 2), and subsequent exclusions.
As expected, the distribution of peaks over a large part of the word translates into maxima that variably occur on different segments. In absolute terms, monosyllabic qwords ma and mad ‘what’ exhibit the greatest degree of uniformity. For theses words, all speakers tend to produce maxima that occur on the vowel /a/ across multiple repetitions. In the polysyllabic words, maxima are spread across different segments as well as different syllables. Peaks are observed on any one segment from the second to the last segment in mani and manwi, and on the first to the fourth in manaɡu.11 There is thus no stable segmental anchor for the peak.
Peak distribution overall can be characterized as exhibiting a gradient spread rather than a categorical distribution, with most speakers producing a multitude of alignment patterns. The attested variability can be classified along a number of parameters:
- peaks that align with different syllables (e.g., 3f’s peaks in mani, and 4f’s peaks in manaɡu).
- peaks that align with different segments within the same syllable (e.g., 1f’s and 9f’s peaks in manwi, and mani, respectively).
- peaks that align with different syllables and with different segments within one of these syllables (e.g., 3f’s peaks in manaɡu, 9f’s peaks in manwi).
With respect to the specific alignment patterns characterizing individual words, a great resemblance of peak locations is observed in manaɡu and mani, both of which exhibit a general preference for maxima on the second syllable as compared to the first syllable. In both words, the earliest peaks occur around the segment boundary between /m/ and /a/, and the latest peaks halfway through the second vowel. It appears, therefore, that manaɡu behaves like a disyllabic word in which the final syllable ɡu does not count. A plausible explanation is that the word, despite being produced with a clear final vowel, is treated as if it does not have a final phonological vowel that participates in tune-text association, i.e. as manaɡʷ (see also section 2.1).
Compared to mani and manaɡu, manwi exhibits slightly more categorical peak alignment, reflected also by the greatest within-speaker variability (e.g., speakers 4f and 9f producing discretely different peak alignment, i.e. on different syllables, across repetitions). Two arguments however can be brought forward in favour of considering alignment in manwi on a par with that in the other qwords. Firstly, it should be noted that single speakers may produce any combination of ‘categorical’ peaks and other more gradiently different peaks (excepting 3f). This suggests that manwi, too, is characterized by a gradient distribution of peaks, which is simply obscured by the large microprosodic perturbation of the labiovelar approximant. Secondly, a comparison between late peaks in manwi and the right edge peaks on complex qwords (see section 3.3) shows that manwi’s peaks are not aligned as late as those on complex qwords, which suggests that the former are qualitatively more like those on the other simple qwords than like the possible edge-marking strategy seen in complex qwords.
In sum, then, the alignment results presented so far indicate that there is little systematicity both within and across speakers in alignment of F0 maxima in Tashlhiyt qwords. While peaks may occur on most segments in the word, and variably align within these segments, a consistent feature of all peaks is that they occur within the boundaries of the qword. This degree of variation in alignment is unlike reports on most other languages, which show relatively stable alignment of (specific) tonal targets in relation to the segmental string.
To name one example of consistent alignment results, Atterer and Ladd (2004) found that in rising L*+H accents in two varieties of German, the low turning point characterizing this accent was significantly later aligned in one variety compared to the other. Crucially, cast in absolute terms, this alignment difference roughly spanned half a segment. Speakers of each language variety behaved so uniformly that the resulting difference, realized on a very small temporal scale, functioned as a significant predictor of the variety spoken. In relation to these findings, the patterns discussed here are of an altogether different nature. While it might be objected that the present data come from speakers with perhaps a less uniform background than Atterer and Ladd’s participants (although Atterer and Ladd’s participants came from dialect regions that were rather broadly defined as ‘Northern’ and ‘Southern’), two points may be raised in defence of considering the present study’s speakers together as representing a single variety. Firstly, alignment patterns from the two speakers with the same birthplace (2m and 6f, both of whose parents are also from that same town), are not more similar than those of any two other speakers. In a similar vein, the three speakers who grew up in Agadir (1f, 4f, 6f) do not behave more uniformly than any other grouping of speakers. Finally, even if variability in alignment between the Tashlhiyt speakers could be explained by attributing differences to specific subdialects, we would still not expect to observe the degree of intraspeaker variability exhibited in the present data if there were a predetermined segmental anchor. It can be concluded that the alignment of the F0 maximum that consistently occurs on qwords is genuinely variable, both within and across speakers.
3.2.2 Alignment of plateaux in simple qwords
Turning our focus away now from single F0 turning points to plateaux, we find that the variability seen for single maxima is seen also in the plateau measures. Figure 7 below depicts, for each single target qword produced, the plateau onset and offset (black dots linked by line) and the location of the absolute maximum within it (orange dot).
A pattern observed across all qwords is one whereby the plateau starts in the second segment (the vowel /a/), and extends across a number of segments within the word, but usually does not cross the right qword boundary. Within this general pattern, plateaux nevertheless exhibit considerable within-word variability in alignment of both onset and offset.
Additionally, a number of idiosyncratic patterns for individual words can be identified. Mad exhibits what are arguably the most consistent alignment patterns across repetitions, with the end of the plateau reached well before the end of the word-final segment /d/. Ma differs from the other qwords in that it may be marked by a high plateau that extends far to the right of the qword boundary (in repetitions by speakers 1f, 4f and 8f).12 Other speakers produce plateaux on ma that are more similar to the narrower plateaux within the word observed for the other qwords. Manwi is characterizsed by another kind of variability in the realization of its high region, with both very wide plateaux and very narrow plateaux.
Considering all qword plateaux together, every single plateau parameter seems to be characterized by gradient variation (with the exception of ma with two categorically different patterns): The plateau onset, the plateau offset, the duration of the plateau, and the peak location within the plateau region. The latter type of variation is crucial for a discussion that takes into account peak shape as an important perceptual cue in listeners’ categorization of contours (cf. Barnes et al., 2012a; Barnes et al., 2012b). In the present data, the varying peak location within the plateau indicates that different peak shapes are being produced, even for plateaux characterized by similar spans. Speaker 6f, for example, produces plateaux on manaɡu that are similarly aligned in all her productions. In one case, however, the peak is aligned differently from the rest, namely near the end of the plateau, which indicates that there is a relatively shallow rise to a peak and a steep fall following it. Similar and more extreme variability is seen across repetitions of the same word by most speakers. The different realizations of the high region and, by extension, peak shape, once again suggest that considerable variability is an intrinsic and non-spurious characteristic of intonational patterns in Tashlhiyt.
In sum, the search for systematic alignment has proven unfruitful for qwords in Tashlhiyt. Neither different levels of phonological structure below the word (segment or syllable) nor different measures characterizing the contour (the single maximum and a more broadly defined plateau) have revealed a high degree of systematicity in tune-text alignment. In many languages, alignment of tonal targets is clearly defined with respect to phonological units below the lexical level, like the syllable or mora. This is not what we found for the H tonal target occurring on Tashlhiyt qwords. However, one aspect of the qword tune that should not be overlooked is that the H target systematically occurs on the qword. In this sense, it appears to be the word level that forms a domain within which the H tonal target is consistently present. Its phonetic alignment, as reflected by the exact location of a peak or high region within this qword domain, on the other hand, is variable in seemingly all possible ways.
3.3 Complex qwords
Much in line with the patterns attested for simple qwords, complex qwords are also marked by a rise-fall, with a peak typically occurring on the interrogative element man. Again, the alignment patterns of the maximum for these words (in Figure 8) exhibit a considerable spread, in this case across /a/ and /n/. The vast majority of maxima are realized towards the end of this vowel /a/, but early alignment within the vowel and later peaks on the following segment /n/ are also attested. What all speakers’ production additionally have in common is the presence of a relatively steep fall following the H peak, with most of the falling movement achieved within the same element man. Like our earliest observations in section 2.2, a post-peak fall seems to be an integral aspect of the qword tune. Thus, for complex qwords, too, we can posit the presence of an L target, reached somewhere around the border between man and the following noun.
In addition to the rise-fall early in the complex qword, there may also be a rise at the end of the qword constituent as a whole (followed by a fall on the next word). As an example, contours for the constituent man butili ‘which shepherd’ are shown in Figure 9. This graph shows that 3 out of 7 speakers (1f, 6f, 9f) are consistent across repetitions in producing a rise at the right edge of the complex qword (indicated with a red line), while the remaining four do not usually produce such a rise. The number of final rises by speaker and question constituent are shown in Table 3. Based on this information, speakers form two groups: One group produces final rises (1f in 100% of her questions, 6f and 9f in most cases, and 4f and 8f variably), the other group never does (2m and 3f). The factors determining the occurrence of a final rise remain unclear, but informal checks with native speakers suggest that the presence of a rise is simply a stylistic variant which possibly reflects phrasing.
3.4 Pitch scaling and cue interaction
Given the variable alignment of the H target on simple and complex qwords discussed in the previous sections, the question arises as to what the effect of variability in the alignment domain is on scaling, and what this tells us about the properties that are the most crucial aspects of the qword tune. As could already be seen in Figure 9, it is clear that scaling of the qword peak varies considerably across subjects, with at the lowest end the male speaker, 2m, who produces maxima around 200 Hertz, and at the highest end one of the female speakers, 3f, who produces maxima in falsetto up to 850 Hertz. Consultation with two native speakers who did not participate in the experiments confirmed that the pitch excursions produced by speaker 3f sounded overly dramatic, but we decided not to exclude this particular speaker for two reasons. Firstly it proved difficult to judge for other speakers to what extent they made use of exaggerated versus ‘normal’ pitch. Secondly, very salient pitch excursions and falsetto are a recurrent feature in Tashlhiyt, both in semi-spontaneous data and in daily speech, as previously observed by Roettger and Grice (2015). We are, however, currently not in a position to interpret the function of extremely high pitch as compared to ‘normal’ high pitch and thus refrain from excluding speakers based on their use of pitch range.
Returning to pitch scaling of the individual tones that make up the qword tune, Figure 10 shows F0 height for the H target with increasing peak distance. Speakers appear to have individual preferences for this target: Four speakers (2m, 3f, 4f and 9f) produce lower maxima with increasing distance between the left (word/IP) boundary and the maximum. A further 2 speakers produce more or less the same target height irrespective of its alignment (1f and 6f), and the remaining speaker (8f) produces marginally higher targets with later alignment.
The pattern of decreased H target height with later alignment is most interesting in the case of the highest targets (speaker 3f), given that these also represent the earliest aligned peaks. In order to produce a rise between the initial %L and a H target shortly after, this speaker must produce very steep pitch rises, or undershoot the %L target (which is indeed what we find, see below). While the other speakers exhibit somewhat varying interactions between alignment and scaling of the H target, the data overall rule out an interpretation in terms of undershoot of the H target. If undershoot were the case, we would expect later peaks to be consistently higher. Rather, an analysis that covers the behaviour of all speakers could invoke the requirement that a specific H target level must be reached, where even in questions with early H alignment this required F0 target is reached. Additionally, the fact that some speakers produce higher peaks with earlier alignment could be taken to reflect the kind of peak delay effect mentioned in section 3.
In the main experiment, qwords were phrase-initial, so that the left qword boundary coincided with the left IP boundary. In these cases, the %L is realized on the qword, as opposed to in the pilot experiment, where the low turning point was usually realized before the qword when it was non-initial (section 2.3). The temporal vicinity of these opposite targets requires a steep rise, and given that speakers do not seem to compromise on the height of the H target, this leaves us to explore the scaling of %L.
Figure 11 shows that when the H target on the qword is reached later, the preceding %L target is consistently realized lower, with the strongest effects for the two female speakers with the highest ranges. This indicates that alignment of the H target and scaling of the %L are in a trading relation: In cases that require a steep rise (when H is early), the rise may be truncated by realizing the initial %L somewhat higher. The relative stability of H scaling and the variable truncation of the preceding %L indicate that H is the more important of the two tones in the qword tune.13
It should be noted that although we did not quantify properties of the low turning point following the H target on the qword (the region it occurred in was characterized by microprosody) our observations suggest that the fall following the peak usually entailed a drop to a level similar to that of the initial %L, which is maintained until the end of the phrase, as illustrated with an example in Figure 12. As can be seen, the pitch excursion on the qword is by far the most prominent intonational event in the phrase. This is typical of qword questions in this experiment. Although semi-spontaneous qword questions show similar (rising-)falling contours on the qword, the scaling of their F0 targets is less extreme. It is possible that this difference reflects that the qword was in narrow focus in the present experiment.
4 General discussion
Our point of departure for the investigation of the qword tune in Tashlhiyt was the expectation that there might be no systematic alignment of tonal targets on the qword. We first identified that qwords are realized with a HL narrow focus tune preceded by L at the left phrasal boundary. We then investigated the alignment of the maximum (corresponding to the H tone) on the qword in more detail. The alignment of this maximum was variable within the word, both at the syllable and segmental level, with considerable within- and across-speaker variability. We also calculated a plateau for which we measured onset and offset values in relation to syllables and segments, which showed a similar degree of variation.
While we did not find the degree of systematic alignment that characterizes pitch movements typical of pitch accents in a number of European languages, alignment of the H tonal target was also not completely unsystematic, as it consistently occurred on the qword. However, there is no systematic alignment of this target with respect to any other constituent, such as a syllable or mora. We interpret this as a lack of tonal association of H to any structural element below the word level. Instead, we suggest that the present alignment patterns are best captured in terms of association to a higher-level unit, in this case the focused qword. This fits in with the assumed lack of word stress in Tashlhiyt, which leads us to predict that no syllable is predetermined as an anchor in the process of tonal association. Figure 13 shows our phonological representation of the qword tunes based on the present findings.
While the variable behaviour of H can be thus accounted for, results with respect to the right-edge marking of qwords by a fall, the L target following H, are more difficult to interpret. As indirectly shown by the plateau results, this L in some cases could be aligned much later (a number of syllables to the right). This matches qword intonation patterns found in semi-spontaneous data, confirming the common nature of late-aligned right-edge Ls. At the same time, given that the predominant pattern in the present data saw the L aligned at or near the right edge of the qword, we assume that in the case of narrow focus, this tone also associates to the qword.
It is possible to think of the H and L as being characterized by a strong-weak relation (cf. Chen and Xu, 2006), similar to the interpretation of bitonal pitch accents as involving a strength relationship with the starred tone as the stronger one. In this way, both tones could be associated to the qword, but H more strongly so: HsLw. As the variability characterizing the alignment of the low turning point is at least in part categorical (early or late in the case of ma), an interesting question for further research would be whether these patterns reflect a phonological, potentially meaningful difference, or whether the occurrence of late alignment serves some other prosodic or phonological requirement.
Finally, an initial low turning point %L was invoked to explain the initial rise to the H target on the qword. The phrase-initial minimum which we took to represent it was scaled somewhat variably, which, together with the fact that this minimum typically stayed at the left IP edge when another word intervened between the IP edge and the left edge of the qword (section 2), we took as support for an analysis that treats it as a boundary tone associating to the IP, i.e. as %L.
4.2 Modelling tonal association: Is ‘high’ good enough?
Given the discussion above, the answer to the question in this section’s title would be: Yes, ‘high’ is good enough. There seem to be few if any constraints on the exact realization (in terms of temporal alignment and scaling) of the H tone occurring on the qword. The only reason why the alignment of the H is not completely free (it tends not to be realized at the very edges of the qword) have to do with realizational constraints on the execution of pitch movements in a limited timespan. The H is preceded by an %L and followed by another L, which is usually located near the right edge of the qword. As there are cases where the following L was in fact not realized near the right edge, the requirement for an L target to mark the right edge seems weaker than the requirement for an H target to be realized on the qword.
The variability in alignment of the H tone in this study makes for an interesting comparison with previous work on Tashlhiyt intonation. Grice et al. (2015) showed that in phrase-final position, the variability in alignment of a local high turning point in the contour (interpreted as a H tone) could be captured probabilistically by appealing to a number of constraints that favoured the H tone to occur on heavier and more sonorous syllables. The variable behaviour of the H tone in final position applied in two different contexts: One in which the local maximum marked the right edge of a polar (yes-no) question, and one in which it marked a contrastively focused word in phrase-final position. This raises a number of questions with respect to the present data. Does the apparent tonal attraction effect of ‘prosodically privileged’ syllables in final position (as in Grice et al., 2015) also apply in initial position, and more specifically, to qwords? The stimuli in Grice et al. (2015) were designed to vary in terms of syllable structure and sonority, factors that only vary in a limited way in Tashlhiyt qwords. Nevertheless, in the present data syllable weight distinguishes two minimal word pairs, namely ma/mad and mani/manwi. There is little reason to believe that what little differences in F0 patterns characterizing these pairs should be attributed to syllable weight, primarily because the peak distribution in the present and the Grice et al. (2015) study is fundamentally different. Grice et al. (2015) find a discrete peak distribution, i.e. the peak was either on the penultimate syllable or on the final syllable, with an attraction effect for peaks to occur on the heavier of the two. This categorical tonal placement was moreover reflected in the auditory impression by the authors that the peak was located either on one syllable or the other, and systematically aligned in the second part of the syllable irrespective of which syllable it was on. The present study found a far more gradient distribution, in which peaks also occurred on intervocalic consonants in onset or coda position and did not align systematically with respect to any landmark. Note that Grice et al. (2015)’s target words had either obstruents or liquids in onset position, and no peaks occurred on any of these onsets. An explanation that invokes the unlikelihood of peaks occurring on (voiced) obstruents in onset position may thus account only in part for the difference between the present study and that of Grice et al. (2015). It is still unclear whether the differences in tonal alignment in the two studies are due to the effect of phrasal position of target words (initial versus final), to the function of the tones (yes-no question modality marking and contrastive final focus versus, in the present study, initial narrow focus), and/or partly to do with the segmental make-up of the stimuli.
A final point of interest concerns the aforementioned characteristics of the population under investigation (section 1.1). Prior studies as well as the present one suggest that this particular Tashlhiyt population exhibits a high degree of intonational variability no matter how carefully participants are screened for age and socio-economic status (which is controlled for in these studies) and regional background (which is not controlled for, but any differences between speakers did not map consistently onto details of their place of birth). It remains to be tested whether Tashlhiyt speakers in geographically more homogeneous speaker communities will produce intonation patterns that are less variable, keeping in mind that controlled reading experiments are (currently) limited to university students.
If we find that a high pitch region aligns ‘somewhere on the word’, we should ask whether this counts as systematic enough for descriptive purposes. In the case of Tashlhiyt, where the interaction between underlying metrical structure and postlexical intonation is rather unlike that of most other languages, an explanation of this kind might well suffice. Similar accounts to the present one have been proposed for the few other (non-lexical-tonal) languages that are postulated not to have stress. This literature tends to describe the respective prosodic systems in terms of predetermined tonal strings associating sequentially within small phrasal domains like Accentual Phrases or Phonological Phrases (Jun, 2005 for Korean, Post, 2000; Jun and Fougeron, 2002 for French; Karlsson, 2014 for Mongolian), or even the IP (Maskikit-Essed and Gussenhoven, 2016; Maskikit and Gussenhoven, 2013 for Ambonese Malay). The present data show that Tashlhiyt employs pitch movement quite freely within prespecified domains, potentially even more freely than in some of the aforementioned stressless languages, as descriptions of Mongolian, French, and Korean do attribute a role to the mora or the syllable in predicting the location of intonational tones. The present qword data exhibit what we might call ‘structure-independent alignment’, within a given domain, which is similar to alignment patterns observed by Maskikit-Essed and Gussenhoven (2016) for Ambonese Malay.
4.3 Future directions
A first aspect that has received little attention in the present paper, but is undoubtedly pivotal in our understanding of the marking of (qword) focus domains in Tashlhiyt has to do with right-edge marking of focus. The presence and alignment of the post-H L target near the right edge of qwords, while exhibiting variability in the present data, shows an even greater variability in semi-spontaneous speech (that is, it can be aligned further to the right). Undoubtedly, contextual and pragmatic factors play a role here. In the present data, in which the qword is in narrow focus, the L might be placed at the right edge of the qword as a marker of this narrow focus domain. It is likely that other interrogative contexts with qwords require prominence on constituents other than the qword, and one possible way to do this would be to mark the relevant constituent with the fall that would need to be realized somewhere following the compulsory H on the qword.
Another aspect that merits further investigation is the role that perception plays in a categorization of the contours under investigation. Based on informal perceptual checks with our participants, no different interpretations seemed to be linked to differing peak locations. A systematic investigation would prove insightful, however, especially with respect to work by Barnes et al. (2012a), Barnes et al. (2012b). Based on a discussion of European languages, and notably English, these authors argue for a systematic inclusion of contour shape into the categorization of specific accent categories, so that more justice is done to listeners’ perceptual discrimination of accents that do not differ in their usual phonological description, but are characterized by slightly differing contours. There are rich possibilities for investigating the sensitivity of Tashlhiyt speakers, and speakers of any language without stress, to variations in peak shape, scaling, and alignment. We would predict that the production of gradient variation in alignment of tones is mirrored by limited perception of linguistic contrast based on such gradient patterns.