It has long been clear that syntax determines certain aspects of prosody, and that prosody should therefore be part of the grammar influencing how a parser arrives at the syntactic analysis of an utterance (Chomsky, 1955, II-2fn). However, it has remained unclear how to bring prosody into computational models of syntactic parsing. The few models that have incorporated any substantial prosodic information do not do so on the basis of a generative model of how syntax structurally conditions prosody. Instead, they tend to treat prosodic information as another class of bottom-up cues and mainly focus on English, e.g., Shriberg et al. (2000); Kahn et al. (2005); Huang and Harper (2010); Pate and Goldwater (2013). Here, we report on generalizations about the Samoan syntax-prosody interface uncovered by original fieldwork. We use these generalizations to motivate grammatical rules stating how syntactic structure conditions the insertion of tonal elements, and we show how the syntax/prosody interface in Samoan could be computed in a comprehension model using these rules.1
The challenge for defining a prosodically-informed comprehension model is that there is a multitude of interacting factors that condition the appearance and realization of prosodic events in the speech signal, e.g., see Yu (2014, Appendix B, p. 777). Tonal events are only a subset of prosodic events, but the factors that have been proposed to condition tonal events are already numerous and diverse. In addition to syntactic structure, these include lexical representation, e.g., lexical accent in Swedish, phonological grammar (Nespor & Vogel, 1986; E. Selkirk, 2003), e.g., the rising pitch accent associated with predictable primary stress in Egyptian Arabic (Hellmuth, 2009, 2006), inflectional morphology, e.g., tonal marking of genitive case in Igbo ‘associative’ constructions (Hyman, 2011), and pragmatics, e.g., the English contrastive topic rise-fall-rise contour (Jackendoff, 1972, Büring, 2003, Constant, 2014, i.a.). To complicate matters further, a given tonal event might reliably appear in a particular kind of syntactic environment—sometimes. Whether it might appear could depend on its sensitivity to phonological factors such as speech rate (Hayes & Lahiri, 1991; Fougeron & Jun, 1998) which might make the tonal event difficult to detect or even absent; its presence and phonetic realization might also be variable between speakers due to individual differences that aren’t yet well-understood, e.g., Clifton Jr. et al. (2002); Ferreira and Karimi (2015); Speer and Foltz (2015).
Thus, the speech signal (and the prosodic information contained within it) that both the analyst and listener are confronted with is the result of the interaction of this multitude of conditioning factors. From this output, how can we factor out the contribution of syntax to conditioning prosodic events? And if we are able to do that factorization and define a production model from the syntactic grammar to a prosodified utterance, how can we then define a comprehension model based on that production model? This paper answers these two questions. To isolate the contribution of syntax or any other factor in intonational fieldwork, we systematically vary one factor while holding others constant, just like in Bruce’s (1977) landmark study on word accent in Stockholm Swedish. Following this strategy, we show that in Samoan, syntax appears to be the primary conditioning factor on the placement of high edge tones. This makes defining the foundations of a production model for Samoan straightforward (as opposed to say, English, where it is much less apparent how to decouple the contribution of syntax to conditioning prosodic events). Based on the fieldwork, we stipulate spellout rules that insert high edge tones and adjoin them in the syntactic tree in exactly and only the structural configurations where high edge tones reliably occur. But defining a corresponding comprehension model is not as simple as running the production model in reverse. Intuitively, the problem is that in the comprehension direction, the phonological grammar does not deliver well-formed trees to the parser—only a string. How then, do we get from a string to a tree? Nevertheless, we show here that we can still compute the syntax-prosody interface in a comprehension model even if the prosodic grammar does not derive hierarchical structures separate from the syntactic grammar (a property it shares with prosodic grammars in ‘direct reference’ theories of the interface, e.g., Kaisse, 1985; Odden, 1987; Pak, 2008; see Elordieta, 2008 for a review).
The structure of the remainder of the paper is as follows: After reporting methods of data collection and analysis in Section 1.1, we first show that while the placement of high edge tones in Samoan may at first seem unsystematic, at least some of its positions are very reliably predicted by syntactic structure. While absolutive DPs have been assumed to be unmarked in Samoan, Yu (2011, 2017) noticed that they are preceded by a high edge tone. This paper confirms that this correlation is very reliable and provides evidence that it does not vary with prosodic length, speech rate, register, or focus (Section 2). Considering the syntax more carefully in Section 3, we show how this case marking can be added to the proposals of Collins (2016, 2015, 2014). Collins argues, following Legate (2008), that the Samoan absolutive is actually either nominative or accusative, and that we can define the case marking of these positions as part of the morphophonological spellout. Then we extend the account to some additional constructions (Section 4) and show how the syntax and interface proposals extend easily to these (Section 5). We observe some further complications in the data that we do not yet understand (Section 6), and then briefly consider how, in spite of variability that is not yet understood, a parsing model can use the relatively invariable case marking rules (Section 7). We conclude briefly with the broader lessons of this case study (Section 8).
Prosodic data and analyses used for this paper are available as on-line supplementary material at the following link: http://www.krisyu.org/blog/supp-material-invariability-samoan-interface.html.
Data were collected in the Los Angeles area in one- to two-hour sessions from September 2007 to December 2014 with 1 main consultant, aged 19 when we started working with him. He was born and raised in Upolu, Samoa and moved to the Los Angeles area in 2003. Data were also elicited and recorded from 4 consultants in Apia, Samoa in November 2011, and an additional female consultant in her 50s in the Los Angeles area in January 2012. The additional consultant in Los Angeles had been in the United States for 27 years, but regularly spent an extended part of the year in Samoa. The consultants in Samoa included 3 men, aged 21 to 23, and 1 woman aged 46, from the capital city of Apia and other areas of Upolu. Data were also elicited and recorded in Auckland, New Zealand in July 2015 from 3 additional female speakers, 2 of which are analyzed here. One (f03), aged 48, grew up in Apia and had been in New Zealand since 2009; the other (f05) was aged 19, grew up in Savai’i and had been in New Zealand since age 10.2 All consultants spoke Samoan regularly or primarily in daily life and were literate in Samoan, but also spoke English as a second language with some fluency. English was used as the contact language. Elicitation items were presented individually on slides on a computer screen, and they were elicited in randomized order. The consultant was asked to read each sentence at least twice. Unless otherwise stated, sentences were elicited out-of-the-blue.
All recordings in Los Angeles and Samoa were made directly to a computer through a head-mounted microphone (Shure SM10A); the signal ran through a Shure X2u pre-amplifier and A-D device. Recordings in Auckland, New Zealand were made with a Shure SM10A microphone to a Marantz PMD661 MKII recorder. All recordings were made at a sampling rate of 22,050 Hz with 16-bit precision. Recording sessions in Los Angeles were made in either a sound-attenuated booth or a quiet room, while recordings in Samoa and Auckland were made in a quiet room.
All sound files were segmented and annotated using Praat (Boersma & Weenink, 2012). Utterances were segmented by word and syllable and transcribed intonationally by the first author. However, our main strategy for detection of high edge tones (H- tones) in fundamental frequency (F0) contours was to rely on phonetic comparisons of F0 contours within minimal sets (Yu, 2014); see, for example, Yu (2017) and Figure 3 in Clemens and Coon (2016) for additional examples of comparisons of this type. What this means is that we did not rely on intonational transcriptions of individual utterances to tally up where H- tones were present or absent in each utterance (except in Section 6, which is exploratory work comparing counts of multiple kinds of tonal events). Instead, we determined how some factor (e.g., speech rate) conditioned the presence of an H- by comparing F0 contours between utterances varying only for that factor (e.g., slow vs. fast speech rate), much like Bruce (1977). This analysis based on comparing F0 contours is advantageous because it is transparent and reproducible; it helps control for allophonic variation in the realization of H- tones which may make H- tones difficult to detect; it prevents the transcriber from imposing any subjective biases in transcription, and it releases the transcriber from making difficult judgment calls for transcriptional labels. But this phonetic approach is only possible when enough is known about the basic units of the intonational system and what conditions them so that the analyst can design structured elicitations investigating these basic units. And, initial discovery of these basic units is facilitated by the challenge of labeling them in transcription. That is to say, the phonetic approach emphasized here doesn’t replace intonational transcription, but complements it.
F0 extraction was performed using Praat’s autocorrelation algorithm, as implemented in VoiceSauce (Shue et al., 2011), software for automatic voice quality analysis, with the floor and ceiling values for candidate F0 values set to 40 Hz and 300 Hz, respectively, and default settings for other parameters.3 For the F0 contours plotted throughout the paper, F0 values were averaged over each of 10 time slices uniformly dividing each syllable for each utterance, e.g., the first F0 value was the average F0 over the first tenth of the syllable. Converting the time scale from absolute time in seconds to time in syllables allowed trends in the shape of F0 contours to be captured without variability conditioned on speech rate. All further data processing and analysis was performed in R (R Core Team, 2014). For the most part, this consisted of averaging F0 contours across sentences and/or across speakers. All plots were created using the ggplot2 package (Wickham, 2009). Gray ribbons flanking lines in any plot of F0 contours show ±1SE.
Samoan is a Polynesian language with an ergative/absolutive case-system. The sentences in (1) exemplify properties of this kind of case-system (see Deal, 2015 for an overview of ergativity): The subject of a transitive clause, e.g., le malini ‘the marine’ in (1a), is marked with a distinct case—the ‘ergative.’ The subject of an intransitive clause, e.g., le malini in (1b), and the object of a transitive clause, e.g., le mamanu ‘the design’ in (1a), both appear unmarked and receive ‘absolutive’ case (Chung, 1978, p. 54–56; Ochs, 1982, p. 649), though as we will discuss below, an alternative analysis is offered by Collins (2016, 2014), following Legate (2008). Samoan primarily has VSO word order in transitive clauses, as exemplified in (1a), which also shows that the transitive subject is marked by the ergative case marker e. The intransitive clause (1b) demonstrates that the prepositional element [i] is a marker of oblique case. This preposition marks stative agents (Chung, 1978, p. 29), and also indirect objects, locatives, temporal expressions, sources, and goals (Mosel & Hovdhaugen, 1992, p. 144).4
The following sections first review evidence for tonal marking of absolutive case in Samoan (Section 2.1) and then present new evidence that the appearance of a high edge tone preceding absolutive arguments is insensitive to prosodic length (Section 2.1 and Section 2.2), speech rate (Section 2.3), and speech register (Section 2.4).
Yu (2011, 2017); Yu and Özyıldız (2016) showed that absolutive case in Samoan is not unmarked and does in fact have a phonological correspondent in spellout. As shown in (2), revised from (1), a high tone—which we notate as ‘H-’ and gloss as ABS—appears at the right edge of the phonological material immediately preceding the absolutive argument: Before the object le mamanu ‘the design’ in the transitive clause (2a), and before the subject le malini ‘the marine’ in the intransitive clause (2b).
The notation ‘H-’ comes from conventions for the intonational transcription of tonal events developed in autosegmental-metrical theory (Pierrehumbert, 1980; Beckman & Pierrehumbert, 1986; Beckman & Elam, 1997; Ladd, 2008). The ‘H’ stands for a high F0 target and the ‘–’ is a diacritic we use merely to indicate that the high tone is an edge tone associated to a word edge, rather than a pitch accent associated to a stressed syllable. Other morphosyntactic structures in addition to absolutive arguments also reliably surface with an H-, as we will discuss in detail in Section 4. By using the ‘–’ diacritic, we do not mean to imply that an H- is a prosodic boundary tone, associated to some prosodic constituent in a prosodic hierarchy; we simply mean to say, descriptively, that the tone appears at edges.6 Evidence that H- tones are edge tones and not pitch accents is given in Section 4.
The evidence Yu (2011, 2017) used to argue that an H- always appears before an absolutive argument came from directly comparing F0 contours between minimally different syntactic structures elicited in fieldwork (see Figures 4 and 5 for examples of this kind of comparison). We emphasize that this evidence came from comparing F0 contours rather than comparing intonational transcriptions (the same is true for all the evidence introduced in this paper, except for in Section 6). Yu (2011, 2017) showed that an H- appeared before the absolutive argument irrespective of diverse syntactic and semantic properties of the argument: Before subjects of intransitive clauses, objects of transitive predicates, proper names, pronouns, nominalized verbs, and regardless of specificity or number. Moreover, the presence of the absolutive H- was insensitive to argument order—e.g., in verb-initial ditransitives, the position of the H- tracks the left edge of the absolutive argument, regardless of the order of subject and objects—and the absolutive H- was absent before absolutive arguments that weren’t overt—e.g., in pro drop of absolutives and extraction of absolutives out of relative clauses.
In addition, H- tones were not observed before bare NPs in environments where bare NPs are independently expected not to be case marked (pseudo-noun incorporation constructions, which have surface VOS order) or where ergative and oblique case marking are also banned, e.g., on arguments in fronted predicates, see Yu and Özyıldız (2016, Section 3.4) for details. Moreover, although Calhoun (2017) shows that no H- appears before post-verbal absolutive arguments under [naɁo] ‘only’ and argues that this data is problematic for positing an absolutive H-, Yu and Özyıldız (2016, Section 3.4.1) and Yu (2017) show that no case markers can co-occur with [naɁo], whether segmental or the H-.
Finally, the presence of the absolutive H- was not sensitive to different focus conditions elicited in question-answer pairs over a range of focus conditions (broad focus, wh subject focus, corrective subject focus, wh object/PP focus, corrective object/PP focus) and answer types (VSO, VOS, fronted subject, fronted object), for both transitives and intransitives (for more detail on the stimulus set, see Appendix A.2). An H- always appeared before the absolutive argument, and never before the ergative argument or oblique object (with some rare, non-systematic exceptions; see Section 6.1)—whether an argument was given, new, or under contrastive focus in the answers to the questions. This result is consistent with Calhoun’s (2015) results from intonational transcriptions for sentences, which also showed no evidence that the H- preceding the absolutive was sensitive to discourse structure. Utterances in that study were elicited under broad focus (‘What happened earlier’), question focus on the agent or direct object, and contrastive focus on the agent or direct object.
The phonetic realization of the absolutive high edge tone is shown in the context of entire utterances in Figure 1 and over a single word in Figure 2. Figure 1a displays an annotated F0 contour for (2a), while Figure 1b displays an annotated F0 contour for (2b). There are three different kinds of tonal events labeled in these figures: LH* (a rising pitch accent), H- (a high edge tone), and L-L% (an utterance final fall),7 which we discuss further in the context of Figure 2. We remind the reader that only the data in Section 6 comes from intonational transcription, while the rest of the data introduced in this paper comes directly from the F0 contours. Nevertheless, it is still useful to discuss the tonal events in terms of intonational labels to describe general observations about their phonetic realization. By convention, we place the label for an LH* pitch accent over the primary stressed syllable it is associated to in all intonational transcription displays. We also segment the ergative and oblique case markers together with the last syllable of the preceding word, e.g., [ŋa e], [ni i] in the annotation of the F0 contours, because it is very difficult to develop consistent criteria for deciding on where one vowel ends and another begins.8 There are two sites that illustrate the realization of the H- in Figure 1a and b: (a) The final syllable of the verbs ([lalaŋa] ‘weave’ and [ŋalue] ‘work’)—an H- keeps the F0 contour high at the right edge of [ŋalue] in Figure 1b but the F0 contour falls over the last syllable of [lalaŋa] in Figure 1a; and (b), the final syllable of [malini] shows an H- keeping F0 high in Figure 1a, preceding the object, but not in Figure 1b, preceding the oblique PP, where F0 falls over the last syllable of [malini].
For a more detailed explication of LH* and H- tonal events, we turn to Figure 2. Figure 2a shows a representative F0 contour over malini when it is the subject of the intransitive clause in (2b) and followed by an oblique PP: No H- appears at the right edge of malini. In contrast, Figure 2b shows a representative F0 contour over malini when it is the subject of the transitive clause in (2a) and immediately followed by the object le mamanu: An absolutive H- appears at the right edge of malini. We emphasize that malini is not the absolutive argument in either of the figures; rather, the H- that appears on malini in Figure 2b marks the absolutive argument coming up immediately after malini, which is not shown.9
To describe the realization of the H-, we first need to explain the rising tonal events in both F0 contours which we transcribe as ‘LH*,’ following Orfitelli and Yu (2009); Zuraw et al. (2014), where the ‘*’ is a diacritic from autosegmental-metrical theory that indicates pitch accenthood, and ‘L’ stands for a low pitch target.10 This is a pitch accent associated to the penultimate syllable, which receives primary stress. The basic footing pattern in Samoan, as observed in monomorphemes, consists of a moraic trochee at the right edge of the word (Zuraw et al., 2014). Primary stress is on the final vowel if it is long, e.g., la(ˈvaː) ‘energized,’ and otherwise on the penultimate vowel, e.g., ma (ˈlini) ‘marine.’ Thus, ma(ˈlini) has a rising pitch accent associated to the penultimate syllable, where the low F0 valley appears around the onset of the stressed mora, and the high F0 peak appears at or slightly later than the offset of the stressed mora (see also Orfitelli & Yu, 2009; Zuraw et al., 2014; Calhoun, 2015 for more on pitch accent realization). If the immediately following tonal event is another pitch accent, e.g., on mamanu in (2b), then the F0 contour over malini falls after the high F0 peak over the last syllable towards the L of this next pitch accent, as in Figure 2a. If however, an H- is present, then the F0 contour continues to rise over the last syllable of malini, as in Figure 2b. Yu (2017) also shows that this high F0 continues into the beginning of the absolutive argument, and the persistence of high F0 into the absolutive argument can also be seen in Figures 4 and 5b.
In the remainder of this section, we provide additional empirical evidence that the syntax completely determines the presence of the high tone as an absolutive case marker. We show that the presence of the high tone is insensitive to prosodic length (Section 2.2), speech rate (Section 2.3), and speech register (Section 2.4). This sets up our initial picture of the syntax/prosody interface in Samoan in Section 3, for which we make the methodological abstraction that the moment that a parser detects a high tone, it can conclude that an absolutive argument is about to occur, i.e., we don’t consider multiple triggers of high tones yet (these include coordination and fronting). This is a good first step towards tackling the Samoan syntax-prosody interface, but we introduce evidence in Sections 4–6 to support complications to this picture that we adjust for in our analysis of the interface: Adjustments that reveal Samoan intonation to have some of the kinds of variability seen in other languages like English, though perhaps to a lesser extent.
If, in addition to syntax, prosody also played a role in determining the presence of the high tone as an absolutive case marker, i.e., if the high edge tone were a consequence of prosodic phrasing choices, then we would expect it to be sensitive to factors known to influence prosodic phrasing (other than syntactic constituency). A large body of work has suggested that prosodic restrictions that regulate size and eurythmy play a role in determining prosodic phrasing decisions, e.g., Nespor and Vogel (1986); Ghini (1993b, 1993a); Fodor (1998); E. Selkirk (2000); Prieto (2005). One general principle that has been discussed in the literature states that prosodic phrasing favors structures where sister prosodic constituents are roughly equal in prosodic size or weight, e.g., (Fodor, 1998, p. 304). A number of related optimality-theoretic constraints formulated in terms of the size of prosodic constituents (taking prosodic constituents more deeply embedded to be relatively smaller in size than those higher in the prosodic tree) have been proposed to drive prosodic phrasing choices that appear to mismatch with syntactic constituency. For instance, Myrberg (2013) accounts for variability in the prosodic phrasing of clauses with embedded structures in Stockholm Swedish by showing how a markedness constraint EQUALSISTERS (sister nodes in prosodic structure are instantiations of the same prosodic category) might underlie the well-formedness of prosodic phrasing choices that mismatch with syntactic constituency; see also related work in Irish (Elfner, 2012, 2015; Bennett et al., 2016).
If the presence of the absolutive high were conditioned on prosodic phrasing choices, we would expect to see variability in its presence, as well as variability in the presence of a high tone elsewhere in an utterance, depending on prosodic length/size. This section shows that we do not see such variability in the tonal marking of the Samoan absolutive.
The first piece of evidence comes from sentences with extremely long DPs. In the sentences discussed in this section, shown in (3) and Table 1, the DPs X, Y are 17–28 syllables long. The sentences have the same basic syntactic structure as those in (2); they just have much longer DPs. In addition to potentially increasing the probability of a prosodic break between the two DPs or anywhere else in the utterance, having extremely long DPs also makes the phonetic realization of the H- as visualized in F0 contours much more easily visible to the naked eye than in F0 contours for short DPs. This is because of the large drop in F0 range due to a downtrend in F0 before the site of the H-, which is much larger over the course of long DPs under discussion here, than over the short DPs in Figure 1.
|SENTENCE STRUCTURE||WORD ORDER|
|transitive||Verb [e X]erg [Y]abs||Verb [X]abs [e Y]erg|
|intransitive||Verb [X]abs [i Y]obl||Verb [i X]obl [Y]abs|
Keeping the DPs constant, we manipulated (a) TRANSITIVITY to be either transitive (with the transitive verb [laŋona]) or intransitive (with the intransitive verb [manoŋi], and (b) the WORD ORDER to be default (VSO/V-S-PP) or scrambled (VOS/V-PP-S). These manipulations are summarized in Table 1.
If the appearance of a high tone were being governed by prosodic restrictions on eurythmy to break the sentence into roughly equal halves, we might expect a high edge tone to appear between the two DPs in the sentence, regardless of word order or transitivity. However, Figure 3 shows that this is not the case in representative F0 tracks from a single speaker who uttered the sentences without discernable silent pauses.11 There are many peaks in the F0 contour from LH* pitch accents over content words, but we annotate the F0 contour only at the site between the two DPs to highlight what is happening at this point (see on-line supplementary material for more detailed annotations of these F0 contours; link given at the beginning of Section 1.1).12 We found a sentence-medial H- for the VSO transitive condition (Figure 3a), as well as for the V-PP-S intransitive condition (Figure 3d). However, no sentence-medial H- between the two DPs occurred in the other two conditions, so the generalization for the distribution of the H- cannot be that it occurs before the second post-verbal argument. Rather, an H- appeared between DPs only when the second DP was an absolutive argument.13
In this section, we present additional evidence showing that the absolutive high isn’t a consequence of prosodic phrasing choices conditioned on prosodic length. This evidence comes from production data where arguments in ditransitive sentences were systematically lengthened. As exemplified in (4),14 we increased the prosodic length of the ergative argument by adding adjectival or locative phrases. We also increased the prosodic length of the absolutive and oblique arguments in precisely the same way, only ever lengthening one argument in each utterance. We recorded this data set with 6 speakers (4 speakers in Samoa and 2 in Los Angeles). In (4) below, the ergative argument is enclosed in brackets, and material added for prosodic lengthening is bold-faced.
If the absolutive high were a prosodic boundary tone associated with a prosodic constituent, we would expect variation in its placement and appearance. This would be a consequence of expected variation in the prosodic phrasing of the ditransitive structures conditioned on prosodic length of the arguments. For instance, in Connemara Irish, Elfner (2012, Section 4.3) finds variation in the prosodic phrasing choices of VSO sentences—and thus, the appearance and positioning of tones reflecting these phrasing choices—depending on whether the arguments are single words (bare nouns), or nouns modified by adjectives. Elfner (2012) attributes this variation to interaction between prosodic markedness constraints, which derives different preferences for prosodic phrasing choices depending on argument size (single word vs. noun-adjective). However, comparing F0 contours within our Samoan ditransitive data set (see on-line supplementary material for F0 data), we found that a high edge tone appeared before (and only before) the absolutive argument, regardless of the length manipulations, as summarized in Table 2.
|Modification||Modified argument||Sentence structure schematic|
|Unlengthened||ERG||na momoli [x xx́x]ergH- [x xx́x]abs [x xx́x]obl|
|Short AP||ERG||na momoli [x xx́x xx́x]ergH- [x xx́x]abs [x xx́x]obl|
|Short PP||ERG||na momoli [x xx́x x x́x]ergH- [x xx́x]abs [x xx́x]obl|
|Long AP||ERG||na momoli [x xx́x x́x-xx́x]ergH- [x xx́x]abs [x xx́x]obl|
|Long PP||ERG||na momoli [x xx́x x x́x x x xx́x]ergH- [x xx́x]abs [x xx́x]obl|
|Short AP||ABS||na momoli [x xx́x]ergH- [x xx́x xx́x]abs [x xx́x]obl|
|Short PP||ABS||na momoli [x xx́x]ergH- [x xx́x x x́x]abs [x xx́x]obl|
|Short AP||OBL||na momoli [x xx́x]ergH- [x xx́x]abs [x xx́x xx́x]obl|
|Short PP||OBL||na momoli [x xx́x]ergH- [x xx́x]abs [x xx́x x x́x]obl|
Having presented evidence that the presence of the absolutive high is insensitive to prosodic length/size, we now provide evidence to show that it is also insensitive to speech rate. As a baseline for comparison, consider the classic example of sensitivity of prosodic phrasing to speech rate in this example from Calcutta Bengali (Hayes & Lahiri, 1991 [54a]), where parentheses delimit ‘phonological phrases.’15 (Another example is Fougeron & Jun, 1998 on French).
In Calcutta Bengali, phonological phrases are produced with rising pitch contours, with a L* pitch accent at the left edge and a high edge tone at the right edge. Therefore, the fact that phrasing in Calcutta Bengali is acutely sensitive to speech rate, means that so too is the placement of the L* and high edge tones: The loss of phonological phrase boundaries in faster speech entails the loss of L* and high edge tones. We will see, however, that the presence and placement of H- tones does not vary with speech rate in the Samoan data set presented in this section (although—unsurprisingly—the phonetic realization of H- tones is sensitive to speech rate).
We elicited simple transitive and intransitive sentences, varying the number of syllables between the absolutive high and neighboring primary stress (to observe the effect of tonal crowding on the realization of the H- for another study; tonal crowding occurs when there is close spacing between neighboring tonal events), and asked our primary consultant to read them at a comfortable pace, and then a fast pace, and a slow pace (see on-line supplementary materials for more information on speech rate under these different conditions).16 A sample minimal pair in the data set—a transitive sentence and its intransitive counterpart—is shown in (6). For a full description of the elicited sentences, see Appendix A.1. One thing to note about the sentences is that since they were also designed to test the effect of tonal crowding on the realization of the H-, there were a number of sentences where the absolutive argument was initially stressed and/or vowel-initial. In such sentences, it appears that there is a compromise between the conflicting demands of realizing the high target of the H- and the low target of the immediately following LH* pitch accent on the stressed syllable, so that F0 contours on the last syllable immediately preceding the absolutive (i.e., in the third syllable, S3) can be seen to fall slightly in Figure 4, which compares F0 contours between absolutive and ergative subjects and objects. The H- is nevertheless present and positioned before the absolutive as expected.
Each transitive sentence like (6a) had a minimally different intransitive counterpart like (6b). This allowed us to compare F0 tracks over the subject when it was followed by the absolutive (6a) to when it was followed by an oblique (6b). These comparisons are shown for the subject in Figure 4a, b, c for the three different speech rates; Figure 4d compares F0 tracks over the object when it is absolutive vs. oblique under the fast speech rate. Figure 4a, b, c, show F0 contours for utterances where the subject was [malini], and Figure 4d shows F0 contours for utterances where the object was [liona].
There are two sites where we expected an absolutive high to appear: (i) Immediately preceding an absolutive subject and into the left edge of the subject in the first syllable (S1) in Figure 4a, b, c, and (ii) immediately preceding the absolutive object, at the right edge of the subject (in the third syllable, S3) in Figure 4a, b, c, as well as at the left edge of the absolutive object (in S1) in Figure 4d. One distinguishing property of the absolutive high’s F0 contour is clearly consistent across speech rates: The persistence of high F0 into the first syllable of the absolutive. This is apparent in syllable 1 (S1) for the absolutive subject for all three speech rates: In Figure 4a, b, c—the solid black line (the intransitive F0 contour) stays well above the dotted grey line (the transitive F0 contour). Speech rate does induce allophonic variability in the realization of the absolutive high, though. In the slow and normal speech rates, there is clearly a continued rise and maintenance of high F0 in the F0 contour into the third syllable in transitive sentences, when the subject is followed by an absolutive object. In the fast speech rate, though, the F0 height in the third syllable (S3) is similar for the ergative and absolutive subjects, so the phonetic difference when the absolutive H- is present or not before the object is smaller. Still, even in this fast speech rate, Figure 4d shows that the high F0 from the absolutive H- persists into the first syllable (S1) of the absolutive object so that the shape of the F0 curve when the object is absolutive is clearly distinct from the F0 curve when the object is oblique.
In summary, the absolutive H- did not disappear as speech rate increased—in this sense, the presence of the absolutive high is not sensitive to speech rate, although (unsurprisingly) the particular phonetic realization of the absolutive high is. The insensitivity of the presence and placement of the Samoan absolutive H- to speech rate thus contrasts with the sensitivity of the presence of L* and high edge tones in Calcutta Bengali to speech rate.
The last factor that we’ll show does not influence the presence of the absolutive H- is ‘register.’ Samoan is well-known for having two distinct registers: Tautala lelei ‘good language’—used in literary contexts and and Westernized institutional contexts like in church and school, as well as with foreigners, and tautala leaga ‘bad language’—used in traditional ceremonies and meetings, as well as between family members and between friends (Shore, 1977, 1980; Duranti, 1981, p. 165–168; Ochs, 1988, p. 196; Duranti, 1990, p. 4–5; Mosel & Hovdhaugen, 1992, p. 7–11; Mayer, 2001).17 One of the most striking contrasts between the two registers is in the segmental phonology. The following mergers occur from tautala lelei to tautala leaga (Mosel & Hovdhaugen, 1992, p. 9):
|(7)||Mergers from tautala lelei to tautala leaga|
|a.||/t/ and /k/ → /k/|
|b.||/n/ and /ŋ/ → /ŋ/|
Consideration of the syntax-prosody interface in tautala leaga is important for two reasons. First, although almost all linguistic research on Samoan has been in tautala lelei, “as much as 90% of casual speech and most traditional oration actually take place using more colloquial forms of Samoan” (i.e., in tautala leaga) (Mayer, 2001, p. 58). Secondly, the segmental ergative case marker e has been reported to be rarely used in tautala leaga (Mosel & Hovdhaugen, 1992, p. 9), see also Mayer (2001). Mayer (2001) also reports that genitive case markers are often dropped in tautala leaga as well (Duranti, 1981; Ochs, 1988), although the literature does not indicate whether the oblique particle i is also typically dropped or not. In contexts where segmental case markers are dropped, the presence of a tone marking absolutive case would not only be informative about morphosyntactic structure, but it would serve to disambiguate between possible parses.
Consider the tautala leaga minimal pair in (8). The two sentences are string-identical, but if there were a high tone before [le malie], ‘the shark’ in (8a), in contrast to a high tone before [le liona], ‘the lion’ in (8b), then the position of the high tone would disambiguate between VSO and VOS word order.
We present initial evidence that the absolutive high is present in tautala leaga from two data sets. In the first data set, sentences in tautala leaga were elicited from our primary consultant in Los Angeles. Twenty-four minimal pairs from two transitive verbs ([laˈŋoŋa] ‘hear,’ [iˈloa] ‘know’), two intransitive verbs ([mˈaŋoŋi] ‘be smelly/fragrant,’ [laˈvea] ‘be injured by’), and four different animal NPs, [liˈoŋa] ‘lion,’ [koˈloa] ‘duck,’ [iˈsumu] ‘rat,’18 and [maˈlie] ‘shark.’ Within each minimal pair, the only variable we manipulated was WORD ORDER: VSO vs. VOS, see (9) and (10). This consultant found both word orders licit out-of-the-blue. No segmental case markers were present for ergative or oblique case; therefore, each string was ambiguous for whether the subject was the first or second argument. However, for the purposes of elicitation, the case markers were indicated in parentheses. Each of the 48 sentences (in randomized order) was uttered twice, for a total of 96 utterances.
As shown in Figure 5, the absence of segmental case markers had no effect on the presence of the absolutive H-: The H- appears in the third syllable on the right edge of the verb (Figure 5a) and in the third syllable on the right edge of the first argument (Figure 5b) when they are immediately followed by an absolutive argument. Like in the F0 contours from other data sets in the paper, the absolutive H- is also still clearly discernable on the F0 contour over the first syllable of the absolutive argument (Figure 5b).
The second data set we elicited in tautala leaga is described in detail in Appendix A.3. This consisted of two consultants’ most preferred responses to a variety of questions eliciting different focus conditions in the tautala lelei data set described in Appendix A.2, elicited in tautala leaga for Speakers f03 and f05. Briefly, the tautala lelei data set included question-answer pairs over a range of focus conditions (broad focus, wh subject focus, corrective subject focus, wh object/PP focus, corrective object/PP focus) and answer types (VSO, VOS, fronted subject, fronted object), for both transitives and intransitives. As discussed later in Section 6 and shown in Tables 5 and 6, a high tone still invariably occurred before absolutives in the tautala leaga data set. It should be noted that f03 explicitly stated she was dropping the ergative e in the tautala leaga recordings, but f05 did not say that. Therefore, it’s possible that some trace of the ergative e, however reduced, might have been present in f05’s speech (so that prosody wasn’t the only means of detecting case)—this is something we leave to future fine-grained phonetic analysis to check.
In this section, we presented a preliminary view of the Samoan syntax-prosody interface (to be revised) where the syntax determines the presence of the high tone as an absolutive case marker (and only as an absolutive case marker), so that the moment that a parser detects a high tone, it can conclude that an absolutive argument is about to occur. In the following section, we set up a syntactic perspective to define the absolutive high in the syntax/prosody interface.
To define the syntax/prosody interface, we tentatively adopt the analysis that has been proposed by Collins (2016, 2015, 2014). While Massam (2001) and others have assumed that Samoan has absolutive case marking, Collins (2014) argues that Samoan is actually a language of the type Legate (2008) classifies as ‘ABS = DEF,’ that is, a language where the marking that has been called ‘absolutive’ is actually the default case marking for nominative and accusative.19 While Collins and others originally assumed the default case marking in Samoan was null, Yu (2011, 2017) showed that Samoan reliably presents the high tone H- in these positions.
|(11)||a.||the structure for (2a) on page 5||b.||the structure for (2b) on page 5|
The structures shown in (11) indicate a derivation of Samoan verb initial ordering by fronting the VP to a functional head F below T after the arguments have been raised out of it, and head movement moves T na to C, (following Collins 2016, ). Phrasal movements are shown coindexed, but the head movement is shown leaving a bare trace t. And notice that the case markers are depicted as adjoined to their arguments; we assume that this happens during spellout. While many details of the spellout mechanism remain unknown, one way to compute this spellout in recognition and production is sketched in Section 7 and Appendix B.
Collins’ main argument against assuming that the intransitive subject S and the transitive object P are both marked by a single absolutive case marking mechanism is that in nominalized clauses, S and P behave differently: S must be genitively marked (with /o/ or /a/), while P can have the same marking as in finite clauses. Collins assumed the marking of P was null in both finite and nominalized clauses, but Yu (2017) shows that in both finite and nominalized clauses, when P lacks a segmentally explicit case marker, it is invariably marked with a preceding H- (compare Collins 2014, ):
Since this H- marking in nominalizations is possible for the transitive object P but not for the intransitive subject S, we adopt Collins’ view that the gloss ABS preceding [le manini] ‘the fish’ is really the marking of ACC. So now we have this answer to the question in the title of this section: What is the ‘absolutive high’? According to the syntactic analysis adopted here, it is a (perhaps slightly misleading) descriptive gloss of what we now recognize to be the default, syncretic marking of nominative NOM and accusative ACC. We will continue to use ‘absolutive’ descriptively, even though, from this perspective (and remembering footnote 19), the syncretism of NOM and ACC marking may mislead some linguists into thinking that Samoan has a single mechanism of absolutive case assignment—in nonfinite embedded contexts we see that distinct mechanisms must be reponsible for the case marking of S and P.20
Having now situated the absolutive H- in the syntax-prosody interface, in this section we expand the range of empirical data we consider to include multiple triggers for high edge tones. In Section 2.1, we briefly noted that sentence-medial H- tones in Samoan occur not just before absolutives, but also in other syntactic environments. In this section, we introduce these other H- tones to set up our integration of them into the syntax/prosody interface in Section 5.
The sentence (13) below exemplifies multiple triggers for H- tones:
Figure 6 shows the F0 contour for an utterance of the sentence (13) by our primary consultant, which depicts many of the multiple triggers for H- tones. We do not provide a minimal comparison for Figure 6 without H- tones here, but one reflex of the sequence of H- tones in the utterance that is plainly visible is that the topline (the line connecting the peaks in the F0 contour) stays high throughout the utterance, around 180 Hz, rather than declining (compare to Figure 3). The first trigger for an H- in the utterance is coordination (Orfitelli & Yu, 2009): An H- precedes the conjunction [ma] (glossed as CONJ) inside the fronted DP o le malini mamalu ma Mala, ‘the glorified marine and Mala’. The second is the fronted (non-pronominal) DP argument (glossed as FRONT): An H- appears at the right edge of the fronted argument o le malini mamalu ma Mala, right before the predicate (Orfitelli & Yu, 2009; Calhoun, 2015). The absolutive H- appears at the right edge of the transitive verb [laŋona], immediately preceding an absolutive argument. The last H- we introduce here delineates members of a list (glossed as LIST) (Orfitelli & Yu, 2009).
It is noteworthy that the two final H- tones indicated in Figure 6 are followed by (fluent) pauses. As a rule of thumb, (fluent) pauses have been used to diagnose strong prosodic junctures, i.e., intonational phrase boundaries, see e.g., E. O. Selkirk (1978/1981, p. 135), Pierrehumbert (1980, p. 19), Nespor and Vogel (1986, p. 188), Krivokapić (2007, p. 163), and S. Jun and Fletcher (2014, p. 501–502). This raises the issue that the syntactically conditioned H- tones expected in these configurations may be co-occurring or may have been ‘overridden’ by a different kind of high edge tone, one that demarcates a prosodic domain, see e.g., S.-A. Jun (1996, p. 38), Khan (2008, p. 119), Hyman and Monaka (2011). If so, then an alternative transcription of the high edge tones followed by pauses that we have transcribed with ‘H-’ in Figure 6 might be ‘H%,’ as ‘%’ is a diacritic standardly used for indicating association to an ‘intonational phrase’ boundary in autosegmental-metrical theory. Calhoun (2017) also found many examples of high edge tones followed by pauses. We discuss high edge tones followed by pauses further in Section 6 and Yu (2017) and leave them aside for now.
We give a simpler example of the H- that appears in fronting in (14), with a representative F0 contour for (14a) shown in Figure 7. The point of interest here is the F0 contour over [malini] at the end of the fronted predicate [o le malini] ‘TOP the marine.’ Compare this to the F0 contours over [malini] in Figure 2: The F0 contour over [malini] in Figure 7 looks like Figure 2b, which has an H-.
We show another example of the H- in coordination in Figure 8, where the utterances contain no pauses (in contrast to Figure 6). Here, the point of interest is the F0 contour over the string [le malini ma Malu/mamalu], which may mean either [le malini mamalu] ‘the glorified marine’ (15a) when [mamalu] is an adjectival modifier, or [le malini ma Malu] ‘the marine and Malu’ (15b), when [Malu] is coordinated.21 Figure 8a shows that F0 begins to sharply fall on [ni] before the adjective [mamalu] (although there is some rise into [ni] from peak delay), while Figure 8b shows that high F0 persists into the final syllable [ni] of [malini] when the conjunction [ma] follows. Zoomed in, the contrast between F0 contours over [malini] in Figure 8a and b looks just like the contrast displayed over the F0 contours for [malini] in Figure 2a and b, respectively.
The coordination H- also appears in disjunctions before the disjunctive coordinators [poʔo] or [peː], which are described in Mosel and Hovdhaugen (1992, p. 153, 681)22 and in verbal coordination (see Yu, 2017 for an example of verbal coordination). The evidence for this comes from the basic coordination data set described in Appendix A.4. An important caveat, though, is that initial stress in the disjunctive coordinators [ˈpoʔo] or [ˈpeː] makes it very difficult to tell if rising F0 preceding the coordinator can be attributed to a high edge tone, or if rising F0 might only be due to the rise to the initial pitch accent on the disjunctive coordinator. Further fine-grained phonetic work is needed to tease this apart.
There are two things to note about these other high tones that are relevant for computing the syntax-prosody interface. First, these high tones are not optionally produced—rather, like the absolutive high, the current evidence shows that they always appear.23 While we have not done the systematic manipulations with lengthening for these high tones that we reported for the absolutive high in Section 2.2, we have not noticed that the high tones disappear when prosodic length decreases, e.g., the coordination high appears even if there are only two syllables in the first coordinate and one in the second.24 Second, whether the source is from coordination, fronting, or the absolutive, the H- tones are all aligned to the edge of the word. Thus, upon detecting a high edge tone, a parser must consider all these different sources as possible alternatives.
The evidence for edge alignment of the H- tones comes from the prepenultimate stress data set (see Appendix A.5) and from another similar data set discussed in Yu (2017). We provide a brief overview of the evidence here. A standard way to tease apart whether a tone is a pitch accent or a edge tone is to vary the position of stress and the number of syllables in words, and to observe if the alignment of the tone correlates with stress position (the signature of a pitch accent) or with word length (the signature of a edge tone) (S. Jun & Fletcher, 2014). But the penultimate mora is the furthest mora from the left edge of a prosodic word that native Samoan words25 can bear primary stress (Zuraw et al., 2014). Thus, it is not possible to sufficiently separate the position of primary stress from the right word edge in Samoan to check whether H- tones track with stress or word edges. We therefore performed a Bach test (Halle, 1978, p. 301), using nonce forms with nonnative stress patterns, by asking our speakers in Auckland to code-switch in English names with antepenult stress (Melanie, Romeo) alongside names with native stress patterns. Codeswitching between Samoan and English is a common everyday occurrence for our speakers. We observed that the high tone still appeared at the right edge of the target words, even with antepenult stress. Moreover, all high tones exhibit similar phonetic properties in that they spread rightward from where they initially begin to rise, like in Figures 4 and 5b.
In summary, with the complication of additional H- tones besides the absolutive H-, we now have a more elaborated view of the interface (though still to be revised) than the initial view presented in Section 2. When the parser detects a high tone, the source of the high tone is known to be morphosyntactic, but the particular structural source of the high tone could be from fronting, coordination, or from the absolutive.
Section 3 proposed that spellout introduces H- as the spellout of NOM and ACC. We now turn to some of the additional constructions mentioned in the preceding section.
For coordination, we can either assume that the high tone is lexically associated with the coordinators, or we can again introduce it postsyntactically, in spellout. Consider the following examples:
If we assume that the marking is done in spellout, then the coordinator just needs to trigger the insertion. Note that case marking will apply to these structures as well, inserting the absolutive H-, yielding a structure that we can assume to be roughly like this:
Here we follow Zhang (2009) in assuming that the coordinator ‘inherits’ the category of its arguments, D in this example (additional detail in Appendix B).
As illustrated by the first tone indicated in (19), fronted arguments are H- marked:
The spellout rule we need here simply inserts a high as a reflex of the syntactic configuration causing the material to be fronted. Case marking applies in this example too, inserting the absolutive case marker H- before le mamanu ‘the design,’ so we can obtain a structure like this:
In (20), our spellout rule has adjoined the H- to C, but if it turned out that we had evidence for an alternate structure, e.g., right-adjoining the H- to the fronted DP, we could revise the spellout rule accordingly.
With these additional spellout rules, other prosodic events besides the absolutive H- are determined by the syntax. See Section 7 and Appendix B for a sketch of one way to execute these proposals.
In this section, we introduce a final complication to our picture of the syntax-prosody interface in Samoan: Interactions between prosodic phrasing and the presence of high edge tones. We show using counts from tonal transcriptions from four data sets that sometimes a low (L-) edge tone appears where we would have expected a high tone, and that the frequency of this occurring seems to be sensitive to the morphosyntactic source of the high tone. We also show that there is some noise in the morphosyntactic-high tone correlation. Occasionally, for instance, the absolutive high or some other morphosyntactically conditioned high may be missing, and occasionally, a high tone might appear in an environment we have not yet specified. The four data sets we included in the analysis here are the following: (1) The tautala lelei ‘good language’ data set (introduced in Section 2.4), (2) the tautala leaga ‘bad language’ data set (introduced in Section 2.4), (3) the basic coordination data set (introduced in Section 6.3), and (4) the prepenultimate stress data set (introduced in Section 4 and also discussed in Section 6.4).
All of these data sets are small and biased towards particular constructions, and the number of repetitions of a particular item varied slightly, so the frequencies we found for various tones should not be taken to be representative for Samoan in general. We can, however, still see that a sentence-medial low edge tone can sometimes appear—both in places where we rarely see high edge tones and in places where we expect morphosyntactically conditioned high edge tones—and that the frequency of low edge tones appearing before absolutive seems to be very low.
A careful study of the sentence-medial low edge tones awaits future work—including whether a distinction should be made between L- tones that are followed by pauses and those that are not (and if so, what kind of distinction should be made).26 But we make some preliminary observations about L- tones here; for further discussion of low edge tones (and high edge tones) followed by pauses, see Calhoun (2017) and Yu (2017). Sentence-medial low edge tones often occur with a pause (see Tables 3, 4, 5, 6, 7, 8, 9), and always occur with pitch reset, in the sense that the pitch restarts with a high peak, even when there are unstressed elements immediately following the L tone (annotated as ‘reset’ in Figure 9). This can be seen in the F0 tracks for the example sentences in (21), one which includes a pause (Figure 9a) and one which does not (Figure 9b). These figures also show that the pitch accent immediately preceding the L- is suppressed, something that also happens at the end of interrogatives (Orfitelli & Yu, 2009); the F0 contour up to the L- in Figure 9a in particular sounds like the F0 contour of an interrogative. Since low edge tones often occur with a pause, it may be the case that they mark strong prosodic junctures, as we discussed for high edge tones followed with pauses in Figure 6 in Section 4 on page 20. So an alternate transcription for L- tones might be ‘L%.’
|Structure||Sites||Null||H-||H-, pause||L-||L-?||L-, pause|
|Structure||Sites||Null||H-||H-, pause||H-?||L-||L-, pause||L-?|
|Structure||Sites||Null||H-||H-, pause||H-?||L-||L-, pause||L-?|
As described in Section 2.4, this data set included question-answer pairs over a range of focus conditions (broad focus, wh subject focus, corrective subject focus, wh object/PP focus, corrective object/PP focus) and answer types (VSO, VOS, fronted subject, fronted object), for both transitives and intransitives (see Appendix A.2 and Yu, 2017 for more details). Speaker f03’s data set included only inanimate objects (due to time constraints) and so is smaller than speaker f05’s data, which also included animate objects. The frequency of tonal events for different morphosyntactic structures is given in Table 3 for Speaker f03 and in Table 4 for Speaker f05. The syntactic environments include those where a site for a syntactically conditioned H- is expected (absolutive, fronting), as well as environments where edge tones were transcribed but no syntactically-conditioned H- is expected (immediately preceding ergative or oblique nominals). In all tables of frequencies given in this paper, the listing of tonal events is exhaustive, i.e., there are no edge tones we transcribed that we do not include in the tables. A tone label with a “?” means that there was some evidence for that particular tone, but we were not certain that it was present. A tone label like “L-, pause” means that the tone was followed by a period of silence. Tonal events indicated for “oblique” structures are tonal events that occurred immediately preceding oblique PPs.
For speaker f03, there were 60 sites for absolutives, and a high tone appeared in 59 of them (98%); a low tone appeared once: In a VOS response to corrective focus on the object. Immediately preceding the ergative, an L tone followed by an audible pause occurred 11 times in fronted object constructions and 7 times in VOS sentences in a range of discourse contexts: Broad focus, wh-focus on the VP, subject, and corrective focus on the subject and object. Immediately preceding the oblique, an L tone with a pause occurred under wh-VP, wh-object, corrective-object, and corrective-subject focus conditions, all in VSO order. In fronting, an L- tone appeared twice in fronted subject responses to wh-subject focus.
For speaker f05, there were two cases of L- tones before an absolutive: Once for corrective focus on the subject for a VSO response, and once for wh-subject on the focus for a fronted subject response. Preceding an ergative, a L- tone with a pause occurred 3 times for fronted object responses, for wh-focus on the VP and on the object, and for corrective focus on the object. An L- tone with a pause occurred twice before obliques, for wh-focus on the object, with VSO responses. In fronting, an L- tone appeared 15 times (2/3 of these with pauses), occurring in both fronted subject and object responses to wh- or corrective focus on the object or subject.
This data set was introduced in Section 2.4 and is described in detail in Appendix A.3. Recall that it consisted of f03 and f05’s preferred responses to the various dicourse contexts, elicited in tautala leaga. The frequency of tonal events for different morphosyntactic structures is given in Table 5 for Speaker f03 and in Table 6 for Speaker f05.
For speaker f03, there were two H- tones before an ergative, in two repetitions of a fronted object response to corrective focus on the object. Three different tonal events occurred before the oblique in VSO responses to corrective focus on the object.
For speaker f05, no tonal events occurred before ergatives or obliques, and H- tones always appeared for absolutives and in fronting.
This data set, which included a range of nominal and verbal phrase conjunction and disjunction, was already introduced in Section 4 and is described in detail in Appendix A.4. The frequency of tonal events for different morphosyntactic structures for Speaker f03 is given in Table 7.
An L- tone with a pause occurred before the ergative in two repetitions of a verbal disjunction. L- tones occurred in conjunctions and disjunctions of DPs in transitive and intransitive sentences and also in verbal conjunction and disjunction. Before obliques, an L- tone or suspected L- tone sometimes occurred in both conjunction and disjunction of DPs.
This data set was already introduced in Section 4 and is described in detail in Appendix A.5. Recall that this data set manipulated stress position in a target word by including English proper names and probed the interaction of different morphosyntactic high tones with stress position. The frequency of tonal events for different morphosyntactic structures is given in Table 8 for Speaker f03 and in Table 9 for Speaker f05.
For speaker f03, this data set was noisy because she had some trouble with the English name Gabrielle. All unexpected tonal events for the absolutive happened in utterances with this name. An L- tone with a pause occurred in disjunction for two repetitions of DP disjunction. Before obliques, an L- tone with a pause occurred in sentences with ditransitives and conjunctions of absolutive subjects, and a H- tone with a pause occurred in two repetitions of a ditransitive. Although it’s possible that some of the pauses were disfluent rather than fluent, we didn’t discard any of the utterances because there was no evidence besides pauses (such as speech repairs) that a disfluency might have occurred.
For speaker f05, L- tones only occurred in disjunctions in the data set. The coordination H- was missing in two repetitions of a particular ditransitive (but not other repetitions).
There are a few generalizations we draw from these exploratory data sets to revise our picture of the Samoan syntax-prosody interface. While the invariance in morphosyntactic conditioning of the high tone by the absolutive, coordination, and fronting was largely upheld by the frequency counts, there are two classes of exceptions that complicate the picture.
First, there were sporadic instances where the absolutive high and coordination high did not appear, and there were also sporadic instances where a high tone appeared before the ergative or obliques. Presently, we can point to no systematic factor underlying these exceptions: In general, we would certainly expect such exceptions, due to disfluencies and speech errors or other production planning factors. Until we better understand what factors may be driving these exceptions, we can treat them as noise and model these exceptions by having a probability distribution placing some probability mass on no tone or a high edge tone occurring in all the morphosyntactic structures we have discussed.
Second, there were occurrences of an L-/L% that ‘overrode’ the H- tone. This happened only twice for absolutives, out of a total of 467 times (0.04%), but it happened more frequently in coordination (15/185, 8%) and in fronting (16/134, 12%). An L- also sometimes occurred before ergatives (25/237, 11%) and obliques (34/314, 11%), where we would not expect an H-. Tentatively we note that there may be some systematic conditioning of discourse structure in play for the appearance of the L-; it mostly appeared under wh- or corrective focus; in part this was because fronted object or subject constructions were mostly accepted by consultants only under wh- or corrective focus. In addition, word order may play a role in conditioning the presence of the L-: In f03’s tautala lelei data set, an L- followed by an audible pause preceded the ergative occurred 11 times in fronted object constructions and 7 times in VOS sentences, and in no other constructions.
Finally, for both H- and L- tones, another observation we made was that sometimes they were followed by a silent pause, e.g., the L- (L%) in Figure 9a. At the present time, we do not yet have enough data to understand if there are systematic differences between the distribution of edge tones followed by an audible pause and those that are not, or how multiple edge tones from a variety of grammatical sources might interact when appearing at a single edge, or if a single edge tone might simultaneously come from multiple grammatical sources. Calhoun (2017) makes a valuable contribution here, showing that the appearance of both low and high sentence-medial edge tones is quite common, though variable, in sentences with ‘exclusive’ [naʔo] constructions and ‘equative’ copular constructions. While the majority of her figures of representative intonational transcriptions show edge tones followed by (often quite long) silent pauses, the transcriptional count data given doesn’t distinguish between whether these edge tones are followed by pauses or not.
As mentioned in Section 4, in intonational phonology, the presence of a pause is often taken to be grounds to distinguish between types of edge tones, such as between an intonational phrase and prosodic categories lower in the prosodic hierarchy. It would be interesting if the optional edge tones that occur in Calhoun’s (2017) constructions before the predicate and between arguments in equatives are typically followed by pauses, since the H- tones we’ve discussed here for absolutives, coordination, and fronting invariably appear and are typically not followed by pauses. Such a systematic distinction might suggest that variably appearing edge tones typically followed by pauses have a different grammatical source, e.g., prosodic grammar, than those that are not, e.g., syntactic grammar, and that, accordingly, we would want to handle them differently in production and comprehension models.
There is no reason not to have an interface model that includes both syntactically determined tones inserted in spellout as well as prosodically determined ones. The emphasis in much of work on the syntax-phonology interface is on the relation between syntactic and prosodic constituency, and this may sometimes make it seem like this relation is the whole of the syntax-phonology interface. But that is not the case, as stated in the opening sentence of Selkirk’s statement of the ‘Match theory’ of syntax-prosody mapping: “The topic of the syntax-phonology interface is broad, encompassing different submodules of grammar and interactions of these. This chapter addresses one fundamental aspect of the syntax-phonology interface in detail: The relation between syntactic constituency and the prosodic constituent domains for sentence-level phonological and phonetic phenomena. Two further core aspects, which rely on an understanding of the first, are not examined here – the phonological realization (spell-out) of the morphosyntactic feature bundles of morphemes and lexical items that form part of syntactic representation and the linearization of syntactic representation which produces the surface word order of the sentence as actually pronounced” (E. Selkirk, 2011, p. 435).
We have come to the end of the presentation of our empirical data bearing on the syntax-prosody interface in Samoan. We first presented evidence showing that the presence of high edge tones in the structural configuration of ‘absolutive’ case is insensitive to extra-syntactic factors (Section 2). Then, we introduced coordination and fronted expressions as additional configurations triggering high edge tones (Section 4). With this final section, we have pointed out occasional exceptions to the expected distribution of syntactically-conditioned H- tones, and we’ve hypothesized that some high and low edge tones might be prosodically rather than syntactically conditioned, see Yu (2017) for further discussion. Section 7.1 notes that the class of parsing models we are considering can be extended to be probabilistic to handle variability in the appearance of syntactically conditioned edge tones discussed here (see Appendix B for further detail). We also discussed the possibility of an additional class of variably occurring edge tones in Samoan which may be conditioned on prosodic domains. The model defined in this paper factors out the syntactically determined portion of the interface in Samoan, and we leave extending the model to handle prosodically-conditioned edge tones to future work.
We have already laid the foundations for showing how the syntax/prosody interface in Samoan could be computed in Section 3 and Section 5. In those sections, we informally described relevant aspects of Samoan syntactic grammar and tone-marking spellout rules sensitive to syntactic structure that place H- tones exactly and only in the positions where they reliably occurred in our fieldwork (see Appendix B for definitions of these rules). Those rules define a way to compute the syntax/prosody interface in Samoan that fits with our empirical data in a production model. In this section, we tackle the challenge of defining a comprehension model on the basis of the defined production model. The basic temporal flow of the comprehension model is easy to write down; we simply write the flow of the production model in reverse, as shown in (22). Given a syntactic grammar GS, a prosodic grammar GPr (i.e., our tone-marking spellout rules), and a phonological grammar GPh, we would have something like the following:
|(22)||The comprehension model as the production model in reverse:|
However, we can see in this puzzling diagram the problem alluded to in Section 1: Defining a comprehension model from production-oriented grammatical components is not straightforward. Phonology GPh in reverse does not define prosodically marked syntactic trees, or any kinds of syntactic trees at all. And the prosody GPr presented here comprises just some simple rules for inserting tones into syntactic trees, and so how can we ‘reverse’ those rules? How could GPr ‘uninsert’ the tones from just the places that the syntax allows, when we have not gotten to the syntax yet? This section presents a simple solution to this problem. Without adding any new components to the grammar, we can transparently define how the prosodically marked sequences delivered by phonology can be properly parsed.
The hierarchical syntactic structures displayed in the preceding sections indicate discontinous ‘movement’ relations of various kinds, and the ‘spellout’ mechanisms proposed add high edge tones to certain syntactic structures as part of the specification of linear, pronounced forms. The basic claims we need for our production model are these:
|(P1)||The posited syntactic structures in Samoan can be computed by a certain kind of ‘minimalist grammar’ (Stabler, 2010). This claim is defended in Section 7.1, just below.|
|(P2)||The posited post-syntactic tone-marking spellout can be computed by a certain simple kind of ‘regular tree transduction.’ This claim is defended in Section 7.2.|
Given these claims, the following mathematical results allow us to solve the problem of computing the syntax/prosody interface in the comprehension direction:
In the absence of direct psycholinguistic evidence bearing on the status of syntax vs. spellout, this last idea about how structure and spellout are computed—the idea that they are computed simultaneously, rather than in sequence—seems the simplest and most plausible.
An important advantage of (P1), (P2) is that a large number of equivalent and near-equivalent approaches have been identified, often with constructive proofs that provide recipes for converting from the minimalist grammar approach into any of the ‘mildly context sensitive’ alternatives that are relevantly ‘equivalent’ (Stabler, 2010). Furthermore, parsing and generation algorithms associated with any of those alternatives can be used to compute the same string mappings and structural relations that our particular proposal identifies. So our strategy for solving the problem of computing the syntax/prosody interface in the comprehension direction will be to defend (P1) and (P2) here, with brief discussion of parsing consequences, and with some further details provided in Appendix B.
In this section, we do not aim to present a complete minimalist grammar (MG) for Samoan, but just to defend the view that minimalist grammars have the mechanisms required to define syntactic structures like those shown in previous sections for ‘absolutive’ case, coordination and fronted expressions. Our characterization of the relevant aspects of Samoan is facilitated by the fact that we follow the proposals of Collins mentioned above.27 Collins proposes that the basic Samoan VSO order is derived by VP fronting, after the arguments have raised out of the VP, as indicated in the syntactic structures shown above. He suggests that after v selects VP, v has an EPP feature which triggers the raising of all arguments. That is, the EPP feature of v should trigger the raising of the object, if there is an object, and not crash if there is no object. Since the number of arguments of any verb in the lexicon is bounded, an EPP feature of that kind could be added to MGs without fundamentally changing their computational properties, but the same structures can be built by assigning each argument a different feature and triggering the movements of the phrases with each feature.28 When T is merged, an EPP feature on T can then trigger the fronting of VP to its specifier. And then when C is merged, the head of T moves to C.
Consider again the syntactic structure (11a), repeated below on the left. It can be calculated by merging the lexical items as shown in the derivation on the right. In that 10-step derivation, the features of the lexical items determine the internal merge steps indicated by • and the two external merge (i.e., movement) steps indicated by ⚬, corresponding to the coindexed trace t(0) and DP(0) and the coindexed trace t(1) and VP(1), in the tree on the left (details are provided in Appendix B).
So the basic mechanisms that Collins uses to get basic clause structures are either immediately available in the MG framework or easily emulated and added. The reason that it is interesting that MGs can encode these analyses is not because that tells us anything new about Samoan syntax per se, but rather because that guarantees certain computational properties, including the proven existence of a range of parsing strategies adequate to compute all and only the structures allowed by the grammar (Harkema, 2000, 2001; Stabler, 2013). Those algorithms are efficient and also easily extend to select analyses which are ‘most probable’ in various senses (Hunter & Dyer, 2013). Thus, they can encode the probabilistic modeling to handle some of the variability that we described in Section 6–variability in the appearance of syntactically determined high edge tones that we treat as ‘noise.’
MGs have also been extended to handle a range of coordinate structures (Torr & Stabler, 2016), respecting the ‘coordinate structure constraint’ (CSC) and ‘across the board’ extractions. Roughly, constituents that differ in what has been extracted from them cannot coordinate, and this is easily enforced in MGs by reflecting the relevant properties of extracted elements in the category (i.e., the features) of the coordinated elements. Some examples of coordination are considered in Section 4, and a wider range is discussed by Collins (2016, Section 6). Collins observes that the CSC correctly predicts the degraded status of structures that coordinate unergative and unaccusative predicates in Samoan, exactly as in the English gloss:
While the properties of these constructions are not fully understood in either English or Samoan, it appears that CSC applies similarly. And for present concerns, the only relevant question is how to specify that the coordinators are H- marked. Appendix B shows how MGs can also handle fronted expressions.
The reason for drawing attention to the derivation shown in (23) on the right is that it is especially simple, in the sense that it is defined by a simple finite state mechanism (Michaelis, 1998; Kobele et al., 2007). Not only is this particular tree simple in that sense, but the derivation trees are guaranteed to be finite state definable no matter how the minimalist grammar needs to be elaborated to get the whole Samoan language. This sets the stage for a simple approach to spellout.
The preceding section showed that the syntactic structures proposed in this paper are all MG-definable. What about the tone-marking spellout rules? Formal grammars and parsing algorithms are usually defined over a lexicon. In linguistic theory and in applications, the lexicon is often taken to correspond to pronounced words or morphemes; derivations concatenate the pronounced elements. Many grammars in the minimalist tradition depart quite dramatically from that perspective though. Not only do they allow phonologically empty lexical items of various categories, phonologically vacuous (‘covert’) movements, etc., but also processes that distance the basic formatives of the syntax quite significantly from what is actually heard or spoken. In this recent tradition, the syntax is stated over feature structures that are significantly more abstract than the ‘pronounced words’ of traditional approaches. A wide range of theoretical traditions is advancing this kind of idea—that not only does phonology modify pronounced sequences in regular ways, but also, ‘distributed morphology,’ ‘exoskeletal morphology,’ etc. rearrange elements to allow more abstract syntactic formatives. The proposal in this paper falls into this very broad tradition. The proposal is that Samoan structural ‘case marking,’ and the similar marking of fronted and coordinate expressions, is not the concatenation of special lexical items, but is ‘postsyntactic,’ a kind of pronounced reflex of structural configurations.29 Because the tone-marking rules are sensitive to syntactic structure, they cannot apply before any assumptions about syntactic structure, but the idea that, in performance models, they apply after parsing the syntax is unappealing, and, it turns out, unnecessary. They can apply simultaneously.
To show this, we first establish that the case-marking and other tone-insertions needed for this approach are themselves simple in the sense of being ‘regular,’ that is, finite state definable on trees, in a precise sense that matters for the computation of the syntax/prosody interface. In the trees just above in (23), notice that when the leaves of the tree on the left are pronounced in order, we have the example (2a) on page 5, without the case marking—this is the standard spellout rule. Because the derivations are finite state definable, we can use another finite state mechanism to, in effect, climb up the tree and insert the case markers wherever a case marking configuration occurs. So for example, to get tautala lelei case marking we insert the elements shown here:
Spelling this out, the case markers are pronounced in the correct, structurally determined positions.
In previous sections we have seen H- insertion in the structural configuration of ‘absolutive’ (which we are taking to be nom, acc) case assignment in (11), in coordination (18), and in fronted expressions (20). The previous section argues that these constructions can be defined by minimalist grammars. Now we add the observation that all of the tone placement rules needed for these constructions are simple in the precise sense of being, quite easily, finite state definable. Minimalist grammars have very simple derivations, so we can specify the case-marking and other tone-insertions as a simple reflex of the structures of minimalist derivations.
Finally, it is fairly easy to see now that the case marking and structure calculation can be done simultaneously. This solves the puzzle of (22), showing how we can straightforwardly define a comprehension model for Samoan from the production-oriented grammatical components that we already previously defined. Observe that (i) the structural configurations in which these changes apply are defined ‘locally’—by the features of the two subtrees that appear at the point where the marking is to take place; and (ii) the marks which are inserted are simple constants, not full phrases of any kind, and not copies of other structures, or any such thing. Because of property (i), we can define a ‘parsing grammar’ that combines the syntax and spellout, distinguishing the categories relevant for case marking right, and so that in this combined system, a rule applies to insert the specified constants mentioned in property (ii). In this sense, the post-syntactic process can be ‘composed into’ the syntax in a way that yields another grammar that is, in computational respects, a grammar of the same kind, a minimalist grammar. This situation will hold not only for the superficial syntax sketched here, but it holds for any minimalist grammar, no matter how complicated, and any post-syntactic process with properties (i–ii). Therefore, we need not suppose that a parser considers segmental material only, subject to a following filter based on prosody. Rather, prosodic and segmental cues can be considered together, as soon as they are perceived. And since this is a standard MG, some of the variability can be handled with a probabilistic model, as mentioned at the end of Section 7.1.
Taking the syntax/prosody interface in Samoan as a case study, we have identified some syntactically determined aspects of prosody and shown how these can inform the syntactic parser. Since the syntax and prosody can be folded together as sketched in the previous section and described in more detail in Appendix B, a minimalist parser can directly parse the tone-marked surface forms. That is possible because, even though we describe syntax followed by spellout, the combination of these two is still a minimalist language, and so a minimalist parser suffices to do the analysis. With this approach, a large range of minimalist parsing models can use prosody as soon as it is heard. If we could not combine syntax and prosody in this way, then any theory of sentence comprehension would have to explain not just how the sequence of morphemes is analyzed by the syntax, but also how that sequence of morphemes is computed properly from the surface forms, and how prosodic reflexes of syntactic structure are provided at that interface. But we are not in that situation. We can factor performance into syntax and prosody in order to state generalizations in each domain most perspicuously, and in the performance model they do not need to be temporally separate in any sense. Thus, in this paper, we have answered the two questions we started with: (i) How to factor out the contribution of syntax to conditioning prosodic events, when presented only with the resulting output from the interaction of a multitude of conditioning factors, and (ii) given a production model from the syntactic grammar to a prosodified utterance, how to possibly define a comprehension model based on that production model.
Given the marked scarcity of computational models of syntactic parsing that incorporate prosodic information in any substantial way, what has allowed our success here? First, we saw from initial fieldwork in Samoan that it appeared to have prosodic events primarily conditioned by syntax, and we pursued further empirical study to clarify the facts enough to ground a first sketch of a production model. The overarching strategy that this exemplifies is to start with empirical case studies where syntax is clearly the primary determining factor for prosody, with an eye towards using our understanding of these to bootstrap work on cases where the syntax-prosody relation is less clear. Second, we explicitly defined an empirically grounded production-oriented grammar for computing the interface in minimalist grammar and took advantage of relevant mathematical results to then define a comprehension model based on the production-oriented grammar. The overarching strategy that this exemplifies is: (i) To define a computational model of the interface, which forces us to explicitly, precisely, and comprehensively state the tentative assumptions adopted, and (ii) to choose classes of computational models with mathematical properties that make it possible to test and compare hypotheses about fundamental properties of different components of the interface and their relations. We briefly explicate the two overarching strategic principles below to conclude.
The crucial property of the Samoan syntax/prosody interface that makes it a good first case study is that it provides clear cases of prosodic events that are under the control of the syntax. We have shown that the prosodic events studied here do not disappear in short or long constituents, changes of speech rate, or changes of speech register; even the dramatic change from tautala lelei to tautala leaga preserves H- marking. Nevertheless, with small probabilities, the tonal events can surface in different ways or fail to surface, and so we have offered up a probabilistic parsing model to handle these exceptions until we better understand all the factors in play. Even with the evidenced nondeterminacy, we have seen that certain tonal events in Samoan are nevertheless very good signals of syntactic structure, and we have described fairly well-understood and flexible methods for modeling these in efficient parsing mechanisms.
It is unlikely that the primacy of syntactic conditioning in the Samoan syntax/prosody interface is anomalous in natural language. There are two ways to locate other such cases. One is for us to continue to expand our range of knowledge about the syntax/prosody interface cross-linguistically in prosodic fieldwork. As a case in point, a striking recent addition to the catalogue of syntactically determined prosody in natural language comes from the Dogon languages of Mali. In the Dogon language of Tommo So, the word for ‘cat,’ gamma bears an HH tone sequence in isolation; gamma bears the same HH sequence in ‘three cats’ and ‘the cat.’ But in the nominal phrases ‘black cat,’ ‘one cat,’ ‘Sana’s cat,’ gamma surfaces with an LL sequence (Heath & McPherson, 2013; McPherson & Heath, 2016). Heath and McPherson (2013); McPherson and Heath (2016) discovered that what tone sequence gamma and other words surface with is completely predictable and insensitive to prosodic factors; for instance, the tone sequences over nominal phrases are completely determined by the syntactic category of ‘controller’ words within the nominal phrase that c-command the other words in the nominal phrase.
The second strategy for uncovering clear cases of prosodic events that are under the control of the syntax is to examine well-studied phenomena and reconsider the assumptions under which they have been analyzed. Since theories of the syntax/prosody interface must make assumptions about syntax and phonology, in addition to what information is passed between them, their ability to fit the data rests on all of these assumptions together. Thus, re-examining assumptions about any component of an interface model can reveal that what has appeared to be poorly understood variability in prosodic events is perhaps in fact a regular consequence of previously unrecognized factors. As an example, Hirsch and Wagner (2015) found that they could reconcile conflicting pattern of results for the prosodification of prepositional phrase attachment in English, e.g., Tap the frog with the flower on the hat, with a syntactic analysis. Snedeker and Trueswell (2003) found that speakers prosodically disambiguated only when disambiguation was needed for the visual scene, but Kraljic and Brennan (2005) found that speakers prosodically disambiguated even if there was disambiguating referential context and they were unaware of the ambiguity. Hirsch and Wagner (2015) found they could account for the conflicting results by noticing syntactic differences between the two sets of experimental stimuli: Snedeker and Trueswell’s (2003) stimuli contrasted left vs. list bracketing, while Kraljic and Brennan’s (2005) stimuli contrasted left- vs. right-bracketing. Another example of work re-examining syntactic assumptions in an interface model is Ahn (2016a), which shows how hierarchical syntactic structure might regularly condition apparent exceptions to the nuclear stress rule in English.
Strategic principle 1 lays the empirical groundwork to motivate production models of the interface for syntactic parsing. The composability of MGs and related finite state systems mentioned in Section 7 and Appendix B makes them an advantageous choice for defining and comparing production models. On the one hand, we can define each component of the interface separately and thus factorize the interacting influences on prosody. On the other hand, these different components can also then be folded in together for comprehension models. Although MGs and finite state interfaces can accommodate a wide range of proposals, they have empirical consequences, some of which are contested. For instance, MGs don’t allow an unbounded number of elements to move to the front of a clause, e.g., MGs cannot express multiple wh-movement with no finite bounds. If we were to compose in a finite state prosodic grammar as part of the interface computation, this would also be restricted. If prosodic events were conditioned on the number of brackets there are at a boundary (and a potentially unbounded number of them), e.g., Wagner (2005, 2010), there would be no way to encode prosody in the grammar: Regular grammars cannot express unlimited sensitivity to the number of brackets. Encoding syntactic and phonological generalizations in MG and related formalisms also forces us to be completely clear about what our account of the empirical facts is—including points we aren’t yet sure of, where we must adopt tentative, likely simplifed assumptions as a starting point. That is, formalizing our accounts computationally lays all our assumptions bare, and is not an endpoint following fieldwork; but an intermediate step in the iterative process cycling between proposing and refining testable hypotheses.
Our current understanding of the conditioning of high edge tones led us to define the computation of the interface in terms of post-syntactic spellout rules that place H- tones in particular syntactic configurations. Stating these rules, and the syntactic grammar they are sensitive to, clarified what we were claiming the ‘absolutive high’ actually is, as we explained in Section 3. But if new data revealed a more general syntactic configuration underlying the H- tones in Samoan, e.g., if every adjoined phrase were marked with an H-, that could also be represented in a minimalist grammar, composed into the syntax; the proposal here does not hang on a particular, precise account of the syntactic conditioning of H-. It could also be informative to formalize interface theories that refer to syntactic phases, e.g., see Ahn (2016b); Dobashi (2004); Kratzer and Selkirk (2007); Dobashi (2009); Cheng and Downing (2016).
If it turns out that ‘information structure’ determines the placement of some H- tones in Samoan, then if these principles are encoded in syntactic structure (e.g., Cheng & Downing, 2009; Kavari et al., 2012), it would require assessing how it could be implemented. Given a precise syntactic account of the view of information structure described in Calhoun (2017), and assuming that the generalizations stated there fit the data, we would assess if we could encode her proposed generalization: That H- tones occur at the edges of incomplete information units because phonological phrases map onto theme and rheme units. In that case, we would also want to revise the prosodic grammar involved in the computation of the Samoan interface to deriving prosodic trees, rather than comprising just some simple rules for inserting tones into syntactic trees. As we speculated in Section 6.5, it may be that there are prosodically- as well as syntactically-conditioned H- tones, in which case we would want our prosodic grammar to include the tone insertion rules, as well as derive prosodic trees.
These are all examples of how empirical work might drive revisions of the computational models. But computational modeling can also drive empirical work, based on what it tells us about broad classes of assumptions about the interface. The proposal for the computation of the Samoan interface in this paper, for instance, tells us something about the broad class of interface theories in which the prosodic grammar does not derive prosodic trees. As we have discussed, it is not obvious that a reasonable comprehension model could be defined for such prosodic grammars. In this paper, however, we have shown that such grammars are in fact compatible with a large range of minimalist parsing models. Thus, the lack of a reasonable comprehension model would not be grounds for rejecting interface theories that exclude prosodic hierarchical structure. Our proposal also shows that we can straightforwardly compute the syntax/prosody interface, even if we assume high edge tones which might overlap significantly in phonetic realization actually arise from different sources, rather than a single unified source in the grammar.
The fine-grained model of the interface we have proposed here—with tones placed by individual rules that refer to specific morphosyntactic constructions—is quite different from other theories of the syntax-phonology interface.30 As stated in Kaisse and Zwicky (1987, p. 7), both theories that have proposed ‘direct reference’ to syntax and those that have proposed ‘indirect reference’ to syntax have agreed that phonological rules refer to cross-categorical relationships rather than specific syntactic categories. For example, Odden (1987) describes five rules of shortening for Kimatuumbi in NPs, VPs, PPs, APs, and PossPs, and then unifies these by saying that shortening applies to the head of a phrase. But it is not clear that a model that hides syntactic structure—whether by restricting what aspects of syntax are visible, or by the introduction of mediating prosodic structure—could fit our current empirical evidence in Samoan better than the fine-grained model we have proposed. We do not find high tones in Samoan for all heads or all phrases—for instance, we would need to explain the asymmetry in the presence of the high tone for absolutives vs. the absence of high tones for ergatives. A fine-grained account that fits the data well sets a challenge—the attempts at deeper or more unified accounts should aim to fit the data as well.
1We do not aim to define a performance model here, but we take a more modest strategy, starting from our understanding of the language structure and asking: how could this be computed? What kinds of mechanisms could compute the structure that competent speakers apparently produce and recognize in the language? Answers to these questions can rule out some mechanisms as inadequate to the task. Arguably, we need answers to these preliminary questions before we can seriously tackle questions about what algorithms are cognitively and neurally realized in human language use.
2The work here all concerns Samoan as spoken in Samoa, and not Samoan as spoken in American Samoa. Mosel and Hovdhaugen (1992, p. 8) wrote: “Today we find a very marked difference in intonation between the two variants [from Samoa vs. American Samoa].”
4As noted by Mosel and Hovdhaugen (1992, p. 144), some linguists make a distinction between [i] and [Ɂi] oblique case markers in Samoan, while others do not. We do not make the distinction here, as we have not discerned this distinction in working with our consultants.
5The following abbreviations are used in this paper: ABS absolutive; CONJ conjunction; COORD coordination; DET determiner; DIR directional particle; DISJ disjunction; ERG ergative; GEN genitive; INA verbal suffix -a/ina; NEG negation; OBL oblique; PERF perfective; PRES present; SG singular; TOP topic marker. Also, F0 and f0 are used for fundamental frequency.
6We leave open here whether an H- in Samoan might sometimes have the status of a prosodic boundary tone, or whether there may be some high edge tones that are syntactically determined and others that are conditioned by prosodic domains. See Sections 4 and 6 and Yu (2017) for further discussion of these issues. The perspective we take for this paper, as a starting point, is to show that current evidence suggests that at least some high edge tones in Samoan are syntactically determined and to define a model to handle these.
7See Calhoun (2015) for examples of F0 contours of declaratives that do not end in final falls.
8But note that the presence and placement of the absolutive H- isn’t some epiphenomenon of our segmentation choice for the segmental case markers; Section 2.4 clearly shows that the same distribution of H- tones occurs in the absence of segmental case markers.
11Two speakers produced the sentences like this, without any pauses. Some speakers sometimes produced a low edge tone and a pause between the two DPs, which complicates our conception of the interface. This complication is handled in Section 6 and Section 7, but we abstract away from it in the current section.
12The remarkably wide pitch excursion in the predicate at the onset of the utterances could be due to utterance preplanning. Utterance-initial F0 and F0 at the first pitch peak in an utterance has been shown to increase with utterance length, see Liberman and Pierrehumbert (1984); Prieto et al. (1996, 2009) for discussion. Because of this initial pitch excursion and the short length of the predicate, it is very difficult to be confident about determining whether an H- is present in the predicate before an immediately following absolutive argument, so we don’t consider that issue for these utterances.
13As an anonymous reviewer points out, based on this data set only, an alternative account of the distribution of the H- would be to say that ergatives and obliques are marked by trailing H- tones. However, this cannot be the case. Yu (2017) shows that in a data set of ditransitive sentences varying argument order, an H- always and only appeared before the absolutive argument.
14In the midst of fieldwork, we discovered that nunua ‘dolphin’ which we found in a Samoan wordbook was either an extremely rare word or possibly a typo for mumua. Although 1 of our older consultants accepted it, for most consultants, it may have been effectively a nonce word. Since nunua was in every single sentence in this ditransitive data set so that every sentence was equally affected by whatever effects nunua’s presence may have caused, the results described here cannot be attributed to something about nunua.
16Perhaps because we asked the consultant to read at a slow pace last (so the materials were very familiar to him at that point), and because he might have interpreted this instruction as being lethargic, his pitch range is smallest among the three rates in the slow rate condition.
17We have found that a number of our consultants are not just open to, but eager to work with us in tautala leaga; in fact, some of our consultants speak in Samoan regularly only in tautala leaga. Consultants found it very natural to read materials written in tautala lelei and produce tautala leaga speech.
18Milner (1993, p. 88) lists this as [ʔisumu], but our consultant did not pronounce the glottal stop in these utterances spoken in tautala leaga.
19Collins’ perspective serves well here because it is relatively well worked out and defended, but our basic claims about the syntax/prosody interface are compatible with various alternative views about case in Samoan and related languages (Chung, 1978; Bittner & Hale, 1996; Massam, 2006, 2012; Koopman, 2012; Tollan, 2015, and others).
20As noted in the previous section, while the ergative case marking is the segmental /e/ in tautala lelei, in tautala leaga, that case marker is usually dropped. Collins (2014, Section 3.4) suggests that in tautala leaga, the dropping of the ergative /e/ in matrix clauses may indicate not just a phonological change but rather an alternation between ergative and nominative. However, with the hypothesis that nominative is realized as H-, this idea is not supported by the data in Yu (2017), since no H- is found in those contexts. Furthermore, recall from Section 2.4 above that the genitive case markers may also be dropped in this casual register, and /t/,/n/ are replaced by /k/, /ŋ/. For the moment, we simply adopt the Collins proposal without his additional proposed explanation of the missing ergative /e/ in tautala leaga, leaving the resolution of that issue to future work.
22Mosel and Hovdhaugen (1992) reports the disjunct as [po], and [poʔo] as a contraction of pe and ‘o.
25and most loanwords as well. No loanwords that are presented in (Zuraw et al., 2014) have pre-penultimate primary stress. But in work not reported in Zuraw et al. (2014), we found one example where our primary consultant volunteered antepenultimate stress (faithful to the source) as a possible stress pattern for a loanword: [t͡saiámit͡sa]~ [t͡sàiamít͡sa] ‘diameter.’
26Just because a low edge tone doesn’t occur with a pause doesn’t necessarily mean that it isn’t still marking the same prosodic domain type as a low edge that does occur with a pause. Conventions for intonational transcription in the autosegmental-metrical tradition make a distinction between perceived size of juncture (‘break index tier’) and tonal events marking junctures (‘tone tier’), e.g., Beckman and Elam (1997). So a tonal event could be transcribed as marking a strong prosodic boundary in the tone tier, while the perceived juncture was marked as being rushed in the break index tier, e.g. missing the typically expected slowdown or pause.
27Although we adopt Collins’ perspective to explore the prosody/syntax interface, recall the alternatives mentioned in footnote 19 and references cited there. The reader is invited to consider whether the prosody/syntax interface defined in this section and in Appendix B could be adjusted to fit those alternative syntactic theories of Samoan case.
28The emulation of Collins’ account with features that trigger movements of each argument is certainly less elegant and is not proposed as an alternative hypothesis about Samoan, but used only to show that the raising of arguments is within the powers of MGs that have already been studied.
29There are various motivations for ‘post-syntactic’ processes like these (Marantz, 1991; Bobaljik, 2008)—e.g., the material inserted is not independently meaningful and does not interact with any aspects of the syntax that affect meaning. But these issues are not important here, since even if our proposal that tone-marking is done by a post-syntactic ‘spellout’ were rejected in favor of a more traditional account, so that the Samoan case markers were lexical items, those case markers would still be sometimes empty, and they would sometimes be homophonous with each other and with other lexical items. So the basic elements of the syntax—bundles of abstract features associated with each of these elements—would still be significantly more ‘abstract’ than what is heard.
30Perhaps a theory that comes closest to ours is that of Steedman (2000). Steedman considers, at least briefly, the possibility of putting pitch accents into the categorial specification of heads, which is quite similar to what is being done with the features in minimalist grammars. But in Samoan we find, in the absolutive and in other constructions, particular featurally-defined structures that trigger H- marking regardless of ‘information structure’ status, e.g., as noted earlier in Section 2, the absolutive H- is insensitive to focus conditions elicited in question-answer pairs.
We gratefully acknowledge our primary consultants in Los Angeles, John Fruean and Kare’l Lokeni and thank Gladys Fuimaono and Peone Fuimaono for coordination of fieldwork in Apia, Samoa, and Jason Brown for coordination of fieldwork in Auckland. We thank Rajesh Bhatt, Mara Breen, Seth Cable, Sandy Chung, James Collins, Lisa Selkirk, Ellen Woolford, Kie Zuraw, three anonymous reviewers, and audiences at Experimental and Theoretical Approaches to Prosody 3, Workshop on the Effects of Constituency on Sentence Phonology, and the Yale University Department of Linguistics for their suggestions and comments, which have greatly improved this work. This work was funded by the Department of Linguistics at University of Maryland College Park and the Department of Linguistics at University of Massachusetts Amherst.
The authors have no competing interests to declare.
Ahn, B. (2016a). The role of syntax in the nuclear stress rule. Proceedings of Speech Prosody. Boston, MA8: 203–206, DOI: https://doi.org/10.21437/SpeechProsody.2016-42
Beckman, M. and Pierrehumbert, J. (1986). Intonational structure in Japanese and English. Phonology Yearbook 3: 255–309, DOI: https://doi.org/10.1017/S095267570000066X
Bennett, R., Elfner, E. and McCloskey, J. (2016). Lightest to the right: An apparently anomalous displacement in Irish. Linguistic Inquiry 47(2): 169–234, DOI: https://doi.org/10.1162/LING_a_00209
Boersma, P. and Weenink, D. (2012). Praat: doing phonetics by computer (version 5.3.18) [computer program], http://www.praat.org
Büring, D. (2003). On D-trees, beans, and B-accents. Linguistics and Philosophy 26(5): 511–545, DOI: https://doi.org/10.1023/A:1025887707652
Cheng, L. L.-S. and Downing, L. J. (2009). Where’s the topic in Zulu?. The Linguistic Review 26: 207–238, DOI: https://doi.org/10.1515/tlir.2009.008
Cheng, L. L.-S. and Downing, L. J. (2016). Phasal syntax = cyclic phonology?. Syntax 19(2): 156–191, DOI: https://doi.org/10.1111/synt.12120
Clemens, L. and Coon, J. (2016). Keough, M. ed. Prosodic constituency of verb-initial clauses in Ch’ol. Proceedings of the the 21st Workshop on Structure and Constituency in Languages of the Americas,
Clifton, C. Jr., Carlson, K. and Frazier, L. (2002). Informative prosodic boundaries. Language and Speech 45(2): 87–114, DOI: https://doi.org/10.1177/00238309020450020101
Collins, J. N. (2016). Samoan predicate initial order and object positions. Natural Language and Linguistic Theory 35(1): 1–59, DOI: https://doi.org/10.1007/s11049-016-9340-1
Dobashi, Y. (2009). Multiple spell-out, assembly problem, and syntax-phonology mapping In: Grijzenhout, J. and Kabak, B. eds. Interface explorations : Phonological domains: Universals and deviations. Berlin, Germany: Mouton de Gruyter, pp. 195–220, DOI: https://doi.org/10.1515/9783110219234.2.195
Elfner, E. (2015). Recursion in prosodic phrasing: evidence from Connemara Irish. Natural Language & Linguistic Theory 33: 1169–1208, DOI: https://doi.org/10.1007/s11049-014-9281-5
Ferreira, F. and Karimi, H. (2015). Prosody, performance, and cognitive skill: Evidence from individual differences In: Frazier, L. and Gibson, E. eds. Explicit and implicit prosody in sentence processing. Switzerland: Springer, pp. 119–132.
Fodor, J. D. (1998). Learning to parse?. Journal of Psycholinguistic Research 27(2): 285–319, DOI: https://doi.org/10.1023/A:1023258301588
Fougeron, C. and Jun, S. (1998). Rate effects on French intonation: prosodic organization and phonetic realization. Journal of Phonetics 26(1): 45–69, DOI: https://doi.org/10.1006/jpho.1997.0062
Harkema, H. (2000). A recognizer for minimalist grammars. Sixth International Workshop on Parsing Technologies, IWPT’00, http://www.informatics.susx.ac.uk/research/groups/nlp/carroll/iwpt2000/after.html
Hayes, B. and Lahiri, A. (1991). Bengali intonational phonology. Natural Language & Linguistic Theory 9: 47–96, DOI: https://doi.org/10.1007/BF00133326
Heath, J. and McPherson, L. (2013). Tonosyntax and reference restriction in Dogon NPs. Language 89(2): 265–295, DOI: https://doi.org/10.1353/lan.2013.0020
Hellmuth, S. (2009). The (absence of) prosodic reflexes of given/new information status in Egyptian Arabic In: Owens, J. and Elgibali, A. eds. Information structure in spoken Arabic. Oxford: Routledge, pp. 165–188.
Hyman, L. M. and Monaka, K. C. (2011). Tonal and non-tonal intonation in Shekgalagari In: Frota, S., Elordieta, G. and Prieto, P. eds. Prosodic categories: production, perception and comprehension. Dordrecht: Springer, pp. 267–289, DOI: https://doi.org/10.1007/978-94-007-0137-3_12
Jun, S. and Fletcher, J. (2014). Methodology of studying intonation: from data collection to data analysis In: Jun, S.-A. ed. Prosodic typology II: The phonology and phonetics of intonation and phrasing. Oxford, England: Oxford University Press, pp. 493–519, DOI: https://doi.org/10.1093/acprof:oso/9780199567300.003.0016
Kahn, J. G., Lease, M., Charniak, E., Johnson, M. and Ostendorf, M. (2005). Effective use of prosody in parsing conversational speech. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language. : 233–240, DOI: https://doi.org/10.3115/1220575.1220605
Kaisse, E. M. and Zwicky, A. M. (1987). Introduction: Syntactic influences on phonological rules. Phonology Yearbook January 19874: 3–11, DOI: https://doi.org/10.1017/S0952675700000749
Khan, S. (2014). The intonational phonology of Bangladeshi Standard Bengali In: Jun, S.-A. ed. Prosodic typology II: the phonology and phonetics of intonation and phrasing. Oxford, England: Oxford University Press, pp. 81–117, DOI: https://doi.org/10.1093/acprof:oso/9780199567300.003.0004
Koopman, H. (2012). Samoan ergativity as double passivization In: Functional heads: The cartography of syntactic structures. Oxford: Oxford University Press, 7pp. 168–180, DOI: https://doi.org/10.1093/acprof:oso/9780199746736.003.0013
Kraljic, T. and Brennan, S. E. (2005). Prosodic disambiguation of syntactic structure: For the speaker or the addressee?. Cognitive Psychology 50: 194–231, DOI: https://doi.org/10.1016/j.cogpsych.2004.08.002
Kratzer, A. and Selkirk, E. (2007). Phase theory and prosodic spellout: the case of verbs. The Linguistic Reiew 24: 93–135, DOI: https://doi.org/10.1515/TLR.2007.005
Krivokapić, J. (2007). Prosodic planning: Effects of phrasal length and complexity on pause duration. Journal of Phonetics 35: 162–179, DOI: https://doi.org/10.1016/j.wocn.2006.04.001
Ladd, D. R. (2008). Intonational phonology. 2nd ed. Cambridge University Press, DOI: https://doi.org/10.1017/CBO9780511808814
Legate, J. A. (2008). Morphological and abstract case. Linguistic Inquiry 29(1): 55–101, DOI: https://doi.org/10.1162/ling.2008.39.1.55
Massam, D. (2001). Pseudo noun incorporation in Niuean. Natural Language & Linguistic Theory 19(1): 153–197, DOI: https://doi.org/10.1023/A:1006465130442
Massam, D. (2006). Neither absolutive nor ergative is nominative or accusative In: Johns, A., Massam, D. and Ndayiragije, J. eds. Ergativity: Emerging issues. Dordrecht: Springer, pp. 26–46, DOI: https://doi.org/10.1007/1-4020-4188-8_2
McPherson, L. and Heath, J. (2016). Phrasal grammatical tone in the Dogon languages: The role of constraint interaction. Natural Language & Linguistic Theory 34: 593–639, DOI: https://doi.org/10.1007/s11049-015-9309-5
Myrberg, S. (2013). Sisterhood in prosodic branching. Phonology 30(1): 73–124, DOI: https://doi.org/10.1017/S0952675713000043
Ochs, E. (1982). Ergativity and word order in Samoan child language. Language 58(3): 646–671, DOI: https://doi.org/10.2307/413852
Odden, D. (1987). Kimatuumbi phrasal phonology. Phonology Yearbook January 19874: 13–36, DOI: https://doi.org/10.1017/S0952675700000750
Prieto, P. (2005). Syntactic and eurhythmic constraints on phrasing decisions in Catalan. Studia Linguistica 59(2/3): 194–222, DOI: https://doi.org/10.1111/j.1467-9582.2005.00126.x
Prieto, P., D’Imperio, M., Elordieta, G., Frota, S. and Vigário, M. (2009). Evidence for ‘soft’ preplanning in tonal production: Initial scaling in Romance. Proceedings of the Speech Prosody 2009 conference. : 803–806.
Prieto, P., Shih, C. and Nibert, H. (1996). Pitch downtrend in Spanish. Jounal of Phonetics 24(4): 445–473, DOI: https://doi.org/10.1006/jpho.1996.0024
R Core Team (2014). R: A language and environment for statistical computing [Computer software manual] In: Austria: Vienna. Retrieved from: http://www.R-project.org/..
Selkirk, E. (2000). The interaction of constraints on prosodic phrasing In: Horne, M. ed. Prosody: theory and experiment. Dordrecht: Kluwer Academic Publishers, pp. 231–261, DOI: https://doi.org/10.1007/978-94-015-9413-4_9
Selkirk, E. (2011). The syntax-phonology interface In: Goldsmith, J., Riggle, J. and Yu, A. C. L. eds. The handbook of phonological theory. Wiley-Blackwell, pp. 435–484, DOI: https://doi.org/10.1002/9781444343069.ch14
Shriberg, E., Stolcke, A., Hakkani-Tür, D. and Tür, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication 32: 127–154, DOI: https://doi.org/10.1016/S0167-6393(00)00028-5
Snedeker, J. and Trueswell, J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language 48: 103–130, DOI: https://doi.org/10.1016/S0749-596X(02)00519-3
Speer, S. R. and Foltz, A. (2015). The implicit prosody of corrective contrast primes appropriately intonated probes In: Frazier, L. and Gibson, E. eds. Explicit and implicit prosody in sentence processing. Switzerland: Springer, pp. 263–285, DOI: https://doi.org/10.1007/978-3-319-12961-7_14
Stabler, E. P. (2013). Two models of minimalist, incremental syntactic analysis. Topics in Cognitive Science 5(3): 611–633, DOI: https://doi.org/10.1111/tops.12031
Steedman, M. (2000). Information structure and the syntax-phonology interface. Linguistic Inquiry 31(4): 649–689, DOI: https://doi.org/10.1162/002438900554505
Wagner, M. (2010). Prosody and recursion in coordinate structures and beyond. Natural Language & Linguistic Theory January 201028(1): 183–237, DOI: https://doi.org/10.1007/s11049-009-9086-0
Yu, K. M. (2011). Lima, S., Mullin, K. and Smith, B. eds. The sound of ergativity: Morphosyntax-prosody mapping in Samoan. Proceedings of the 39th Annual Meeting of the North East Linguistic Society. Amherst, MAGraduate Student Linguistic Association2: 825–838.
Zhang, N. N. (2009). Coordination in syntax. NY: Cambridge University Press, DOI: https://doi.org/10.1017/CBO9780511770746
Zuraw, K., Yu, K. M. and Orfitelli, R. (2014). The word prosody of Samoan. Phonology 31: 1–58, DOI: https://doi.org/10.1017/S095267571400013X