1 Introduction

It has long been clear that syntax determines certain aspects of prosody, and that prosody should therefore be part of the grammar influencing how a parser arrives at the syntactic analysis of an utterance (Chomsky, 1955, II-2fn). However, it has remained unclear how to bring prosody into computational models of syntactic parsing. The few models that have incorporated any substantial prosodic information do not do so on the basis of a generative model of how syntax structurally conditions prosody. Instead, they tend to treat prosodic information as another class of bottom-up cues and mainly focus on English, e.g., Shriberg et al. (2000); Kahn et al. (2005); Huang and Harper (2010); Pate and Goldwater (2013). Here, we report on generalizations about the Samoan syntax-prosody interface uncovered by original fieldwork. We use these generalizations to motivate grammatical rules stating how syntactic structure conditions the insertion of tonal elements, and we show how the syntax/prosody interface in Samoan could be computed in a comprehension model using these rules.1

The challenge for defining a prosodically-informed comprehension model is that there is a multitude of interacting factors that condition the appearance and realization of prosodic events in the speech signal, e.g., see Yu (2014, Appendix B, p. 777). Tonal events are only a subset of prosodic events, but the factors that have been proposed to condition tonal events are already numerous and diverse. In addition to syntactic structure, these include lexical representation, e.g., lexical accent in Swedish, phonological grammar (Nespor & Vogel, 1986; E. Selkirk, 2003), e.g., the rising pitch accent associated with predictable primary stress in Egyptian Arabic (Hellmuth, 2009, 2006), inflectional morphology, e.g., tonal marking of genitive case in Igbo ‘associative’ constructions (Hyman, 2011), and pragmatics, e.g., the English contrastive topic rise-fall-rise contour (Jackendoff, 1972, Büring, 2003, Constant, 2014, i.a.). To complicate matters further, a given tonal event might reliably appear in a particular kind of syntactic environment—sometimes. Whether it might appear could depend on its sensitivity to phonological factors such as speech rate (Hayes & Lahiri, 1991; Fougeron & Jun, 1998) which might make the tonal event difficult to detect or even absent; its presence and phonetic realization might also be variable between speakers due to individual differences that aren’t yet well-understood, e.g., Clifton Jr. et al. (2002); Ferreira and Karimi (2015); Speer and Foltz (2015).

Thus, the speech signal (and the prosodic information contained within it) that both the analyst and listener are confronted with is the result of the interaction of this multitude of conditioning factors. From this output, how can we factor out the contribution of syntax to conditioning prosodic events? And if we are able to do that factorization and define a production model from the syntactic grammar to a prosodified utterance, how can we then define a comprehension model based on that production model? This paper answers these two questions. To isolate the contribution of syntax or any other factor in intonational fieldwork, we systematically vary one factor while holding others constant, just like in Bruce’s (1977) landmark study on word accent in Stockholm Swedish. Following this strategy, we show that in Samoan, syntax appears to be the primary conditioning factor on the placement of high edge tones. This makes defining the foundations of a production model for Samoan straightforward (as opposed to say, English, where it is much less apparent how to decouple the contribution of syntax to conditioning prosodic events). Based on the fieldwork, we stipulate spellout rules that insert high edge tones and adjoin them in the syntactic tree in exactly and only the structural configurations where high edge tones reliably occur. But defining a corresponding comprehension model is not as simple as running the production model in reverse. Intuitively, the problem is that in the comprehension direction, the phonological grammar does not deliver well-formed trees to the parser—only a string. How then, do we get from a string to a tree? Nevertheless, we show here that we can still compute the syntax-prosody interface in a comprehension model even if the prosodic grammar does not derive hierarchical structures separate from the syntactic grammar (a property it shares with prosodic grammars in ‘direct reference’ theories of the interface, e.g., Kaisse, 1985; Odden, 1987; Pak, 2008; see Elordieta, 2008 for a review).

The structure of the remainder of the paper is as follows: After reporting methods of data collection and analysis in Section 1.1, we first show that while the placement of high edge tones in Samoan may at first seem unsystematic, at least some of its positions are very reliably predicted by syntactic structure. While absolutive DPs have been assumed to be unmarked in Samoan, Yu (2011, 2017) noticed that they are preceded by a high edge tone. This paper confirms that this correlation is very reliable and provides evidence that it does not vary with prosodic length, speech rate, register, or focus (Section 2). Considering the syntax more carefully in Section 3, we show how this case marking can be added to the proposals of Collins (2016, 2015, 2014). Collins argues, following Legate (2008), that the Samoan absolutive is actually either nominative or accusative, and that we can define the case marking of these positions as part of the morphophonological spellout. Then we extend the account to some additional constructions (Section 4) and show how the syntax and interface proposals extend easily to these (Section 5). We observe some further complications in the data that we do not yet understand (Section 6), and then briefly consider how, in spite of variability that is not yet understood, a parsing model can use the relatively invariable case marking rules (Section 7). We conclude briefly with the broader lessons of this case study (Section 8).

1.1 Materials and methods

Prosodic data and analyses used for this paper are available as on-line supplementary material at the following link: http://www.krisyu.org/blog/supp-material-invariability-samoan-interface.html.

1.1.1 Consultants and elicitation

Data were collected in the Los Angeles area in one- to two-hour sessions from September 2007 to December 2014 with 1 main consultant, aged 19 when we started working with him. He was born and raised in Upolu, Samoa and moved to the Los Angeles area in 2003. Data were also elicited and recorded from 4 consultants in Apia, Samoa in November 2011, and an additional female consultant in her 50s in the Los Angeles area in January 2012. The additional consultant in Los Angeles had been in the United States for 27 years, but regularly spent an extended part of the year in Samoa. The consultants in Samoa included 3 men, aged 21 to 23, and 1 woman aged 46, from the capital city of Apia and other areas of Upolu. Data were also elicited and recorded in Auckland, New Zealand in July 2015 from 3 additional female speakers, 2 of which are analyzed here. One (f03), aged 48, grew up in Apia and had been in New Zealand since 2009; the other (f05) was aged 19, grew up in Savai’i and had been in New Zealand since age 10.2 All consultants spoke Samoan regularly or primarily in daily life and were literate in Samoan, but also spoke English as a second language with some fluency. English was used as the contact language. Elicitation items were presented individually on slides on a computer screen, and they were elicited in randomized order. The consultant was asked to read each sentence at least twice. Unless otherwise stated, sentences were elicited out-of-the-blue.

1.1.2 Recordings

All recordings in Los Angeles and Samoa were made directly to a computer through a head-mounted microphone (Shure SM10A); the signal ran through a Shure X2u pre-amplifier and A-D device. Recordings in Auckland, New Zealand were made with a Shure SM10A microphone to a Marantz PMD661 MKII recorder. All recordings were made at a sampling rate of 22,050 Hz with 16-bit precision. Recording sessions in Los Angeles were made in either a sound-attenuated booth or a quiet room, while recordings in Samoa and Auckland were made in a quiet room.

1.1.3 Analysis

All sound files were segmented and annotated using Praat (Boersma & Weenink, 2012). Utterances were segmented by word and syllable and transcribed intonationally by the first author. However, our main strategy for detection of high edge tones (H- tones) in fundamental frequency (F0) contours was to rely on phonetic comparisons of F0 contours within minimal sets (Yu, 2014); see, for example, Yu (2017) and Figure 3 in Clemens and Coon (2016) for additional examples of comparisons of this type. What this means is that we did not rely on intonational transcriptions of individual utterances to tally up where H- tones were present or absent in each utterance (except in Section 6, which is exploratory work comparing counts of multiple kinds of tonal events). Instead, we determined how some factor (e.g., speech rate) conditioned the presence of an H- by comparing F0 contours between utterances varying only for that factor (e.g., slow vs. fast speech rate), much like Bruce (1977). This analysis based on comparing F0 contours is advantageous because it is transparent and reproducible; it helps control for allophonic variation in the realization of H- tones which may make H- tones difficult to detect; it prevents the transcriber from imposing any subjective biases in transcription, and it releases the transcriber from making difficult judgment calls for transcriptional labels. But this phonetic approach is only possible when enough is known about the basic units of the intonational system and what conditions them so that the analyst can design structured elicitations investigating these basic units. And, initial discovery of these basic units is facilitated by the challenge of labeling them in transcription. That is to say, the phonetic approach emphasized here doesn’t replace intonational transcription, but complements it.

F0 extraction was performed using Praat’s autocorrelation algorithm, as implemented in VoiceSauce (Shue et al., 2011), software for automatic voice quality analysis, with the floor and ceiling values for candidate F0 values set to 40 Hz and 300 Hz, respectively, and default settings for other parameters.3 For the F0 contours plotted throughout the paper, F0 values were averaged over each of 10 time slices uniformly dividing each syllable for each utterance, e.g., the first F0 value was the average F0 over the first tenth of the syllable. Converting the time scale from absolute time in seconds to time in syllables allowed trends in the shape of F0 contours to be captured without variability conditioned on speech rate. All further data processing and analysis was performed in R (R Core Team, 2014). For the most part, this consisted of averaging F0 contours across sentences and/or across speakers. All plots were created using the ggplot2 package (Wickham, 2009). Gray ribbons flanking lines in any plot of F0 contours show ±1SE.

2 Syntax-prosody 1: The invariable absolutive high

Samoan is a Polynesian language with an ergative/absolutive case-system. The sentences in (1) exemplify properties of this kind of case-system (see Deal, 2015 for an overview of ergativity): The subject of a transitive clause, e.g., le malini ‘the marine’ in (1a), is marked with a distinct case—the ‘ergative.’ The subject of an intransitive clause, e.g., le malini in (1b), and the object of a transitive clause, e.g., le mamanu ‘the design’ in (1a), both appear unmarked and receive ‘absolutive’ case (Chung, 1978, p. 54–56; Ochs, 1982, p. 649), though as we will discuss below, an alternative analysis is offered by Collins (2016, 2014), following Legate (2008). Samoan primarily has VSO word order in transitive clauses, as exemplified in (1a), which also shows that the transitive subject is marked by the ergative case marker e. The intransitive clause (1b) demonstrates that the prepositional element [i] is a marker of oblique case. This preposition marks stative agents (Chung, 1978, p. 29), and also indirect objects, locatives, temporal expressions, sources, and goals (Mosel & Hovdhaugen, 1992, p. 144).4

    1. (1)
    1. Ergative-absolutive patterns in transitive and intransitive clauses5
    1.  
    1. a.
    1. Transitive clause
    1. na
    2. PAST
    1. lalaŋa
    2. weave
    1. *(e)
    2. ERG
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. le
    2. DET.SG
    1. mamanu.
    2. design
    1. ‘The marine wove the design.’
    1.  
    1. b.
    1. Intransitive clause
    1. na
    2. PAST
    1. ŋalue
    2. work
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. (i
    2. OBL
    1. le
    2. DET.SG
    1. mamanu).
    2. design
    1. ‘The marine worked (on the design).’

The following sections first review evidence for tonal marking of absolutive case in Samoan (Section 2.1) and then present new evidence that the appearance of a high edge tone preceding absolutive arguments is insensitive to prosodic length (Section 2.1 and Section 2.2), speech rate (Section 2.3), and speech register (Section 2.4).

2.1 Review of evidence for tonal marking of absolutive case

Yu (2011, 2017); Yu and Özyıldız (2016) showed that absolutive case in Samoan is not unmarked and does in fact have a phonological correspondent in spellout. As shown in (2), revised from (1), a high tone—which we notate as ‘H-’ and gloss as ABS—appears at the right edge of the phonological material immediately preceding the absolutive argument: Before the object le mamanu ‘the design’ in the transitive clause (2a), and before the subject le malini ‘the marine’ in the intransitive clause (2b).

    1. (2)
    1. Revision of (1): A high edge tone (H-) precedes absolutive arguments
    1.  
    1. a.
    1. Transitive clause
    1. na
    2. PAST
    1. lalaŋa
    2. weave
    1. *(e)
    2. ERG
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. mamanu.
    2. design
    1. ‘The marine wove the design.’
    1.  
    1. b.
    1. Intransitive clause
    1. na
    2. PAST
    1. ŋalue
    2. work
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. (i
    2. OBL
    1. le
    2. DET.SG
    1. mamanu).
    2. design
    1. ‘The marine worked (on the design).’

The notation ‘H-’ comes from conventions for the intonational transcription of tonal events developed in autosegmental-metrical theory (Pierrehumbert, 1980; Beckman & Pierrehumbert, 1986; Beckman & Elam, 1997; Ladd, 2008). The ‘H’ stands for a high F0 target and the ‘–’ is a diacritic we use merely to indicate that the high tone is an edge tone associated to a word edge, rather than a pitch accent associated to a stressed syllable. Other morphosyntactic structures in addition to absolutive arguments also reliably surface with an H-, as we will discuss in detail in Section 4. By using the ‘–’ diacritic, we do not mean to imply that an H- is a prosodic boundary tone, associated to some prosodic constituent in a prosodic hierarchy; we simply mean to say, descriptively, that the tone appears at edges.6 Evidence that H- tones are edge tones and not pitch accents is given in Section 4.

The evidence Yu (2011, 2017) used to argue that an H- always appears before an absolutive argument came from directly comparing F0 contours between minimally different syntactic structures elicited in fieldwork (see Figures 4 and 5 for examples of this kind of comparison). We emphasize that this evidence came from comparing F0 contours rather than comparing intonational transcriptions (the same is true for all the evidence introduced in this paper, except for in Section 6). Yu (2011, 2017) showed that an H- appeared before the absolutive argument irrespective of diverse syntactic and semantic properties of the argument: Before subjects of intransitive clauses, objects of transitive predicates, proper names, pronouns, nominalized verbs, and regardless of specificity or number. Moreover, the presence of the absolutive H- was insensitive to argument order—e.g., in verb-initial ditransitives, the position of the H- tracks the left edge of the absolutive argument, regardless of the order of subject and objects—and the absolutive H- was absent before absolutive arguments that weren’t overt—e.g., in pro drop of absolutives and extraction of absolutives out of relative clauses.

In addition, H- tones were not observed before bare NPs in environments where bare NPs are independently expected not to be case marked (pseudo-noun incorporation constructions, which have surface VOS order) or where ergative and oblique case marking are also banned, e.g., on arguments in fronted predicates, see Yu and Özyıldız (2016, Section 3.4) for details. Moreover, although Calhoun (2017) shows that no H- appears before post-verbal absolutive arguments under [naɁo] ‘only’ and argues that this data is problematic for positing an absolutive H-, Yu and Özyıldız (2016, Section 3.4.1) and Yu (2017) show that no case markers can co-occur with [naɁo], whether segmental or the H-.

Finally, the presence of the absolutive H- was not sensitive to different focus conditions elicited in question-answer pairs over a range of focus conditions (broad focus, wh subject focus, corrective subject focus, wh object/PP focus, corrective object/PP focus) and answer types (VSO, VOS, fronted subject, fronted object), for both transitives and intransitives (for more detail on the stimulus set, see Appendix A.2). An H- always appeared before the absolutive argument, and never before the ergative argument or oblique object (with some rare, non-systematic exceptions; see Section 6.1)—whether an argument was given, new, or under contrastive focus in the answers to the questions. This result is consistent with Calhoun’s (2015) results from intonational transcriptions for sentences, which also showed no evidence that the H- preceding the absolutive was sensitive to discourse structure. Utterances in that study were elicited under broad focus (‘What happened earlier’), question focus on the agent or direct object, and contrastive focus on the agent or direct object.

The phonetic realization of the absolutive high edge tone is shown in the context of entire utterances in Figure 1 and over a single word in Figure 2. Figure 1a displays an annotated F0 contour for (2a), while Figure 1b displays an annotated F0 contour for (2b). There are three different kinds of tonal events labeled in these figures: LH* (a rising pitch accent), H- (a high edge tone), and L-L% (an utterance final fall),7 which we discuss further in the context of Figure 2. We remind the reader that only the data in Section 6 comes from intonational transcription, while the rest of the data introduced in this paper comes directly from the F0 contours. Nevertheless, it is still useful to discuss the tonal events in terms of intonational labels to describe general observations about their phonetic realization. By convention, we place the label for an LH* pitch accent over the primary stressed syllable it is associated to in all intonational transcription displays. We also segment the ergative and oblique case markers together with the last syllable of the preceding word, e.g., [ŋa e], [ni i] in the annotation of the F0 contours, because it is very difficult to develop consistent criteria for deciding on where one vowel ends and another begins.8 There are two sites that illustrate the realization of the H- in Figure 1a and b: (a) The final syllable of the verbs ([lalaŋa] ‘weave’ and [ŋalue] ‘work’)—an H- keeps the F0 contour high at the right edge of [ŋalue] in Figure 1b but the F0 contour falls over the last syllable of [lalaŋa] in Figure 1a; and (b), the final syllable of [malini] shows an H- keeping F0 high in Figure 1a, preceding the object, but not in Figure 1b, preceding the oblique PP, where F0 falls over the last syllable of [malini].

Figure 1 

F0 contours in basic VS(O) declaratives. Pitch accent rises (LH*) occur over primary stressed syllables. An H- occurs before the absolutive object [le mamanu] ‘the design’ in Figure 1a and before the absolutive subject [le malini] ‘the marine’ in Figure 1b.

Figure 2 

Phonetic realization of the absolutive high edge tone, contrasting: (a) When sentence-medial malini ‘marine’ is followed by an oblique PP, so H- is absent, see example (2b), vs. (b) when malini is followed by an absolutive argument, so H- is present, see example (2a). In both figures, malini receives a LH* pitch accent associated to the stressed penultimate syllable.

For a more detailed explication of LH* and H- tonal events, we turn to Figure 2. Figure 2a shows a representative F0 contour over malini when it is the subject of the intransitive clause in (2b) and followed by an oblique PP: No H- appears at the right edge of malini. In contrast, Figure 2b shows a representative F0 contour over malini when it is the subject of the transitive clause in (2a) and immediately followed by the object le mamanu: An absolutive H- appears at the right edge of malini. We emphasize that malini is not the absolutive argument in either of the figures; rather, the H- that appears on malini in Figure 2b marks the absolutive argument coming up immediately after malini, which is not shown.9

To describe the realization of the H-, we first need to explain the rising tonal events in both F0 contours which we transcribe as ‘LH*,’ following Orfitelli and Yu (2009); Zuraw et al. (2014), where the ‘*’ is a diacritic from autosegmental-metrical theory that indicates pitch accenthood, and ‘L’ stands for a low pitch target.10 This is a pitch accent associated to the penultimate syllable, which receives primary stress. The basic footing pattern in Samoan, as observed in monomorphemes, consists of a moraic trochee at the right edge of the word (Zuraw et al., 2014). Primary stress is on the final vowel if it is long, e.g., la(ˈvaː) ‘energized,’ and otherwise on the penultimate vowel, e.g., ma (ˈlini) ‘marine.’ Thus, ma(ˈlini) has a rising pitch accent associated to the penultimate syllable, where the low F0 valley appears around the onset of the stressed mora, and the high F0 peak appears at or slightly later than the offset of the stressed mora (see also Orfitelli & Yu, 2009; Zuraw et al., 2014; Calhoun, 2015 for more on pitch accent realization). If the immediately following tonal event is another pitch accent, e.g., on mamanu in (2b), then the F0 contour over malini falls after the high F0 peak over the last syllable towards the L of this next pitch accent, as in Figure 2a. If however, an H- is present, then the F0 contour continues to rise over the last syllable of malini, as in Figure 2b. Yu (2017) also shows that this high F0 continues into the beginning of the absolutive argument, and the persistence of high F0 into the absolutive argument can also be seen in Figures 4 and 5b.

In the remainder of this section, we provide additional empirical evidence that the syntax completely determines the presence of the high tone as an absolutive case marker. We show that the presence of the high tone is insensitive to prosodic length (Section 2.2), speech rate (Section 2.3), and speech register (Section 2.4). This sets up our initial picture of the syntax/prosody interface in Samoan in Section 3, for which we make the methodological abstraction that the moment that a parser detects a high tone, it can conclude that an absolutive argument is about to occur, i.e., we don’t consider multiple triggers of high tones yet (these include coordination and fronting). This is a good first step towards tackling the Samoan syntax-prosody interface, but we introduce evidence in Sections 4–6 to support complications to this picture that we adjust for in our analysis of the interface: Adjustments that reveal Samoan intonation to have some of the kinds of variability seen in other languages like English, though perhaps to a lesser extent.

2.2 Evidence for insensitivity of the absolutive high to prosodic length

If, in addition to syntax, prosody also played a role in determining the presence of the high tone as an absolutive case marker, i.e., if the high edge tone were a consequence of prosodic phrasing choices, then we would expect it to be sensitive to factors known to influence prosodic phrasing (other than syntactic constituency). A large body of work has suggested that prosodic restrictions that regulate size and eurythmy play a role in determining prosodic phrasing decisions, e.g., Nespor and Vogel (1986); Ghini (1993b, 1993a); Fodor (1998); E. Selkirk (2000); Prieto (2005). One general principle that has been discussed in the literature states that prosodic phrasing favors structures where sister prosodic constituents are roughly equal in prosodic size or weight, e.g., (Fodor, 1998, p. 304). A number of related optimality-theoretic constraints formulated in terms of the size of prosodic constituents (taking prosodic constituents more deeply embedded to be relatively smaller in size than those higher in the prosodic tree) have been proposed to drive prosodic phrasing choices that appear to mismatch with syntactic constituency. For instance, Myrberg (2013) accounts for variability in the prosodic phrasing of clauses with embedded structures in Stockholm Swedish by showing how a markedness constraint EQUALSISTERS (sister nodes in prosodic structure are instantiations of the same prosodic category) might underlie the well-formedness of prosodic phrasing choices that mismatch with syntactic constituency; see also related work in Irish (Elfner, 2012, 2015; Bennett et al., 2016).

If the presence of the absolutive high were conditioned on prosodic phrasing choices, we would expect to see variability in its presence, as well as variability in the presence of a high tone elsewhere in an utterance, depending on prosodic length/size. This section shows that we do not see such variability in the tonal marking of the Samoan absolutive.

2.2.1 Extremely long DPs

The first piece of evidence comes from sentences with extremely long DPs. In the sentences discussed in this section, shown in (3) and Table 1, the DPs X, Y are 17–28 syllables long. The sentences have the same basic syntactic structure as those in (2); they just have much longer DPs. In addition to potentially increasing the probability of a prosodic break between the two DPs or anywhere else in the utterance, having extremely long DPs also makes the phonetic realization of the H- as visualized in F0 contours much more easily visible to the naked eye than in F0 contours for short DPs. This is because of the large drop in F0 range due to a downtrend in F0 before the site of the H-, which is much larger over the course of long DPs under discussion here, than over the short DPs in Figure 1.

Table 1

Structured elicitation for extremely long DPs: A fully crossed 2 × 2 design for WORD ORDER (default, scrambled) × TRANSITIVITY (transitive, intransitive). The segmental strings X and Y are given in (3).

SENTENCE STRUCTURE WORD ORDER

default scrambled

transitive Verb [e X]erg [Y]abs Verb [X]abs [e Y]erg
intransitive Verb [X]abs [i Y]obl Verb [i X]obl [Y]abs
    1. (3)
    1. Template for constructing verb-initial sentences with extremely long DPs (case markers not shown here, but are shown in Table 1)
    1.  
    1. na ‘PAST’ Verb [X] [Y], where:
      Verb ∈ {laŋona ‘hear,’ manoŋi ‘be smelly’}
    1.  
    1. X =
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. a
    2. GEN
    1. le
    2. DET.SG
    1. milionea
    2. millionaire
    1. leaŋa
    2. bad
    1. mai
    2. DIR
    1. ierusalema
    2. Jerusalem
    1. i
    2. OBL
    1. luŋa
    2. on
    1. o
    2. GEN
    1. le
    2. DET.SG
    1. mamanu
    2. design
    1. ‘the lion of the bad millionaire from Jerusalem on the design’
    1.  
    1. Y =
    1. le
    2. DET.SG
    1. manu-lele
    2. animal-fly
    1. a
    2. GEN
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. maːnaia
    2. nice
    1. mai
    2. DIR
    1. Apia
    2. Apia
    1. ‘the bird of the nice marine from Apia’

Keeping the DPs constant, we manipulated (a) TRANSITIVITY to be either transitive (with the transitive verb [laŋona]) or intransitive (with the intransitive verb [manoŋi], and (b) the WORD ORDER to be default (VSO/V-S-PP) or scrambled (VOS/V-PP-S). These manipulations are summarized in Table 1.

If the appearance of a high tone were being governed by prosodic restrictions on eurythmy to break the sentence into roughly equal halves, we might expect a high edge tone to appear between the two DPs in the sentence, regardless of word order or transitivity. However, Figure 3 shows that this is not the case in representative F0 tracks from a single speaker who uttered the sentences without discernable silent pauses.11 There are many peaks in the F0 contour from LH* pitch accents over content words, but we annotate the F0 contour only at the site between the two DPs to highlight what is happening at this point (see on-line supplementary material for more detailed annotations of these F0 contours; link given at the beginning of Section 1.1).12 We found a sentence-medial H- for the VSO transitive condition (Figure 3a), as well as for the V-PP-S intransitive condition (Figure 3d). However, no sentence-medial H- between the two DPs occurred in the other two conditions, so the generalization for the distribution of the H- cannot be that it occurs before the second post-verbal argument. Rather, an H- appeared between DPs only when the second DP was an absolutive argument.13

Figure 3 

F0 contours for extremely long DPs in (3). An absolutive H- occurs on the right edge of the first DP: (a) In a transitive clause with VSO order, and (d) in an intransitive clause with V-PP-S order. No H- occurs on the right edge of the first DP in (b), which is a transitive clause with VOS order, or in (c), which is an intransitive clause with V-PP-S order.

2.2.2 Lengthened arguments in ditransitive sentences

In this section, we present additional evidence showing that the absolutive high isn’t a consequence of prosodic phrasing choices conditioned on prosodic length. This evidence comes from production data where arguments in ditransitive sentences were systematically lengthened. As exemplified in (4),14 we increased the prosodic length of the ergative argument by adding adjectival or locative phrases. We also increased the prosodic length of the absolutive and oblique arguments in precisely the same way, only ever lengthening one argument in each utterance. We recorded this data set with 6 speakers (4 speakers in Samoa and 2 in Los Angeles). In (4) below, the ergative argument is enclosed in brackets, and material added for prosodic lengthening is bold-faced.

    1. (4)
    1. Examples of systematic lengthening of ergative argument with modifiers
    1.  
    1. a.
    1. Unlengthened
    1. na
    2. past
    1. momoli
    2. take
    1. [e
    2. ERG
    1. le
    2. DET.SG
    1. liona]
    2. lion
    1. le
    2. DET.SG
    1. nunua
    2. “dolphin”
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. toloa
    2. duck
    1. ‘The lion took the dolphin to the duck.’
    1.  
    1. b.
    1. Modified with short adjectival phrase
    1. na
    2. past
    1. momoli
    2. take
    1. [e
    2. ERG
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. leaŋa]
    2. bad
    1. le
    2. DET.SG
    1. nunua
    2. “dolphin”
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. toloa
    2. duck
    1. ‘The bad lion took the dolphin to the duck.’
    1.  
    1. c.
    1. Modified with short prepositional phrase
    1. na
    2. past
    1. momoli
    2. take
    1. [e
    2. ERG
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. i
    2. OBL
    1. lalo]
    2. below
    1. le
    2. DET.SG
    1. nunua
    2. “dolphin”
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. toloa
    2. duck
    1. ‘The lion downstairs took the “dolphin” to the duck.’
    1.  
    1. d.
    1. Modified with long adjectival phrase
    1. na
    2. past
    1. momoli
    2. take
    1. [e
    2. ERG
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. lanu-moana]
    2. color-sea
    1. le
    2. DET.SG
    1. nunua
    2. “dolphin”
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. toloa
    2. duck
    1. ‘The blue lion took the dolphin to the duck.’
    1.  
    1. e.
    1. Modified with long prepositional phrase
    1. na
    2. past
    1. momoli
    2. take
    1. [e
    2. ERG
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. i
    2. OBL
    1. luma
    2. above
    1. o
    2. GEN
    1. le
    2. det.sg
    1. mamanu]
    2. design
    1. le
    2. DET.SG
    1. nunua
    2. “dolphin”
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. toloa
    2. duck
    1. ‘The lion on top of the design took the dolphin to the duck.’

If the absolutive high were a prosodic boundary tone associated with a prosodic constituent, we would expect variation in its placement and appearance. This would be a consequence of expected variation in the prosodic phrasing of the ditransitive structures conditioned on prosodic length of the arguments. For instance, in Connemara Irish, Elfner (2012, Section 4.3) finds variation in the prosodic phrasing choices of VSO sentences—and thus, the appearance and positioning of tones reflecting these phrasing choices—depending on whether the arguments are single words (bare nouns), or nouns modified by adjectives. Elfner (2012) attributes this variation to interaction between prosodic markedness constraints, which derives different preferences for prosodic phrasing choices depending on argument size (single word vs. noun-adjective). However, comparing F0 contours within our Samoan ditransitive data set (see on-line supplementary material for F0 data), we found that a high edge tone appeared before (and only before) the absolutive argument, regardless of the length manipulations, as summarized in Table 2.

Table 2

Summary of distribution of H- in ditransitives as an argument is lengthened under modification. Each ‘x’ indicates a syllable, and acute accents indicate primary stress. All sentence structures tested in which the modified argument was ergative are shown. Only the first two are shown for when the modified argument was absolutive or oblique, as the remaining not shown are the same as for when the modified argument was ergative, mutatis mutandis.

Modification Modified argument Sentence structure schematic

Unlengthened ERG na momoli [x xx́x]ergH- [x xx́x]abs [x xx́x]obl
Short AP ERG na momoli [x xx́x xx́x]ergH- [x xx́x]abs [x xx́x]obl
Short PP ERG na momoli [x xx́x x x́x]ergH- [x xx́x]abs [x xx́x]obl
Long AP ERG na momoli [x xx́x x́x-xx́x]ergH- [x xx́x]abs [x xx́x]obl
Long PP ERG na momoli [x xx́x x x́x x x xx́x]ergH- [x xx́x]abs [x xx́x]obl

Short AP ABS na momoli [x xx́x]ergH- [x xx́x xx́x]abs [x xx́x]obl
Short PP ABS na momoli [x xx́x]ergH- [x xx́x x x́x]abs [x xx́x]obl

Short AP OBL na momoli [x xx́x]ergH- [x xx́x]abs [x xx́x xx́x]obl
Short PP OBL na momoli [x xx́x]ergH- [x xx́x]abs [x xx́x x x́x]obl

2.3 Evidence for insensitivity of the absolutive high to speech rate

Having presented evidence that the presence of the absolutive high is insensitive to prosodic length/size, we now provide evidence to show that it is also insensitive to speech rate. As a baseline for comparison, consider the classic example of sensitivity of prosodic phrasing to speech rate in this example from Calcutta Bengali (Hayes & Lahiri, 1991 [54a]), where parentheses delimit ‘phonological phrases.’15 (Another example is Fougeron & Jun, 1998 on French).

    1. (5)
    1. Sensitivity of prosodic phrasing to speech rate in Calcutta Bengali
    1.  
    1. a.
    1. (ɔmor)
    2. Armor
    1. (čador)
    2. scarf
    1. (tara-ke)
    2. Tara-obj
    1. (díečhe)
    2. gave
    1. deliberate speech
    2.  
    1. ‘Armor gave a scarf to Tara’
    1.  
    1. b.
    1. (ɔmor čador) (tara-ke) (díečhe) faster speech
    1.  
    1. c.
    1. (ɔmor) (čador tara-ke) (díečhe) faster speech
    1.  
    1. d.
    1. (ɔmor čador tara-ke) (díečhe) very rapid speech

In Calcutta Bengali, phonological phrases are produced with rising pitch contours, with a L* pitch accent at the left edge and a high edge tone at the right edge. Therefore, the fact that phrasing in Calcutta Bengali is acutely sensitive to speech rate, means that so too is the placement of the L* and high edge tones: The loss of phonological phrase boundaries in faster speech entails the loss of L* and high edge tones. We will see, however, that the presence and placement of H- tones does not vary with speech rate in the Samoan data set presented in this section (although—unsurprisingly—the phonetic realization of H- tones is sensitive to speech rate).

We elicited simple transitive and intransitive sentences, varying the number of syllables between the absolutive high and neighboring primary stress (to observe the effect of tonal crowding on the realization of the H- for another study; tonal crowding occurs when there is close spacing between neighboring tonal events), and asked our primary consultant to read them at a comfortable pace, and then a fast pace, and a slow pace (see on-line supplementary materials for more information on speech rate under these different conditions).16 A sample minimal pair in the data set—a transitive sentence and its intransitive counterpart—is shown in (6). For a full description of the elicited sentences, see Appendix A.1. One thing to note about the sentences is that since they were also designed to test the effect of tonal crowding on the realization of the H-, there were a number of sentences where the absolutive argument was initially stressed and/or vowel-initial. In such sentences, it appears that there is a compromise between the conflicting demands of realizing the high target of the H- and the low target of the immediately following LH* pitch accent on the stressed syllable, so that F0 contours on the last syllable immediately preceding the absolutive (i.e., in the third syllable, S3) can be seen to fall slightly in Figure 4, which compares F0 contours between absolutive and ergative subjects and objects. The H- is nevertheless present and positioned before the absolutive as expected.

Figure 4 

Comparison of mean F0 contours over 3-syllable subjects and objects under slow, normal, and fast speech rates for sentences like (6). Vertical lines delimit syllable boundaries between the first (S1), second (S2), and third (S3) syllables of the subject or object. The absolutive high remains present under different speech rate conditions for the 3-syllable absolutive subject [malini] ‘marine’ (a, b, c), and the 3-syllable absolutive object [liona] ‘lion’ (d).

    1. (6)
    1. A sample minimal pair from the speech rate data set
    1.  
    1. a.
    1. Transitive clause
    1. na
    2. PAST
    1. laŋona
    2. hear
    1. e
    2. ERG
    1. malini
    2. marine
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. aoauli.
    2. afternoon
    1. ‘The marines heard the lion in the afternoon.’
    1.  
    1. b.
    1. Intransitive clause
    1. na
    2. PAST
    1. manoŋi
    2. smelly
    1. H-
    2. ABS
    1. malini
    2. marine
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. aoauli.
    2. afternoon
    1. ‘The marines were smelly to the lion in the afternoon.’

Each transitive sentence like (6a) had a minimally different intransitive counterpart like (6b). This allowed us to compare F0 tracks over the subject when it was followed by the absolutive (6a) to when it was followed by an oblique (6b). These comparisons are shown for the subject in Figure 4a, b, c for the three different speech rates; Figure 4d compares F0 tracks over the object when it is absolutive vs. oblique under the fast speech rate. Figure 4a, b, c, show F0 contours for utterances where the subject was [malini], and Figure 4d shows F0 contours for utterances where the object was [liona].

There are two sites where we expected an absolutive high to appear: (i) Immediately preceding an absolutive subject and into the left edge of the subject in the first syllable (S1) in Figure 4a, b, c, and (ii) immediately preceding the absolutive object, at the right edge of the subject (in the third syllable, S3) in Figure 4a, b, c, as well as at the left edge of the absolutive object (in S1) in Figure 4d. One distinguishing property of the absolutive high’s F0 contour is clearly consistent across speech rates: The persistence of high F0 into the first syllable of the absolutive. This is apparent in syllable 1 (S1) for the absolutive subject for all three speech rates: In Figure 4a, b, c—the solid black line (the intransitive F0 contour) stays well above the dotted grey line (the transitive F0 contour). Speech rate does induce allophonic variability in the realization of the absolutive high, though. In the slow and normal speech rates, there is clearly a continued rise and maintenance of high F0 in the F0 contour into the third syllable in transitive sentences, when the subject is followed by an absolutive object. In the fast speech rate, though, the F0 height in the third syllable (S3) is similar for the ergative and absolutive subjects, so the phonetic difference when the absolutive H- is present or not before the object is smaller. Still, even in this fast speech rate, Figure 4d shows that the high F0 from the absolutive H- persists into the first syllable (S1) of the absolutive object so that the shape of the F0 curve when the object is absolutive is clearly distinct from the F0 curve when the object is oblique.

In summary, the absolutive H- did not disappear as speech rate increased—in this sense, the presence of the absolutive high is not sensitive to speech rate, although (unsurprisingly) the particular phonetic realization of the absolutive high is. The insensitivity of the presence and placement of the Samoan absolutive H- to speech rate thus contrasts with the sensitivity of the presence of L* and high edge tones in Calcutta Bengali to speech rate.

2.4 Evidence for insensitivity of the absolutive high to register

The last factor that we’ll show does not influence the presence of the absolutive H- is ‘register.’ Samoan is well-known for having two distinct registers: Tautala lelei ‘good language’—used in literary contexts and and Westernized institutional contexts like in church and school, as well as with foreigners, and tautala leaga ‘bad language’—used in traditional ceremonies and meetings, as well as between family members and between friends (Shore, 1977, 1980; Duranti, 1981, p. 165–168; Ochs, 1988, p. 196; Duranti, 1990, p. 4–5; Mosel & Hovdhaugen, 1992, p. 7–11; Mayer, 2001).17 One of the most striking contrasts between the two registers is in the segmental phonology. The following mergers occur from tautala lelei to tautala leaga (Mosel & Hovdhaugen, 1992, p. 9):

(7) Mergers from tautala lelei to tautala leaga
  a. /t/ and /k/ → /k/
  b. /n/ and /ŋ/ → /ŋ/

Consideration of the syntax-prosody interface in tautala leaga is important for two reasons. First, although almost all linguistic research on Samoan has been in tautala lelei, “as much as 90% of casual speech and most traditional oration actually take place using more colloquial forms of Samoan” (i.e., in tautala leaga) (Mayer, 2001, p. 58). Secondly, the segmental ergative case marker e has been reported to be rarely used in tautala leaga (Mosel & Hovdhaugen, 1992, p. 9), see also Mayer (2001). Mayer (2001) also reports that genitive case markers are often dropped in tautala leaga as well (Duranti, 1981; Ochs, 1988), although the literature does not indicate whether the oblique particle i is also typically dropped or not. In contexts where segmental case markers are dropped, the presence of a tone marking absolutive case would not only be informative about morphosyntactic structure, but it would serve to disambiguate between possible parses.

Consider the tautala leaga minimal pair in (8). The two sentences are string-identical, but if there were a high tone before [le malie], ‘the shark’ in (8a), in contrast to a high tone before [le liona], ‘the lion’ in (8b), then the position of the high tone would disambiguate between VSO and VOS word order.

    1. (8)
    1. Transitive sentence minimal pair in tautala leaga
    1.  
    1. a.
    1. VSO word order
    1. ŋa
    2. PAST
    1. laŋoŋa
    2. hear
    1. ERG
    1. le
    2. DET.SG
    1. lioŋa
    2. lion
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. malie.
    2. shark
    1. ‘The lion heard the shark.’
    1.  
    1. b.
    1. VOS word order
    1. ŋa
    2. PAST
    1. laŋoŋa
    2. hear
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. lioŋa
    2. lion
    1. ERG
    1. le
    2. DET.SG
    1. malie.
    2. shark
    1. ‘The lion was heard by the shark.’

We present initial evidence that the absolutive high is present in tautala leaga from two data sets. In the first data set, sentences in tautala leaga were elicited from our primary consultant in Los Angeles. Twenty-four minimal pairs from two transitive verbs ([laˈŋoŋa] ‘hear,’ [iˈloa] ‘know’), two intransitive verbs ([mˈaŋoŋi] ‘be smelly/fragrant,’ [laˈvea] ‘be injured by’), and four different animal NPs, [liˈoŋa] ‘lion,’ [koˈloa] ‘duck,’ [iˈsumu] ‘rat,’18 and [maˈlie] ‘shark.’ Within each minimal pair, the only variable we manipulated was WORD ORDER: VSO vs. VOS, see (9) and (10). This consultant found both word orders licit out-of-the-blue. No segmental case markers were present for ergative or oblique case; therefore, each string was ambiguous for whether the subject was the first or second argument. However, for the purposes of elicitation, the case markers were indicated in parentheses. Each of the 48 sentences (in randomized order) was uttered twice, for a total of 96 utterances.

    1. (9)
    1. Transitive sentence minimal pair in tautala leaga
    1.  
    1. a.
    1. ŋa
    2. PAST
    1. laŋoŋa
    2. hear
    1. le
    2. DET.SG
    1. isumu
    2. rat
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. lioŋa
    2. lion
    1. ‘The rat heard the lion.’
    1.  
    1. b.
    1. ŋa
    2. PAST
    1. laŋoŋa
    2. hear
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. lioŋa
    2. lion
    1. le
    2. DET.SG
    1. isumu
    2. rat
    1. ‘The lion was heard by the rat.’
    1. (10)
    1. Intransitive sentence minimal pair in tautala leaga
    1.  
    1. a.
    1. ŋa
    2. PAST
    1. maŋoŋi
    2. smelly
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. koloa
    2. duck
    1. le
    2. DET.SG
    1. malie
    2. shark
    1. ‘The duck smelled to the shark.’
    1.  
    1. b.
    1. ŋa
    2. PAST
    1. maŋoŋi
    2. smelly
    1. le
    2. DET.SG
    1. malie
    2. shark
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. koloa
    2. duck
    1. ‘The shark was subjected to the fragrance of the smelly duck.’ (roughly)

As shown in Figure 5, the absence of segmental case markers had no effect on the presence of the absolutive H-: The H- appears in the third syllable on the right edge of the verb (Figure 5a) and in the third syllable on the right edge of the first argument (Figure 5b) when they are immediately followed by an absolutive argument. Like in the F0 contours from other data sets in the paper, the absolutive H- is also still clearly discernable on the F0 contour over the first syllable of the absolutive argument (Figure 5b).

Figure 5 

F0 contours when no segmental case markers are present for sentences like (9) and (10). An absolutive H- appears when an absolutive argument immediately follows: (a) The verbs ([laˈŋoŋa] ‘hear,’ [maˈŋoŋi] ‘smelly’), and (b) the first argument in the sentence ([liˈoŋa] ‘lion,’ [iˈsumu] ‘rat’) in (b). The jump in the F0 contours over syllable 2 in (b) is due to segmental perturbation of the F0 contour by the voiceless fricative [s] in [isumu].

The second data set we elicited in tautala leaga is described in detail in Appendix A.3. This consisted of two consultants’ most preferred responses to a variety of questions eliciting different focus conditions in the tautala lelei data set described in Appendix A.2, elicited in tautala leaga for Speakers f03 and f05. Briefly, the tautala lelei data set included question-answer pairs over a range of focus conditions (broad focus, wh subject focus, corrective subject focus, wh object/PP focus, corrective object/PP focus) and answer types (VSO, VOS, fronted subject, fronted object), for both transitives and intransitives. As discussed later in Section 6 and shown in Tables 5 and 6, a high tone still invariably occurred before absolutives in the tautala leaga data set. It should be noted that f03 explicitly stated she was dropping the ergative e in the tautala leaga recordings, but f05 did not say that. Therefore, it’s possible that some trace of the ergative e, however reduced, might have been present in f05’s speech (so that prosody wasn’t the only means of detecting case)—this is something we leave to future fine-grained phonetic analysis to check.

In this section, we presented a preliminary view of the Samoan syntax-prosody interface (to be revised) where the syntax determines the presence of the high tone as an absolutive case marker (and only as an absolutive case marker), so that the moment that a parser detects a high tone, it can conclude that an absolutive argument is about to occur. In the following section, we set up a syntactic perspective to define the absolutive high in the syntax/prosody interface.

3 Syntax and spellout 1: What the ‘absolutive high’ really is

To define the syntax/prosody interface, we tentatively adopt the analysis that has been proposed by Collins (2016, 2015, 2014). While Massam (2001) and others have assumed that Samoan has absolutive case marking, Collins (2014) argues that Samoan is actually a language of the type Legate (2008) classifies as ‘ABS = DEF,’ that is, a language where the marking that has been called ‘absolutive’ is actually the default case marking for nominative and accusative.19 While Collins and others originally assumed the default case marking in Samoan was null, Yu (2011, 2017) showed that Samoan reliably presents the high tone H- in these positions.

(11) a. the structure for (2a) on page 5 b. the structure for (2b) on page 5
     

The structures shown in (11) indicate a derivation of Samoan verb initial ordering by fronting the VP to a functional head F below T after the arguments have been raised out of it, and head movement moves T na to C, (following Collins 2016, [66]). Phrasal movements are shown coindexed, but the head movement is shown leaving a bare trace t. And notice that the case markers are depicted as adjoined to their arguments; we assume that this happens during spellout. While many details of the spellout mechanism remain unknown, one way to compute this spellout in recognition and production is sketched in Section 7 and Appendix B.

Collins’ main argument against assuming that the intransitive subject S and the transitive object P are both marked by a single absolutive case marking mechanism is that in nominalized clauses, S and P behave differently: S must be genitively marked (with /o/ or /a/), while P can have the same marking as in finite clauses. Collins assumed the marking of P was null in both finite and nominalized clauses, but Yu (2017) shows that in both finite and nominalized clauses, when P lacks a segmentally explicit case marker, it is invariably marked with a preceding H- (compare Collins 2014, [20]):

    1. (12)
    1. e
    2. PRES
    1. iloa-atu
    2. spot
    1. e
    2. ERG
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. ABS
    1. [le
    2. DET.SG
    1. momoli-ina
    2. deliver-INA
    1. e
    2. ERG
    1. le
    2. DET.SG
    1. liona
    2. lion
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. manini]
    2. fish
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. ala.
    2. street
    1. ‘The marine spots the delivering of the fish by the lion in the street.’

Since this H- marking in nominalizations is possible for the transitive object P but not for the intransitive subject S, we adopt Collins’ view that the gloss ABS preceding [le manini] ‘the fish’ is really the marking of ACC. So now we have this answer to the question in the title of this section: What is the ‘absolutive high’? According to the syntactic analysis adopted here, it is a (perhaps slightly misleading) descriptive gloss of what we now recognize to be the default, syncretic marking of nominative NOM and accusative ACC. We will continue to use ‘absolutive’ descriptively, even though, from this perspective (and remembering footnote 19), the syncretism of NOM and ACC marking may mislead some linguists into thinking that Samoan has a single mechanism of absolutive case assignment—in nonfinite embedded contexts we see that distinct mechanisms must be reponsible for the case marking of S and P.20

4 Syntax-prosody 2: Multiple triggers for high tones

Having now situated the absolutive H- in the syntax-prosody interface, in this section we expand the range of empirical data we consider to include multiple triggers for high edge tones. In Section 2.1, we briefly noted that sentence-medial H- tones in Samoan occur not just before absolutives, but also in other syntactic environments. In this section, we introduce these other H- tones to set up our integration of them into the syntax/prosody interface in Section 5.

The sentence (13) below exemplifies multiple triggers for H- tones:

    1. (13)
    1. ʔo
    2. TOP
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. mamalu
    2. glorified
    1. H-
    2. COORD
    1. ma
    2. CONJ
    1. Mala
    2. Mala
    1. H-
    2. FRONT
    1. na
    2. PAST
    1. laŋona
    2. hear
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. liona,
    2. lion
    1. H-
    2. LIST
    1. le
    2. DET.SG
    1. manini
    2. fish
    1. H-
    2. COORD
    1. ma
    2. CONJ
    1. Nonu.
    2. Nonu
    1. ‘The glorified marine and Mala heard the lion, the fish, and Nonu.’

Figure 6 shows the F0 contour for an utterance of the sentence (13) by our primary consultant, which depicts many of the multiple triggers for H- tones. We do not provide a minimal comparison for Figure 6 without H- tones here, but one reflex of the sequence of H- tones in the utterance that is plainly visible is that the topline (the line connecting the peaks in the F0 contour) stays high throughout the utterance, around 180 Hz, rather than declining (compare to Figure 3). The first trigger for an H- in the utterance is coordination (Orfitelli & Yu, 2009): An H- precedes the conjunction [ma] (glossed as CONJ) inside the fronted DP o le malini mamalu ma Mala, ‘the glorified marine and Mala’. The second is the fronted (non-pronominal) DP argument (glossed as FRONT): An H- appears at the right edge of the fronted argument o le malini mamalu ma Mala, right before the predicate (Orfitelli & Yu, 2009; Calhoun, 2015). The absolutive H- appears at the right edge of the transitive verb [laŋona], immediately preceding an absolutive argument. The last H- we introduce here delineates members of a list (glossed as LIST) (Orfitelli & Yu, 2009).

Figure 6 

An F0 contour of (13) demonstrating a multitude of syntactically-conditioned high edge tones in Samoan. The discontinuity in the F0 contour immediately after the fronted DP is due to glottalization preceding [na] ‘PAST.’ The gaps in the annotation indicate silence; an alternate transcription for the H- tones followed by silence would be ‘H%,’ which would indicate high edge tones marking a strong prosodic juncture that may be co-occurring or that may have ‘overridden’ the H- (see body text). While the F0 contour for both coordination highs in this utterance appear to fall slightly after peaking, the fall is not at all perceptually salient. In this particular utterance, there is a lot of lengthening (indicating a slowdown in articulatory getsures) where many of the H- tones occur—even pauses. This is by no means usually the case, see Figures 7 and 8.

Figure 7 

F0 contour showing H- at the boundary between a fronted subject and the predicate for (14a), after [ʔo le malini].

Figure 8 

A comparison of F0 contours for the minimal pair in (15), contrasting the absence of an H- preceding the modifier [mamalu] in (a) with the presence of the coordination H- before the conjunction [ma] in (b). The point of interest for the comparison is the F0 contour over [malini].

It is noteworthy that the two final H- tones indicated in Figure 6 are followed by (fluent) pauses. As a rule of thumb, (fluent) pauses have been used to diagnose strong prosodic junctures, i.e., intonational phrase boundaries, see e.g., E. O. Selkirk (1978/1981, p. 135), Pierrehumbert (1980, p. 19), Nespor and Vogel (1986, p. 188), Krivokapić (2007, p. 163), and S. Jun and Fletcher (2014, p. 501–502). This raises the issue that the syntactically conditioned H- tones expected in these configurations may be co-occurring or may have been ‘overridden’ by a different kind of high edge tone, one that demarcates a prosodic domain, see e.g., S.-A. Jun (1996, p. 38), Khan (2008, p. 119), Hyman and Monaka (2011). If so, then an alternative transcription of the high edge tones followed by pauses that we have transcribed with ‘H-’ in Figure 6 might be ‘H%,’ as ‘%’ is a diacritic standardly used for indicating association to an ‘intonational phrase’ boundary in autosegmental-metrical theory. Calhoun (2017) also found many examples of high edge tones followed by pauses. We discuss high edge tones followed by pauses further in Section 6 and Yu (2017) and leave them aside for now.

We give a simpler example of the H- that appears in fronting in (14), with a representative F0 contour for (14a) shown in Figure 7. The point of interest here is the F0 contour over [malini] at the end of the fronted predicate [o le malini] ‘TOP the marine.’ Compare this to the F0 contours over [malini] in Figure 2: The F0 contour over [malini] in Figure 7 looks like Figure 2b, which has an H-.

    1. (14)
    1. H- after fronted arguments
    1.  
    1. a.
    1. ʔo
    2. TOPIC
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. FRONT
    1. na
    2. PAST
    1. lalaŋa-ina
    2. weave-INA
    1. le
    2. DET.SG
    1. mamanu.
    2. design
    1. ‘The marine wove the design.’
    1.  
    1. b.
    1. ʔo
    2. TOPIC
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. FRONT
    1. na
    2. PAST
    1. ŋalue
    2. work
    1. (i
    2. (OBL
    1. le
    2. DET.SG
    1. mamanu).
    2. design)
    1. ‘The marine worked on the design.’

We show another example of the H- in coordination in Figure 8, where the utterances contain no pauses (in contrast to Figure 6). Here, the point of interest is the F0 contour over the string [le malini ma Malu/mamalu], which may mean either [le malini mamalu] ‘the glorified marine’ (15a) when [mamalu] is an adjectival modifier, or [le malini ma Malu] ‘the marine and Malu’ (15b), when [Malu] is coordinated.21 Figure 8a shows that F0 begins to sharply fall on [ni] before the adjective [mamalu] (although there is some rise into [ni] from peak delay), while Figure 8b shows that high F0 persists into the final syllable [ni] of [malini] when the conjunction [ma] follows. Zoomed in, the contrast between F0 contours over [malini] in Figure 8a and b looks just like the contrast displayed over the F0 contours for [malini] in Figure 2a and b, respectively.

    1. (15)
    1. Minimal comparison illustrating coordination H-
    1.  
    1. a.
    1. na
    2. PAST
    1. ŋalue
    2. work
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. mamalu
    2. glorified
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. mamanu.
    2. design
    1. ‘The glorified marine worked on the design.’
    1.  
    1. b.
    1. na
    2. PAST
    1. ŋalue
    2. work
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. COORD
    1. ma
    2. CONJ
    1. Malu
    2. Malu
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. mamanu.
    2. design
    1. ‘The marine and Malu worked on the design.’

The coordination H- also appears in disjunctions before the disjunctive coordinators [poʔo] or [peː], which are described in Mosel and Hovdhaugen (1992, p. 153, 681)22 and in verbal coordination (see Yu, 2017 for an example of verbal coordination). The evidence for this comes from the basic coordination data set described in Appendix A.4. An important caveat, though, is that initial stress in the disjunctive coordinators [ˈpoʔo] or [ˈpeː] makes it very difficult to tell if rising F0 preceding the coordinator can be attributed to a high edge tone, or if rising F0 might only be due to the rise to the initial pitch accent on the disjunctive coordinator. Further fine-grained phonetic work is needed to tease this apart.

There are two things to note about these other high tones that are relevant for computing the syntax-prosody interface. First, these high tones are not optionally produced—rather, like the absolutive high, the current evidence shows that they always appear.23 While we have not done the systematic manipulations with lengthening for these high tones that we reported for the absolutive high in Section 2.2, we have not noticed that the high tones disappear when prosodic length decreases, e.g., the coordination high appears even if there are only two syllables in the first coordinate and one in the second.24 Second, whether the source is from coordination, fronting, or the absolutive, the H- tones are all aligned to the edge of the word. Thus, upon detecting a high edge tone, a parser must consider all these different sources as possible alternatives.

The evidence for edge alignment of the H- tones comes from the prepenultimate stress data set (see Appendix A.5) and from another similar data set discussed in Yu (2017). We provide a brief overview of the evidence here. A standard way to tease apart whether a tone is a pitch accent or a edge tone is to vary the position of stress and the number of syllables in words, and to observe if the alignment of the tone correlates with stress position (the signature of a pitch accent) or with word length (the signature of a edge tone) (S. Jun & Fletcher, 2014). But the penultimate mora is the furthest mora from the left edge of a prosodic word that native Samoan words25 can bear primary stress (Zuraw et al., 2014). Thus, it is not possible to sufficiently separate the position of primary stress from the right word edge in Samoan to check whether H- tones track with stress or word edges. We therefore performed a Bach test (Halle, 1978, p. 301), using nonce forms with nonnative stress patterns, by asking our speakers in Auckland to code-switch in English names with antepenult stress (Melanie, Romeo) alongside names with native stress patterns. Codeswitching between Samoan and English is a common everyday occurrence for our speakers. We observed that the high tone still appeared at the right edge of the target words, even with antepenult stress. Moreover, all high tones exhibit similar phonetic properties in that they spread rightward from where they initially begin to rise, like in Figures 4 and 5b.

In summary, with the complication of additional H- tones besides the absolutive H-, we now have a more elaborated view of the interface (though still to be revised) than the initial view presented in Section 2. When the parser detects a high tone, the source of the high tone is known to be morphosyntactic, but the particular structural source of the high tone could be from fronting, coordination, or from the absolutive.

5 Syntax and spellout 2: Multiple triggers in the interface

Section 3 proposed that spellout introduces H- as the spellout of NOM and ACC. We now turn to some of the additional constructions mentioned in the preceding section.

5.1 The coordinators are H- marked

For coordination, we can either assume that the high tone is lexically associated with the coordinators, or we can again introduce it postsyntactically, in spellout. Consider the following examples:

    1. (16)
    1. na
    2. PAST
    1. ŋalue
    2. work
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. COORD
    1. ma
    2. CONJ
    1. Malu.
    2. Malu
    1. ‘The marine and Malu worked.’
    1. (17)
    1. na
    2. PAST
    1. ŋalue
    2. work
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. COORD
    1. po’o
    2. DISJ
    1. Malu.
    2. Malu
    1. ‘The marine or Malu worked.’

If we assume that the marking is done in spellout, then the coordinator just needs to trigger the insertion. Note that case marking will apply to these structures as well, inserting the absolutive H-, yielding a structure that we can assume to be roughly like this:

    1. (18)

Here we follow Zhang (2009) in assuming that the coordinator ‘inherits’ the category of its arguments, D in this example (additional detail in Appendix B).

5.2 Fronted arguments are H- marked

As illustrated by the first tone indicated in (19), fronted arguments are H- marked:

    1. (19)
    1. ʔo
    2. TOP
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. H-
    2. FRONT
    1. na
    2. past
    1. lalaŋa-ina
    2. weave-INA
    1. H-
    2. ABS
    1. le
    2. DET.SG
    1. mamanu
    2. design
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. asoː
    2. day
    1. ‘It was the marine that wove the design today.’

The spellout rule we need here simply inserts a high as a reflex of the syntactic configuration causing the material to be fronted. Case marking applies in this example too, inserting the absolutive case marker H- before le mamanu ‘the design,’ so we can obtain a structure like this:

    1. (20)

In (20), our spellout rule has adjoined the H- to C, but if it turned out that we had evidence for an alternate structure, e.g., right-adjoining the H- to the fronted DP, we could revise the spellout rule accordingly.

With these additional spellout rules, other prosodic events besides the absolutive H- are determined by the syntax. See Section 7 and Appendix B for a sketch of one way to execute these proposals.

6 Syntax-prosody 3: Variability in high edge tones

In this section, we introduce a final complication to our picture of the syntax-prosody interface in Samoan: Interactions between prosodic phrasing and the presence of high edge tones. We show using counts from tonal transcriptions from four data sets that sometimes a low (L-) edge tone appears where we would have expected a high tone, and that the frequency of this occurring seems to be sensitive to the morphosyntactic source of the high tone. We also show that there is some noise in the morphosyntactic-high tone correlation. Occasionally, for instance, the absolutive high or some other morphosyntactically conditioned high may be missing, and occasionally, a high tone might appear in an environment we have not yet specified. The four data sets we included in the analysis here are the following: (1) The tautala lelei ‘good language’ data set (introduced in Section 2.4), (2) the tautala leaga ‘bad language’ data set (introduced in Section 2.4), (3) the basic coordination data set (introduced in Section 6.3), and (4) the prepenultimate stress data set (introduced in Section 4 and also discussed in Section 6.4).

All of these data sets are small and biased towards particular constructions, and the number of repetitions of a particular item varied slightly, so the frequencies we found for various tones should not be taken to be representative for Samoan in general. We can, however, still see that a sentence-medial low edge tone can sometimes appear—both in places where we rarely see high edge tones and in places where we expect morphosyntactically conditioned high edge tones—and that the frequency of low edge tones appearing before absolutive seems to be very low.

A careful study of the sentence-medial low edge tones awaits future work—including whether a distinction should be made between L- tones that are followed by pauses and those that are not (and if so, what kind of distinction should be made).26 But we make some preliminary observations about L- tones here; for further discussion of low edge tones (and high edge tones) followed by pauses, see Calhoun (2017) and Yu (2017). Sentence-medial low edge tones often occur with a pause (see Tables 3, 4, 5, 6, 7, 8, 9), and always occur with pitch reset, in the sense that the pitch restarts with a high peak, even when there are unstressed elements immediately following the L tone (annotated as ‘reset’ in Figure 9). This can be seen in the F0 tracks for the example sentences in (21), one which includes a pause (Figure 9a) and one which does not (Figure 9b). These figures also show that the pitch accent immediately preceding the L- is suppressed, something that also happens at the end of interrogatives (Orfitelli & Yu, 2009); the F0 contour up to the L- in Figure 9a in particular sounds like the F0 contour of an interrogative. Since low edge tones often occur with a pause, it may be the case that they mark strong prosodic junctures, as we discussed for high edge tones followed with pauses in Figure 6 in Section 4 on page 20. So an alternate transcription for L- tones might be ‘L%.’

Table 3

Counts and percentages for tone labels for different syntactic structures for speaker f03 in the tautala lelei data set.

Structure Sites Null H- H-, pause L- L-? L-, pause

Absolutive 60 0 58(97%) 1 1 0 0
Ergative 34 16(47%) 0 0 0 0 18(53%)
Oblique 32 20(63%) 0 1 1 1 9(28%)
Fronting 30 0 28(93%) 0 1 0 1

Table 4

Counts and percentages for tone labels for different syntactic structures for speaker f05 in the tautala lelei data set.

Structure Sites Null H- H-? L- L-, pause

Absolutive 89 0 87(98%) 0 2(2%) 0
Ergative 40 36(90%) 0 1 0 3(8%)
Oblique 56 54(96%) 0 0 0 2(4%)
Fronting 53 0 38(72%) 0 5(9%) 10(19%)

Table 5

Counts and percentages for tone labels for different syntactic structures for speaker f03 in the tautala leaga data set.

Structure Sites Null H- L- L-, pause

Absolutive 19 0 19(100%) 0 0
Ergative 8 6(75%) 2(25%) 0 0
Oblique 19 16(84%) 1 1 1
Fronting 14 0 14(100%) 0 0

Table 6

Counts and percentages for tone labels for different syntactic structures for speaker f05 in the tautala leaga data set.

Structure Sites Null H- L- L-, pause

Absolutive 23 0 23(100%) 0 0
Ergative 6 6(100%) 0 0 0
Oblique 13 13(100%) 0 0 0
Fronting 14 0 14(100%) 0 0

Table 7

Counts and percentages for tone labels for different syntactic structures for speaker f03 in the basic coordination data set.

Structure Sites Null H- H-, pause H-? L- L-, pause L-?

Absolutive 85 0 83(98%) 1 1 0 0 0
Ergative 48 46(96%) 0 0 0 0 2(4%) 0
Conjunction 39 0 33(85%) 1 0 4(10%) 1 0
Disjunction 42 0 38(90%) 0 0 3(7%) 1 0
Oblique 44 0 33(75%) 0 0 1 4(9%) 6(14%)

Table 8

Counts and percentages for tone labels for different syntactic structures for speaker f03 in the prepenultimate stress data set.

Structure Sites Null H- H-, pause H-? L- L-, pause L-?

Absolutive 83 1 75(90%) 1 5(7%) 0 0 0
Ergative 46 46(100%) 0 0 0 0 0 0
Conjunction 26 0 24(92%) 0 2(8%) 0 0 0
Disjunction 22 0 18(82%) 0 2(9%) 2(9%) 0 0
Oblique 59 45(76%) 0 2 0 1 7(12%) 2

Table 9

Counts and percentages for tone labels for different syntactic structures for speaker f05 in the prepenultimate stress data set.

Structure Sites Null H- H-? L-

Absolutive 108 0 108(100%) 0 0
Ergative 55 55(100%) 0 0 0
Conjunction 27 2(7%) 25(93%) 0 0
Disjunction 29 0 24(83%) 1 4(14%)
Fronting 23 0 23(100%) 0 0
Oblique 91 91(100%) 0 0 0
Figure 9 

Representative examples of sentence-medial low edge tones before ergatives and obliques (sites where we do not expect an H-). (a)F0 contour showing L- with a pause before the ergative for sentence (21a). (b)F0 contour showing L- at the boundary between an absolutive subject and an oblique PP object for sentence (21b). In both F0 contours, the annotation ‘reset’ indicates where the F0 contour resets with a high peak immediately following the L-, even over typically unstressed elements. An alternate transcription for the low edge tones might be ‘L%,’ since these tones typically are followed by pauses and might demarcate prosodic domains (see discussion of high edge tones followed by pauses in Figure 6 in Section 4).

    1. (21)
    1. Example sentences from the tautala lelei data set, with sentence-medial low edge tones
    1.  
    1. a.
    1. o
    2. TOP
    1. le
    2. DET.SG
    1. mamanu
    2. design
    1. na
    2. PAST
    1. lalaŋa-ina
    2. weave-INA
    1. L-/L%
    2.  
    1. e
    2. ERG
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. asoː
    2. day
    1. ‘It was the marine that wove the design today.’
    1.  
    1. b.
    1. na
    2. PAST
    1. malaŋa
    2. journey
    1. le
    2. DET.SG
    1. malini
    2. marine
    1. L-/L%
    2.  
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. moana
    2. sea
    1. i
    2. OBL
    1. le
    2. DET.SG
    1. asoː
    2. day
    1. ‘The marine journeyed to the sea today.’

6.1 The tautala lelei ‘good language’ data set

As described in Section 2.4, this data set included question-answer pairs over a range of focus conditions (broad focus, wh subject focus, corrective subject focus, wh object/PP focus, corrective object/PP focus) and answer types (VSO, VOS, fronted subject, fronted object), for both transitives and intransitives (see Appendix A.2 and Yu, 2017 for more details). Speaker f03’s data set included only inanimate objects (due to time constraints) and so is smaller than speaker f05’s data, which also included animate objects. The frequency of tonal events for different morphosyntactic structures is given in Table 3 for Speaker f03 and in Table 4 for Speaker f05. The syntactic environments include those where a site for a syntactically conditioned H- is expected (absolutive, fronting), as well as environments where edge tones were transcribed but no syntactically-conditioned H- is expected (immediately preceding ergative or oblique nominals). In all tables of frequencies given in this paper, the listing of tonal events is exhaustive, i.e., there are no edge tones we transcribed that we do not include in the tables. A tone label with a “?” means that there was some evidence for that particular tone, but we were not certain that it was present. A tone label like “L-, pause” means that the tone was followed by a period of silence. Tonal events indicated for “oblique” structures are tonal events that occurred immediately preceding oblique PPs.

For speaker f03, there were 60 sites for absolutives, and a high tone appeared in 59 of them (98%); a low tone appeared once: In a VOS response to corrective focus on the object. Immediately preceding the ergative, an L tone followed by an audible pause occurred 11 times in fronted object constructions and 7 times in VOS sentences in a range of discourse contexts: Broad focus, wh-focus on the VP, subject, and corrective focus on the subject and object. Immediately preceding the oblique, an L tone with a pause occurred under wh-VP, wh-object, corrective-object, and corrective-subject focus conditions, all in VSO order. In fronting, an L- tone appeared twice in fronted subject responses to wh-subject focus.

For speaker f05, there were two cases of L- tones before an absolutive: Once for corrective focus on the subject for a VSO response, and once for wh-subject on the focus for a fronted subject response. Preceding an ergative, a L- tone with a pause occurred 3 times for fronted object responses, for wh-focus on the VP and on the object, and for corrective focus on the object. An L- tone with a pause occurred twice before obliques, for wh-focus on the object, with VSO responses. In fronting, an L- tone appeared 15 times (2/3 of these with pauses), occurring in both fronted subject and object responses to wh- or corrective focus on the object or subject.

6.2 The tautala leaga ‘bad language’ data set

This data set was introduced in Section 2.4 and is described in detail in Appendix A.3. Recall that it consisted of f03 and f05’s preferred responses to the various dicourse contexts, elicited in tautala leaga. The frequency of tonal events for different morphosyntactic structures is given in Table 5 for Speaker f03 and in Table 6 for Speaker f05.

For speaker f03, there were two H- tones before an ergative, in two repetitions of a fronted object response to corrective focus on the object. Three different tonal events occurred before the oblique in VSO responses to corrective focus on the object.

For speaker f05, no tonal events occurred before ergatives or obliques, and H- tones always appeared for absolutives and in fronting.

6.3 The basic coordination data set

This data set, which included a range of nominal and verbal phrase conjunction and disjunction, was already introduced in Section 4 and is described in detail in Appendix A.4. The frequency of tonal events for different morphosyntactic structures for Speaker f03 is given in Table 7.

An L- tone with a pause occurred before the ergative in two repetitions of a verbal disjunction. L- tones occurred in conjunctions and disjunctions of DPs in transitive and intransitive sentences and also in verbal conjunction and disjunction. Before obliques, an L- tone or suspected L- tone sometimes occurred in both conjunction and disjunction of DPs.

6.4 The prepenultimate stress data set

This data set was already introduced in Section 4 and is described in detail in Appendix A.5. Recall that this data set manipulated stress position in a target word by including English proper names and probed the interaction of different morphosyntactic high tones with stress position. The frequency of tonal events for different morphosyntactic structures is given in Table 8 for Speaker f03 and in Table 9 for Speaker f05.

For speaker f03, this data set was noisy because she had some trouble with the English name Gabrielle. All unexpected tonal events for the absolutive happened in utterances with this name. An L- tone with a pause occurred in disjunction for two repetitions of DP disjunction. Before obliques, an L- tone with a pause occurred in sentences with ditransitives and conjunctions of absolutive subjects, and a H- tone with a pause occurred in two repetitions of a ditransitive. Although it’s possible that some of the pauses were disfluent rather than fluent, we didn’t discard any of the utterances because there was no evidence besides pauses (such as speech repairs) that a disfluency might have occurred.

For speaker f05, L- tones only occurred in disjunctions in the data set. The coordination H- was missing in two repetitions of a particular ditransitive (but not other repetitions).

6.5 Summary and discussion

There are a few generalizations we draw from these exploratory data sets to revise our picture of the Samoan syntax-prosody interface. While the invariance in morphosyntactic conditioning of the high tone by the absolutive, coordination, and fronting was largely upheld by the frequency counts, there are two classes of exceptions that complicate the picture.

First, there were sporadic instances where the absolutive high and coordination high did not appear, and there were also sporadic instances where a high tone appeared before the ergative or obliques. Presently, we can point to no systematic factor underlying these exceptions: In general, we would certainly expect such exceptions, due to disfluencies and speech errors or other production planning factors. Until we better understand what factors may be driving these exceptions, we can treat them as noise and model these exceptions by having a probability distribution placing some probability mass on no tone or a high edge tone occurring in all the morphosyntactic structures we have discussed.

Second, there were occurrences of an L-/L% that ‘overrode’ the H- tone. This happened only twice for absolutives, out of a total of 467 times (0.04%), but it happened more frequently in coordination (15/185, 8%) and in fronting (16/134, 12%). An L- also sometimes occurred before ergatives (25/237, 11%) and obliques (34/314, 11%), where we would not expect an H-. Tentatively we note that there may be some systematic conditioning of discourse structure in play for the appearance of the L-; it mostly appeared under wh- or corrective focus; in part this was because fronted object or subject constructions were mostly accepted by consultants only under wh- or corrective focus. In addition, word order may play a role in conditioning the presence of the L-: In f03’s tautala lelei data set, an L- followed by an audible pause preceded the ergative occurred 11 times in fronted object constructions and 7 times in VOS sentences, and in no other constructions.

Finally, for both H- and L- tones, another observation we made was that sometimes they were followed by a silent pause, e.g., the L- (L%) in Figure 9a. At the present time, we do not yet have enough data to understand if there are systematic differences between the distribution of edge tones followed by an audible pause and those that are not, or how multiple edge tones from a variety of grammatical sources might interact when appearing at a single edge, or if a single edge tone might simultaneously come from multiple grammatical sources. Calhoun (2017) makes a valuable contribution here, showing that the appearance of both low and high sentence-medial edge tones is quite common, though variable, in sentences with ‘exclusive’ [naʔo] constructions and ‘equative’ copular constructions. While the majority of her figures of representative intonational transcriptions show edge tones followed by (often quite long) silent pauses, the transcriptional count data given doesn’t distinguish between whether these edge tones are followed by pauses or not.

As mentioned in Section 4, in intonational phonology, the presence of a pause is often taken to be grounds to distinguish between types of edge tones, such as between an intonational phrase and prosodic categories lower in the prosodic hierarchy. It would be interesting if the optional edge tones that occur in Calhoun’s (2017) constructions before the predicate and between arguments in equatives are typically followed by pauses, since the H- tones we’ve discussed here for absolutives, coordination, and fronting invariably appear and are typically not followed by pauses. Such a systematic distinction might suggest that variably appearing edge tones typically followed by pauses have a different grammatical source, e.g., prosodic grammar, than those that are not, e.g., syntactic grammar, and that, accordingly, we would want to handle them differently in production and comprehension models.

There is no reason not to have an interface model that includes both syntactically determined tones inserted in spellout as well as prosodically determined ones. The emphasis in much of work on the syntax-phonology interface is on the relation between syntactic and prosodic constituency, and this may sometimes make it seem like this relation is the whole of the syntax-phonology interface. But that is not the case, as stated in the opening sentence of Selkirk’s statement of the ‘Match theory’ of syntax-prosody mapping: “The topic of the syntax-phonology interface is broad, encompassing different submodules of grammar and interactions of these. This chapter addresses one fundamental aspect of the syntax-phonology interface in detail: The relation between syntactic constituency and the prosodic constituent domains for sentence-level phonological and phonetic phenomena. Two further core aspects, which rely on an understanding of the first, are not examined here – the phonological realization (spell-out) of the morphosyntactic feature bundles of morphemes and lexical items that form part of syntactic representation and the linearization of syntactic representation which produces the surface word order of the sentence as actually pronounced” (E. Selkirk, 2011, p. 435).

We have come to the end of the presentation of our empirical data bearing on the syntax-prosody interface in Samoan. We first presented evidence showing that the presence of high edge tones in the structural configuration of ‘absolutive’ case is insensitive to extra-syntactic factors (Section 2). Then, we introduced coordination and fronted expressions as additional configurations triggering high edge tones (Section 4). With this final section, we have pointed out occasional exceptions to the expected distribution of syntactically-conditioned H- tones, and we’ve hypothesized that some high and low edge tones might be prosodically rather than syntactically conditioned, see Yu (2017) for further discussion. Section 7.1 notes that the class of parsing models we are considering can be extended to be probabilistic to handle variability in the appearance of syntactically conditioned edge tones discussed here (see Appendix B for further detail). We also discussed the possibility of an additional class of variably occurring edge tones in Samoan which may be conditioned on prosodic domains. The model defined in this paper factors out the syntactically determined portion of the interface in Samoan, and we leave extending the model to handle prosodically-conditioned edge tones to future work.

7 Syntax, spellout, variability and parsing

We have already laid the foundations for showing how the syntax/prosody interface in Samoan could be computed in Section 3 and Section 5. In those sections, we informally described relevant aspects of Samoan syntactic grammar and tone-marking spellout rules sensitive to syntactic structure that place H- tones exactly and only in the positions where they reliably occurred in our fieldwork (see Appendix B for definitions of these rules). Those rules define a way to compute the syntax/prosody interface in Samoan that fits with our empirical data in a production model. In this section, we tackle the challenge of defining a comprehension model on the basis of the defined production model. The basic temporal flow of the comprehension model is easy to write down; we simply write the flow of the production model in reverse, as shown in (22). Given a syntactic grammar GS, a prosodic grammar GPr (i.e., our tone-marking spellout rules), and a phonological grammar GPh, we would have something like the following:

(22) The comprehension model as the production model in reverse:
  (Production)
         
  (Comprehension)
         

However, we can see in this puzzling diagram the problem alluded to in Section 1: Defining a comprehension model from production-oriented grammatical components is not straightforward. Phonology GPh in reverse does not define prosodically marked syntactic trees, or any kinds of syntactic trees at all. And the prosody GPr presented here comprises just some simple rules for inserting tones into syntactic trees, and so how can we ‘reverse’ those rules? How could GPr ‘uninsert’ the tones from just the places that the syntax allows, when we have not gotten to the syntax yet? This section presents a simple solution to this problem. Without adding any new components to the grammar, we can transparently define how the prosodically marked sequences delivered by phonology can be properly parsed.

The hierarchical syntactic structures displayed in the preceding sections indicate discontinous ‘movement’ relations of various kinds, and the ‘spellout’ mechanisms proposed add high edge tones to certain syntactic structures as part of the specification of linear, pronounced forms. The basic claims we need for our production model are these:

(P1) The posited syntactic structures in Samoan can be computed by a certain kind of ‘minimalist grammar’ (Stabler, 2010). This claim is defended in Section 7.1, just below.
(P2) The posited post-syntactic tone-marking spellout can be computed by a certain simple kind of ‘regular tree transduction.’ This claim is defended in Section 7.2.

Given these claims, the following mathematical results allow us to solve the problem of computing the syntax/prosody interface in the comprehension direction:

  • It follows from (P1) that a wide range of parsers are adequate to compute the syntactic structures displayed in previous sections (Harkema, 2001; Stabler, 2013). That is, the fact that minimalist grammars are sufficiently expressive to define all these syntactic structures guarantees that these structures can be parsed.
  • The posited spellout transductions can be ‘composed with’ any minimalist grammar, in a sense explained in Section 7.2. That is, both syntax and spellout can be computed at once by any standard minimalist parser.

In the absence of direct psycholinguistic evidence bearing on the status of syntax vs. spellout, this last idea about how structure and spellout are computed—the idea that they are computed simultaneously, rather than in sequence—seems the simplest and most plausible.

An important advantage of (P1), (P2) is that a large number of equivalent and near-equivalent approaches have been identified, often with constructive proofs that provide recipes for converting from the minimalist grammar approach into any of the ‘mildly context sensitive’ alternatives that are relevantly ‘equivalent’ (Stabler, 2010). Furthermore, parsing and generation algorithms associated with any of those alternatives can be used to compute the same string mappings and structural relations that our particular proposal identifies. So our strategy for solving the problem of computing the syntax/prosody interface in the comprehension direction will be to defend (P1) and (P2) here, with brief discussion of parsing consequences, and with some further details provided in Appendix B.

7.1 Minimalist grammar and parsing for Samoan syntax

In this section, we do not aim to present a complete minimalist grammar (MG) for Samoan, but just to defend the view that minimalist grammars have the mechanisms required to define syntactic structures like those shown in previous sections for ‘absolutive’ case, coordination and fronted expressions. Our characterization of the relevant aspects of Samoan is facilitated by the fact that we follow the proposals of Collins mentioned above.27 Collins proposes that the basic Samoan VSO order is derived by VP fronting, after the arguments have raised out of the VP, as indicated in the syntactic structures shown above. He suggests that after v selects VP, v has an EPP feature which triggers the raising of all arguments. That is, the EPP feature of v should trigger the raising of the object, if there is an object, and not crash if there is no object. Since the number of arguments of any verb in the lexicon is bounded, an EPP feature of that kind could be added to MGs without fundamentally changing their computational properties, but the same structures can be built by assigning each argument a different feature and triggering the movements of the phrases with each feature.28 When T is merged, an EPP feature on T can then trigger the fronting of VP to its specifier. And then when C is merged, the head of T moves to C.

Consider again the syntactic structure (11a), repeated below on the left. It can be calculated by merging the lexical items as shown in the derivation on the right. In that 10-step derivation, the features of the lexical items determine the internal merge steps indicated by • and the two external merge (i.e., movement) steps indicated by ⚬, corresponding to the coindexed trace t(0) and DP(0) and the coindexed trace t(1) and VP(1), in the tree on the left (details are provided in Appendix B).

    1. (23)
    1. Deriving (2a) on page 5, with structure (11a) on page 18, before case marking:

So the basic mechanisms that Collins uses to get basic clause structures are either immediately available in the MG framework or easily emulated and added. The reason that it is interesting that MGs can encode these analyses is not because that tells us anything new about Samoan syntax per se, but rather because that guarantees certain computational properties, including the proven existence of a range of parsing strategies adequate to compute all and only the structures allowed by the grammar (Harkema, 2000, 2001; Stabler, 2013). Those algorithms are efficient and also easily extend to select analyses which are ‘most probable’ in various senses (Hunter & Dyer, 2013). Thus, they can encode the probabilistic modeling to handle some of the variability that we described in Section 6–variability in the appearance of syntactically determined high edge tones that we treat as ‘noise.’

MGs have also been extended to handle a range of coordinate structures (Torr & Stabler, 2016), respecting the ‘coordinate structure constraint’ (CSC) and ‘across the board’ extractions. Roughly, constituents that differ in what has been extracted from them cannot coordinate, and this is easily enforced in MGs by reflecting the relevant properties of extracted elements in the category (i.e., the features) of the coordinated elements. Some examples of coordination are considered in Section 4, and a wider range is discussed by Collins (2016, Section 6). Collins observes that the CSC correctly predicts the degraded status of structures that coordinate unergative and unaccusative predicates in Samoan, exactly as in the English gloss:

    1. (24)
    1. *?{sā | na | ‘ua}
    2. PAST1/PAST2/PERF
    1. siva
    2. dance
    1. ma
    2. CONJ
    1. taunu’u
    2. arrive
    1. (mai)
    2. DIR
    1. Simi
    2. Simi
    1. *?’Simi danced and arrived’

While the properties of these constructions are not fully understood in either English or Samoan, it appears that CSC applies similarly. And for present concerns, the only relevant question is how to specify that the coordinators are H- marked. Appendix B shows how MGs can also handle fronted expressions.

The reason for drawing attention to the derivation shown in (23) on the right is that it is especially simple, in the sense that it is defined by a simple finite state mechanism (Michaelis, 1998; Kobele et al., 2007). Not only is this particular tree simple in that sense, but the derivation trees are guaranteed to be finite state definable no matter how the minimalist grammar needs to be elaborated to get the whole Samoan language. This sets the stage for a simple approach to spellout.

7.2 Spellout, variability, and parsing for Samoan

The preceding section showed that the syntactic structures proposed in this paper are all MG-definable. What about the tone-marking spellout rules? Formal grammars and parsing algorithms are usually defined over a lexicon. In linguistic theory and in applications, the lexicon is often taken to correspond to pronounced words or morphemes; derivations concatenate the pronounced elements. Many grammars in the minimalist tradition depart quite dramatically from that perspective though. Not only do they allow phonologically empty lexical items of various categories, phonologically vacuous (‘covert’) movements, etc., but also processes that distance the basic formatives of the syntax quite significantly from what is actually heard or spoken. In this recent tradition, the syntax is stated over feature structures that are significantly more abstract than the ‘pronounced words’ of traditional approaches. A wide range of theoretical traditions is advancing this kind of idea—that not only does phonology modify pronounced sequences in regular ways, but also, ‘distributed morphology,’ ‘exoskeletal morphology,’ etc. rearrange elements to allow more abstract syntactic formatives. The proposal in this paper falls into this very broad tradition. The proposal is that Samoan structural ‘case marking,’ and the similar marking of fronted and coordinate expressions, is not the concatenation of special lexical items, but is ‘postsyntactic,’ a kind of pronounced reflex of structural configurations.29 Because the tone-marking rules are sensitive to syntactic structure, they cannot apply before any assumptions about syntactic structure, but the idea that, in performance models, they apply after parsing the syntax is unappealing, and, it turns out, unnecessary. They can apply simultaneously.

To show this, we first establish that the case-marking and other tone-insertions needed for this approach are themselves simple in the sense of being ‘regular,’ that is, finite state definable on trees, in a precise sense that matters for the computation of the syntax/prosody interface. In the trees just above in (23), notice that when the leaves of the tree on the left are pronounced in order, we have the example (2a) on page 5, without the case marking—this is the standard spellout rule. Because the derivations are finite state definable, we can use another finite state mechanism to, in effect, climb up the tree and insert the case markers wherever a case marking configuration occurs. So for example, to get tautala lelei case marking we insert the elements shown here:

    1. (25)

Spelling this out, the case markers are pronounced in the correct, structurally determined positions.

In previous sections we have seen H- insertion in the structural configuration of ‘absolutive’ (which we are taking to be nom, acc) case assignment in (11), in coordination (18), and in fronted expressions (20). The previous section argues that these constructions can be defined by minimalist grammars. Now we add the observation that all of the tone placement rules needed for these constructions are simple in the precise sense of being, quite easily, finite state definable. Minimalist grammars have very simple derivations, so we can specify the case-marking and other tone-insertions as a simple reflex of the structures of minimalist derivations.

Finally, it is fairly easy to see now that the case marking and structure calculation can be done simultaneously. This solves the puzzle of (22), showing how we can straightforwardly define a comprehension model for Samoan from the production-oriented grammatical components that we already previously defined. Observe that (i) the structural configurations in which these changes apply are defined ‘locally’—by the features of the two subtrees that appear at the point where the marking is to take place; and (ii) the marks which are inserted are simple constants, not full phrases of any kind, and not copies of other structures, or any such thing. Because of property (i), we can define a ‘parsing grammar’ that combines the syntax and spellout, distinguishing the categories relevant for case marking right, and so that in this combined system, a rule applies to insert the specified constants mentioned in property (ii). In this sense, the post-syntactic process can be ‘composed into’ the syntax in a way that yields another grammar that is, in computational respects, a grammar of the same kind, a minimalist grammar. This situation will hold not only for the superficial syntax sketched here, but it holds for any minimalist grammar, no matter how complicated, and any post-syntactic process with properties (i–ii). Therefore, we need not suppose that a parser considers segmental material only, subject to a following filter based on prosody. Rather, prosodic and segmental cues can be considered together, as soon as they are perceived. And since this is a standard MG, some of the variability can be handled with a probabilistic model, as mentioned at the end of Section 7.1.

8 Conclusion

Taking the syntax/prosody interface in Samoan as a case study, we have identified some syntactically determined aspects of prosody and shown how these can inform the syntactic parser. Since the syntax and prosody can be folded together as sketched in the previous section and described in more detail in Appendix B, a minimalist parser can directly parse the tone-marked surface forms. That is possible because, even though we describe syntax followed by spellout, the combination of these two is still a minimalist language, and so a minimalist parser suffices to do the analysis. With this approach, a large range of minimalist parsing models can use prosody as soon as it is heard. If we could not combine syntax and prosody in this way, then any theory of sentence comprehension would have to explain not just how the sequence of morphemes is analyzed by the syntax, but also how that sequence of morphemes is computed properly from the surface forms, and how prosodic reflexes of syntactic structure are provided at that interface. But we are not in that situation. We can factor performance into syntax and prosody in order to state generalizations in each domain most perspicuously, and in the performance model they do not need to be temporally separate in any sense. Thus, in this paper, we have answered the two questions we started with: (i) How to factor out the contribution of syntax to conditioning prosodic events, when presented only with the resulting output from the interaction of a multitude of conditioning factors, and (ii) given a production model from the syntactic grammar to a prosodified utterance, how to possibly define a comprehension model based on that production model.

Given the marked scarcity of computational models of syntactic parsing that incorporate prosodic information in any substantial way, what has allowed our success here? First, we saw from initial fieldwork in Samoan that it appeared to have prosodic events primarily conditioned by syntax, and we pursued further empirical study to clarify the facts enough to ground a first sketch of a production model. The overarching strategy that this exemplifies is to start with empirical case studies where syntax is clearly the primary determining factor for prosody, with an eye towards using our understanding of these to bootstrap work on cases where the syntax-prosody relation is less clear. Second, we explicitly defined an empirically grounded production-oriented grammar for computing the interface in minimalist grammar and took advantage of relevant mathematical results to then define a comprehension model based on the production-oriented grammar. The overarching strategy that this exemplifies is: (i) To define a computational model of the interface, which forces us to explicitly, precisely, and comprehensively state the tentative assumptions adopted, and (ii) to choose classes of computational models with mathematical properties that make it possible to test and compare hypotheses about fundamental properties of different components of the interface and their relations. We briefly explicate the two overarching strategic principles below to conclude.

8.1 Strategic principle 1: Finding cases where prosodic events are primarily under syntactic control

The crucial property of the Samoan syntax/prosody interface that makes it a good first case study is that it provides clear cases of prosodic events that are under the control of the syntax. We have shown that the prosodic events studied here do not disappear in short or long constituents, changes of speech rate, or changes of speech register; even the dramatic change from tautala lelei to tautala leaga preserves H- marking. Nevertheless, with small probabilities, the tonal events can surface in different ways or fail to surface, and so we have offered up a probabilistic parsing model to handle these exceptions until we better understand all the factors in play. Even with the evidenced nondeterminacy, we have seen that certain tonal events in Samoan are nevertheless very good signals of syntactic structure, and we have described fairly well-understood and flexible methods for modeling these in efficient parsing mechanisms.

It is unlikely that the primacy of syntactic conditioning in the Samoan syntax/prosody interface is anomalous in natural language. There are two ways to locate other such cases. One is for us to continue to expand our range of knowledge about the syntax/prosody interface cross-linguistically in prosodic fieldwork. As a case in point, a striking recent addition to the catalogue of syntactically determined prosody in natural language comes from the Dogon languages of Mali. In the Dogon language of Tommo So, the word for ‘cat,’ gamma bears an HH tone sequence in isolation; gamma bears the same HH sequence in ‘three cats’ and ‘the cat.’ But in the nominal phrases ‘black cat,’ ‘one cat,’ ‘Sana’s cat,’ gamma surfaces with an LL sequence (Heath & McPherson, 2013; McPherson & Heath, 2016). Heath and McPherson (2013); McPherson and Heath (2016) discovered that what tone sequence gamma and other words surface with is completely predictable and insensitive to prosodic factors; for instance, the tone sequences over nominal phrases are completely determined by the syntactic category of ‘controller’ words within the nominal phrase that c-command the other words in the nominal phrase.

The second strategy for uncovering clear cases of prosodic events that are under the control of the syntax is to examine well-studied phenomena and reconsider the assumptions under which they have been analyzed. Since theories of the syntax/prosody interface must make assumptions about syntax and phonology, in addition to what information is passed between them, their ability to fit the data rests on all of these assumptions together. Thus, re-examining assumptions about any component of an interface model can reveal that what has appeared to be poorly understood variability in prosodic events is perhaps in fact a regular consequence of previously unrecognized factors. As an example, Hirsch and Wagner (2015) found that they could reconcile conflicting pattern of results for the prosodification of prepositional phrase attachment in English, e.g., Tap the frog with the flower on the hat, with a syntactic analysis. Snedeker and Trueswell (2003) found that speakers prosodically disambiguated only when disambiguation was needed for the visual scene, but Kraljic and Brennan (2005) found that speakers prosodically disambiguated even if there was disambiguating referential context and they were unaware of the ambiguity. Hirsch and Wagner (2015) found they could account for the conflicting results by noticing syntactic differences between the two sets of experimental stimuli: Snedeker and Trueswell’s (2003) stimuli contrasted left vs. list bracketing, while Kraljic and Brennan’s (2005) stimuli contrasted left- vs. right-bracketing. Another example of work re-examining syntactic assumptions in an interface model is Ahn (2016a), which shows how hierarchical syntactic structure might regularly condition apparent exceptions to the nuclear stress rule in English.

8.2 Strategic principle 2: Testing hypotheses with computational models of the interface

Strategic principle 1 lays the empirical groundwork to motivate production models of the interface for syntactic parsing. The composability of MGs and related finite state systems mentioned in Section 7 and Appendix B makes them an advantageous choice for defining and comparing production models. On the one hand, we can define each component of the interface separately and thus factorize the interacting influences on prosody. On the other hand, these different components can also then be folded in together for comprehension models. Although MGs and finite state interfaces can accommodate a wide range of proposals, they have empirical consequences, some of which are contested. For instance, MGs don’t allow an unbounded number of elements to move to the front of a clause, e.g., MGs cannot express multiple wh-movement with no finite bounds. If we were to compose in a finite state prosodic grammar as part of the interface computation, this would also be restricted. If prosodic events were conditioned on the number of brackets there are at a boundary (and a potentially unbounded number of them), e.g., Wagner (2005, 2010), there would be no way to encode prosody in the grammar: Regular grammars cannot express unlimited sensitivity to the number of brackets. Encoding syntactic and phonological generalizations in MG and related formalisms also forces us to be completely clear about what our account of the empirical facts is—including points we aren’t yet sure of, where we must adopt tentative, likely simplifed assumptions as a starting point. That is, formalizing our accounts computationally lays all our assumptions bare, and is not an endpoint following fieldwork; but an intermediate step in the iterative process cycling between proposing and refining testable hypotheses.

Our current understanding of the conditioning of high edge tones led us to define the computation of the interface in terms of post-syntactic spellout rules that place H- tones in particular syntactic configurations. Stating these rules, and the syntactic grammar they are sensitive to, clarified what we were claiming the ‘absolutive high’ actually is, as we explained in Section 3. But if new data revealed a more general syntactic configuration underlying the H- tones in Samoan, e.g., if every adjoined phrase were marked with an H-, that could also be represented in a minimalist grammar, composed into the syntax; the proposal here does not hang on a particular, precise account of the syntactic conditioning of H-. It could also be informative to formalize interface theories that refer to syntactic phases, e.g., see Ahn (2016b); Dobashi (2004); Kratzer and Selkirk (2007); Dobashi (2009); Cheng and Downing (2016).

If it turns out that ‘information structure’ determines the placement of some H- tones in Samoan, then if these principles are encoded in syntactic structure (e.g., Cheng & Downing, 2009; Kavari et al., 2012), it would require assessing how it could be implemented. Given a precise syntactic account of the view of information structure described in Calhoun (2017), and assuming that the generalizations stated there fit the data, we would assess if we could encode her proposed generalization: That H- tones occur at the edges of incomplete information units because phonological phrases map onto theme and rheme units. In that case, we would also want to revise the prosodic grammar involved in the computation of the Samoan interface to deriving prosodic trees, rather than comprising just some simple rules for inserting tones into syntactic trees. As we speculated in Section 6.5, it may be that there are prosodically- as well as syntactically-conditioned H- tones, in which case we would want our prosodic grammar to include the tone insertion rules, as well as derive prosodic trees.

These are all examples of how empirical work might drive revisions of the computational models. But computational modeling can also drive empirical work, based on what it tells us about broad classes of assumptions about the interface. The proposal for the computation of the Samoan interface in this paper, for instance, tells us something about the broad class of interface theories in which the prosodic grammar does not derive prosodic trees. As we have discussed, it is not obvious that a reasonable comprehension model could be defined for such prosodic grammars. In this paper, however, we have shown that such grammars are in fact compatible with a large range of minimalist parsing models. Thus, the lack of a reasonable comprehension model would not be grounds for rejecting interface theories that exclude prosodic hierarchical structure. Our proposal also shows that we can straightforwardly compute the syntax/prosody interface, even if we assume high edge tones which might overlap significantly in phonetic realization actually arise from different sources, rather than a single unified source in the grammar.

The fine-grained model of the interface we have proposed here—with tones placed by individual rules that refer to specific morphosyntactic constructions—is quite different from other theories of the syntax-phonology interface.30 As stated in Kaisse and Zwicky (1987, p. 7), both theories that have proposed ‘direct reference’ to syntax and those that have proposed ‘indirect reference’ to syntax have agreed that phonological rules refer to cross-categorical relationships rather than specific syntactic categories. For example, Odden (1987) describes five rules of shortening for Kimatuumbi in NPs, VPs, PPs, APs, and PossPs, and then unifies these by saying that shortening applies to the head of a phrase. But it is not clear that a model that hides syntactic structure—whether by restricting what aspects of syntax are visible, or by the introduction of mediating prosodic structure—could fit our current empirical evidence in Samoan better than the fine-grained model we have proposed. We do not find high tones in Samoan for all heads or all phrases—for instance, we would need to explain the asymmetry in the presence of the high tone for absolutives vs. the absence of high tones for ergatives. A fine-grained account that fits the data well sets a challenge—the attempts at deeper or more unified accounts should aim to fit the data as well.

Additional File

The additional file for this article can be found as follows:

Appendix

The appendices describe elicited data sets and parser implementation details. DOI: https://doi.org/10.5334/labphon.113.s1