1. Introduction
This paper explores the prosodic disambiguation of the understudied contrast between Connected Clauses (1-a) and clefted Relative Clauses (1-b), and its implications for current models of prosody and its interfaces with syntax and semantics.
- (1)
- a.
- It was the gardener that painted the door (that was locked).Connected Clause
- b.
- It was the gardener that painted the door (that called).clefted Relative Clause
While the sentences above are string-identical (up until the bracketed material), they differ significantly in both structure and informational content. If you experienced some difficulty parsing the sentence in (1-b), you are not alone: Guo et al. (2024a, 2025) demonstrated that reading clefted Relative Clauses is more challenging than reading Connected Clauses (1-a), as indicated by longer reading times and lower acceptability ratings. However, a simple contextual manipulation can partially mitigate such garden-path effects and aid in disambiguating the two readings (constituents in focus are marked in italics in examples [2-a] and [2-b]):
- (2)
- a.
- Connected Clause condition:
- Q: Who painted the door?
- A: It was [DP the gardener] [CP that painted the door].
- cf.: who painted the door was the gardener.
- b.
- Relative Clause condition:
- Q: Who called?
- A: It was [DP the gardener [CP that painted the door]] (that called).
- cf.: who called was the gardener that painted the door.
The prototypical cleft sentence in (2-a) involves focus on the clefted Determiner Phrase (DP) the gardener, which introduces new information. The following Complementizer Phrase (CP) that painted the door introduces background information and can optionally be omitted, although its content is always implied. In line with previous literature, we refer to this type of Complementizer Phrases as Connected Clauses. The pseudo-cleft (who painted the door was the gardener) clarifies the relevant reading and shows that the Determiner Phrase and the Complementizer Phrase form independent constituents. The example in (2-b), in contrast, involves focalization of a complex Determiner Phrase containing a nested Restrictive Relative Clause (the gardener that painted the door). A mismatch between the content of the context question and the Complementizer Phrase ensures that the Relative Clause reading is obtained, thereby excluding a Connected Clause reading. The pseudo-cleft once again clarifies the constituent structure (who called was the gardener that painted the door).
While Relative Clauses and Clefts have both been investigated in the prosody literature, to the best of our knowledge, there has been no systematic investigation into the prosody of clefted Relatives.1 The prosody of Relative Clauses has been central to the discussion of syntax-prosody interface and syntax-prosody mismatches (Shattuck-Hufnagel & Turk, 1996; Turk & Shattuck-Hufnagel, 2014; Wagner, 2005, 2010; Bennett & Elfner, 2019; Elfner, 2018; Grillo et al., 2025). Cleft constructions introduce syntactically focused elements and are of primary interest for the literature on focus and prominence and the independence of prosodic and syntactic marking of focus (Arnhold, 2021). Investigation into Clefted Relatives, which sits at the intersection of these two central domains of research in prosody, can prove fruitful in informing both theories of phrasing and prominence. Furthermore, since clefted Relatives generate garden-path effects in the absence of prosody, the question of whether and how prosody disambiguates them from string-identical Connected Clauses is important for theories of prosody and psycholinguistics.
To address this gap in the literature, we carried out a planned production study (Experiment 1, Section 2) and an auditory comprehension study (Experiment 2, Section 3). Anticipating some of our results, we see that: i. speakers use both temporal and tonal cues (and, to a lesser extent, amplitude cues) to disambiguate the two structures; and ii. listeners are sensitive to these prosodic cues and use them to disambiguate between the two structures.
Before presenting the two experiments, in Section 1.1 we provide some additional details on the syntactic and semantic properties of these constructions, briefly introduce existing work on the prosody of Clefts, and summarize experimental results on the processing of this ambiguity. We then locate our discussion within the broader context of the literature on phrasing (Section 1.2) and prominence (Section 1.3). The section on phrasing provides a brief summary of different accounts of syntax/prosody mismatches involving Relative Clauses and argues that, due to independent grammatical restrictions on extraposition, the prosody of clefted Relatives should align with their syntax. The section on prominence argues that different theories of the interface between semantics and prosody make different predictions about the prosodic realization of focus in clefted Relatives. We go back to both issues in our Section 4, where we also discuss the results of both experiments within the broader context of the role of prosody in triggering and easing garden-path effects.
1.1. Properties of Connected Clauses and clefted Relative Clauses
This section provides an overview of the syntactic, semantic, and prosodic properties of Connected Clauses and clefted Relative Clauses (clefted Relatives from now on), as well as a summary of recent results showing that clefted Relatives induce garden-path effects in the absence of prosody and additional context supporting the more complex presupposition structure of Relative Clauses (e.g., the Question sentence in [2-b]). Examples (2-a) and (2-b) are repeated here as (3) and (4) respectively, with their syntactic and Focus structures marked in brackets.
Example (3) shows one prototypical it-cleft structure in English which is cued by a question. Declerck (1988), Higgins (1973) and Percus (1997) categorize such it-cleft sentences as specificational, because they specify a value (the clefted element, i.e., the gardener) for a variable (the non-clefted element, i.e., the person who painted the door). The non-clefted parts are often presupposed (e.g., Prince, 1978 and Delin, 1992), yet here we explicitly show it in the contextual question to avoid any ambiguity. Thus, in (3), the Determiner Phrase the gardener provides new information to the discourse and is in focus.2 More specifically, it carries Identificational Focus that exhaustively identifies a target (Kiss, 1998; Gussenhoven, 2007). The Complementizer Phrase that painted the door, is given instead, serving as the background (e.g., Percus, 1997; Collins, 1991, 2006).
- (3)
- Connected Clause condition:
- Q: Who painted the door?
- A: It was [DP the gardener]Focus [CP that painted the door].
When preceded by a different question as in (4), the same string expresses a different meaning. Here, the whole Determiner Phrase (including the Complementizer Phrase) the gardener that painted the door specifies the identity of the person that called, thus providing new information and carrying identificational focus.
- (4)
- Clefted Relative Clause condition:
- Q: Who called?
- A: It was [DP the gardener [CP that painted the door]]Focus (that called).
As indicated through bracketing and in the syntactic analyses in (5), these interpretive differences map onto well-established structural distinctions: The Complementizer Phrase that painted the door in (3) is a Connected Clause, but a restrictive Relative Clause in (4). While several analyses of it-cleft structures have been proposed (Akmajian, 1970; Higgins, 1973; Percus, 1997; Hedberg, 2000; Reeve, 2010), they all agree that the Complementizer Phrase in a typical it-cleft (i.e., Connected Clause) occupies a higher syntactic position than that in restrictive Relative Clauses, either because of extraposition or base-generation in a higher projection. A detailed review of the different syntactic analyses of clefts (which the interested reader can find in Reeve, 2010) is beyond the scope of this paper, as it would not affect our arguments and predictions. In contrast, in (4) the Relative Clause is nested within the Determiner Phrase it modifies (i.e., the gardener). The actual Connected Clause that called in this sentence occupies the same position as the Connected Clause shown in (5-a) and carries the relevant presupposition, but is reduced (Declerck, 1983b) or truncated (Hedberg, 2000) to better illustrate the ambiguity. For illustrative purposes, here we adopt the structure proposed by Hedberg (2000) to exemplify the attachment contrast.3
- (5)
- a.
- b.
However, despite clefts’ prominence in the theoretical syntax and semantics/pragmatics literature (Akmajian, 1970; Higgins, 1973; Prince, 1978; Atlas & Levinson, 1981; Declerck, 1983a, 1984; Hedberg, 1990; Delin, 1992, 1995; Percus, 1997; Davidse, 2000; Hedberg, 2000; Lambrecht, 2001; Collins, 1991; Sharvit, 2003; Reeve, 2010; Den Dikken, 2013; Hedberg, 2013), the experimental work on the prosodic and processing properties of clefts is much more limited, let alone the comparison between the two structures of interest here.
In terms of the prosody of prototypical it-clefts, the shared intuition in the theoretical literature is that the clefted element that carries new information is obligatorily accented, while the non-clefted part that contains old information is typically unaccented (Prince, 1978; Delin, 1992, 1995; Gussenhoven, 2007). Few corpus-based studies analyzed the prosody of prototypical it-clefts, but they either included too few items to draw reliable generalizations (Hedberg & Fadden, 2007, American English) or focused solely on the prosody of prototypical it-clefts like (3) and/or the pragmatic function of prosody (Collins, 1991, 2006; Herment & Leonarduzzi, 2012; Bourgoin et al., 2021, British English). As a result, not only is systematic experimental work on prototypical it-clefts lacking, but the prosodic patterns associated with structurally complex clefted phrases (e.g., clefted Relatives) remain almost completely unexamined.
Arnhold (2021) highlighted such a lack of experimental work on the prosody of cleft structures and performed carefully controlled production and perception experiments, testing whether syntactically marked focus (it-clefts) differs prosodically from unmarked syntax (simple subject-verb-object structures) under different focus conditions (narrow vs. broad). While this work did not investigate the prosody of clefted Relatives, it is relevant here for providing a detailed analysis of prosodic correlates of standard it-clefts (i.e., our Connected Clauses).
It-clefts in Arnhold’s production study always had a narrow focus on the subject (- Who was buying an island? - It was the royal who was buying an island), considering that “clefts are not well-formed in broad focus” (Arnhold, 2021). Results reveal a similar prosodic pattern for narrow-focused elements in both marked and unmarked syntax (- Who was buying an island? - The royal was buying an island), in contrast to the broad-focus condition (- What happened? - The royal was buying an island).
Because the main goal of Arnhold’s (2021) study was to examine the interaction of syntactic and prosodic marking of focus, the analyses focused more on comparing the prosody of narrow vs. broad focus conditions, rather than providing detailed acoustic analyses for clefts with a narrow focus. Nevertheless, the perception study included clefts with a broad focus as auditory stimuli (- What happened? - It was the royal who was buying an island), which was briefly summarized to have “nearly identical” intonation pattern with sentences having unmarked syntax and broad focus.4
In sum, while the prosody literature has examined the properties of standard it-clefts, including Arnhold’s detailed comparison of the prosody of it-clefts and focus marking in unmarked syntax, no work to date has carefully examined whether and how prosody disambiguates Connected Clauses and clefted Relatives.
Similarly, to our knowledge, this ambiguity has also been ignored in the psycholinguistics literature. Guo et al. (2024b, a) is the first study investigating the processing of this type of sentence. In two silent reading studies, we showed that disambiguating towards a clefted Relative reading led to lower acceptability ratings and longer reaction and reading times than Connected Clauses. The results suggest that, similar to other instances of ambiguity involving restrictive modifiers, clefted Relatives elicit garden-path effects in both online and offline processing in the absence of explicit prosody and a supporting context.5
In other words, clefted Relatives are less preferred, consistent with the parser’s general tendency to avoid Relative Clauses. This parsing preference is expected because while clefted Relatives are adjuncts, Connected Clauses are “quasi-arguments” of clefts. Even when reduced, Connected Clauses are always part of clefts’ semantics, and possibly also their syntax. Connected Clauses also carry fewer unsupported presuppositions than clefted Relatives, which also explains why they are easier to parse in the absence of a supporting context. Analyzing the prosodic disambiguation of these structures, therefore, adds to the broader issue of whether and how nested garden paths are prosodically disambiguated and to what extent prosodic disambiguation reduces the processing load of restrictive modifiers and allows listeners to sidestep the garden-path effect observed in the absence of (explicit) prosody.
1.2. Clefted Relatives and Phrasing: Nested garden paths at the syntax-prosody interface
In the previous section, we showed that Connected Clauses and clefted Relative Clauses, though string-identical, exhibit clear structural and interpretive differences. We also saw that, in the absence of explicit prosodic cues or contextual information, clefted Relatives give rise to garden-path effects (Guo et al., 2024a, 2025).6 A central aim of this paper is to determine whether this ambiguity can be resolved prosodically, and if so, how. Before turning to empirical investigation in Section 2, we situate this question within the broader literature on the interface between prosody and syntax (this section) and between prosody and semantics (next section). While no prior work has directly examined the prosody of clefted Relatives or how they are distinguished from Connected Clauses, existing research will allow us to formulate predictions about both prosodic phrasing and prominence, grounded in their structural and interpretive differences.
At the level of prosodic phrasing, although Relative Clauses form a single syntactic constituent with the nominals they modify, they are well known to tolerate prosodic separation, resulting in syntax-prosody mismatches (Shattuck-Hufnagel & Turk, 1996). In this section, we aim to show that clefted Relatives constitute an exceptional case due to independent grammatical factors, and we predict the prosodic phrasing of Connected Clauses and clefted Relatives should align with their respective syntactic structures.
One strong argument for the existence of a separate prosodic level of representation is that syntactic structures and prosodic phrasing are not always aligned (e.g., Shattuck-Hufnagel & Turk, 1996 and Turk & Shattuck-Hufnagel, 2014). Syntax-prosody mismatches suggest that prosodic phrasing is determined by the interaction of what we will refer to as faithfulness and balance constraints which operate over and shape hierarchically organized prosodic representations. Faithfulness constraints ensure that structural information is faithfully encoded at the syntax-prosody interface; balance constraints govern the optimal size of prosodic phrases. Syntax-prosody mismatches can arise from a conflict between these two types of constraints.7
One example of faithfulness constraints is the Edge-Alignment constraint (Selkirk, 1986, 1996), which requires alignment of the left/right edge of a syntactic phrase with the left/right edge of their corresponding prosodic constituent. This constraint further interacts with the Wrap-XP constraint (Truckenbrodt, 1995, 1999) which demands each syntactic phrase be contained in a phonological phrase, thereby preventing the division of syntactic phrases into multiple phonological phrases. An alternative approach in the Match Theory (Selkirk, 2009, 2011) involves a set of Match constraints that do not specifically refer to left or right edge alignment and, instead, emphasize the correspondence between syntactic and prosodic constituency at clause, phrase, and word levels.
Regardless of the specific account of this mapping relation, following faithfulness constraints, a single syntactic phrase should typically align with a single prosodic phrase, while distinct constituents are mapped to separate prosodic phrases, as evidenced by prosodic boundaries. Example (6) shows one such instance described in Büring (2013), with syntactic structures indicated by square brackets and prosodic phrasing by round brackets. In (6-a), fancy books forms a single complex Determiner Phrase. There is a smaller prosodic boundary between the two words compared to (6-b), where Nancy and books are two separate Determiner Phrases with a clear prosodic break in between.
- (6)
- a.
- (I like) [fancy books] and bebop
- … ( ) …
- b.
- (I like) [Nancy], [books], and bebop
- … ( ) ( ) …
Phrasing, however, is not determined solely by faithfulness principles. It is also shaped by constraints which require prosodic phrases to be neither too long nor too short. In phonological theory, constraints on the size of prosodic phrases have been formalized as well-formedness requirements. For instance, Selkirk (2000) proposed the Binary Minimum constraint, which requires a minimally binary prosodic length of a major (phonological) phrase, and the Binary Maximum constraint, which restricts a major phrase to have “at most two minor/accentual phrases” (Selkirk, 2000, p. 244). The principle of Uniformity, proposed by Ghini (1993), similarly suggests rhythmic balance across prosodic constituents, which can be achieved by modulating the number of phonological words and/or by modulating speech tempo (we will go back to Ghini’s insight in our discussion section). Constraints on the size of prosodic phrases have also been invoked to explain phrasing patterns observed in both comprehension and production in the psycholinguistics literature (e.g., Gee & Grosjean, 1983; Fodor, 1998; Watson & Gibson, 2004; Fodor, 2002; Augurzky, 2005 and Cooper & Paccia-Cooper, 1980). For example, in the comprehension literature, Frazier and Fodor (1978) proposed that sentences are parsed into units of similar length; similarly, in the literature on production, Ghini (1993) and Gee and Grosjean (1983) observed that speakers tend to produce balanced prosodic phrases and, like Frazier and Fodor, suggested that “a string is ideally parsed into same length units” (Ghini, 1993, p. 56) in speech comprehension. These so-called eurhythmic or balance constraints are motivated by insights from both phonological theories and cognitive/physiological considerations. A different take on size effects in phrasing is put forward by Watson (2002) and Watson and Gibson (2004), who suggest that the placement of intonational boundaries is partly governed by planning and recovery mechanisms in production. In this model, while phrasing is primarily determined by the syntactic and semantic properties of a sentence, speakers are more likely to produce an intonational boundary before and after large constituents (see also Breen et al., 2011 for a systematic comparison of this proposal with balance accounts).8
Independent of the source of size constraints, the effects of faithfulness and balance constraints do not always align. One classic example of such tension, first described in Chomsky (1965), is illustrated in (7-a). The string the cat that caught the rat that ate the cheese contains multiple nested clauses that form a single syntactic constituent, as in (7-b). However, this sentence is typically produced with a flatter prosodic structure, shown in (7-c), i.e., with phrases of similar sizes and prosodic breaks between each head noun (cat/rat) and the corresponding Relative Clauses.9 In what follows, we use squared bracketing ([]) to represent syntactic structure, and double vertical bars (||) to indicate prosodic phrasing.
- (7)
- a.
- This is the cat that caught the rat that ate the cheese.
- b.
- [This is [the cat [that caught [the rat [that ate the cheese]]]]].Syntactic Structure
- c.
- This is the cat || that caught the rat || that ate the cheese.Prosodic Structure
For one family of approaches (e.g., Nespor & Vogel, 1986; Selkirk, 2000; Truckenbrodt, 1995 and Turk & Shattuck-Hufnagel, 2014), these asymmetries show that sentences have a prosodic structure that is derived from their syntactic structure but essentially can be non-isomorphic to it, and this prosodic structure directly determines speech prosody. The properties of the prosodic structure itself result from the complex interaction of faithfulness and balance principles. Separate phrasing for Relative Clauses, for instance, illustrates a case where balance is ranked above faithfulness.
Wagner (2005, 2010) proposed an alternative account of such syntax-prosody mismatches involving Relative Clauses, which essentially amounts to questioning whether they constitute a genuine case of mismatch. Wagner points out that the separate phrasing in (7-c) is compatible with an alternative syntactic analysis in which the Relative Clause is not nested within the phrase it modifies but is instead extraposed to a higher syntactic position, as shown in (8). Extraposition involves movement of the Relative Clause out of the DP it modifies and to a higher position in the syntactic structure, creating two separate syntactic constituents which match the separate phrasing in (7-c). That extraposition is a grammatical option for these sentences is demonstrated by examples like (9), which shows that temporal adverbs (e.g., yesterday) can intervene between a head noun (cat) and the Relative Clause (that caught the rat). The interpolation of temporal adverbs (which belong to the main clause and not to the Relative Clause) forces extraposition of the Relative Clause (Hulsey & Sauerland, 2006) and serves to demonstrate the availability of this alternative structure.
- (8)
- [[This is the cat] [that caught the rat] [that ate the cheese]].
- (9)
- I saw the cat yesterday that caught the rat on Monday that ate the cheese on Sunday.
One important consequence of Wagner’s account is that, since extraposition is not always a grammatical option, separate phrasing for Relative Clauses should be contextually restricted, i.e., such syntax-prosody mismatch should only be observed when extraposition is licensed. Support for this account comes from Relative Clauses whose head is an idiom chunk (Wagner, 2005), for instance (10-a). It is generally assumed that extraposition is not a grammatical option for this kind of Relative Clause, illustrated in (10-b): Inserting an adverb (last year) between the head noun and the Relative forces extraposition (as shown in example [9]) and leads to degraded acceptability (Hulsey & Sauerland, 2006). Under Wagner’s (2005) account, the ungrammaticality of extraposition in these environments explains why syntax-prosody mismatches such as (10-c) are not observed.
- (10)
- a.
- Mary praised the headway that John made.
- b.
- *Mary praised the headway last year that John made.
- c.
- ?*This was entirely due to the advantage || that he took of the headway || that she had made before.
In sum, Wagner’s account of syntax-prosody mismatches involving Relative Clauses makes a specific, testable prediction which singles it out from other accounts: If separate phrasing is the reflection of an alternative, extraposed structure, it should only be observed in environments which license extraposition.
In the remainder of this section, we argue that clefted Relatives constitute another environment where extraposition is banned and that studying the prosody of clefted Relatives and Connected Clauses provides a novel testing ground for Wagner’s account. To begin with Connected Clauses: Not only do these structures allow extraposition, but several syntactic analyses argue that they involve obligatory extraposition (e.g., Reeve, 2010). That Connected Clauses allow extraposition is visible from examples such as (11): Inserting a temporal modifier last night between the clefted noun and the Connected Clause forces extraposition and yields a perfectly grammatical structure.
- (11)
- Q: Who called?
- A: It was [the gardener] last night [CCthat called].
Applying the same test to Clefted Relatives in (12), however, gives very different results, which provides a first indication that clefted Relatives resist extraposition.
- (12)
- Q: Who called?
- A: #It was [the gardener] last night [RCthat painted the door] ([CCthat called]).
Even more strikingly, extraposition of a clefted Relative over an intervening Connected Clause is completely ungrammatical, as shown in (13):10
- (13)
- Q: Who called?
- A: *It was [the gardener] [CCthat called] [RCthat painted the door].
These examples suggest that, unlike Connected Clauses, clefted Relatives resist syntactic extraposition and thus, syntactically, they can only be nested within the head nouns they modify. Under Wagner’s account, the ban on extraposition leads us to expect no syntax-prosody mismatches in this environment: Clefted Relative should be prosodified as a single phrase with its host, as shown in (14-a), and the separate phrasing in (14-b) should be unattested.
- (14)
- a.
- (Q: Who called?)
- A: It was the gardener that painted the door || that called.
- b.
- (Q: Who called?)
- A: *It was the gardener || that painted the door || that called.
Empirically validating the predicted prosodic phrasing in (14-a) can thus provide a novel argument for Wagner’s extraposition account, supporting the claim that the alignment between prosody and syntax may be more systematic than previously thought, at least in the domain of Relative Clauses.
While the source of the ban on extraposition in clefted relatives is not immediately relevant here (but see Poschmann & Wagner, 2016 for an overview of factors affecting the availability of extraposition), it is tempting to speculate that it may also be related to the different attachment sites of Connected Clauses and clefted Relatives. Extraposing a clefted Relative would produce a structure indistinguishable from a Connected Clause. A rational speaker, as defined by Clifton et al. (2002, 2006) and Frazier et al. (2006), would therefore avoid such ambiguity by employing prosody and syntax in an internally consistent and rational way. This is, in essence, the account proposed by Grillo et al. (2025) to explain the prosody of nested garden-path sentences, a class of constructions to which our contrast of interest belongs. A number of recent findings on the prosodic disambiguation of nested garden paths (Grillo & Turco, 2016; Grillo et al., 2018, 2023, 2025) consistently show that nested structures tend to be prosodically grouped with their syntactic hosts in such environments. Our predictions for clefted Relatives, therefore, are in line with observations from this broader literature. For clarity and continuity, we postpone a thorough discussion of this point until the Discussion.
In sum, this section has argued that clefted Relatives offer a novel testing ground for theories of syntax-prosody mapping, particularly Wagner’s extraposition account of phrasing mismatches. Before empirically evaluating these predictions, the next section revisits the differences in information structure between clefted Relatives and Connected Clauses, demonstrating how this contrast can further inform ongoing debates concerning the interface between prosody and semantics.
1.3. Clefted Relatives and Prominence: The semantics-prosody mapping
As discussed in Section 1.1, Connected Clauses and clefted Relatives differ in their syntactic and information structures. In ‘standard’ clefts (Example [3]), only the clefted DP is in focus, while the Connected Clause introduces backgrounded material and can be optionally omitted. In contrast, because clefted Relatives are syntactically nested within the DP they modify, the entire DP (including Relative Clause) receives focus, and any background information is introduced by an optionally realized (but always presupposed) Connected Clause (Example [4]). This contrast makes clefted Relatives a compelling test case for theories of the prosody-semantics interface. While different approaches make similar predictions for Connected Clauses, their predictions diverge on prominence assignment within clefted Relatives, providing a unique opportunity to evaluate competing accounts. Before addressing this question, we first review the broader literature on the prosodic realization of focus to situate clefted Relatives within the relevant theoretical context. We then examine different families of accounts of the semantics-prosody mapping and consider their respective predictions for the prominence patterns of our target structures.
It is commonly observed that the meaning of a sentence varies with the context in which it occurs and that these differences in meaning are often expressed acoustically. For example, (15) illustrates that the same string of words (John ate pizza) may be realized with prosodies corresponding to the different information that each answer provides. In (15-a), the word pizza provides new information, which answers the question and most naturally receives prosodic prominence. Conversely, John carries new information when answering the question in (15-b) and would typically be acoustically emphasized in that context.
- (15)
- a.
- Q: What did John eat?
- A: John ate [pizza]Focus.
- b.
- Q: Who ate pizza?
- A: [John]Focus ate pizza.
Semantically, this difference in acoustic realization can be captured with Rooth’s (1985) theory of alternative semantics: Focus marks the presence of alternatives relevant to the interpretation of a sentence. A focused constituent introduces a set of contextual alternatives, which are evaluated against the current question under discussion or contrastive expectations. This helps explain why focused elements are typically accented—they signal new, contrastive, or informative content—whereas backgrounded or given material remains unaccented and prosodically reduced (see also Hoeks et al., 2023; Rooth, 1985, 1992; Gussenhoven, 2007; Féry & Krifka, 2008; Breen et al., 2010).
While the meanings and pronunciations of (15-b) and (15-a) differ, the prosodic prominences associated with the focused information in each retain some similarities. For example, such prominence is typically characterized by longer duration, higher F0, and higher intensity on focused elements relative to given elements, at least for some languages (including American and British English), and despite some across-experiment differences in the consistency of these features (Cooper et al., 1985; Eady & Cooper, 1986; Xu & Xu, 2005; Breen et al., 2010; Arnhold, 2021; Lee et al., 2015). Xu (2019) further summarized the most consistent correlates of focus in English as i. post-focal compression of F0, and ii. longer duration and higher F0 on the focused word. Taking (15) as an example, we should expect to find a longer duration and higher F0 for John in (15-b) than (15-a) (and vice versa for pizza), and a lower F0 for ate in (15-b) than (15-a).
However, on the post-lexical level, whether this semantics-prosody mapping is direct (Cooper et al., 1985; Eady & Cooper, 1986; Lieberman, 1963; Pell, 2001; Xu & Xu, 2005; Xu et al., 2015) or indirect (e.g., Ladd, 2008; Pierrehumbert, 1980; Gussenhoven, 2004 and Wagner, 2005) is a contentious matter. From the perspective of direct approaches, aspects of meaning in a sentence, such as emotion, focus, or grouping, are directly linked to a set of acoustic features. In contrast, for indirect approaches, meaning is first mapped onto linguistic representations (syntactic and/or prosodic), over which different phonological rules are applied and which ultimately determine the acoustic properties of utterances. Whether that intermediate level is syntactic or metrical (as the prosodic hierarchy of intonational phonology models) is beyond the scope of this paper; here we are primarily concerned with testing the predictions of direct and indirect accounts.
One issue with this literature is that the same term is sometimes used to refer to different concepts, making it difficult to compare and validate the theories’ predictions. The use of “focus” is one such example. While both accounts consider focus an important factor that affects sentence prosody, indirect approaches generally use the term as we have defined it above, i.e., as marking the presence of alternatives relative to the interpretation of the sentence. Direct approaches, on the other hand, are somewhat internally inconsistent. Early direct-approach studies such as Cooper et al. (1985) and Eady & Cooper (1986) (and implicitly in Pell, 2001) use the term (main) focus to indicate “primary stress” or prominence in a sentence, which relates to the prosodic aspect. Xu (2005, 2019) and Xu et al. (2015) instead see the word focus as a communicative function. Particularly, Xu defines focus as the part of the information that speakers intentionally emphasize to convey its importance to the listener, while newness “is associated only with memory retrieval” (Xu, 2019, p. 330). However, this definition does not clarify how speakers determine which part of the information is important or, crucially, how to predict which elements are focused in a sentence. To address this issue, in the following discussion, we will reserve the word Focus to refer to the semantic aspect associated with a word/phrase’s contribution to information structure, and prominence when referring to the acoustic properties of focused elements.
Under this definition, we expect the two accounts to make aligned predictions for sentences like (15-a) or (15-b), based on the prosodic pattern described above: The word carrying focus has more prominent acoustic features than words that are given or carry a lower informational load. This is because, for cases like (15-a) and (15-b) where focus falls on a single word, space is too limited to decide whether information structure directly determines the prominence of that word or whether this is mediated by intermediate phonological representations (e.g., a +accent feature associated with focused elements in a phonological representation). This explains why the two approaches are often hard to differentiate, as many production studies used such simple structures as items, where their predictions align. Even more sophisticated experimental work, such as Breen et al. (2010), fails to clearly distinguish between the two accounts, despite offering strong evidence on the prosodic correlates of focus.11
Distinguishing the two accounts, therefore, boils down to not a matter of precision of measurement but to constructing linguistic contrasts for which the two accounts indeed make different predictions. We suggest that this can be achieved by leveraging structural nesting, as in (16). Here, the DP pizza with anchovies contains nested materials, and in this context, each part of it carries new information and is in focus. Therefore, direct accounts should predict overall higher prominence for all pitch-accentable words across the focused phrase (i.e., pizza and anchovies). Indirect accounts, in contrast, make specific and localized predictions about accent assignment, which, at least in this “all new” context, are relatively independent of the informational content of a word and are determined based on rules that apply to syntactic and phonological representations (e.g., the Nuclear Stress Rule, Chomsky & Halle, 1968; Zubizarreta, 2016). Specifically, the Nuclear Stress Rule would predict higher prominence to fall on the most deeply nested word anchovies compared to pizza.12
- (16)
- Q: What does John like?
- A: John likes [DP pizza [PP with anchovies]].
Similar reasoning can also be applied to our target structures, repeated in (17-a) and (17-b). Here, both families of accounts predict greater prominence on the clefted element in (17-a), which is in focus and carries new information, than on the Connected Clause, which is not in focus and carries given/old information. However, their predictions diverge for clefted Relatives in (17-b). Since the entire DP carries focus, we expect direct accounts to predict higher prominence for all pitch-accentable words (gardener, painted, and door). As they contribute equally to conveying new information, there should be no difference in the prominence levels among elements in the Relative Clause. In contrast, indirect accounts more specifically predict the location of the nuclear stress to fall on the most deeply nested word in the structure, i.e., door in (17-b), resulting in a prominence asymmetry even when all elements in the Relative Clause are in focus.
- (17)
- a.
- Q: Who painted the door?
- A: It was [DP the gardener]Focus [CC that painted the door].
- b.
- Q: Who called?
- A: It was [DP the gardener [RC that painted the door]]Focus ([CCthat called]).
One key reason why we chose to test the two accounts using Connected Clauses and clefted Relatives is that their focus structures are immediately accessible. An additional reason is that in (16), the DP is globally unambiguous.13 Lack of ambiguity prevents us from asking a number of questions we can afford when testing clefted Relatives, including the contribution of prosody to the disambiguation of this understudied contrast, in addition to the questions about phrasing and the syntax-prosody interface discussed in Section 1.2.
1.4. The present study
In this study, we will address the broader questions of whether and how the contrast of clefted Relatives and Connected Clauses is prosodically disambiguated in production (Experiment 1), and whether listeners are sensitive to these prosodic differences in comprehension (Experiment 2). Based on our impressionistic judgments, previous results on similar structural contrasts (e.g., Grillo & Turco, 2016; Grillo et al., 2018; Grillo et al., 2025 and Poschmann & Wagner, 2016), and the assumption that different syntactic and semantic properties of sentences are reflected in their prosody, we expect Connected Clauses and clefted Relatives to be produced with different prosodic patterns. Similarly, we expect prosody to strongly ameliorate comprehension of clefted Relatives, a structure which we have independently shown to pose severe problems for the parser in silent reading (Guo et al., 2024a, 2025). These questions, however, can only be reasonably addressed through careful experimental work.
In addition, as clarified in the previous sections, the structural and interpretive contrast between the two structures allow us to address the following, more specific theoretical questions about the interface between prosody and syntax/semantics:
Q1. Are the respective phrasing properties of clefted Relatives and Connected Clauses consistent with the predictions of Wagner’s (2005, 2010) extraposition account and Grillo et al.’s (2025) Rational Speaker account? These accounts predict that clefted Relatives and their hosts will be produced as a single prosodic phrase and that Connected Clauses, on the other hand, should be prosodified separately. If our production experiment showed that speakers consistently produce separate phrasing for clefted Relatives and their head nouns, this would provide strong evidence against these accounts.
Q2. Do the prominence properties of these structures align better with the predictions of direct or indirect accounts of the semantics-prosody mapping? The two families of accounts make similar predictions for Connected Clauses (i.e., higher prominence on the clefted Noun and deaccenting on the Connected Clause itself). Their predictions differ for clefted Relatives, i.e., only indirect accounts predict localized higher prominence on the final, most embedded Noun of the clefted Relatives.
2. Experiment 1: Planned production
2.1. Participants
Eight native British English speakers originating from different regions of the UK participated in the experiment in a soundproof booth at the University of York. One participant was excluded due to producing too many errors and hesitations, leaving us seven participants for analysis (age range = 28–36, mean = 32.3, SD = 3.5, women = 4). Participants gave their informed consent and were paid for their participation. Each subject participated in two sessions that were separated by at least one week, and each session focused on a single critical structure. This study was approved by the Ethics Committee of the Department of Language and Linguistic Science, University of York.
2.2. Materials
Each condition included 24 Question-Answer pairs as in (18-a) and (18-b). Each target sentence was structured as It was + the NP1 + that + was + Verb + the NP2 (NP stands for Noun Phrase). The questions in the Connected Clause condition directly asked about the agent of the Verb Phrase in the answer, while the questions in the Clefted Relative condition were kept constant as in (18-b). This design ensured grammatical disambiguation of the two readings: Matching/mismatching the content of the question and the Complementizer Phrase forced either a Connected Clause reading (match) or a clefted Relative reading (mismatch) of the target sentence.14 Target sentences are prosodically controlled across items where we kept the number of syllables and the position of lexical stress constant within each region: NP1 is always trisyllabic with lexical stress on the first syllable, denoting a profession or property of people; the Verb is transitive and disyllabic with lexical stress on the first syllable, and it forms a phrase with NP2, which is monosyllabic. All sentences were in the past tense.
- (18)
- a.
- Connected Clause condition:
- Q: Who was working the shift?
- A: It was [DP the editorNP1] [CP that was workingVerb the shiftNP2].
- b.
- Clefted Relative Clause condition:
- Q: Which one of them was identified?
- A: It was [DP the editorNP1 [CP that was workingVerb the shiftNP2]].
Experimental items were interspersed with 48 filler sentences that varied in their syntactic structures, but approximately matched experimental items in length. Twelve fillers were also preceded by context questions to make half of all items form Question-Answer pairs.
2.3. Procedure
Before the experiment, participants were asked to fill in a questionnaire about their language background and other demographic information. They were then instructed by a researcher to silently scan the entire (question and) sentence before reading aloud. They should then produce the question (if any) and sentence naturally and fluently at normal speed. When presented with a Question-Answer pair, participants were instructed to imagine themselves in a conversation: asking the Question, as if seeking information, and responding with the Answer, as if informing someone unaware of the answer. Items were automatically presented on a computer screen, and participants were recorded with a headset microphone connected to the PC using the software ProRec 2.4 (Mark Huckvale, University College London), at a sampling rate of 44.1 kHz and 32-bit resolution.
Experimental stimuli were initially divided into two lists, each containing only one structure type (i.e., either Connected Clause or clefted Relative) to ensure that participants saw only one condition per session. This was a within-subjects design: All participants completed both conditions across two sessions, with the order of conditions counterbalanced across participants. All items were pseudo-randomized within each list. To control for potential sequence effects, two additional lists were created by reversing the item order of the originals, leading to a total of four lists. Experimental items were separated by at least one filler item in between.
Every session started with four practice items, followed by 24 experimental items interspersed with 48 fillers, leading to a total of 76 items for each participant in each session. The entire experiment lasted approximately 40 minutes.
2.4. Data processing and analysis
Before data analysis, segmentation was performed automatically using the Montreal Forced Aligner (McAuliffe et al., 2017). Duration, F0, and intensity were automatically detected by means of scripts run in Praat software (Boersma, 2001). The results of the automatic procedure were checked and manually corrected (blinded to the condition the sentence belonged to) in case of errors.15
To address the broader question of whether and how speakers prosodically disambiguate Connected Clauses and clefted Relatives, we analyzed the acoustic properties of productions, following Eady & Cooper (1986), Breen et al. (2010) and Arnhold (2021). Firstly, to examine cues for early disambiguation, we calculated the temporal alignment of F0 maximum within the accented syllable of NP1 (i.e., the first syllable), which hereinafter is referred to as peak alignment. This measurement is expressed as a percentage of the syllable’s duration and is calculated as follows. Values above 100 indicate that F0 peak on NP1 occurred after the end of the first syllable, whereas values between 0 and 100 reflect peaks that fall within the syllable.
(TimeF0 maximum – Time1st syllable onset) / (Time1st syllable offset – Time1st syllable onset) ∗ 100
On the specific issue of phrasing differences between the two structures raised in Question 1, if clefted Relatives are prosodified with their hosts, as predicted by Wagner (2005) and Grillo et al. (2025), then we expect to observe a stronger prosodic boundary between NP1 editor and the Complementizer that in Connected Clauses than in clefted Relatives. Comparing pre-boundary lengthening between NP1 and the Complementizer for the two structures should allow us to diagnose the strength of this boundary. Thus, the following measurement was extracted:
Raw duration of the three syllables of NP1, in milliseconds (ms)
To further examine the general prosodic differences between the two structures, we extracted and analyzed the following three measurements for every pitch-accentable word (i.e., NP1 editor, Verb working, and NP2 shift). Also, to address Question 2, we focused on the comparison between NP2 vs. Verb in the two structures, where we expect the two families of accounts to make different predictions.
Raw duration of words, in ms
F0 range (F0maximum – F0minimum), in semitones (st)
Mean intensity, in decibels (dB)
Additionally, we conducted a post hoc analysis comparing the F0 peak at NP2 (shift) and NP1 (editor) across the two conditions. This measurement (hereinafter referred to as F0 scaling) was intended to indicate, among other acoustic cues, whether the focus on NP2 was contrastive or non-contrastive, and was calculated as follows (in semitones):
12 × log2 (F0 MaxNP2/F0 MaxNP1)
If NP2 carries contrastive focus, we expect a higher F0 peak on NP2 than on NP1, resulting in a positive F0 scaling. If the focus on NP2 is non-contrastive, the F0 peak on NP2 should be lower than on NP1, leading to a negative F0 scaling. Values at zero would indicate a same level of F0 peak on the two NPs.
We analyzed acoustic properties in R (v. 4.3.1) using RStudio (v. 2023.6.2.561). Linear mixed-effects regression models in the lme4 package (Bates et al., 2015) were used for these measurements with the maximal random effects structure while allowing for model convergence (Barr et al., 2013). ANOVA comparisons were used to examine the contribution of fixed effects to models, and emmeans (Lenth, 2023) for post hoc analyses on simple structural effects at each region where interactions were observed. Similar analyses were applied to the other analyses discussed later in this paper.
For models relating to the broader question and Question 1, in which Structure is the only independent variable, sum coding was applied for factor contrasts of Structure. Connected Clauses and Relative Clauses were set to –0.5 and 0.5 respectively. For models testing Question 2 in which both Structure and Region are independent factors, we set the fixed effects to Regions (NP2 vs. NP1 vs. Verb, the element in italics indicated the reference level), Structures (CC vs. RC), and the interaction between them.
2.5. Results
To illustrate the general pattern of the recorded utterances, one pair of examples from Experiment 1 is shown in Figures 1 and 2, corresponding to examples of pitch contours of the Connected Clause and Relative Clause recordings respectively. Time-normalized contours of the two structures are provided in Figure 1 in Appendix A.
2.5.1. F0 peak alignment at NP1
The temporal alignment of the F0 peak to the length of the first syllable in NP1 across the two structures is illustrated in Figure 3. Analysis showed a strong tendency for delayed peak alignment at NP1 in Connected Clauses compared to clefted Relatives (β = –11.42, SE = 5.69, t(6.01) = –2.01, p = .09). Post hoc analysis found that one participant showed the opposite pattern in the delayed peak, while the remaining six speakers illustrated a significantly delayed peak at NP1 in Connected Clauses compared to clefted Relatives (β = –16.63, SE = 2.98, t(13.27) = –5.58, p < .001).
Box plot of temporal alignment of F0 peak relevant to the first syllable of NP1 (editor) in Connected Clauses and clefted Relatives. The box spans the interquartile range (25th to 75th percentile), and the median is in between. Whiskers represent the most extreme values within 1.5 times the interquartile range. Jittered points represent individual data observations.
2.5.2. Raw duration of syllables in NP1
Differences in raw durations of the three syllables in NP1 (e.g., editor) are shown in Figure 4 and Table 1. We observed significant structural effects in the same direction at the second (β = –8.18, SE = 2.42, t(301.07) = –3.38, p < .001) and the third (β = –38.89, SE = 13.49, t(5.94) = –2.88, p = .028) syllables which were shorter in Relative Clauses than Connected Clauses. No significant differences between structures were found at the first syllable in NP1 (p = .40). The overall pattern suggests a similar duration for the first syllable in the two structures, but shorter second and third syllables in Relative Clauses.
Average duration in ms of syllables in NP1 (editor) by structure (with Standard Errors in parentheses).
| Syllable 1 | Syllable 2 | Syllable 3 | |
| CC | 197(6.95) | 91.8(3.38) | 200(19.3) |
| RC | 195(7.93) | 83.5(4.17) | 161(9.38) |
2.5.3. Raw Duration of words
Localized effects of Structure were found for word duration as presented in Table 2 and Figure 5. The model comparison showed a significant interaction between Region and Structure (AIC = 11196, χ2(2) = 86.23, p < .001). Crucially, the difference in structural effects at the Verb was significantly smaller than at NP2 (β = –36.89, SE = 8.90, t(952.09) = –4.15, p < .001), indicating localized effects as predicted by the indirect account. This is further supported by post hoc analyses which revealed that NP2 was produced significantly longer in Relative Clauses than in Connected Clauses (β = 51.7, p < .001), while the pattern was opposite for NP1, which was longer in Connected Clauses (β = –36.9, p = .002). No differences in the duration of the Verb were observed (p = .11).
Average duration in ms of NP1 (editor), Verb (working), and NP2 (shift) by structure (with Standard Errors in parentheses).
| NP1 | Verb | NP2 | |
| CC | 474(19.4) | 305(6.8) | 354(12.8) |
| RC | 438(18.3) | 320(10.8) | 405(21.9) |
2.5.4. F0 Range
For F0 range, we observed a similar localized pattern to duration, illustrated in Table 3 and Figure 6. Models revealed a significant interaction effect between Structure and Region (AIC = 4344.7, χ2(2) = 42.93, p < .001). Importantly, the difference in the F0 range between the two structures was significantly smaller at Verb than at NP2 (β = –2.26, SE = 0.43, t(869.95) = –5.29, p < .001). This is supported by post hoc analysis where F0 range at NP2 was significantly larger for Relative Clauses than Connected Clauses (β = 2.14, p < .001). No significant differences were found between the two structures at NP1 (p = .36) or the Verb (p = .77).
Average F0 range in st of NP1 (editor), Verb (working), and NP2 (shift) by structure (with Standard Errors in parentheses).
| NP1 | Verb | NP2 | |
| CC | 5.68(0.99) | 2.74(0.33) | 1.89(0.28) |
| RC | 5.11(0.75) | 2.64(0.16) | 4.12(0.74) |
2.5.5. Mean intensity
Similarly, localized effects in intensity were observed as shown by Table 4 and Figure 7. Models showed significant interactions between Structure and Region (AIC = 5239.7, χ2(2) = 10.51, p = .005). However, we did not find significantly different structural effects at the Verb compared to at NP2 (p = .13). The post hoc analysis showed a significant effect of Structure at all three regions in the same direction: Relative Clauses were produced with higher intensity than Connected ones at NP1 (β = 3.07, p = .017), Verb (β = 3.92, p = .011) and NP2 (β = 4.60, p = .007).
Average intensity in dB of NP1 (editor), Verb (working), and NP2 (shift) by structure (with Standard Errors in parentheses).
| NP1 | Verb | NP2 | |
| CC | 61.7(1.43) | 57.8(1.13) | 52.8(0.97) |
| RC | 64.8(1.51) | 61.8(1.40) | 57.4(1.37) |
2.5.6. F0 scaling
Figure 8 illustrates the F0 scaling between NP2 (e.g., shift) and NP1 (e.g., editor) for both Connected Clauses and clefted Relatives. We observed a significant effect of Structure: Relative Clauses had significantly higher F0 scaling than Connected Clauses (β = 3.82, SE = 1.56, t(5.92) = 2.47, p = .049). This suggests that the difference in F0 peak between NP2 and NP1 was larger in Connected Clauses than in clefted Relatives. Crucially, both conditions showed an average negative F0 scaling, meaning that NP2 had a lower F0 peak than NP1 even in Relative Clauses, suggesting a non-contrastive focus on NP2 in clefted Relatives.16
F0 scaling at NP2 (shift) vs. NP1 (editor) by structure in semitones. Error bars show Standard Errors of the mean. According to the formula used above, positive values indicate a higher F0 peak on NP2 than NP1; negative values indicate a lower peak on NP2 than NP1; zero indicates equal peak values for the two regions.
2.6. Interim summary and discussion
The production experiment established that speakers prosodically disambiguate Relative Clauses and Connected Clauses, evidenced by differences in duration, F0, and intensity across different regions, particularly at NP1 and NP2. Notice that we observed acoustic differences already at the first NP editor, i.e., before the ambiguous region, with: i. longer duration of the last syllable in Connected Clauses, compatible with pre-boundary lengthening in these structures; and ii. later peak alignment in Connected Clauses, which tends to be perceived by listeners as an indication of increased prominence (for discussion see e.g., Ladd & Morton, 1997 and Kohler & Gartenberg, 1991).
We will elaborate on these results in more detail in the Discussion after presenting the auditory comprehension Experiment, and only limit ourselves to highlighting two main findings here, related to the specific questions on phrasing and prominence raised in Section 1.4:
Phrasing: The durational properties of NP1 suggest that clefted Relatives are produced as a single prosodic phrase with the noun they modify, while Connected Clauses are prosodically separated from the clefted noun. This phrasing contrast aligns with a more general pattern observed with other varieties of nested garden paths (Grillo & Turco, 2016; Grillo et al., 2018, 2023), which show that phrasing is preserved in these environments. Given the ban on extraposition for clefted Relatives discussed in Section 1.2, these results can be taken to support Wagner’s (2005, 2010) extraposition account of syntax-prosody mismatches involving Relative Clauses. These results are also consistent with Grillo et al.’s (2025) account of phrasing in nested garden-path environments: Separating clefted Relatives from their hosts would lead to attachment to a higher syntactic position, which essentially would make them indistinguishable from the alternative Connected Clause structure (as seen in Section 1.1, syntactic analyses assume that Connected Clauses are either based generated at a higher position or are obligatorily extraposed). Separate phrasing, therefore, should be avoided by a Rational Speaker who employs syntax and prosody in an internally consistent, rational fashion (Clifton et al., 2002, 2006; Frazier et al., 2006).
Prominence: Speakers tended to exhibit the greatest prosodic difference between the two structures at NP2, with little difference at Verb. Such highly localized differences, particularly for duration and F0 range, suggest that in clefted Relatives the nuclear stress fell on NP2, which is the most deeply nested word. This is more in line with predictions of indirect accounts. F0 scaling patterns between NP1 and NP2 also suggest that the focus we observed on NP2 in clefted Relatives was not realized with acoustic cues typical of a (narrowly) contrastive focus (see Breen et al., 2010 for a detailed discussion).
3. Experiment 2: Auditory Perception
Experiment 1 demonstrated that speakers prosodically disambiguate Connected Clauses and clefted Relatives. Observing differences in production, however, is not enough to establish that those cues are meaningful/perceptually relevant and used to disambiguate between alternative readings in comprehension. In this second study, we test whether listeners are sensitive to these acoustic differences and can use them to guide syntactic parsing and avoid garden-path effects with clefted Relatives, which is observed in the absence of prosody in Guo et al. (2023, 2024a).
3.1. Participants
Sixty-four native speakers of English (30 women) located in the US (age range = 20-to-50, mean = 33.8, SD = 8.1) were recruited via the online recruitment platform Prolific (www.prolific.com). We used recruitment filters to ensure that all participants had no language, vision, or hearing-related disorders. All participants gave their informed consent and were compensated for their participation.
3.2. Materials
This experiment contained 24 experimental stimuli, each comprising two parts: a short written context ending with a Wh-question (similar to the contextual questions in Experiment 1) and an audio sentence acting as an answer. One stimulus is shown in example (19). We employed a 2 × 2 within-subjects design crossing Context and Prosody: The written context and question were designed to elicit either a Connected Clause or Relative Clause interpretation of the target sentence. The target stimuli were sentences from Experiment 1, produced by a trained phonetician with either Connected Clause or clefted Relative prosody.
- (19)
- a.
- Connected Clause-leading Context: There was a severe accident last night at your printing company. As the site manager, you need to know all the details, but you can’t remember who was in at the time, so you asked your colleague:
- Who was working the shift?
- b.
- Clefted Relative Clause-leading Context: The section of the newspaper you recently joined has two editors, one of them has been on leave for the past two months and the other you met just as they were finishing up for the day. As you enter the office kitchen, you hear that one of them got a prize. You want to know who so you ask the colleague:
- Which one of them got a prize?
- c.
- Audio answer: It was the editor that was working the shift.
Specifically, the Connected Clause contexts (including questions), for instance (19-a), were manipulated such that the first noun phrase in the answer (e.g., the editor) was in focus, providing new information. This is consistent with Experiment 1, as shown by the example audio item with pitch contour in Figure 9. In contrast, to ensure a natural flow of context and target questions and sentences, the clefted Relatives context (19-b) introduced two alternative referents for the head of the clefted DP, each performing different actions (in all items we took care to avoid repetition of lexical content in the context and target sentences). While the whole DP was given (entailed by the context), it is still in focus. In the target sentence, the noun introduces a set of alternatives (e.g., editor) which are given in the context, and the Relative Clause provides a restriction over that set. This is a notable difference with the design of Experiment 1, in which the head noun (and the Relative Clause) was only introduced in the target sentence (recall that the context question was: Which one of them was identified?). Due to this difference in information structure, the prosody of the Relative Clause stimuli in Experiment 2 differed slightly from that observed in Experiment 1. Although modeled on the general prosodic patterns of Experiment 1, the speaker in Experiment 2 was instructed to produce the target sentences, keeping in mind the contextual cues provided to participants in the comprehension task. This resulted in a less pronounced tonal excursion at NP1 in Experiment 2 compared to Experiment 1, visible in the pitch contour example in Figure 10.
Experimental items were balanced for condition using the Latin Square design and were interspersed with 36 fillers, preceded by 3 practice items. In total, each participant completed 63 trials. To ensure participants’ attentiveness, half of each of the experimental and filler items contained comprehension questions that targeted different parts of the context to avoid strategic reading of the context. The proportion of Yes and No answers to the comprehension questions was balanced.
3.3. Procedure
This experiment followed a paradigm in Arnhold (2021) and was performed on the Gorilla Experiment Builder (www.gorilla.sc). In each trial, after reading the context and question, participants listened to a recording with either matched or mismatched prosody to the given context as shown in example (19). Next, they were asked to judge whether they thought the audio sentence was an acceptable answer for the context and question by choosing Yes or No. Every judgment was followed by a graded confidence rating that asked about their certainty in that judgment, namely Not confident, Somewhat confident, or Very confident. Finally, participants needed to answer a comprehension question, if any. The experiment took approximately 30 minutes to complete.
3.4. Data processing and analysis
To remove artifacts in the results, trials with a Reaction Time of less than 200 ms (too short to be a realistic response time) or longer than 10000 ms (unreasonably long response time for this task, potentially indicative of distraction) in the acceptability judgment task were excluded from data analysis. This led to a total of 52 items being removed, accounting for 3.38% of the data. Statistical analysis focused on two measurements.
Binary Acceptability. We analyzed the binary acceptability responses (Yes vs. No) using generalized logistic mixed-effects models (GLMER) in the lme4 package (Bates et al., 2015) with a binomial distribution. Fixed effects included Context (Connected Clause-leading vs. clefted Relative-leading), Prosody (Cooperative vs. Conflicting), and their interactions, with a maximum random effects structure while allowing for model convergence. The factor contrasts were set to be (0.5, –0.5) for both factors, where the Connected Clause-leading Context and the Cooperative Prosody weighted –0.5.
Combined 6-pt responses. Apart from binary acceptability responses, we also modeled the responses by combining the binary acceptability data with graded confidence ratings. Such a combination led to a 6-point scale, ranging from 1-Very confident unacceptable to 6-Very confident acceptable, which closely mimics a Likert scale. For statistical analysis, we employed the cumulative link mixed-effects models (CLMM) in the ordinal package (Christensen, 2023), considering the ordinal nature of the data. We used the same fixed and random effects structure as in the GLMER model.
3.5. Results
Binary acceptability. As illustrated in Figure 11, the acceptability rating was significantly lower for the Conflicting than Cooperative prosody (β = –1.74, SE = 0.19, z = –9.03, p < .001) and also lower for clefted Relative-leading contexts than Connected Clause-leading ones (β = –1.00, SE = 0.33, z = –3.01, p = .003). Interestingly, we found a significant interaction between Prosody and Context on the acceptability score: The difference in acceptability between the two Prosody conditions was greater under the clefted Relative-leading Context than the Connected Clause one (β = –1.38, SE = 0.34, z = –4.09, p < .001). The interaction effect is also supported by post hoc analyses that showed no significant difference in acceptability between the two contexts when the Prosody was Cooperative (p = .44); with Conflicting prosody, on the other hand, acceptability was significantly lower for clefted Relative-leading contexts than Connected Clause-leading ones (β = –1.69, SE = 0.34, z = –4.92, p < .001).
Combined 6-point responses. Figure 12 presents the distribution of the 6-point rating across conditions. Overall, the pattern is consistent with the binary data. Conflicting Prosody to the Context received significantly lower ratings than the Cooperative one (β = –0.76, SE = 0.21, z = –3.62, p < .001), while no significant difference between Contexts was found (p = .77). Similar to binary results, 6-point responses also illustrated a significant interaction between Prosody and Context on the ratings (β = –1.70, SE = 0.33, z = –5.20, p < .001): The difference in acceptability between Conflicting and Cooperative Prosody was greater for a Relative Clause-leading Context, compared with Connected Clause-leading context.
To summarize, Experiment 2 asked whether listeners are sensitive to the prosodic differences between Connected Clauses and clefted Relatives identified in Experiment 1. A main effect of Prosody showed that listeners use prosodic cues to disambiguate the two structures. Additionally, the interaction effect shows that listeners are more likely to judge the clefted Relative prosody as acceptable in the context of Connected Clauses than they are to accept the prosody of Connected Clauses in a context that licenses clefted Relatives. We comment on these results in the General Discussion.
4. General discussion
This study investigated the prosodic disambiguation of the understudied contrast between Connected Clauses and clefted Relative Clauses. These constructions are string-identical but syntactically and semantically distinct. Previous work (Guo et al., 2024a, 2025) demonstrates that clefted Relatives are very hard to parse in silent reading and give rise to classic garden-path effects (lower acceptability ratings and longer reading times).
This raised two overarching questions for production and comprehension, which we addressed in two experiments. The first broad question is whether speakers prosodically disambiguate these constructions and, if so, what the main prosodic differences are between the two. The second broad question is whether listeners are sensitive to the prosodic disambiguation and whether they can use it to overcome the garden-path effects observed in silent reading.
The results of Experiments 1 and 2 provide a positive answer to both questions. Experiment 1 showed that speakers use a variety of tonal and temporal cues (as well as intensity to a lesser extent) to distinguish the two constructions. Experiment 2 showed that listeners are sensitive to these prosodic cues and can successfully use them to identify the two constructions. As outlined in the introduction, the structural and interpretive differences between the two constructions raise non-trivial theoretical questions (see Section 1.4). In the remainder of this section, we summarize these results in more detail, focusing in turn on phrasing and prominence while examining their implications for theories of the syntax–prosody and semantics–prosody interfaces. We then discuss the implications of Experiment 2 for models of sentence processing.
As mentioned, production results from Experiment 1 provided a positive answer to the broad question of prosodic disambiguation: The different syntactic and semantic properties of the two structures were associated with distinct prosodic patterns across the ambiguous region, including differences in duration, F0 range, and intensity. In fact, we observed differences in syllable length and F0 peak alignment as early as NP1 (the first Noun Phrase), i.e., the region immediately preceding the ambiguous region, providing prosodic cues for early disambiguation.
Table 5 summarizes the general prosodic differences between Connected Clauses and clefted Relatives at different regions found in the production experiment. Taken together with the intonational patterns exemplified earlier, our results show that typical it-clefts were generally produced with high prominence on the clefted element editor and a rather flat intonation contour for the Connected Clause. These findings are consistent with the general assumption in the cleft literature discussed in Section 1.1. In contrast, in clefted Relatives, we observed higher prominence on both the head Noun editor and the final Noun shift. We will discuss these differences in turn, highlighting how they connect to our more specific questions in Section 1.4 about phrasing and prominence.
Summary of results on differences across regions in the two structures. Only significant differences were reported. The symbol “>” indicates “longer” for the measure raw duration, “bigger” for F0 range, and “higher” for intensity.
| NP1 | Verb | NP2 | |
| Raw duration | CC > RC | CC < RC | |
| F0 range | CC < RC | ||
| Mean intensity | CC < RC | CC < RC | CC < RC |
Regarding phrasing, the longer duration of the final syllable in NP1 in Connected Clauses, compared to Relative Clauses, aligns with phrase-final or pre-boundary lengthening at NP1, suggesting a separate prosodic phrasing between NP1 editor and the Complementizer that in Connected Clauses. In contrast, clefted Relatives were significantly less likely to be prosodically separated from their head noun, despite the fact that these constructions increase the size of their host DP and are harder to parse than Connected Clauses, as shown in Guo et al. (2024a, 2025).
Contrary to previously observed syntax-prosody mismatch in some Relative Clauses, these patterns suggest that the prosodic phrasing for the two structures closely align with their respective syntax: The head Noun Phrase the editor and Complementizer Phrase that was working the shift tend to be independent constituents in Connected Clauses, whereas they are more likely to form a single syntactic phrase in Relative Clauses. In other words, faithfulness constraints appear to be satisfied in these environments, even though this generates unbalanced prosodic phrases containing long modified DPs. This pattern is surprising under both phonological and processing accounts of balance constraints, which would push against long/heavy prosodic phrases. From a processing perspective on balance constraints, moreover, we might expect the higher complexity of clefted Relatives to encourage separate phrasing to ease syntactic processing (this is in essence the idea behind the Sausage Machine model and processing accounts of separate phrasing in multiple right-branched Relatives).
We suggest that the ban on extraposition discussed in Section 1.2 might provide a principled account for these results. As we have seen, extraposition is not an option for clefted Relatives. If separate phrasing is best understood as the result of extraposition, as in Wagner (2005, 2010), then the observed phrasing pattern can be readily explained: Separate phrasing will only be observed when extraposition is allowed, and single phrasing of Relatives and their host should be observed whenever this alternative syntax is not available.
Notice that the phrasing pattern observed here for clefted Relatives aligns with what was observed in a number of recent studies on the prosody of restrictive modifiers in other nested garden-path environments, including the Complement Clause vs. Relative Clause ambiguity (John told the woman that he was coping with the wait/to wait, Grillo et al., 2023, 2025), the main verb vs. reduced Relative ambiguity (the horse raced past the barn fell, Grillo et al., 2018), the goal vs. restrictive Prepositional Phrase ambiguity (put the horse in the barn on the truck, Snedeker & Trueswell, 2003 and Speer et al., 2011), and the Pseudo Relative vs. Relative Clause ambiguity in Italian (Gianni ha visto la ragazza che correva la maratona/John saw the girl (that was) running the marathon, Grillo & Turco, 2016). In each of these environments, nested modifiers are produced as a single prosodic phrase with their hosts, in line with (i.e., faithful to) their syntactic structure.
Grillo et al. (2025) suggest that this tendency can be accounted for using the Rational Speaker Hypothesis (Clifton et al., 2002, 2006; Frazier et al., 2006), which claims that speakers use prosody in an “internally consistent, rational, fashion, and that the listener assumes such rationality in interpretation” (Frazier et al., 2006, p. 246). Paraphrasing Frazier, Carlson, and Clifton, in this perspective, “if a speaker intends a structure where a constituent contains [the Relative Clause], she will not insert a prosodic boundary that separates the [Relative] from the rest of its constituent without good reason” (Frazier et al., 2006, p. 246). We thus expect that Relative Clauses in ambiguous environments will be produced as a single prosodic phrase with the Noun Phrase they modify. A prosodic break separating the nested material from its host would plausibly lead to mapping onto the incorrect parse, strengthening the garden-path effect. In sum, this account suggests that faithfulness constraints are upheld in the presence of ambiguity to avoid inducing a garden-path effect.
Going back to our contrast of interest, if speakers use prosody in an internally consistent and rational fashion, we would expect them to avoid producing a prosodic break between the Noun Phrase and the Relative Clause in the environment of clefts, as this prosody would naturally map onto the alternative Connected Clause structure which indeed shows separate prosodic phrasing for the Complementizer Phrase, in line with its higher attachment site.
Interestingly, we note that the ban on extraposition of clefted Relatives we discussed in Section 1.2 appears to apply more broadly to Relative Clause extraposition in the environment of other types of nested garden paths.17 Some examples are shown in (20). Similar to the case of clefted Relatives (20-a), each of these sentences involves linking an extraposed restrictive modifier to its host over some intervening material: the Complement Clause to go in (20-b), the goal Prepositional Phrase on the truck in (20-c), and the Main Verb fell in (20-d). In each case, extraposition seems to be heavily marked, if not completely unacceptable.
- (20)
- a.
- Q: Who called?
- A: *It was [the gardener] [CC that called] [RC that will paint the door].
- b.
- ??John told [a woman] [to go] [that he was talking to].
- c.
- ??John put [a horse]i on the truck [with a long tail]i.
- d.
- *A horse fell yesterday raced past the barn.
We follow Grillo et al.’s (2025) suggestion that this is at least in part due to a Rational Speaker’s attempt to avoid ambiguity. On the role of processing in extraposition, we agree with Wagner’s argument that “this relation to processing factors does not mean that extraposition does not form part of syntax proper, and the syntactic and semantic effects of extraposition observed in Hulsey & Sauerland (2006) clearly show that extraposition is not just a phonological [or processing] phenomenon” (Wagner, 2005, p. 127). The important result here is that syntax-prosody mismatches involving Relative Clauses appear to be more constrained than traditionally thought and that these constraints seem to be aligned with the degraded status of extraposition in the context of nested garden paths.
Having established that faithfulness constraints are satisfied in the environment of nested garden paths including clefted Relatives, one outstanding question is whether and how balance constraints affect the prosodic properties of nested structures in these environments. Grillo et al. (2025) suggest that balance constraints are not ignored in the domain of nested garden paths, but that they are still active and are satisfied as best as possible by modulating speech rate of the longer phrase. In other words, some kind of balance can be achieved by producing longer phrases with nested modifiers at a faster tempo than their prosodic sisters. This idea echoes Ghini’s (1993) Uniformity Principle:18
Principle of Uniformity (Ghini, 1993, p. 56): A string is ideally parsed into same length φs [phonological phrases]; the average weight of the φs depends on tempo: at an average rate of speech (moderato), a φ contains two phonological words; the number of Ws within a φ increases or decreases by one by speeding up or slowing down the rate of speech.
Support for Grillo et al.’s (2025) hypothesis comes from the durational properties of long DPs with nested Relative Clauses or other restrictive modifiers in the environment of nested garden paths, which are consistently produced at a faster tempo than more predictable and easier to parse string-identical phrases attached to a higher syntactic position (i.e., not nested).
If this account is on the right track, we would also expect clefted Relatives to display a faster tempo than Connected Clauses. However, this is hard to test given the particular design implemented here and the possible confounding factor of focus. In terms of design, to ensure grammatical disambiguation, we used Question-Answer pairs which involved repetition of the Complementizer Phrase in the Connected Clause condition but not in clefted Relatives. Direct comparison of the durational properties of the two structures is thus complicated by the well-known effect of repetition on duration (e.g., Fowler & Housum, 1987 and Fowler, 1988); that is, sentences with Connected Clauses should plausibly have been produced faster than normal because of repetition effects. In other words, disregarding any structural and interpretive effect, we should expect to observe shorter Connected Clauses than clefted Relatives solely because of repetition. This, however, is not quite what we observed: While the Complementizer Phrase is overall shorter in Connected Clauses than clefted Relatives, this effect is primarily driven by duration at NP2, where the nuclear accent falls in clefted Relatives (an effect which can be independently attributed to the NSR). Importantly, duration at the verb is not significantly different across conditions, despite this portion of the Complementizer Phrase also being repeated between question and answer. This raises the question of what might have compensated for the repetition effect at this region, canceling it. One obvious answer is that this is due to how the independent structural differences between the two structures modulate duration. If nested material is indeed produced at a faster tempo than material attached to a higher position, as suggested by Grillo et al. (2025), then it would make sense to expect the repetition effect to be matched (and canceled) by this structural effect. While more work is needed to further test this hypothesis, we take these results to be at least compatible with Grillo et al.’s (2025) predictions. Unfortunately, the very short baseline provided by the intro phrase (it was) makes it hard to compare the duration of each structure in relation to a baseline tempo, which would be the most informative value here. Grillo et al. (2025), in fact, would predict shorter relative duration for the embedded clause in the clefted Relative condition in (21-b) than in the Connected Clause (21-a):
- (21)
- a.
- John said that it was [the editor] [CC that was working the shift].
- b.
- John said that it was [the editor [RC that was working the shift]].
This prediction seems to be supported at an impressionistic level, but will have to be tested experimentally in future work.
In sum, more work is needed to establish the independent contribution of repetition, focus, and syntactic structure to the durational properties of clefted Relatives and Connected Clauses. Still, the observed differences in prosodic phrasing between clefted Relatives and Connected Clauses add to a growing body of literature showing i. that nested garden paths are prosodically disambiguated (contrary to previous assumptions, e.g., Fodor, 2002 and Wagner & Watson, 2010) and ii. that restrictive modifiers are not prosodically separated from their hosts in garden-path environments.
At the level of prominence, the localized effect of focus on sentence prosody suggests that the information status of a phrase does not directly determine its prominence, providing an answer to Question 2. Our results show that in Connected Clauses, the nuclear accent fell on the new element that is in focus (editor). As shown by the acoustic results, the prominence of the accent on the clefted constituent in Connected Clauses is increased by speakers by a delayed alignment of the peak, in line with studies on contrastive focus (as mentioned earlier; see also Kügler & Gollrad, 2015 and references therein). However, in clefted Relatives, speakers tended to place the nuclear stress on the last Noun Phrase shift, rather than having a balanced prominence across the whole clefted constituent the editor that was working the shift, despite the whole phrase carrying new information and being in focus. This is specifically shown by a mismatch in the structural effects for NP2 vs. Verb across duration, pitch, and (to a somewhat lesser extent) intensity: A larger effect of structure on the measurements was observed for NP2 relative to the Verb. This localized prominence suggests that the assignment of nuclear stress is in accordance with the Nuclear Stress Rule, indicating the existence of mediating linguistic/phonological representations when mapping between sound and meaning.
This pattern aligns with the predictions of indirect accounts as discussed in Section 1.3, but is less straightforward to explain from a direct perspective, which would seem to predict a broader effect of focus over the entire complex DP. Therefore, the current study adds more evidence to a much larger pattern supporting the need for metrical structures/representations to understand the relation between sound and meaning (see, e.g., Ladd, 2008 and Pierrehumbert, 2017 for recent discussions). The argument that indirect accounts allow for more specific predictions about the localization of an effect echoes previous work by White & Turk (2010) on the effects of prosodic boundaries on duration. They showed that pre-boundary lengthening is influenced by domain-edge effects in ways predicted by phonological accounts but not by direct approaches. Linguistic principles governing focal accent assignment (see e.g., Zubizarreta, 2016 for a recent review) indeed make even more specific predictions about the localization of the effect, which we aim to test in future work.19
It is worth noting that the observed accent on the last Noun Phrase in clefted Relatives does not display the prototypical properties of contrastive focus. There are two reasons for this argument. Theoretically, the context questions eliciting the clefted Relative readings in our production study (i.e., Which one of them was identified?) are quite broad and are not associated with a contrastive focus for a specific part in the answer (e.g., It was the editor that was working the shift). For instance, the question did not ask participants to identify a particular editor who was either working the shift or working the system (or working vs. leaving the shift). Instead, participants were expected to provide both editor and working the shift as new information that was not presupposed in the context questions. Empirically, if the accent on NP2 in clefted Relatives was contrastive focus, we would expect a positive F0 scaling in the clefted Relative condition, which would indicate a higher F0 max on NP2 compared to NP1. Nevertheless, our F0 scaling pattern showed an opposite pattern where for both structures, F0 max was higher on NP1 than NP2.
This, nevertheless, reflected one limitation of the current study, as our context questions in the production experiment did not strictly control for the focus structure of clefted Relatives. To address this open issue, we are planning a follow-up study that manipulates different context questions for the clefted Relative condition, providing more detailed and constrained contexts targeting different focus structures in the clefted Relatives. For instance, by changing the context, the semantic focus can fall on i. the whole head noun plus Relative Clause, ii. the Relative Clause itself, or iii. only on the Verb or the last Noun Phrase. Manipulating where focus (in information structure) should fall in nesting structures will provide insights into how phrasing and prominence interact, adding to our understanding of the complex interaction of syntax, semantics, and prosody.
Anecdotally, the processing difficulties with Clefted Relative Clauses observed in Guo et al. (2024a, 2025) were also visible in the context of the production study, where all participants initially seemed to struggle to get the relevant reading. Although participants generally adapted quickly once they understood the sentence, sometimes they seemed inclined to produce a Connected Clause reading for a Relative Clause Question-Answer pair. The divergent prosodic patterns of clefted Relatives compared to Connected Clauses in Experiment 1 might offer an additional explanation for the processing difficulty associated with clefted Relatives. We would argue that these and other types of nested garden paths are hard to recover from in part because they require revision of the implicit prosodic representation, in line with Bader’s (1998) Prosodic Constraint on Reanalysis: Revising a syntactic structure is difficult if it necessitates a concomitant reanalysis of the associated prosodic structure (Bader, 1998, p. 8).20
However, the results of our auditory comprehension study (Experiment 2) suggest that such processing difficulty can be alleviated by prosodic disambiguation. Specifically, we found that listeners are sensitive to the prosodic differences between Connected Clauses and clefted Relatives observed in Experiment 1. This is supported by the main effect of Prosody, with overall higher acceptability judgments for cooperative than conflicting prosody across both Structures. Crucially for us, clefted Relative Clauses are found to be as acceptable as Connected Clauses in the presence of Cooperating prosody, which suggests that prosodic disambiguation can cancel the garden-path effects observed in Guo et al. (2024a, 2025) with clefted Relatives.
Interestingly, while Cooperative prosody is preferred over Conflicting prosody across both types of structures, we also observed an interaction between Context and Prosody. This interaction shows that listeners were more likely to correctly reject utterances with the Connected Clause prosody in a context that elicited clefted Relatives and were more tolerant of a clefted Relative prosody in contexts eliciting a Connected Clause. One possible account of this interaction is that prosody may play a more crucial role in the disambiguation of clefted Relatives. This is possibly due to the higher processing difficulty associated with clefted Relatives compared to Connected Clauses. As a result, listeners may have stronger expectations for prosodic cues to reliably signal that the upcoming structure is a clefted Relative. When these expectations are not precisely met, they are more likely to reject the prosody. An alternative explanation is that the prosody of Connected Clauses is more marked and harder to accommodate than that of clefted Relatives, possibly because it involves a clear prosodic boundary between the head noun and the Relative Clause, which mismatches the syntactic structure, violating the assumption that speakers behave rationally when producing prosodic breaks.
While informative, the current results also have two main methodological limitations. First of all, it would be desirable to replicate the results of the Planned Production study in a more naturalistic setting. The main advantage of Planned Production is that it allows the experimenter to collect phonetic data in extremely well-controlled conditions. This is especially suitable when trying to elicit minimal pairs of complex ambiguous sentences, such as the nested garden paths tested here, which are unlikely to be encountered out of the lab.21 Nevertheless, it is well known that reading affects prosody in non-trivial ways (for more discussion on this, see e.g., Swerts et al., 1996 and de Ruiter, 2015). This limits our ability to generalize the observed results. To address this limitation, in the future, we plan to replicate Experiment 1 using a more natural game setting, where participants have to instruct each other about moving objects (e.g., it is [the square [RC that is on the circle]], with the Connected Clause that you should move being optionally omitted in production) or describing the position of such objects (e.g., it is [the square] [CC that is on the circle]).
Second, as Experiment 2 is an offline acceptability judgment study, although it provides us with important information on the role of prosody in the disambiguation of clefted Relatives, it cannot provide information about the timing of this disambiguation. Thus, it cannot address the question of the relative contribution of the different prosodic cues identified in our production study. In other words, we know that the prosodic differences we identified are perceptually relevant, but we need to use different online methods to assess how early listeners converge on the correct parse and which prosodic cues are more reliably picked up by listeners when resolving this type of ambiguity.
To address this second issue, we are planning a follow-up study using the visual-world paradigm in eye-tracking. Online methods can tell us not just whether explicit prosody disambiguates nested Relatives, but also how early this information becomes available to the parser. This is particularly important considering that our production results showed prosodic differences as early as the unambiguous Noun Phrase.22 For example, if listeners can use early prosodic cues such as pre-boundary lengthening and peak alignment at NP1 to disambiguate structures with different attachment heights, we would expect to see detectable differences in their eye movements to different visual targets shortly after hearing the first Noun, reflecting real-time structural interpretation.
Despite these limitations, the results of Experiment 2 provide a positive answer to the broad question regarding prosodic disambiguation and further corroborate the idea that prosody plays a central role in the disambiguation of nested garden paths. As such, these results should be seen as part of a larger project investigating the prosodic disambiguation of nested garden-path sentences.
5. Conclusions and outlook
Through two experiments, we investigated the prosody of the previously understudied structural contrast between string-identical Connected Clauses vs. Relative Clauses. Results from production revealed clear differences in prosodic patterns between the two structures in terms of duration, F0 range, and intensity, observable as early as the Noun Phrase preceding the ambiguous region. For both structures, we observed alignment of the prosodic structure with the underlying syntactic structure. Clefted nouns and Connected Clauses are produced as separate prosodic phrases (indexed by pre-boundary lengthening at the noun). Clefted Relatives, on the other hand, are consistently produced as a single prosodic phrase with the noun they modify. This phrasing pattern is particularly interesting given that Relative Clauses can be separated from their hosts in other non-nested environments. This suggests that syntax-prosody mismatches involving Relatives are much more constrained than generally assumed and that faithfulness constraints prevail over balance constraints in the environment of nested garden paths, as previously suggested in Grillo et al. (2025). These results can also be taken to provide a novel argument for Wagner’s (2005) extraposition account of apparent syntax-prosody mismatches involving Relative Clauses. In terms of prominence, a localized prosodic difference between the two structures was found at the last Noun Phrase (i.e., the most deeply nested word in clefted Relatives). Such localized effects are more in line with the predictions of indirect accounts of the mapping between semantics and prosody, as they are best captured by making reference to a metrical grid. In future work, we plan to replicate these results using a more natural game setting to elicit the target sentences while also comparing the prosody of clefted Relatives under different focus structures.
Our results from comprehension show that these prosodic differences are exploited by listeners to overcome the garden-path effects triggered by Clefted Relatives in the absence of prosody. While these results show that listeners are sensitive to these prosodic distinctions, we still lack information about the timing of this disambiguation in comprehension. In future work, we plan to test this disambiguation in the visual-world paradigm to establish how early and on the basis of which prosodic cues listeners converge on the Relative Clause.
In conclusion, our findings present a case for investigating the prosodic properties of nested garden paths. These constructions provide new insights into the syntax-prosody mapping and, in the case of clefted Relatives, also into the role of intermediate levels of representation in the prosodic realization of information structure.
Appendix
A. Additional measurements for Experiment 1
Figure 1 shows time-normalized intonation contours of clefted Relative Clauses and Connected Clauses productions from Experiment 1, expressed in semitones relative to 100 Hz. F0 measurements were extracted using a Praat script that identified labeled intervals in the TextGrid and divided each interval into 20 equal-length sub-intervals, resulting in 21 pitch values per word. The extracted F0 values were then converted to semitones and visualized in R (v. 4.3.1). The contours reflect mean values across participants and items. The divergence between the two contours, especially at NP2 (e.g. scene), illustrates distinct prosodic patterns associated with the two structures.
Figure 2 illustrates the by-participant F0 Scaling patterns in the two structures.
By-participant mean F0 scaling (in st) for Connected Clause (CC) and clefted Relative Clause (RC) productions in Experiment 1. Positive values indicate a higher F0 peak on NP2 than NP1; negative values indicate a lower peak on NP2 than NP1; zero indicates equal peak values for the two regions.
B. List of experimental items from Experiment 1
Response sentences from experimental items used in Experiment 1. Every sentence was preceded by a context question that either elicits a Connected Clause or clefted Relative Clause reading of the sentence. For example, for item 1, the question of the Connected Clause condition is Who was leading the talk?, while the clefted Relative Clause condition is Which one of them was identified?.
It was the manager that was leading the talk.
It was the monarchist that was rowing the boat.
It was the murderer that was locking the door.
It was the maniac that was waiting the trial.
It was the auditor that was dealing the drug.
It was the humorist that was leaving the scene.
It was the governor that was running the gang.
It was the moralist that was guarding the truth.
It was the messenger that was walking the dog.
It was the modeler that was humming the tune.
It was the editor that was working the shift.
It was the royalist that was hunting the deer.
It was the bureaucrat that was reading the form.
It was the mortgager that was viewing the file.
It was the novelist that was riding the mare.
It was the officer that was writing the case.
It was the lecturer that was moving the desk.
It was the journalist that was launching the book.
It was the analyst that was heating the room.
It was the medalist that was hiding the score.
It was the carpenter that was making the tool.
It was the gardener that was loading the car.
It was the loyalist that was holding the sign.
It was the lumberer that was learning the trade.
C. List of experimental items from Experiment 2
Context questions and audio stimuli in written version from experimental items used in Experiment 2. Participants either read a context eliciting a Connected Clause answer (CC Context) or a context targeting a clefted Relative Clause answer (RC Context), and then listened to an audio stimulus producing the sentence with either a Connected Clause or clefted Relative Clause prosody.
-
CC Context: You just started working in a big company and you were confused as to who was in charge of your first meeting, so you asked an experienced colleague: Who was leading the talk?
RC Context: You just joined the company where your friend works, which has two managers. One is very vocal and deals with operations and the other very silent and tends to take a less prominent role. After a meeting, you want to know which one your friend had trouble with, so you ask: Which one of them did you have issues with?
Sentence: It was the manager that was leading the talk.
-
CC Context: The Reality TV show you were watching with your friend was chaotic, there was a variety of famous people doing weird things. You got distracted and lost track of who was doing what, so you asked your friend: Who was rowing the boat?
RC Context: You went down to the river bank with your friend and there were two famous monarchists, one was clapping non-stop while the other worked really hard. Back at home, your friend mentioned that one of them was in a TV show he happened to work on and you wanted to know who, so you asked: Which one of them was on the TV show?
Sentence: It was the monarchist that was rowing the boat.
-
CC Context: You and your friend were watching a horror film that he had already watched several times. There was a scene where the front door of the house was suddenly shut from outside accompanied by the sounds of keys. After a while you got really curious and wanted to know who did that, so you asked your friend: Who was locking the door?
RC Context: You and your friend were watching a horror film in which two murderers acted as a killing team. At one point one of them is holding the victim while the other is securing the house. You lost track of things and can’t remember which one of them was married to the policewoman, so you asked: Which one of them was married to the policewoman?
Sentence: It was the murderer that was locking the door.
-
CC Context: In the past few weeks, there had been several arrests of hardened criminals in your town. The local newspaper you read this morning said that a major criminal trial was starting today, but no details were given. Later on at the local courthouse, you forgot who was related to that particular case, so you asked your friend who works there: Who was awaiting the trial?
RC Context: There were two notorious maniacs in your town, one of them was already captured and another was still at large. You heard that one of them had died suddenly but you didn’t catch the details, so you asked your friend: Which one of them died?
Sentence: It was the maniac that was awaiting the trial.
-
CC Context: You were playing a card game at a company event and you didn’t realize until after the last round that the person who shuffled is not from your office group, so you ask an experienced colleague: Who was dealing the cards?
RC Context: You and your colleague were playing a card game at a company event and met two auditors from another group. One of them joined the game, the other one said he’d rather just watch. After the game, your friend told you that one of them used to live with him, so you asked: Which one of them did you use to live with?
Sentence: It was the auditor that was dealing the cards.
-
CC Context: You were watching a musical with your friend. There was a commotion in the seats around you that distracted you as one of the performers was escorted off stage, so you asked your friend: Who was leaving the scene?
RC Context: You were taking a walk with your friend when you saw two popular humorists from your town when all of a sudden there was a car accident. One the humorists, stayed put and called the police, the other one was rushing away. Later on you were wondering which one of them performed in the most recent musical, so you asked: Which one of them was in the musical we saw last week?
Sentence: It was the humorist that was leaving the scene.
-
CC Context: You and your friend have been watching a TV series about politics. Yesterday the trailer indicated that the secret leader of a criminal group would be revealed in today’s episode. However, you missed the show, so later that night you asked your friend: Who was running the gang?
RC Context: You and your friend have been watching a TV series about politics. In yesterday’s episode there were two governors having conflicts with each other. One of them was secretly leading a criminal group while the other focused on reforming criminal law. You missed the new episode today and when your friend mentioned a scene with FBI questioning, you asked your friend: Which one of them was interrogated?
Sentence: It was the governor that was running the gang.
-
CC Context: Your friend was covering a beauty pageant for the local newspaper and told you that there was an important guest counting the vote. You are curious, so you ask: Who was counting the vote?
RC Context: Today you went to a local election event with your friend. It was crowded and there were also two congressmen present. One of them was talking to the volunteers while another kept tallying the number of ballots. On the way back, you wanted to know which one of them had worked with your friend, so you asked: Which one of them did you work with?
Sentence: It was the congressman that was counting the vote.
-
CC Context: As a newcomer to an advertising company and you don’t know many people there yet. One day, while you are hanging out with one of your colleagues who has worked there for a long time, you saw somebody from work passing by with a Golden Retriever, and out of curiosity, you asked your friend: Who was walking the dog?
RC Context: You were hanging out with your colleague from your big advertising company when two of the company’s messengers walked by. One of them was carrying a heavy bag and the other had a Golden Retriever. After they left, you remembered that years ago your friend said he dated one of them but you can’t remember clearly, so you asked: Which one them did you use to date?
Sentence: It was the messenger that was walking the dog.
-
CC Context: You are attending a workshop with your friend. You heard someone produce the melody of your favourite song. However, without glasses you can’t see clearly, so you asked your friend: Who was humming the tune?
RC Context: You and your friend were attending a 3D printing workshop where there were two instructing modelers. During the practice session, one kept producing the same melody while the other was very quiet. After the workshop, you remembered that your friend said that one of them was his roommate. You wanted to know who, so you asked your friend: Which one did you use to live with?
Sentence: It was the modeler that was humming the tune.
-
CC Context: There was a severe incident last night at your printing company. As the site manager, you need to know all the details, but you can’t remember who was in at the time, so you asked your colleague: Who was working the shift?
RC Context: The section of the newspaper you recently joined has two editors, one of them has been on leave for the past two months and the other you met just as they were finishing-up for the day. As you enter the office kitchen, you hear that one of them got a prize. You want to know who so you ask the colleague: Who got a prize?
Sentence: It was the editor that was working the shift.
-
CC Context: You and your friend are watching a political debate on animal welfare on TV and you remember that your friend’s newspaper recently published compromising photographs of a number of the debaters. You remember one of the debaters was involved in hunting, but you can’t remember who, so you asked your friend: Who was hunting the deer?
RC Context: Last week you and your journalist friend watched a political debate with guests from the whole political spectrum, including two royalists. Both of them claimed to be animal activists but one of them was photographed hunting in the forest. Your friend mentioned today that their newspaper interviewed one of them and you wanted to know which one, so you asked: Who did your company interview?
Sentence: It was the royalist that was hunting the deer.
-
CC Context: Your friend submitted an application to the town hall two months ago. This type of application has to be read and discussed publicly, so it can take a long time before they are approved. Yesterday, your friend told you that the application was finally accepted. You are happy it’s finally sorted and curious about the process, so you ask: Who was reading the form?
RC Context: Today you accompanied your friend to a government office where you saw two bureaucrats behind the desk. One of them was chatting with an intern, while another was checking some documents. After you left, you want to know which one was dealing with your friend’s application, so you asked: Which one of them was in charge of your acceptance?
Sentence: It was the bureaucrat that was reading the form.
-
CC Context: As a lawyer, you are responsible for cases related to private lending issues. One day, coming back from your lunch break, you realize that a sensitive document that was on your desk had been taken by somebody for review. To ensure the confidentiality of the case, you asked one of your colleagues: Who was viewing the file?
RC Context: You went to your friend’s office today to have coffee with him and you saw two mortgagers having an argument: one of them was keeping busy going through a document while the other kept asking for attention and criticizing them. At the cafe, you asked your friend: Which one of them started the fight?
Sentence: It was the mortgager that was viewing the file.
-
CC Context: Your friend invited you and some of their colleagues to an afternoon at a country farm. After the introductions, someone excitedly rushed off to try out horseback riding, but soon after fell off. You ask your friend: Who was riding the mare?
RC Context: Your friend invited you and some colleagues to a social event at a horseback riding center. Amongst them were two famous novelists, one that spent the day sitting in the sun chatting with people and the other spent the day riding. During the dinner, you wanted to know which one your friend once admired, so you asked: Which one did you admire when you were at university?
Sentence: It was the novelist that was riding the mare.
-
CC Context: You are assigned to review an old lawsuit. You want to know as much details as possible, so you ask the managing partner: Who was writing the case?
RC Context: You have been robbed while visiting a friend’s town and he takes you to the police station. There are only two officers there. One of them helped filling the paperwork while the other was making phone calls. After you left, your friend said that one of them was once in a reality TV show. So you ask: Which one of them was on TV?
Sentence: It was the officer that was writing the case.
-
CC Context: You fell asleep during a class. However, your dream was interrupted by some grating noise. You are really confused and sleepy, so you ask your classmate nearby: Who was moving the desk?
RC Context: You and your friend volunteered to organise an event in your department this afternoon. Two lecturers also came to help with the venue setup, one of them focused on wall decoration and the other helped rearranging the furniture. You wanted to know which one they used to have problems with, so you asked: Which one of them did you have issues with?
Sentence: It was the lecturer that was moving the desk.
-
CC Context: Your office recently had a job cut and you have to take over some advertising work in the publishing section. To your surprise, you found that the new edition you were assigned to work on has already been thoroughly advertised. You wanted to know who did such an excellent job, so you asked your colleague: Who was launching the book?
RC Context: Two journalists in your office were recently laid off. On their last day of work, you and your colleague saw one of them conducting online interviews, while the other one was finalising the advertising campaign for the new edition of a famous novel. On your way home, your colleague mentioned that a renowned company just offered a position to one of them. Out of curiosity, you ask: Who was offered the job?
Sentence: It was the journalist that was launching the book.
-
CC Context: It was freezing today so you were worried the office would be cold as well. However, after entering the office you found it was actually very warm, although the heater was off. You wonder who was so kind to warm up the office, so you ask your colleague who alway arrives early: Who was heating the room?
RC Context: You just joined a research group on a new project. One day, one of the two analysts in the group kept adjusting the thermostat to ensure all were warm, while the other one was really hungry and focused on eating. After lunch, you want to know which one of them was in charge of the previous project, so you ask a colleague nearby: Who was the former project leader?
Sentence: It was the analyst that was heating the room.
-
CC Context: In a high school sports competition, you noticed that there was a sudden uproar near the scoreboard because it was blocked by someone you couldn’t see very well. After the crowd went away, you asked your friend next to you: Who was hiding the score?
RC Context: Your friend’s sports group proudly has two state medalists in basketball this year. When attending their informal competition today, you saw one of them surprisingly block the scoreboard for nearly one minute on purpose, while the other continued to dribble. After the game, you wanted to know which one of them won the competition last year, so you asked your friend: Who was the champion last year?
Sentence: It was the medalist that was hiding the score.
-
CC Context: You and your friend are shopping in a big furniture store. You went to the garden section and your friend to the woodworking section. When you reunited, your friend said that there was a special demonstration and give-away of hand-made instruments built on site which can be used for assembling furniture without too much effort. You didn’t see the process, so you ask him: Who was making the tool?
RC Context: Today you and your friend participated in a workshop on making carpentry instruments. The workshop was delivered by two carpenters. One of them gave verbal instructions, while the other demonstrated the physical process. When walking back home, you wanted to know which one had advised your friend on how to fix their chair, so you asked: Who advised you on how to repair your chair?
Sentence: It was the carpenter that was making the tool.
-
CC Context: You friend is going on a road trip tonight so you came to their house to help them pack. When you arrived, you were surprised to see someone else carrying luggage to the already half-filled car. You wanted to know who was helping him, so you asked: Who was loading the car?
RC Context: While walking with your friend, you saw two gardeners busy in front of your neighbour’s house, one of which was taking boxes out of the house while the other was packing them into your neighbour’s jeep. Afterwards, your friend said that one of them previously happened to work in their home. You wanted to know which one so you asked: Which one worked for you?
Sentence: It was the gardener that was loading the car.
-
CC Context: Your friend is an activist and he convinced you to join him at a demonstration. There were lots of very different people marching with you and you were a bit confused by the different affiliations. Back home, you remembered a strange guy standing alone in the middle of the square with a big board and cryptic words on it, so you asked your friend: Who was holding the sign?
RC Context: While jogging this morning, you and your friend saw two loyalists on the street trying to publicise a new activity. One of them was carrying a big board with words on it, while the other was talking with a speaker. After getting back home, your friend mentioned that he once met one of them in a restaurant and had a chat. You want to know which one so you ask: Which one did you meet at a restaurant?
Sentence: It was the loyalist that was holding the sign.
-
CC Context: You are a new member of a landscaping company. This morning your colleague is giving you a tour of the company and you observed an old introductory manual on the table. You want to know who that belonged to, so you ask: Who was learning the trade?
RC Context: Today you and your colleague are visiting a landscaping company where you saw two lumberers near the field. One of them looked very experienced, while the other was still in the training phase. After you left the field, your colleague told you that he once lived next to one of them. You want to know more, so you ask: Which one used to be your neighbour?
Sentence: It was the lumberer that was learning the trade.
Acknowledgments
Preliminary versions of this work were presented at Speech Prosody 2024 in Leiden, LabPhon19 in Seoul. We thank conference organizers, reviewers, participants and especially our discussant, Michael Wagner, for their comments and suggestions. We also thank the Laboratory Phonology editors and reviewers for their constructive comments and suggestions for improvement. The usual disclaimers apply.
Funding information
This work was jointly funded by the University of York (Psycholinguistics PhD Grant 2022–2025) and the French Investissements d’Avenir-Labex EFL program (ANR-10-LABX-0083), contributing to the IdEx Université Paris Cité – ANR-18-IDEX-0001.
Competing interests
The authors have no competing interests to declare.
Author contributions
Conceptualization: Buhan Guo, Nino Grillo, Giuseppina Turco, and Andrea Santi. Experimental Design: B. Guo, N. Grillo, Sven Mattys, A. Santi, Shayne Sloggett and G. Turco. Formal analysis, data curation, visualization: B. Guo, G. Turco, S. Sloggett and N. Grillo. Investigation: B. Guo, N. Grillo and G. Turco. Writing – Original draft: B. Guo, N. Grillo and G. Turco. Writing – Review and Editing: A. Santi, S. Sloggett and S. Mattys. Supervision and funding acquisition: N. Grillo, S. Mattys and G. Turco.
Author note
A preliminary version of Experiments 1 and 2 was presented at Speech Prosody 2024 (Leiden, July 2–5) and published in the conference proceedings (Guo et al., 2024b).
Notes
- The only work we were able to identify on this topic is Schachter (1973), which contains a brief discussion on the prosodic differences between the two structures based exclusively on impressionistic judgments. [^]
- When we use the word “focus,” we refer specifically to semantic focus, not prosodic focus, unless otherwise stated. [^]
- The sentence in (4) should not be confused with other, only apparently similar, sub-types of clefts in which the Complementizer Phrase carries new information but is not itself clefted, as in informative-presupposition it-clefts in (i) (Prince, 1978, also dubbed new-presupposition it-clefts in Collins, 2006). Nor should it be confused with standard it-clefts where the non-clefted constituent contrasts with recently activated discourse information (as in (ii) discussed in Gussenhoven, 2007). These constructions are structurally and interpretively distinct from the clefted Relative Clauses used in our study: Their Complementizer Phrases are neither clefted nor restrictive, as shown by their compatibility with proper names. Importantly, our experimental materials were designed to preclude these alternative readings. For a detailed discussion of different sub-types of clefts/pseudo-clefts, see e.g., Prince (1978), Higgins (1973), Declerck (1984), Lambrecht (2001) and Collins (2006).
- (i)
- informative-presupposition it-cleft:
- It was just about 50 years ago that Henry Ford gave us the weekend.
[^]- (ii)
- it-cleft with a reactivated clause:
- A: Does Helen know john?
- B: It is John (that) she dislikes.
- While Arnhold’s study provides a detailed comparison of the prosody of it-clefts and focus in unmarked syntax, it remains unclear what the prosodic patterns associated with clefts carrying a “broad-focus” are. This is because no acoustic analysis was performed for this type of clefts. We also note that more work might be needed to define the informational content of the answer in the broad context of questions like what happened. Our impression is that they do not exactly provide an answer to the question, as a preamble to an answer. For example, they seem appropriate in a situation in which we report who was on the phone, rather than what happened, e.g., Q: What happened? A: It was Charles/It was the royal who was buying an island. He says he won’t be coming to work today. – Oh!, did he ever? Under this interpretation, they are actually aligned with the clefted Relatives under discussion in this paper. [^]
- Throughout the paper we use nested garden-paths to refer to a class of locally ambiguous sentences characterized by a contrast between a nested reading—exemplified by the clefted Relative currently under discussion—and an alternative interpretation in which the ambiguous string attaches higher in the syntactic structure, as in the case of our Connected Clauses. This family includes classic ambiguities such as the Main Verb/Reduced Relative Clause ambiguity (e.g., The horse raced past the barn [and] fell) and the Complement Clause/Relative Clause ambiguity (e.g., John told the woman that he was coping with the wait / to wait). On nested garden paths see e.g., Bever (1970); Frazier & Rayner (1982); Frazier & Clifton Jr. (1996); Altman & Steedman (1988); Crain & Steedman (1985); Pickering & Van Gompel (2006); Sedivy (2002); Grodner et al. (2005); Grillo et al. (2025). [^]
- In fact, Guo et al. (2025) show that this garden-path effect is surprisingly persistent in the presence of supporting context. [^]
- There is ongoing debate about what specific domain the phonological processes take place in, whether directly on syntax or through the intermediate prosodic structure. Distinguishing these accounts is beyond the scope of this paper. See Elfner (2018) and Bennett and Elfner (2019) for recent overviews of the syntax–phonology interface. [^]
- The source of what we will keep referring to as balance or eurhythmic constraints, whether cognitive, physiological, or phonological, is beyond the scope of this paper. What matters here is that such constraints exist and systematically influence prosodic phrasing. [^]
- It has been argued that this is due to the burden these long phrases impose on processing and articulation. For discussion of this point, see Chomsky and Halle, 1968, p. 372, Wagner, 2005, p. 127 and discussion below. Notice that the argument that these factors might modulate the likelihood of breaking down long syntactic constituents into smaller prosodic phrases does not necessarily imply that there are no underlying grammatical factors at play here. We go back to this point in the Discussion section. [^]
- One might question whether the extraposed Relative Clause in (13) can still be considered a clefted Relative Clause, i.e., under a base-generation account of extraposition, one could argue that this Relative Clause was never clefted. This, however, is irrelevant for the point we are making, namely that extraposed relatives cannot modify a clefted Determiner Phrase. [^]
- Breen et al. (2010) manipulated the information structure of simple subject–verb–object sentences with unmarked syntax, varying between broad and narrow focus. Their production data showed that speakers consistently marked both the location and breadth of focus through increased intensity, longer duration, and higher mean and maximum F0. [^]
- See Ladd (2008), Arvaniti & Ladd (2009) and Pierrehumbert (2017) for additional arguments supporting the need for metrical structure/representations mediating sound and meaning. [^]
- It is obviously possible to construct ambiguous sentences with Prepositional Phrases (e.g., John ate pizza with the fork/with good appetite/with his friends/with a nice glass of wine), including globally ambiguous ones (e.g., John saw the man with the binoculars). Relative Clauses, however, are much more flexible in terms of length and internal structure than Prepositional Phrases. [^]
- See Poschmann & Wagner (2016) on the importance of using grammatical disambiguation when investigating the prosody of Relative Clauses. [^]
- Code and data are available on OSF repository https://osf.io/rz96p/. [^]
- See Figure 2 in Appendix A for by-participant F0 scaling patterns. As shown in the figure, despite some individual differences in the size of the structural effects, all participants consistently showed negative mean F0 scaling values. This indicates that the F0 peak on NP2 was consistently lower than that on NP1 across both conditions. [^]
- That this restriction is particularly strong in the context of clefts is perhaps not surprising under an extraposition analysis of Connected Clauses (e.g., Reeve, 2010). If Connected Clauses are indeed extraposed Relatives, as Reeve and others suggest, then extraposing clefted Relatives would map exactly onto the incorrect structure and would naturally be strongly avoided by a rational speaker. [^]
- Note that in Ghini’s proposal, balance constraints apply to abstract prosodic units and not their phonetic realization/physical duration. Changes in tempo affect the number of phonological words and thus, indirectly, ensure better balance between sister prosodic phrases. [^]
- One obvious limitation of this study, for example, is that the most embedded noun in our utterances is monosyllabic. This prevents us from checking whether the effect of prominence is localized to the stressed syllable. [^]
- On implicit prosody see also e.g., Fodor (2002) and Frazier & Gibson (2015). [^]
- We do not mean to imply that clefted Relatives cannot be encountered out of a lab setting. In fact, we believe they are perfectly ordinary sentences. What is difficult is to obtain recordings of string-identical Connected Clauses and clefted Relatives with phonetically and acoustically controlled characteristics produced outside of the lab. [^]
- See also Grillo & Turco (2016); Grillo et al. (2018, 2023) for similar results showing early prosodic disambiguation of other types of nested garden paths. [^]
References
Akmajian, A. (1970). Aspects of the grammar of focus in English. [Doctoral dissertation, Massachusetts Institute of Technology].
Altman, G., & Steedman, M. (1988). Interaction with context during human speech comprehension. Cognition, 30, 191–238.
Arnhold, A. (2021). Prosodic focus marking in clefts and syntactically unmarked equivalents: Prosody–syntax trade-off or additive effects? Journal of the Acoustical Society of America, 149(3), 1390–1399.
Arvaniti, A., & Ladd, D. R. (2009). Greek wh-questions and the phonology of intonation. Phonology, 26(1), 43–74. http://doi.org/10.1017/S0952675709001717
Atlas, J. D., & Levinson, S. C. (1981). It-clefts, informativeness and logical form: Radical pragmatics (revised standard version). In P. Cole (Ed.), Radical pragmatics (pp. 1–62). Academic Press.
Augurzky, P. (2005). Attaching relative clauses in German: The role of implicit and explicit prosody in sentence processing. [Doctoral dissertation, University of Leipzig].
Bader, M. (1998). Prosodic influences on reading syntactically ambiguous sentences. In J. D. Fodor & F. Ferreira (Eds.), Reanalysis in sentence processing (pp. 1–46). Springer. http://doi.org/10.1007/978-94-015-9070-9_1
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. http://doi.org/10.18637/jss.v067.i01
Bennett, R., & Elfner, E. (2019). The syntax–prosody interface. Annual Review of Linguistics, 5(1), 151–171. http://doi.org/10.1146/annurev-linguistics-011718-012503
Bever, T. (1970). The cognitive basis for linguistic structures. In J. R. Hayes (Ed.), Cognition and the development of language (pp. 279–362). Wiley.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9), 341–345.
Bourgoin, C., O’grady, G., & Davidse, K. (2021). Managing information flow through prosody in it-clefts. English Language & Linguistics, 25(3), 485–511.
Breen, M., Fedorenko, E., Wagner, M., & Gibson, E. (2010). Acoustic correlates of information structure. Language and Cognitive Processes, 25(7), 1044–1098. http://doi.org/10.1080/01690965.2010.504378
Breen, M., Watson, D., & Gibson, E. (2011). Intonational phrasing is constrained by meaning, not balance. Language and Cognitive Processes, 26(10), 1532–1562.
Büring, D. (2013). Syntax, information structure, and prosody. In M. den Dikken (Ed.), The Cambridge handbook of generative syntax (pp. 860–896). Cambridge University Press.
Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press.
Chomsky, N., & Halle, M. (1968). The sound pattern of English. Harper & Row.
Christensen, R. H. B. (2023). Ordinal—Regression models for ordinal data. (R package version 2023.12-4) [Computer software manual].
Clifton, C. J., Carlson, K., & Frazier, L. (2002). Informative prosodic boundaries. Language and Speech, 45(2), 87–114. http://doi.org/10.1177/00238309020450020101
Clifton, C. J., Carlson, K., & Frazier, L. (2006). Tracking the what and why of speakers’ choices: Prosodic boundaries and the length of constituents. Psychonomic Bulletin & Review, 13, 854–61. http://doi.org/10.3758/BF03194009
Collins, P. (1991). Cleft and pseudo-cleft constructions in English. Routledge. http://doi.org/10.4324/9780203202463
Collins, P. (2006). It-clefts and wh-clefts: Prosody and pragmatics. Journal of Pragmatics, 38(10), 1706–1720.
Cooper, W. E., Eady, S. J., & Mueller, P. R. (1985). Acoustical aspects of contrastive stress in question–answer contexts. Journal of the Acoustical Society of America, 77(6), 2142–2156. http://doi.org/10.1121/1.392372
Cooper, W. E., & Paccia-Cooper, J. (1980). Syntax and speech. Harvard University Press.
Crain, S., & Steedman, M. (1985). On not being led up the garden path: The use of context by the psychological syntax processor. In D. R. Dowty, L. Karttunen, & A. M. Zwicky (Eds.), Natural language parsing: Psychological, computational, and theoretical perspectives (pp. 320). Cambridge University Press.
Davidse, K. (2000). A constructional approach to clefts. Linguistics, 38(6), 1101–1131. http://doi.org/10.1515/ling.2000.022
de Ruiter, L. E. (2015). Information status marking in spontaneous vs. read speech in story-telling tasks – Evidence from intonation analysis using GToBI. Journal of Phonetics, 48, 29–44. http://doi.org/10.1016/j.wocn.2014.10.008
Declerck, R. (1983a). Predicational clefts. Lingua, 61(1), 9–45. http://doi.org/10.1016/0024-3841(83)90023-2
Declerck, R. (1983b). ‘It is Mr. Y’ or ‘he is Mr. Y’? Lingua, 59(2–3), 209–246.
Declerck, R. (1984). The pragmatics of it-clefts and wh-clefts. Lingua, 64(4), 251–289.
Declerck, R. (1988). Studies on copular sentences, clefts and pseudo-clefts. Leuven University Press; Foris Publications. http://doi.org/10.1515/9783110869330
Delin, J. (1992). Properties of it-cleft presupposition. Journal of Semantics, 9(4), 289–306. http://doi.org/10.1093/jos/9.4.289
Delin, J. (1995). Presupposition and shared knowledge in it-clefts. Language and Cognitive Processes, 10(2), 97–120. http://doi.org/10.1080/01690969508407089
Den Dikken, M. (2013). Predication and specification in the syntax of cleft sentences. In K. Hartmann & T. Veenstra (Eds.), Cleft structures (pp. 35–70). John Benjamins.
Eady, S. J., & Cooper, W. E. (1986). Speech intonation and focus location in matched statements and questions. Journal of the Acoustical Society of America, 80(2), 402–415. http://doi.org/10.1121/1.394091
Elfner, E. (2018). The syntax–prosody interface: Current theoretical approaches and outstanding questions. Linguistics Vanguard, 4(1), 20160081. http://doi.org/10.1515/lingvan-2016-0081
Féry, C., & Krifka, M. (2008). Information structure: Notional distinctions, ways of expression. In P. van Sterkenburg (Ed.), Unity and diversity of languages (pp. 123–136). John Benjamins.
Fodor, J. D. (1998). Learning to parse? Journal of Psycholinguistic Research, 27, 285–319.
Fodor, J. D. (2002). Prosodic disambiguation in silent reading. In Proceedings of North East Linguistic Society 32 (pp. 113–132).
Fowler, C. A. (1988). Differential shortening of repeated content words produced in various communicative contexts. Language and Speech, 31(4), 307–319.
Fowler, C. A., & Housum, J. (1987). Talkers’ signaling of “new” and “old” words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language, 26(5), 489–504. http://doi.org/10.1016/0749-596X(87)90136-7
Frazier, L., Carlson, K., & Clifton, C. (2006). Prosodic phrasing is central to language comprehension. Trends in Cognitive Sciences, 10(6), 244–249. http://doi.org/10.1016/j.tics.2006.04.002
Frazier, L., & Clifton Jr., C. (1996). Construal. MIT Press.
Frazier, L. & Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model. Cognition, 6(4), 291–325. http://doi.org/10.1016/0010-0277(78)90002-1
Frazier, L., & Gibson, E. (2015). Explicit and implicit prosody in sentence processing: Studies in honor of Janet Dean Fodor. Springer.
Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178–210. http://doi.org/10.1016/0010-0285(82)90008-1
Gee, J. P., & Grosjean, F. (1983). Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15(4), 411–458. http://doi.org/10.1016/0010-0285(83)90014-2
Ghini, M. (1993). Phi-formation in Italian: A new proposal. Toronto Working Papers in Linguistics, 12(2).
Grillo, N., Aguilar, M., Roberts, L., Santi, A., & Turco, G. (2018). Prosody of classic garden path sentences: The horse raced faster when embedded. In Proceedings of Speech Prosody 2018 (pp. 284–288).
Grillo, N., Santi, A., Aguilar, M., Roberts, L., & Turco, G. (2023, August 31–September 2). Garden-path no more: How prosody resolves the complement clauses/relative clauses ambiguity [Conference presentation]. Architectures and Mechanisms for Language Processing 29, San Sebastian, Spain.
Grillo, N., Santi, A., & Turco, G. (2025). Shaping rhythm to keep balance: The structural implications of temporal modulation. In L. Meyer & A. Strauss (Eds.), Rhythm of speech and language. Cambridge University Press.
Grillo, N., & Turco, G. (2016). Prosodic disambiguation and attachment height. In Proceedings of Speech Prosody 2016 (pp. 1176–1180).
Grodner, D., Gibson, E., & Watson, D. (2005). The influence of contextual contrast on syntactic processing: Evidence for strong-interaction in sentence comprehension. Cognition, 95(3), 275–296.
Guo, B., Grillo, N., Mattys, S., Santi, A., Sloggett, S., & Turco, G. (2023, August 31–September 2). Prosody disambiguates string-identical Connected Clauses and Relative Clauses [Conference presentation]. Architectures and Mechanisms for Language Processing 29, San Sebastian, Spain.
Guo, B., Grillo, N., Mattys, S., Santi, A., Sloggett, S., & Turco, G. (2024, September 5–7). Clefted garden-paths: On the incremental semantic Processing of Tense Harmony [Conference presentation]. Architectures and Mechanisms for Language Processing 30, Edinburgh, Scotland.
Guo, B., Grillo, N., Mattys, S., Santi, A., Sloggett, S., & Turco, G. (2024b). The prosody of Clefted Relatives: A new window into prosodic representations. In Proceedings of Speech Prosody 2024 (pp. 1215–1219).
Guo, B., Mattys, S., Santi, A., Sloggett, S., Turco, G., & Grillo, N. (2025, September 4–6). Reanalysis as Last Resort: Coercion in Tense Harmony Violations [Conference presentation]. Architectures and Mechanisms for Language Processing 31, Prague, Czechia.
Gussenhoven, C. (2004). The phonology of tone and intonation. Cambridge University Press.
Gussenhoven, C. (2007). Types of focus in English. In C. Lee, M. Gordon, & D. Büring (Eds.), Topic and focus: Cross-linguistic perspectives on meaning and interpretation (pp. 83–100). Springer.
Hedberg, N. (1990). Discourse pragmatics and cleft sentences in English. University of Minnesota.
Hedberg, N. (2000). The referential status of clefts. Language, 76(4), 891–920.
Hedberg, N. (2013). Multiple focus and cleft sentences. In K. Hartmann & T. Veenstra (Eds.), Cleft structures (pp. 227–250). John Benjamins.
Hedberg, N., & Fadden, L. (2007). The information structure of it-clefts, wh-clefts and reverse wh-clefts in English. In N. Hedberg & R. Zacharski (Eds.), The grammar–pragmatics interface: Essays in honor of Jeanette K. Gundel (pp. 49–76). John Benjamins. http://doi.org/10.1075/pbns.155.05hed
Herment, S., & Leonarduzzi, L. (2012). The pragmatic functions of prosody in English cleft sentences. In Proceedings of Speech Prosody 2012 (pp. 713–716).
Higgins, F. R. (1973). The pseudo-cleft construction in English. [Doctoral dissertation, Massachusetts Institute of Technology].
Hoeks, M., Toosarvandani, M., & Rysling, A. (2023). Processing of linguistic focus depends on contrastive alternatives. Journal of Memory and Language, 132, 104444. http://doi.org/10.1016/j.jml.2023.104444
Hulsey, S., & Sauerland, U. (2006). Sorting out relative clauses. Natural Language Semantics, 14(2), 111–137. http://doi.org/10.1007/s11050-005-3799-3
Kiss, K. É. (1998). Identificational focus versus information focus. Language, 74(2), 245–273.
Kohler, K. J., & Gartenberg, R. (1991). The perception of accents: F0 peak height versus F0 peak position. Arbeitsberichte des Instituts für Phonetik der Universität Kiel (AIPUK), 25, 219–242.
Kügler, F., & Gollrad, A. (2015). Production and perception of contrast: The case of the rise-fall contour in German. Frontiers in Psychology, 6, 1254.
Ladd, D. R. (2008). Intonational phonology. Cambridge University Press. http://doi.org/10.1017/CBO9780511808814
Ladd, D. R., & Morton, R. (1997). The perception of intonational emphasis: Continuous or categorical? Journal of Phonetics, 25(3), 313–342. http://doi.org/10.1006/jpho.1997.0046
Lambrecht, K. (2001). A framework for the analysis of cleft constructions. Linguistics, 39(3), 463–516. http://doi.org/10.1515/ling.2001.021
Lee, Y.-c., Wang, B., Chen, S., Adda-Decker, M., Amelot, A., Nambu, S., & Liberman, M. (2015). A crosslinguistic study of prosodic focus. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4754–4758).
Lenth, R. V. (2023). Emmeans: Estimated marginal means, aka least-squares means. (R package version 1.8.7) [Computer software manual].
Lieberman, P. (1963). Some effects of semantic and grammatical context on the production and perception of speech. Language and Speech, 6(3), 172–187. http://doi.org/10.1177/002383096300600306
McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal forced aligner: Trainable text-speech alignment using Kaldi. In Proceedings of Interspeech 2017 (pp. 498–502).
Nespor, M., & Vogel, I. (1986). Prosodic phonology. Foris.
Pell, M. D. (2001). Influence of emotion and focus location on prosody in matched statements and questions. Journal of the Acoustical Society of America, 109(4), 1668–1680. http://doi.org/10.1121/1.1352088
Percus, O. (1997). Prying open the cleft. In Proceedings of North East Linguistic Society 27 (pp. 337–351).
Pickering, M. J., & Van Gompel, R. P. (2006). Syntactic parsing. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (pp. 455–503). Elsevier.
Pierrehumbert, J. B. (1980). The phonology and phonetics of English intonation. [Doctoral dissertation, Massachusetts Institute of Technology].
Pierrehumbert, J. B. (2017). Comparing PENTA to autosegmental-metrical phonology. Prosodic Theory and Practice.
Poschmann, C., & Wagner, M. (2016). Relative clause extraposition and prosody in German. Natural Language & Linguistic Theory, 34(3), 1021–1066.
Prince, E. F. (1978). A comparison of wh-clefts and it-clefts in discourse. Language, 54(4), 883–906.
Reeve, M. (2010). Clefts. [Doctoral dissertation, University College London].
Rooth, M. (1985). Association with focus. [Doctoral dissertation, University of Massachusetts, Amherst].
Rooth, M. (1992). A theory of focus interpretation. Natural Language Semantics, 1(1), 75–116.
Schachter, P. (1973). Focus and relativization. Language, (pp. 19–46).
Sedivy, J. C. (2002). Invoking discourse-based contrast sets and resolving syntactic ambiguities. Journal of Memory and Language, 46(2), 341–370.
Selkirk, E. (1986). On derived domains in sentence phonology. Phonology, 3, 371–405. http://doi.org/10.1017/S0952675700000695
Selkirk, E. (1996). The prosodic structure of function words. In J. L. Morgan & K. Demuth (Eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition. (pp. 187–213). Lawrence Erlbaum.
Selkirk, E. (2000). The interaction of constraints on prosodic phrasing. In M. Horne (Ed.), Prosody: Theory and experiment: Studies presented to Gösta Bruce (pp. 231–261). Springer Netherlands. http://doi.org/10.1007/978-94-015-9413-4_9
Selkirk, E. (2009). On clause and intonational phrase in Japanese: The syntactic grounding of prosodic constituent structure. Gengo Kenkyu, 136, 35–73.
Selkirk, E. (2011). The syntax-phonology interface. In J. Goldsmith, J. Riggle, & A. C. L. Yu (Eds.), The handbook of phonological theory (pp. 435–484). John Wiley & Sons. http://doi.org/10.1002/9781444343069.ch14
Sharvit, Y. (2003). Tense and identity in copular constructions. Natural Language Semantics, 11(4), 363–393.
Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25, 193–247. http://doi.org/10.1007/BF01708572
Snedeker, J., & Trueswell, J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language, 48(1), 103–130.
Speer, S. R., Warren, P., & Schafer, A. J. (2011). Situationally independent prosodic phrasing. Laboratory Phonology, 2, 35–98.
Swerts, M., Strangert, E., & Heldner, M. (1996). F0 declination in read-aloud and spontaneous speech. In Proceeding of Fourth International Conference on Spoken Language Processing (pp. 1501–1504).
Truckenbrodt, H. (1995). Phonological phrases–their relation to syntax, focus, and prominence. [Doctoral dissertation, Massachusetts Institute of Technology].
Truckenbrodt, H. (1999). On the relation between syntactic phrases and phonological phrases. Linguistic Inquiry, 30(2), 219–255. http://doi.org/10.1162/002438999554048
Turk, A., & Shattuck-Hufnagel, S. (2014). Timing in talking: What is it used for, and how is it controlled? Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1658), 20130395. http://doi.org/10.1098/rstb.2013.0395
Wagner, M. (2005). Prosody and recursion. [Doctoral dissertation, Massachusetts Institute of Technology].
Wagner, M. (2010). Prosody and recursion in coordinate structures and beyond. Natural Language & Linguistic Theory, 28, 183–237.
Wagner, M., & Watson, D. G. (2010). Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes, 25(7–9), 905–945.
Watson, D. (2002). Intonational phrasing in language production and comprehension. [Doctoral dissertation, Massachusetts Institute of Technology].
Watson, D., & Gibson, E. (2004). The relationship between intonational phrasing and syntactic structure in language production. Language and cognitive processes, 19(6), 713–755.
White, L., & Turk, A. E. (2010). English words on the Procrustean bed: Polysyllabic shortening reconsidered. Journal of Phonetics, 38(3), 459–471.
Xu, Y. (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication, 46(3), 220–251. http://doi.org/10.1016/j.specom.2005.02.014
Xu, Y. (2019). Prosody, tone, and intonation. In W. F. Katz & P. F. Assmann (Eds.), The Routledge handbook of phonetics (pp. 314–356). Routledge. http://doi.org/10.4324/9780429056253-13
Xu, Y., Lee, A., Prom-on, S., & Liu, F. (2015). Explaining the PENTA model: A reply to Arvaniti and Ladd. Phonology, 32(3), 505–535. http://doi.org/10.1017/S0952675715000299
Xu, Y., & Xu, C. X. (2005). Phonetic realization of focus in English declarative intonation. Journal of Phonetics, 33(2), 159–197. http://doi.org/10.1016/j.wocn.2004.11.001
Zubizarreta, M. L. (2016). Nuclear stress and information structure. In C. Féry & S. Ishihara (Eds.), The Oxford handbook of information structure. Oxford University Press. http://doi.org/10.1093/oxfordhb/9780199642670.013.008















