In a discourse, a crucial task for listeners is to keep track of information which is presupposed, or established, with the speaker, and that which is new in the common ground (Krifka, 2008). Focus marking helps listeners identify the new information, and correctly reject false alternatives to it (Fraundorf, Benjamin, & Watson, 2013; Fraundorf, Watson, & Benjamin, 2010). Presupposed information, on the other hand, should be established, and is therefore not expected to be falsified. For example, it should be easier to say “no” to “Did the sailor put on the raincoat?” after hearing (1a) than after (1b) (bold marks contrastive prominence; the captain example used in the introduction is one of the experimental stimuli).
|(1)||a.||The captain put on the raincoat.|
|b.||The captain put on the raincoat.|
The easier rejection of the false alternative sailor results from the function of focus in language processing. Words that are focus-marked are more salient than those that are not. Therefore, false alternatives to captain should be easier to detect if captain is focus-marked, e.g., by contrastive prosodic prominence as in (1a). There are, however, multiple ways of marking focus, including prosodic prominence, as in this example, but also syntactic markers such as clefts, and sentence position. It is not yet clear how listeners use and integrate these different cues, and how this varies across languages. In this study, we report on related psycholinguistic experiments in Mandarin Chinese (Chinese) and English which aimed to investigate how these different markers affect discourse encoding and therefore the rejection of false alternatives. Chinese and English are interesting to compare, as they have reasonably similar systems for the prosodic, clefting, and positional marking of focus, though different phonological means of marking prosodic prominence (Y. Chen, Lee, & Pan, 2016; Ladd, 2008; Lambrecht, 2001).
There are two main conceptions of focus which are widely accepted in the current literature, which are best viewed as orthogonal to each other (Calhoun, 2010; Vallduví, 2016). One is that focus is the part of the utterance which updates the common ground, or is new in relation to an implicit or explicit question-under-discussion (hereafter QUD) presupposed in the preceding discourse (Ginzburg, 1994; Roberts, 1996; Vallduví, 2016). This will be called QUD-focus. For example, the captain is the focus in (2b) (where […]F shows the focus), which answers an implicit or explicit QUD like (2a) in the preceding discourse.
|(2)||a.||Who put on the raincoat?|
|b.||[The captain]F put on the raincoat.|
The other main definition of focus involves contrastive alternatives. Starting from the perspective of alternative semantics (Rooth, 1985, 1992), focus is defined as indicating “the presence of alternatives that are relevant for the interpretation of linguistic expressions” (Krifka, 2008, p. 247). This will be called contrastive focus. For example, the focus the captain in (2b) indicates a set of contextually-relevant alternatives, e.g., sailor or pirate. In this example, the captain is the focus by either definition; however, as we discuss further below, in some cases, different elements in a sentence can carry each type of focus.
Both types of focus should affect how listeners process the discourse information. For example, QUD-focus enhances the salience of words, so it follows that listeners pay more attention to words that are focus-marked (e.g., Akker & Cutler, 2003; Cutler, 1976; Cutler, Dahan, & van Donselaar, 1997; Cutler & Fodor, 1979; Ip & Cutler, 2017; Kember, Choi, & Cutler, 2016; Kember, Choi, & Yu, 2016; Kember, Choi, Yu, & Cutler, 2019; Yan & Calhoun, 2019; Yan, Calhoun, & Warren, 2019). Focus marking also results in better encoding and remembering of focused information, so it helps listeners detect false alternatives to words when they are focus-marked (e.g., Sanford, Sanford, Molle, & Emmott, 2006). In contrast, presupposed information, which is assumed to be established between the interlocutors, should be harder to verify, as presupposed information is not expected to be falsified.
Contrastive focus also has a similar effect on the rejection of false alternatives to focused words. As mentioned above, contrastive focus implies the presence of alternatives (Rooth, 1992). Therefore, it follows that the main processing effect is to enhance the encoding of a set of alternatives to the focus within the utterance. In other words, focus marking not only strengthens the mental representation of what has happened, but also what has not happened, in relation to the focused word (Fraundorf et al., 2013, 2010).
Therefore, it follows from both a QUD-focus or contrastive focus definition of focus that a false alternative to a focus-marked word should be rejected more quickly compared to when the word is not focus-marked. For example, a question like (3) should be rejected more quickly and more accurately after hearing (1a) than after (1b).
|(3)||Did the sailor put on the raincoat?|
The focus marker that has been investigated most with regard to its role in processing is prosodic prominence. It is not yet clear, however, what other cues listeners use to encode discourse information (i.e., focused words and focus alternatives) beyond prosodic prominence when processing a discourse. For example, syntactic clefting is known to be an important focus marker in Chinese and English (Lambrecht, 2001; Paul & Whitman, 2008), but its role in encoding discourse information has been much less investigated in the psycholinguistic literature. There have been a few studies that have looked at the effect of syntactic clefting on memory for focused words, but it is not clear whether syntactic clefting also facilitates more immediate processing (Birch, Albrecht, & Myers, 2000; Birch & Garnsey, 1995; Kember et al., 2019). In addition, in both Chinese and English, the sentence-final element (in canonical sentences usually objects) are the places where new information tends to occur, and they are also places where the nuclear prominence usually occurs (Calhoun, 2010; Y. Chen et al., 2016; Ladd, 2008). Previous research has experimentally confirmed in English that final objects have a default focus bias, even if they are not overtly focused marked (Carlson, Dickey, Frazier, & Clifton, 2009; Harris & Carlson, 2018). But there is less work on this in Chinese (but see S.-H. Chen, Chen, & He, 2012). More interestingly, the interaction between prosodic prominence, clefting, and default focus position in encoding discourse information is not at all clear.
In Section 1.1, we further describe prosodic, positional, and clefting cues to focus in Chinese and English. In Section 1.2, we then review previous studies looking at the effect of focus on language processing, concentrating on the research on the role of focus in encoding discourse information explored in this study. In Section 1.3, we present relevant research on the role of the default focus position in language processing. In Section 2.1, we outline research questions that were investigated and predictions based on the previous research findings. In Sections 3 and 4, we then describe the results of related experiments which were carried out in Chinese and English to investigate the interaction of prosodic prominence, clefting, and default focus position, in encoding discourse information.
In both Chinese and English, prosodic prominence is a key marker of focus (Calhoun, 2010; Y. Chen & Gussenhoven, 2008; Krifka, 2008; Wang & Xu, 2006; Xu, 1999). Prosodic prominence is realized by pitch accenting in English, i.e., the focused word typically carries the nuclear pitch accent, i.e., it is the most prominent in the intonational phrase (Calhoun, 2010) (see examples in Section 4). Nuclear pitch accented words typically have a movement in fundamental frequency (F0) associated with the lexically stressed syllable and a drop in F0 following the accented syllable, at least in declaratives; as well as longer duration and higher intensity (Breen, Fedorenko, Wagner, & Gibson, 2010; Kügler & Calhoun, to appear). In Chinese, nuclear prosodic prominence is realized through pitch register, as the lexical tone determines the local F0 curve of each syllable (Y. Chen & Gussenhoven, 2008; Wang & Xu, 2006; Xu, 1999). The pitch range in the focused word is expanded, and the region following the focus compressed (see Figure 1, which was also a stimulus used in the Chinese experiment described in Section 3). The focused word is also realized with longer duration and higher mean intensity (e.g., S.-w. Chen, Wang, & Xu, 2009; Y. Chen & Gussenhoven, 2008; Xu, 1999).
In both languages, however, there is an asymmetry in the realization of the nuclear accent in phrase-initial and phrase-final position, or subject and object position in a canonical sentence. In English, a nuclear accent on the subject is usually marked by a large high movement in F0 (often labelled L+H* in the ToBI system, Brugos, Shattuck-Hufnagel, & Veilleux, 2006), with a fall following the accent and low pitch for the rest of the intonation phrase, i.e., it is the most phonetically prominent word in the utterance (see Figure 5 in Section 4). However, on the phrase-final object, while the nuclear accent can be marked by a high, rising accent, it will be perceived as nuclear even if the accent is downstepped, low, or not as phonetically prominent as an earlier accented word (Calhoun, 2010). The situation is essentially the same in Chinese. As in English, when the focus is on the subject, the pitch range of the subject word is expanded and following pitch range is heavily reduced (see Figure 1). However, when the focus is on the final object, which is also the default position for nuclear prominence in Chinese, the pitch range in the pre-focal region is still relatively wide (Xu, 1999).
This asymmetry can be taken as evidence for two types of focus marking in the two languages: prosodic prominence and positional marking. It is widely claimed that the phrase-final position is the default position for focus in English and Chinese (e.g., Calhoun, 2010; Y. Chen et al., 2016; Ladd, 2008; G. Ward & Birner, 2006; Xu, 1999); although sources differ as to whether this position should be defined syntactically (the focus is the most deeply embedded element) or by prosodic position (the focus is the final strong element in the intonation phrase). In final position, as the position of the element already indicates focus, phonetically strong prosodic marking is not needed to further indicate focus, whereas in subject position it is; hence the asymmetry. This kind of asymmetry has been noted across many languages, including many Bantu languages where it is claimed that there is no prosodic marking of focus at all for non-subjects (e.g., Downing & Hyman, 2016; Zerbian, 2007; Zimmermann, 2011). In some approaches, ‘focus as alignment,’ i.e., focus marked by a particular position in prosodic structure, is posited as an alternative cross-linguistic ‘universal’ to ‘focus as prominence’ (Féry, 2013).
In both Chinese and English, syntactic clefting can also be used to mark focus (Y. Chen et al., 2016; Lambrecht, 2001; Paul & Whitman, 2008; G. Ward & Birner, 2006). While there are a number of different types of clefts in both languages, we will concentrate on it-clefts in English, and 是…的 (SHI…DE) clefts in Chinese, as these are well-studied for their focus properties and are similar in function in the two languages. In the following, (4) and (5) are subject clefts, and (6) and (7) are object clefts.1
|(4)||It was [the captain]F who put on the raincoat.|
|(6)||It was [the raincoat]F that the captain put on.|
While the function of clefts in English and Chinese seems to be similar, their form is different. In English, the focused constituent is fronted. For example in object clefts, as in (6), when the object noun raincoat is focused, it is fronted to the cleft clause from the sentence-final position in canonical word order. In Chinese, clefts are marked morphosyntactically using the 是…的(SHI…DE) construction, without changing the word order (Fang, 1995). For instance, for subject focus, the copula 是 (SHI) occurs immediately before the subject, and 的 (DE) either before or after the object. When 的 (DE) appears before the object, as in (5), the sentence is past tense (Hole, 2011) and this pre-object 的 (DE) is largely restricted to Northern Mandarin speakers. For object focus, as in (7), the copula 是 (SHI) occurs before the verb, and 的 (DE) before the object (see Simpson & Wu, 2002; Paul & Whitman, 2008; Hole, 2011 for an overview of the 是…的 (SHI…DE) cleft construction and the difference between pre-verbal and post-verbal 的 [DE]).
In both languages, there is evidence that object clefts are less common, and may be harder to process, than subject clefts. In Chinese, there is some debate about whether object focus can be marked by the 是…的 (SHI…DE) construction; at the least this seems to be restricted to Northern speakers (Hole, 2011; Paul & Whitman, 2008; Teng, 1979) Our previous acceptability judgment study shows that while object clefts were judged as acceptable by (Northern) Chinese listeners as marking object focus, they were less acceptable than canonical order sentences (with stress on the object), while subject clefts and canonical order (with stress on the subject) were equally acceptable to mark subject focus (Yan, Calhoun, & Warren, 2020) (see further below). In English, object clefts or object relative clauses, both of which involve a change of word order, have shown to be more difficult to process compared to subject clefts which involve ‘regular’ word order (MacDonald, 2002; Traxler, Morris, & Seely, 2002; also see a discussion of the asymmetry between subject and object clefts in Tily, Fedorenko, & Gibson, 2010).
As is shown in (4)–(7), in both languages, the nuclear prominence normally falls on the cleft head (e.g., captain in  and ). That is, focus is marked by both prosodic prominence and syntactic clefting on the cleft head (e.g., captain). In both languages, these clefts are usually analyzed as involving an existential presupposition which is not there with canonical order sentences (Hedberg, 2013; Hole, 2011; Lambrecht, 2001; Onea, 2019; Paul & Whitman, 2008). For example, for (4) and (5), it is pragmatically presupposed that someone put on a raincoat. Hence, these clefts should only be felicitous in discourse contexts where this presupposition is either already available in the context, or where it can be accommodated; whereas in canonical word order this does not need to be the case for the utterance to be felicitous.
In both languages, while the nuclear prominence normally falls on the cleft head, it can also fall on a different constituent, e.g., raincoat in the following examples (8) and (9). In these cases, focus is marked prosodically on raincoat and syntactically on captain:
|(8)||It was [the captain]F who put on [the raincoat]F.|
|(English-subject cleft with prosodic prominence on the object)|
These ‘mismatch’ constructions have not received much attention in the experimental literature (though see Yan & Calhoun, 2019; Calhoun, Wollum, & Kruse-Va’ai, 2019; Kember et al., 2019), but have been shown to be used in naturally occurring speech in both Chinese and English (Delin & Oberlander, 1995; Hedberg, 2013; Hole, 2011; Lambrecht, 2001; Onea, 2019; Prince, 1978). These constructions have a number of different discourse functions. Most relevant here is their use as what Hedberg (2013) calls ‘vice-versa clefts,’ first documented by Ball and Prince (1978), where each of the focus-marked elements is a contrastive focus (for more detailed discussion see Calhoun et al., 2019). They are also akin to ‘second occurrence focus’ constructions (see overview in Baumann, 2016). These constructions are felicitous given a more complex existential presupposition than for head-stressed clefts that there are a number of individuals (e.g., 船长 ‘captain’ and 水手 ‘sailor’) and a number of different items to wear (e.g., 雨衣 ‘raincoat’ and 夹克 ‘jacket’), and that various individuals put on various items, e.g., in the following context:
|The weather got colder, and the captain and the sailor on the ship put on their raincoat and jacket.|
As with the existential presupposition for head-stressed clefts, this presupposition should be available in the discourse context, or be able to be accommodated. These presuppositions add more difficulty and complexity to cleft structures, which might have consequences for language processing (Yan & Calhoun, 2019).
While both the arguments in (8) and (9) are marked as contrastively focused, the question of which is the QUD-focus is less clear, and may differ between English and Chinese. Opinion is divided in the previous theoretical research on this issue (see discussion in Calhoun et al., 2019). As discussed further in the next section, in our previous experimental work, we showed that English listeners, when presented with stimuli like (8) in contexts like (10), judged the QUD-focus to be on the cleft head, i.e., in line with the syntactically marked focus, while Chinese listeners judged the QUD-focus to be on the nuclear stressed word, i.e., in line with the prosodically marked focus (Calhoun et al., 2019; Yan et al., 2020). However, judgements were less strong than for head-stressed clefts and canonical word order sentences. In Calhoun et al. (2019), we argue that these structures may be compatible with QUD-focus on either the cleft head or the word carrying nuclear prominence, depending on the context.
Clefts have long been argued to carry an exhaustive inference, which canonical word order sentences do not (É Kiss, 1998; Krifka, 2008; Molnár, 2006). An exhaustive inference should rule out other alternatives in the context of the proposition, in a similar way to exclusive particles like only. For example, (4) and (5) would imply that no one else but 船长 ‘the captain’ put on the raincoat. Experimental work in recent years has shown that, in both English and Chinese, any exhaustivity inference does not seem to be as clear as this: While experiments differed in the contexts and tasks used, the general finding was that any implication of exhaustivity with clefts was clearer than for canonical word order sentences, but not as strong as with exclusive particles like only (e.g., Destruel et al. 2013, Liu & Yang 2016; see overview in Onea 2019). In his review, Onea (2019) concludes that the likely situation is that “clefts imply that no other true answer to that issue is more informative than the canonical inference” (p. 415), which may or may not lead to the interpretation of exhaustivity depending on the context.
A considerable amount of evidence has been found across a range of languages that focused words have a processing advantage over unfocused words (e.g., Akker & Cutler, 2003; Cutler, 1976; Cutler et al., 1997; Cutler & Fodor, 1979; Ip & Cutler, 2017; Kember, Choi, & Cutler, 2016; Kember, Choi, & Yu, 2016; Kember et al., 2019; Yan & Calhoun, 2019; Yan et al., 2019). Focused words are recognized faster and remembered better. For example, in phoneme-monitoring experiments in English and Chinese, phonemes have been shown to be recognized faster in focused words or in words where the preceding intonation contour predicts that they will be in focus (Akker & Cutler, 2003; Cutler, 1976; Cutler & Fodor, 1979; Ip & Cutler, 2017). This shows cross-linguistic evidence that the prosodic prominence enhances processing of focused words in languages in which prosodic prominence is a major cue to focus. In most of this work, the contexts and utterance types used do not allow us to distinguish if the focus involved was QUD-focus or contrastive focus, rather these processing effects are for focus in general.
This evidence, however, is mostly confined to prosodic prominence as a way of marking focus. It is still poorly understood how other linguistic cues, such as syntactic clefting, are used, let alone the interaction between different cues. A recent study by Calhoun et al. (2019) looked at how syntactic clefting and contrastive prosodic prominence affect focus interpretation in two unrelated intonation languages, English and Samoan, using a metalinguistic judgement task of QUD-focus. In English, the focus was consistently judged to align with the prosodic prominence in canonical sentences. In head-stressed clefts, listeners judged the focus to be on the cleft head, but there was no increase in the likelihood of a focus judgment over prosody alone. On the other hand, clefting seemed to outweigh prosody in the cases where the two cues clashed, as in (8), with the focus still judged to be on the cleft head, though with lower agreement between participants.
Using the same sentence conditions (canonical order versus clefting, and prosodic prominence on subjects and objects) as in Calhoun et al. (2019), Yan et al. (2020) investigated focus interpretation using a question-answer appropriateness rating task in Chinese. The results showed that, as with English, Chinese listeners use prosodic prominence to identify the focus in canonical sentences, with very high appropriateness ratings to the question-answer pairs. For head-stressed clefts, the cleft head was perceived as being in focus, but ratings were not higher than for the canonical sentences. For the ‘mismatch’ sentences, the prosodically prominent word was judged to be the focus by Chinese listeners, although these were rated lower than the other conditions. Consistent clefting never improved appropriateness responses, but inconsistent clefting lowered ratings. Together, these studies show the relative weighting of prosodic prominence and clefting is different across languages, and particularly in Chinese and English, in metalinguistic judgement tasks.
These studies used untimed judgment tasks. There are few studies looking at the interaction of prosodic and other types of focus-marking in processing tasks. Akker and Cutler (2003) looked at focus cued by prosodic prominence and/or semantic context, i.e., a cueing question, e.g., “which man?” cued corner as the focus in “the man on the corner.” Using the phoneme monitoring paradigm (e.g., participants respond as fast as possible when they hear the target phoneme /k/), it was found that both the prosodic and semantic cues were very effective, but these two cues to focus were not additive, which means that there was no extra processing advantage when the contextual and prosodic cues to focus aligned. However, Kember et al. (2019) used a memory task to look at the effect of prosodic prominence and clefting in spoken sentences in Korean and English. They found that both prosodic and clefting cues to focus enhanced memory for focused words, but the relative effects differed in the two languages, i.e., clefting cues were more effective than prosodic cues in Korean, but they were equally effective in English with the combination of clefting and prosodic cues most effective. Therefore, listeners’ use of different cues to focus depends not only on the language, but also on the demands, and time-course, of the task.
Recently, an increasing number of studies have shown that prosodic prominence increases activation of alternatives to focused words, compared to unfocused words, in Germanic languages and Chinese (Braun, Asano, & Dehé, 2019; Braun & Biezma, 2019; Braun & Tagliapietra, 2010; Gotzner, 2017; Husband & Ferreira, 2016; Spalek, Gotzner, & Wartenburger, 2014; Yan & Calhoun, 2019; Yan et al., 2019). Yan and Calhoun (2019) looked at the effects of both prosodic prominence and clefting on the priming of contrastive alternatives in Chinese. They used the lexical priming paradigm, which taps into the immediate word-level activation of discourse referents, for utterances presented out of a discourse context. It was found that only prosodic prominence facilitated the lexical activation of contrastive alternatives. Syntactic clefting slowed down the response times in general. This is likely because, out of a discourse context, there was an initial processing cost due to the need to accommodate the presupposition required by clefts (see Section 1.1). Further, as discussed in Section 1.1, depending on the context, clefts may carry an implicature of exhaustivity. This increases the competition between the focused word and its alternatives (see further in Gotzner, 2017).
Studies that investigated the role of focus in more long-term representation of focused words and focus alternatives have shown that they are remembered better than those to unfocused words (Fraundorf et al., 2013, 2010; Sanford et al., 2006; Spalek et al., 2014). Using the change detection technique, Sanford et al. (2006) found that after hearing “The money from the wallet had gone missing,” false alternatives such as purse in “The money from the purse had gone missing” were easier to detect if wallet carried contrastive prominence (i.e., L+H*). Similar effects were found in reading tasks by P. Ward and Sturt (2007), using focus cued by semantic context, and when italicization was used to mark focus (Sanford et al., 2006); as well as in an earlier study when pseudo-clefts were used to mark focus (Sturt, Sanford, Stewart, & Dawydiak, 2004). Fraundorf et al. (2010) found that focus improved not only memory for focused words but also memory for discourse-mentioned alternatives to focused words. Using a true/false verification task, Fraundorf et al. (2010) tested the rejection of mentioned and unmentioned false alternatives and found that contrastive prominence facilitated the rejection of mentioned alternatives, but did not facilitate the rejection of unmentioned alternatives when the true/false verification task was performed the next day. Therefore, contrastive pitch accenting results in better encoding for what did not happen in relation to the focus. Fraundorf et al. (2010) later found similar effects in a reading task when font emphasis (e.g., capitals and italics) was used to mark focus.
As discussed in Section 1.1, in both English and Chinese the default focus position is final, and therefore, sentence-final objects are expected to be focused. This positional cue to focus has an interesting effect on processing, in that, in English and Chinese, final objects have been previously found to have a default focus bias, even if they are not otherwise focused marked (e.g., if they do not carry nuclear prominence) (Ayers, 1996; Carlson et al., 2009; S.-H. Chen et al., 2012; Harris & Carlson, 2018). Note that this effect is task dependent: Final objects that were not nuclear accented were not interpreted as focused in our meta-linguistic judgment tasks, as discussed above (Calhoun et al., 2019; Yan et al., 2020).
For example, Harris and Carlson (2018) looked at the effect of accenting and phrase position on naturalness judgments of focus-sensitive ellipsis structures like the following:
|(11)||a.||Danielle didn’t pass the quiz, let alone the final.|
|b.||Danielle didn’t pass the quiz, let alone Kayla.|
|c.||Danielle didn’t pass the quiz, let alone the final.|
|d.||Danielle didn’t pass the quiz, let alone Kayla.|
In these structures, the interpretation of the second clause (let alone…) depends on its link to the focused element in the first clause. Harris and Carlson (2018) found that, when the antecedent of the NP in the second clause was the subject (Danielle-Kayla), accenting had a strong effect on the naturalness of the sentence, i.e., (11d) was much more acceptable than (11b). Sentences where the antecedent of the NP in the second clause was the final object (quiz-final) were in general more acceptable, and further, when the antecedent was not accented, i.e., (11c), these were rated lower than when it was (11a), but the difference was relatively small. Harris and Carlson (2018) attribute these results to the effect of ‘enduring focus’: “Locations that typically bear default focus continue to provide potential locations for focus, regardless of overt markers of focus.” In a second experiment, they tried to tease apart whether the default focus bias comes from objects or phrase final position using ellipsis constructions with objects with relative clauses. They conclude that there is a stronger bias for phrase final position, although they were only comparing parts of an object, not objects and non-objects.
Ayers (1996) found similar effects of positional differences between the processing of subjects and objects in an early study of English. In her study, for sentences where participants had to reject information which occurred later in the sentence (object nouns), no difference was found when object nouns in the source sentence carried nuclear accents, compared to when they did not. But when participants had to reject information which occurred early in the sentence (subject nouns), participants responded faster when the subject noun had the nuclear accent, compared to when the subject noun did not. Ayers (1996) conjectured that the word position effect might have arisen because any effects of prosody for the object sentences might have been “obscured by the effect of reprocessing right at the end of the sentence.” This can also be interpreted as the role of default focus position.
Sentence-final objects also hold a default focus in Chinese, but very few studies in Chinese have investigated the role of default focus in the perception of focus or in language processing. S.-H. Chen et al. (2012) is among very few studies that have been conducted on Chinese. They investigated a number of linguistic cues (including default focus position) that are used to encode focal information in both Chinese and English, using a verification task. In the task, participants listened to stimulus recordings with different locations for prosodic prominence (subject versus object), with different sentence structures (cleft [marked by 是 (SHI) in Chinese and by it-clefts in English] versus noncleft), and with different word positions (pre-verbal and post-verbal position; post-verbal position is the default focus position in SVO sentences in both English and Chinese). Some stimulus examples are shown in (12a) and (12b).
|(12)||a.||Does the turtle chase the cat?|
|(Canonical with prosodic prominence on the object)|
|b.||Is it the turtle that chases a cat?|
|(Cleft with prosodic prominence on the subject)|
|(13)||a.||Picture 1: The turtle chases the rabbit.|
|b.||Picture 2: The monkey chases the cat.|
Participants were also visually presented with two pictures describing either the event in (13a) or (13b), each of which contradicted either of the two nouns in recordings that they heard (e.g., turtle and cat). Participants were instructed to select one picture and then correctly describe the picture using a similar sentence structure to the stimulus recording. The expected correct response rates would be higher if the word that is being corrected is marked by a stronger cue to focus. The theoretical evidence to support this hypothesis is that in discourse speakers mark new information or the information that updates the common ground (QUD-focus) to help listeners to identify such types of information. Therefore, compared to new information (e.g., cat in [12a], as cat carried prosodic prominence), presupposed information (e.g., turtle in [12a]) should be more likely to be assumed to be true and more difficult to correct. Thus, a higher number of responses would be expected to correct Picture 1 in (13a) than Picture 2 in (13b).
Different patterns were found for the two languages (Chinese and English) in this task. The relative importance of the linguistic cues in encoding focal information is shown in (14):
|(14)||Chinese: default focus position > prosodic prominence > clefting|
|English: clefting > default focus position > prosodic prominence|
In Chinese, prosodic prominence was more effective than clefting in encoding focal information. It was the other way around for English, i.e., clefting was preferred. Further, default focus position was more effective than prosodic prominence in both Chinese and English, but clefting was more effective than default focus position in English.
To sum up, in both Chinese and English, it is well-established that prosodic prominence confers processing advantages in relation to the encoding of discourse information. However, morphosyntactic means of marking focus, e.g., clefts, have received far less attention. There have been a few studies that have looked at the effect of syntactic clefting on memory for focused words (Birch et al., 2000; Birch & Garnsey, 1995; Kember et al., 2019), but it is not clear whether syntactic clefting also facilitates more immediate processing. Syntactic clefting is claimed to mark focus in both Chinese and English, but it has not been well tested whether it facilitates language processing. Further, the relative importance of morphosyntactic and prosodic cues in language processing is still poorly understood (cf. Calhoun et al., 2019; Yan & Calhoun, 2019). Across languages, when multiple focus cues fall on one word, it is very likely that the word will be perceived as focal. However, the processing effects are less understood if the cues do not fall on the same word (but see e.g., Kember et al., 2019; Yan & Calhoun, 2019), though this kind of ‘mismatch’ sentence is found in naturally occurring contexts (see Section 1.1). In addition, the role of default focus bias in language processing has not been well investigated, especially in Chinese. Given the findings that the weighting of prosodic prominence, syntactic clefting, and default focus position is different in the two languages, Chinese and English are interesting to compare to investigate language-specific weighting of these cues.
In the current study, we used a speeded false alternative rejection task to look at the effects of prosodic prominence and clefting, as well as default focus position, on the speed of correct rejection of false alternatives to the focus-marked word in both Chinese and English. This task taps into the encoding of discourse referents in relation to other discourse information. In the experiment, following an appropriate discourse context, participants heard a sentence with prosodic, clefting, and/or positional cues to focus on the subject and/or object, e.g., The captain put on the raincoat. They then had to respond to a follow-up yes/no question about the sentence, in which either the subject or the object was incorrect, e.g., Did the sailor put on the raincoat?. This task should be easier, and therefore responses faster, if there were effective cues to focus on the incorrect word in the preceding utterance for that language, i.e., captain-sailor. As discussed above, focus-marking is predicted to facilitate rejection of false alternatives by either the QUD-focus or contrastive focus definitions. In the canonical order and head-stressed clefts, QUD-focus and contrastive focus are prosodically and syntactically marked on the same word. However, in the ‘mismatch’ clefts, with stress in the main clause, contrastive focus is marked on both the subject and object, while QUD-focus may be on either, as discussed above. Responses to these sentences are therefore particularly relevant to separating the role of each of these types of focus. We return to this point in the final discussion. The Chinese and English experiments were part of the same broad research project, but they were not originally planned to be parallel experiments, hence there are some minor differences in methodology between them. However, due to the strikingly similar findings and valuable cross-linguistic contribution, we decided it was useful to report them together.
The Chinese and English experiments aimed to address the following research questions:
For the first research question, for both languages, our prediction is that there will be an asymmetry in the effect of prosodic prominence and clefting in correctly rejecting false alternatives to subjects and objects. That is, we expect that focus cues will have a greater effect for subjects than for objects, as objects are in the default focus position in both languages (see Section 1.3). The one exception to this is for object clefts in English, where objects are fronted so they are no longer in the final default focus position. For the second research question, we predict that for both languages, prosodic prominence will result in faster rejection of false alternatives compared to no prosodic prominence. But if the default focus effect is stronger than prosodic prominence, we would only expect this effect for words which are in the non-default focus position (i.e., subjects in both languages, and objects in object clefts in English). For the third research question, our prediction is that consistent clefting plays a facilitatory role (i.e., resulting in faster rejection of false alternatives relative to canonical order), and inconsistent clefting plays an inhibitory role (i.e., resulting in slower rejections relative to canonical order). Similar to the prediction for the second research question, if the default focus effect is stronger than clefting, we would only expect clefting to have an effect on words which are in the non-default focus position. Furthermore, for object clefts, due to the word order change in English and the relative uncommonness of these structures in Chinese, they may be harder to process than canonical sentences in both languages. For the last research question, we expect prosodic prominence to be more effective than clefting in Chinese and the other way around in English, given the previous research on the relative importance of theses two cues in each language (see Section 1.2).
A total of 36 near-monolingual native Northern Mandarin Chinese speakers (13 females and 23 males; mean age = 21.4, SD = 2.1, age range = 16–27) were recruited from the student population at Henan Polytechnic University in China. They reported that they had received formal compulsory English education in China, but that they did not speak other languages at home and were not fluent in any other languages. They had not lived outside China for more than six months, nor did any report they had lived in an English-speaking country. The participants received supermarket vouchers in recognition of their participation. None of them reported any hearing or reading difficulties.
Forty-eight sentences were constructed as the experimental stimuli, e.g., the Critical Sentences in Table 1.3 Two prosodic prominence locations (subject, object), three syntax conditions (canonical, subject cleft, object cleft), and two question types (subject question, object question) resulted in twelve experimental conditions (see details below). All these three factors were manipulated within-subjects and within-items. Twelve lists of 48 experimental stimuli were constructed in a Latin square design so that each sentence was in a different condition in each list. This resulted in four sentences in each condition. Each participant saw only one list.
|The weather got colder. The captain and the sailor put on their raincoat and jacket.|
|Connecting question: 可以再多告诉我一些信息吗？ Can you tell me more?|
|canonS (15a); canonO (15b); ScleftS (15c); ScleftO (15d); OcleftS (15e); OcleftO (15f);|
|‘False alternative’ questions:|
|SQ: 水手穿上了雨衣吗？ Did the sailor put on the raincoat?|
|OQ: 船长穿上了夹克吗？ Did the captain put on the jacket?|
The 48 sentences described a simple, plausible event in the past tense, using commonly occurring nouns and verbs. Both subject and object nouns had two syllables. All sentences had seven syllables in the canonical order version. For each sentence, six versions were created: canonS (canonical word order with contrastive prominence on the subject noun, as in [15a]), canonO (canonical word order with contrastive prominence on the object noun, as in [15b], ScleftS (subject cleft with contrastive prominence on the subject noun, as in [15c], ScleftO (subject cleft with contrastive prominence on the object noun, as in [15d], OcleftS (object cleft with contrastive prominence on the subject noun, as in [15e], and OcleftO (object cleft with contrastive prominence on the object noun, as in [15f].
As shown in Table 1, each test item consisted of a written context, an audio connecting question, an audio critical stimulus with varying focus marking, and a written question about an alternative. The context contained two conjoint noun phrases, one in subject position and one in object position. These conjoint phrases established the alternatives to the subject and object nouns that appeared in the critical sentence. The context was followed by an open connecting question (可以再多告诉我一些信息吗? “Can you tell me more?”) in order to make the context and the critical sentence connect more naturally. This connecting question was in turn followed by the critical sentence with one of six types of focus marking. Finally, this critical sentence was followed by a ‘false alternative’ question that asked about the false alternative, i.e., it asked about the alternative that was mentioned in the context but not repeated in the critical stimulus. This question was either about the subject (SQ) or object (OQ) of the critical stimulus. The question required a “no” response as the correct answer.
A further 48 filler trials were constructed following the same structure as the critical trials, i.e., a context sentence, a question, an answer, and a further question. This led to a total of 96 trials per participant. The context sentences in the fillers were more varied, some included explicit alternatives, some sets of alternatives that could be inferred from a set (e.g., man and woman can be implied by couple), and some no explicit alternatives. Among the 48 fillers, 24 had answers which had the same structure as one of the critical sentences (canonO, canonS, OcleftO, OcleftS, ScleftO, ScleftS), i.e., four sentences for each of the six versions. These 24 fillers required “yes” as the correct response. In the other half of the fillers, the answers had different sentence structures, such as subject-verb, subject-adverb-verb-object etc. The sentences also differed in length, ranging from four to sixteen characters (syllables). Of these 24 fillers, 12 required “yes” as the correct response, and 12 required “no” as the correct response. To keep the experiment manageable in terms of time, we did not add more trials with “no” responses to balance the number of “yes” and “no” responses. The imbalance in expected responses did not seem to be an issue, and accuracy rates were high for both sets of fillers, i.e., those with “yes” and “no” responses. The ‘false alternative’ questions in fillers that required “no” responses had wrong information about a range of parts of the answer, such as a verb.
The critical sentences in Table 1 (and the equivalent sentences in filler trials) were recorded directly to hard drive using Praat (Boersma & Weenink, 2018) by a trained female native Mandarin speaker (first author) in a soundproof room at Victoria University of Wellington through a USB-based head-mounted microphone (sampling rate: 32000 Hz; bit rate: 512 kbps; bit depth: 16). The connecting question (see Table 1) was recorded by a male native Mandarin speaker. The intensity of all the sentences were re-scaled to 50 dB. All critical sentences were checked impressionistically by two native Mandarin speakers for the location of contrastive prominence (see Figure 1 for examples of canonS and canonO, Figure 2 for examples of ScleftS and ScleftO, and Figure 3 for examples of OcleftS and OcleftO). The figures were drawn in Praat with pitch tracks modified by the build-in smoothing function.
It was decided to use read stimuli performed by a trained speaker, rather than stimuli produced in a more naturally-occurring situation (e.g. Roettger, Mahrt, & Cole, 2019), in order to ensure that the location of prosodic prominence was consistent and unambiguous across all sentence types (canonical and clefts). Particularly for the ‘mismatch’ sentences, these would be relatively hard to elicit in naturally-occurring dialogues. The stimuli used in this study are the same as those used in Yan et al. (2020). As discussed above, in that study we collected appropriateness ratings for the sentences in dialogue contexts, and showed that the sentences were judged highly acceptable when the prosodic prominence (and clefting) matched the focus cued by a preceding wh-question (see Section 1.1).
In order to confirm that the sentences did indeed differ according to the intended position of the contrastive prominence, acoustic measurements (duration, mean F0, max F0, min F0, and mean intensity) of the subject and object nouns were obtained using ProsodyPro (Xu, 2013). As focus is marked through pitch range expansion in Chinese, F0 range was calculated being the difference between max F0 and min F0. Table 2 shows the mean values and standard deviations for each of these measurements in the six sentence conditions.
|Sentence condition||Word position||Duration||Mean F0||Intensity||F0 range|
The experiment was administered using Opensesame v. 3.1 (Mathôt, Schreij, & Theeuwes, 2012), and was run in a quiet computer room at Henan Polytechnic University, China. Participants listened to the sentences over closed-ear headphones. The entire session was conducted in Chinese. Participants received written instructions on the computer screen, and the instructions were also repeated orally by the experimenter.
Participants first saw a context in the centre of the computer screen, and were instructed to press any key to proceed when they had read it, with no time limit. After pressing any key, they heard the connecting question as in Table 1 in a male voice, and after a 500 ms pause an answer (one of the six versions of the critical sentences) in a female voice. The screen was blank with a black background during the playback of the audio sentences. After the critical sentence they saw a ‘false alternative’ question in the centre of the computer screen, and had to decide whether the answer was “yes” or “no” by pressing the “z” key (labelled as 是 “yes”) for a yes response with their non-dominant hand and “m” key (labelled as 否 “no”) for a no response with their dominant hand as fast as they could for right-handed participants. The RTs were measured from when the ‘false alternative’ question appeared to when the response key was pressed. The assignment of keys was reversed for left-handed participants so that the “no” key was always pressed by the dominant hand. The time limit for this key press was six seconds. After the key press or six seconds, the experiment moved to the next trial automatically. All 96 trials were randomized for each participant. Participants could have a break if they wanted when they were at the screen showing the context.
Six practice trials in a fixed order were played before the main experiment. The practice trials followed the same format as the main experiment, except that after six seconds participants heard a warning beep, but could still respond. They were told that the warning beep would happen in the practice phase if they do not respond within six seconds. They were also told that the warning beep would not happen in the main part of the experiment, but the experiment would move onto the next trial if they do not respond within six seconds. They received feedback on their response in the practice phase, but not in the main phase. The entire experiment lasted approximately 25 minutes. Demographic information such as biological sex, age, hometown, and English proficiency was collected using a paper form at the end of the experiment.
Overall accuracy for test and filler items was 96% with the lowest accuracy for any participant being 88.5%. No participants were excluded on the basis of accuracy levels. As the overall accuracy was very high, no analysis of factors affecting accuracy is reported. The response time analysis was based on correct responses to critical trials, i.e., on 1,656 data points (excluding 72 [4.2%] incorrect responses). The RTs were log transformed, which was the best transformation compared with no transformation, inverse transformation, and square root transformation. After modeling, data points of residuals whose standard deviations were larger than 2.5 were further eliminated (excluding 32 [1.9%] responses). The resulting count of trials for the RT analysis was 1,624. The back-transformed data were used when plotting the predicted values.
Mixed effects regression models were built to test how RTs were affected by a number of factors, using the R package lme4. For the RT analysis, transformed reaction times were the dependent variable in linear mixed effects regression. The full model for RT included key experimental predictors, item factors, and participant factors. The key experimental predictors were stress position (subject, object), syntax (canonical, subject cleft, object cleft), and ‘false alternative’ question type (subject question, object question), as well as all the interactions between the three. The item factors included the centred position of the trial in the experiment. The participant factors included their age, sex, hometown, and their English language proficiency. In addition to the fixed effects, the random effects, motivated by the literature and justified by the data, included intercepts for participants and items, and random slopes for the interactions between the key experimental factors by participants and by items. The full model also included the random slope for the centred position of the trial in the experiment by participants. Only the factors that were significant or of central interest were kept and reported below. Following the model selection procedure used in Yan and Calhoun (2019), if the initial model did not converge, we simplified it by reducing random structures, i.e., removing the slopes that had the lowest variance scores until the model converged. Then the step function in the R package lmerTest (Kuznetsova, Brockhoff, & Christensen, 2017) was used to eliminate non-significant fixed and random effects. The alpha-level we used for step function elimination was 0.1. As mentioned above, we always kept key factors and their interaction in the model, as they were of central interest (see Wei et al., 2012).
The fixed effects included in the final model for RTs are shown in the ANOVA table in Table 3, which also shows the significance of these effects. The random effects structure in the final model consisted of intercepts for Participants and Items. As expected, responses became faster over the course of the experiment when all other factors are at their intercept values (centred trial: β = –0.003, SD < 0.001). Male participants were faster than female participants at intercept (sex: β = –0.16, SD = 0.064). None of the other participant factors were significant.
The final model showed main effects of syntax, stress, question type, two-way interactions between question type and each of syntax and stress, as well as a three-way interaction between syntax, stress, and question type (see Table 3). Back-transformed fitted RTs are shown in Figure 4. The average response time was 1323 ms when the questions were about subjects, and 1403 ms when the questions were about objects. The average response time was 1344 ms when the subject word in the critical stimulus carried stress (canonS, OcleftS, and ScleftS), and 1382 ms when the subject word did not carry stress (canonO, OcleftO, and ScleftO). For manipulations of syntax, the average reaction times showed that answers to questions following canonical order sentences was the fastest (1331 ms), followed by subject clefts (1366 ms) and object clefts (1390 ms) averaged across both stress positions and question types.
Planned comparisons were conducted to investigate effects of prosodic prominence and clefting by question type (SQ versus OQ) (see Table 4 for summary). We first compared the effect of prosodic prominence on the subject noun compared with prosodic prominence on the object noun for each of the syntax types (canonical, subject, and object clefts) and for each of the two ‘false alternative’ question types (SQ and OQ). For questions that were about subjects (SQ), the comparisons showed faster “no” responses when these questions followed critical sentences with stress on the subject (canonS, OcleftS, and ScleftS) than when they followed critical sentences with stress on the object, for all sentence types (canonS-canonO: β = –0.19, SE = 0.03, z = –2.98, p = 0.01; ScleftS-ScleftO: β = –0.09, SE = 0.03, z = –2.91, p = 0.01; OcleftS-OcleftO: β = –0.14, SE = 0.03, z = –4.65, p < 0.001). This shows that prosodic prominence was an effective and consistent cue across different sentence types in rejecting false alternatives for subject questions.
|Prosodic prominence, with consistent or inconsistent clefting|
|Subject questions (SQ)||Object questions (OQ)|
|Clefting, with consistent stress|
|Subject questions (SQ)||Object questions (OQ)|
|Clefting, with inconsistent stress|
|Subject questions (SQ)||Object questions (OQ)|
However, when the questions were about objects (OQ), the pattern was different. The “no” responses were faster when these questions followed object clefts with stress on objects than when they followed object clefts with stress on subjects (OcleftO-OcleftS: β = –0.1, SE = 0.03, z = –3.3, p = 0.005). But there was no difference in the speed of “no” responses to object questions after both stress versions of the canonical word order sentences (canonS-canonO: β = 0.02, SE = 0.03, z = 0.74, p = 0.55) and after both stress versions of the subject clefts (ScleftS-ScleftO: β = 0.01, SE = 0.03, z = 0.42, p = 0.73). This shows that prosodic prominence was not consistent across different sentence types in rejecting false alternatives for object questions.
The next set of planned comparisons looked at the differences between different syntactic conditions (canonical, subject, and object clefts) by question type. In the first set of comparisons, we compared cases where the stress was consistent with that asked by the question, i.e., subject stress for SQ, and object stress for OQ. The question was whether, in these cases, consistent clefting would further enhance response speed, and/or whether inconsistent clefting would inhibit and slow response speed. The comparisons showed that when the stress position was consistent with the intended focus from the question, then “no” responses to subject questions (i.e., subject questions after canonS, ScleftS, and OcleftS) were equally fast across the syntax types (p values for each comparison >0.1). “No” responses to object questions were slower after subject clefts than after canonical sentences and object clefts, though the differences were only marginally significant (ScleftO-OcleftO: β = 0.06, SE = 0.03, z = 2.19, p = 0.051; ScleftO-canonO: β = 0.06, SE = 0.03, z = 2.12, p = 0.058). “No” responses to object questions after canonical sentences with object stress did not differ from those after object clefts with object stress (canonO-OcleftO: β = 0.001, SE = 0.03, z = 0.04, p = 0.97). This shows that, for subject questions, when the stress is consistent with the intended focus, clefting (consistent or inconsistent) has no effect on response times; however, for object questions, inconsistent clefting (subject clefts) weakly inhibits responses.
The final set of comparisons looked at cases where the stress was inconsistent with the question, i.e., object stress for SQ, and subject stress for OQ. In these cases, clefting would be the only overt cue to focus. When the stress position was inconsistent with the intended focus invoked by the question, there was no significant difference between responses to subject questions following canonical sentences and subject clefts (canonO-ScleftO: β = 0.01, SE = 0.03, z = 0.34, p = 0.77). However, subject questions received faster “no” responses after both canonical sentences with object stress and subject clefts with object stress than after object clefts with object stress (canonO-OcleftO: β = –0.07, SE = 0.03, z = –2.44, p = 0.03; ScleftO-OcleftO: β = –0.08, SE = 0.03, z = –2.78, p = 0.02). For object questions, when the stress position was inconsistent with the intended focus invoked by the question, “no” responses to object questions were faster after canonical sentences with subject stress than after object clefts with subject stress (canonS-OcleftS: β = –0.07, SE = 0.03, z = –2.49, p = 0.03), the opposite direction to that predicted. Responses to questions after subject clefts (ScleftS) did not differ from those after either canonical sentences (canonS) or object clefts (OcleftS) (both p values > 0.1). This shows that, for subject questions, the clefting cue on its own is not enough to enhance response speed, but inconsistent clefting (object clefts) slows responses. For object questions, consistent clefting (object clefts) actually slowed responses compared to canonical order.
A false alternative rejection task was used in Chinese to look at the role of prosodic, clefting, and positional focus cues in the encoding of discourse information, i.e., looking at the effectiveness of each of these cues to focus in speeding the correct rejection of false alternatives to words in spoken utterances given their (non-)focus marking. The study addressed four main questions (see Section 2.1): whether there is an asymmetry in the effect of prosodic prominence or clefting on the processing of subjects versus objects; whether prosodic prominence on a word results in faster rejection of false alternatives to it compared to no prominence; whether syntactic clefting consistent with the focus results in faster rejection of false alternatives, and/or inconsistent clefting in slower rejection compared to canonical order; and what the relative effectiveness of prosodic prominence and clefting is.
Addressing the first research question, it is clear that there is an asymmetry in the processing of subjects and objects. For subjects, prosodic prominence, and to a lesser extent clefting, affected the speed of rejection of false alternatives. However, for objects, the effect of overt focus marking was weak or non-significant for most comparisons (see Table 4). This is consistent with final objects having a default focus bias, as discussed in Section 1.3. In discussing the rest of the research questions, we therefore treat subjects and objects separately. Responses to object questions were slower overall than to subject questions. We analyze this as a simple positional effect, as the false word in the ‘false alternative’ question was later.
In relation to the second research question (see Section 2.1), subject questions received faster “no” responses when prosodic prominence was on the subject, compared to when prominence was on the object, reflecting easier rejections of false alternatives (see Table 4). This held for all sentence types: canonical sentence, subject, and object clefts. This is consistent with prosodic prominence being an effective cue to focus on subjects in Chinese.
However, for object questions, there was little to no effect of prosodic prominence. Responses to object questions were only faster with prosodic prominence on the object rather than the subject for object clefts (see Table 4). This may be due to the object clefts with subject stress (OcleftS) being in general harder to process as marking object focus (see further below). Overall, prosodic prominence did not facilitate rejection of false alternatives to objects, consistent with a default focus bias in this position.
In relation to the final two research questions (see Section 2.1), the role of clefting and the relative effectiveness of prosodic prominence and clefting in the correct rejection of false alternatives, the results showed that although clefting played a role, this was in general inhibitory rather than facilitatory, showing clefting was in general less effective than prosodic cues. As these questions are connected, they are discussed together.
For subject questions, when the prosodic prominence was on the subject, clefting had no effect (no difference between canonS, OcleftS, and ScleftS; see Table 4). When the prosodic prominence was inconsistent (i.e., on the object), consistent clefting (subject clefts) did not facilitate response speed (compared to canonical sentences), but inconsistent clefting (object clefts) slowed responses, compared to both canonical and subject clefts. This likely shows an inhibitory effect of inconsistent clefting when prosodic prominence is also inconsistent. However, it could also be a general effect that object clefts are harder to process, as they were the slowest of all the sentence conditions.
For object questions, when the prosodic prominence was on the object (consistent), consistent clefting (object cleft) did not further facilitate response times (compared to canonical order; see Table 4). However, inconsistent clefting did weakly inhibit responses, which were marginally significantly slower after subject clefts (ScleftO) than after canonical sentences (canonO) and object clefts (OcleftO). When the prosodic prominence was on the subject (inconsistent), there was no effect of inconsistent clefting, however, i.e., subject clefts (ScleftS) were not different from object clefts (OcleftS) or canonical sentences (canonS). When the clefting was consistent (OcleftS), responses were actually slower (compared to canonS). This suggests OcleftS sentences are particularly unacceptable, or hard to process, as marking object focus, as discussed above.
Overall, the results showed that there is a clear asymmetry in the processing of subjects and objects, as responses to object questions were largely equally fast no matter whether the object in the preceding critical sentence had overt prosodic prominence, clefting, or both. This is consistent with positional focus marking, i.e., the default focus bias, being stronger than overt prosodic prominence or clefting. For subjects, on the other hand, prosodic prominence and clefting affected response times differently, with prosodic prominence proving more effective than clefting. In general, consistent clefting did not facilitate responses, but inconsistent clefting inhibited processing.
Sixty native New Zealand English speakers took part (40 females, 19 males, and 1 other gender; mean age = 21.9, SD = 7.2; age range = 18–61). They reported that they were not fluent in any other languages and did not speak other languages at home. The participants received supermarket vouchers in recognition of their participation. None of them reported any hearing or reading difficulties.
The construction of English sentences was similar to the Chinese stimuli (Section 3.1.2) in general. As with the Chinese stimuli, English had 48 test items, each of which included a context sentence, a connecting question, a critical sentence, and a ‘false alternative’ question (see Table 5). Differently to the Chinese stimuli, in the context sentence of the 48 English stimuli, the alternatives were either explicitly mentioned or inferable by introducing a set, i.e., man and woman can be implied by couple. Eighteen items had two sets of explicit mention of alternatives, 13 had one, and 17 had none.
|Context: The couple helped the traveller and his friend find a hotel.|
|They were both very thankful.|
|Connecting question: Can you tell me more?|
|canonS (16a); canonO (16b); ScleftS (16c); ScleftO (16d); OcleftS (16e); OcleftO (16f);|
|‘False alternative’ questions:|
|SQ: Did the friend thank the woman?|
|OQ: Did the traveller thank the man?|
|(16)||a.||canonS: [ S ]F V O|
|[The traveller]F thanked the woman.|
|b.||canonO: S V [ O ]F|
|The traveller thanked [the woman]F.|
|c.||ScleftS: It COP [ S ]FREL V O|
|It was [the traveller ]F who thanked the woman.|
|d.||ScleftO: It COP [ S ]FREL V [ O ]F|
|It was [the traveller ]F who thanked [the woman ]F|
|e.||OcleftS: It COP [ O ]FREL [ S ]F V|
|It was [the woman ]F who [the traveller]F thanked.|
|f.||OcleftO: It COP [ O ]FREL S V|
|It was [the woman ]F who the traveller thanked.|
A total of 288 experimental sentences (48 sentences * 6 sentence types) were constructed. The design of the English experiment was somewhat different to the Chinese one, in that there were two extra conditions which were intended to test the confirmation of focused words, e.g., participants had to respond “yes” to “Did the woman thank the traveller?” after hearing one of the critical sentences. Therefore, there were 24 conditions (6 sentence conditions * 4 question conditions). Similar to the Chinese experiment, the factors were manipulated within-subjects and within-items. The experiment ended up with two sentences in each condition. Each participant saw only one list. We are only reporting on the results to the “no” response questions here, as they are comparable with the Chinese results reported above. Broadly, participants found the “yes”-response task very straight-forward. Responses were fast and there were few significant differences between conditions. The “yes”-response task could be seen as fillers like in the Chinese experiment. Twenty-four lists of 48 experimental stimuli were constructed in a Latin square design so that each sentence was in a different condition in each list.
The English sentences were recorded to hard drive using Praat by a trained female native New Zealand English speaker (second author) in a soundproof room at Victoria University of Wellington though a head-mounted microphone (sampling rate: 44100 Hz; bit rate: 705 kbps; bit depth: 16) (see examples in Figures 5, 6 and 7). The intensity of all the sentences was re-scaled to 50 dB.
We have not run an equivalent acceptability judgment study to Yan et al. (2020) to test the naturalness of the English sentences, though the English sentences were produced in a similar way as the Chinese sentences. However, similar sentences and contexts were used in Calhoun et al. (2019), where a variety of evidence was presented that the participants found the sentences sufficiently natural, including the intonational consistency with which they were produced and the judgements of native speaker consultants.
As with the Chinese stimuli described in Section 3.1.2, in order to confirm that the sentences did indeed differ according to the intended position of the contrastive prominence, duration, mean F0, and mean intensity were extracted and shown in Table 6.
|Sentence condition||Word position||Duration||Mean F0||Intensity|
The English experiment was run at a computer lab at Victoria University of Wellington. As the design originally tested critical stimuli where the correct answer was “yes” as well as “no,” half the participants pressed the “no” key using their dominant hand and the other half with their non-dominant hand. We then included the correct response key side as a factor and it was not significant. The procedure for the English experiment was otherwise the same as the Chinese experiment. Demographic information of participants (e.g., age, gender) was collected through an online adapted version of the Bilingual Language Profile (BLP) prior to the experiment (see Birdsong, Gertken, & Amengual, 2012; Calhoun et al., 2019).
Overall accuracy for test and filler items was 98% with the lowest accuracy for any participant being 81%. No participants were excluded on the basis of accuracy levels. The response time analysis was based on correct responses to critical trials, i.e., on 1403 data points (excluding 37 [2.57%] incorrect responses). The RTs were inverse transformed, which was the best transformation compared with no transformation, log transformation, and square root transformation. After modeling, data points of residuals whose standard deviations were larger than 2.5 were further eliminated (excluding 25 [1.78%] responses). The resulting count of trials for the RT analysis was 1378. The back-transformed data were used when plotting the predicted values. Mixed effects regression models were built following the procedure detailed in Section 3.2.1. The participant factors did not include English proficiency, as it was not applicable to English native speakers. Otherwise, the factors were the same as in the Chinese experiment.
The fixed effects included in the final model for RTs are shown in the ANOVA table in Table 7, which also shows the significance of these effects. The random effects structure in the final model consisted of intercepts for Participants and Items and the slope for syntax by Items. As expected, responses became faster over the course of the experiment when all other factors are at their intercept values (centred trial: β = 0.02, SD = 0.002, p < 0.001).4
The final model showed main effects of syntax, stress, and question type, two-way interactions between question type and each of syntax and stress, but no significant three-way interaction between syntax, stress, and question type (see Table 7). For comparison, a model was run with the same factors as in Table 7 without the non-significant three-way interaction. The interactions between question type and each of the stress and syntax were significant (stress: p < 0.001; syntax: p = 0.004). The interaction between syntax and stress was marginally significant (p = 0.09). The main effects were all significant (stress: p = 0.5; syntax: p < 0.001; question type: p < 0.001).
As discussed above, it is meaningful to conduct further comparisons of the factors in an interaction, even when it is not significant, when the interaction is key to testing the predictions of the experiment (Wei et al., 2012). Therefore we further analyzed the three-way interaction. Back-transformed fitted RTs are shown in Figure 8. The average response time was 1227 ms for subject questions and 1283 ms for object questions. The average response time was 1243 ms when subject words carried stress (canonS, OcleftS, and ScleftS) and 1267 ms when object words carried stress (canonO, OcleftO, and ScleftO). For manipulations of syntax, the average reaction times showed that answers to questions following subject clefts were the fastest (1175 ms), followed by canonical order sentences (1213 ms), and object clefts slowest (1377 ms) averaged across both stress positions and question types.
Planned comparisons were conducted to investigate effects of prosodic prominence and clefting by question type (SQ versus OQ) (see Table 8 for summary). As with the analysis of the Chinese experiment, we first looked at the effect of prosodic prominence by syntax type. For questions that were about subjects (SQ), the comparisons showed faster “no” responses when these questions followed subject stressed sentences than when they followed object stressed sentences in canonical word order (canonS-canonO: β = 0.66, SE = 0.26, z = 2.54, p = 0.025), but these were only marginally faster for subject clefts (ScleftS-ScleftO: β = 0.51, SE = 0.26, z = 1.98, p = 0.08), and there was no difference for object clefts (OcleftS-OcleftO: β = 0.38, SE = 0.26, z = 1.47, p = 0.2). This shows that prosodic prominence was an effective cue in rejecting false alternatives for subject questions, but this interacted with syntax.
|Prosodic prominence, with and without consistent clefting|
|Subject questions (SQ)||Object questions (OQ)|
|Clefting, with consistent stress|
|Subject questions (SQ)||Object questions (OQ)|
|Clefting, with inconsistent stress|
|Subject questions (SQ)||Object questions (OQ)|
When the questions were about objects (OQ), the “no” responses were not different when these questions followed object stressed sentences than when they followed subject stressed sentences for canonical and subject cleft sentences (canonO-canonS: β = –0.007, SE = 0.26, z = –0.03, p = 0.98; ScleftO-ScleftS: β = –0.15, SE = 0.26, z = –0.57, p = 0.59), but there was a marginally significant difference for object clefts (OcleftO-OcleftS: β = –0.58, SE = 0.28, z = –2.06, p = 0.07). This shows that prosodic prominence generally did not enhance rejection of false alternatives for object questions. The one exception was for object clefts with stress on the subject (OcleftS), which may be particularly hard to process (see further below), similar to what was found for Chinese (see Section 3.3).
Planned comparisons were also conducted between different syntax types (canonical, subject, and object clefts) for each of the two ‘false alternative’ question types (SQ and OQ). We first compared cases where the stress was consistent with that asked by the question, i.e., subject stress for SQ, and object stress for OQ. The comparisons showed that when the stress position was consistent with the intended focus invoked by the question, consistent clefting did not further facilitate response speed (canonS-ScleftS: β = –0.31, SE = 0.28, z = –1.08; p = 0.32). However, when there was inconsistent clefting (object clefts), “no” responses to subject questions following object clefts were slower than canonical word order and subject clefts (OcleftS-canonS: β = –1.35, SE = 0.28, z = –4.82, p < 0.001; OcleftS-ScleftS: β = –1.66, SE = 0.28, z = –5.95, p < 0.001). For object questions, consistent clefting likewise did not facilitate responses (canonO-OcleftO: β = 0.41, SE = 0.28, z = 1.46; p = 0.2). For inconsistent clefting, there was a weak inhibition for subject clefts compared to object clefts (OcleftO-ScleftO: β = –0.73, SE = 0.28, z = –2.6; p = 0.07), but not compared to canonical order (canonO-ScleftO: β = –0.17, SE = 0.29, z = –0.59; p = 0.6).
The final set of comparisons was when the stress position was not consistent with the intended focus invoked by the question. For subject questions, consistent clefting did not speed responses, i.e., subject clefts compared to canonical sentences (canonO-ScleftO: β = –0.45, SE = 0.29, z = –1.59, p = 0.18). However, inconsistent clefting slowed responses, i.e., object clefts compared to subject clefts and canonical order (canonO-OcleftO: β = 1.07, SE = 0.28, z = 3.81, p < 0.001; ScleftO-OcleftO: β = 1.53, SE = 0.28, z = 5.47, p < 0.001). For object questions, consistent clefting, i.e., object clefts, slowed responses, compared to canonical sentences with subject stress and subject clefts with subject stress than after object clefts with object stress (canonS-OcleftS: β = 0.92, SE = 0.28, z = 3.23, p = 0.003; ScleftS-OcleftS: β = 1.66, SE = 0.28, z = 5.95, p < 0.001). This is opposite to the predicted effect of consistent clefting. There was no significant difference between canonical sentences and subject clefts (canonS-ScleftS: β = –0.31, SE = 0.29, z = –1.09, p = 0.32), so inconsistent clefting was not inhibitory in this case.
A false alternative rejection task was used in English, with a similar design to that in the Chinese experiment reported above, to look at the role of prosodic, clefting, and positional cues in the encoding of discourse information, i.e., looking at the effectiveness of each of these cues to focus in speeding the correct rejection of false alternatives to words in spoken utterances given their (non-)focus marking. The study addressed four main questions (see Section 2.1): whether there is an asymmetry in the effect of prosodic prominence or clefting on the processing of subjects versus objects; whether prosodic prominence on a word results in faster rejection of false alternatives to it compared to no prominence; whether syntactic clefting consistent with the focus results in faster rejection, and/or inconsistent clefting in slower rejection, of false alternatives compared to canonical order; and what the relative effectiveness of prosodic prominence and clefting is.
For the first research question, as with the Chinese results, it is clear that there is an asymmetry in the processing of subject and objects. For subjects, prosodic prominence and syntactic clefting affected the speed of rejection of false alternatives for most comparisons (see Table 8). However, for objects, the effect of overt focus marking was weak or non-significant for most comparisons. Similarly to the Chinese results, we attribute this to the default focus bias for final objects (see Section 1.3).
In relation to the second research question, the results showed that for subject ‘false alternative’ questions, prosodic prominence on the subject decreased response times for canonical word order sentences, and marginally significantly for subject clefts, but not for object clefts (OcleftS compared to OcleftO). The result for object clefts could be because object clefts with subject stress are particularly hard to process, as noted in Section 3.3 for Chinese (see further below). This suggests that prosodic prominence was generally effective in facilitating the correct rejection speed of false alternatives to the subject noun, although clefting had more of an effect than for Chinese.
For object questions, prosodic prominence had no effect for canonical and subject cleft sentences, although “no” responses were marginally faster for object clefts with head stress (OcleftO versus OcleftS). Once again, this could be because OcleftS structures are particularly hard to process. Therefore, for object questions, no effect of prosodic prominence was found, consistent with a default final object bias.
In relation to the final two research questions, the role of clefting and the relative effectiveness of prosodic prominence and clefting in the correct rejections of false alternatives, the English experiment showed similar results to the Chinese results. Consistent clefting did not speed responses, whether or not there was consistent prosodic prominence. However, inconsistent clefting slowed rejections, though in a slightly different set of conditions to what was found for Chinese. For subject questions, inconsistent clefting slowed responses (object clefts compared to subject clefts and canonical order) whether or not there was consistent prosodic prominence. For Chinese, inhibition was only found in the inconsistent stress cases, suggesting the inhibitory effect of clefting is stronger in English. For object questions, with consistent stress, there was only a weak slowing for subject clefts compared to object clefts, and not compared to canonical order, as was found for Chinese. With inconsistent stress, similarly to the Chinese results, consistent clefting (OcleftS) actually slowed responses compared to canonical order or subject clefts, contrary to predictions. As noted above, these sentences seem to be particularly hard to process.
Overall, positional cues seem to outweigh overt focus cues in both languages, which shows a final default focus bias. In both languages, prosodic cues to focus were more effective than clefting, with clefting having a largely inhibitory rather than facilitatory effect. The effect of clefting seems to be somewhat stronger in English than in Chinese.
We reported the results of two nearly parallel experiments looking at the interaction between prosodic, clefting, and positional cues in the encoding of discourse information in Mandarin Chinese and English, using a false alternative rejection task. It was found that, in both languages, there was a clear effect of positional marking, or default focus position, as shown by an asymmetry in the responses to questions probing focus on the subject versus object. For subjects, prosodic prominence and clefting systematically affected response times, whereas for objects, the effect of these cues was either absent or weak. This is consistent with final objects carrying a default focus bias, so that responses to object focus questions were facilitated whether or not there was overt focus marking. For subjects, prosodic prominence, which was done through contrastive prominence in both languages, facilitated rejection of false alternatives to the subject, although this effect was stronger in Chinese, judging by comparing the patterns of results for the two languages as shown in Table 8. In both languages, consistent clefting (是…的 [SHI…DE] clefts in Chinese and it-clefts in English), did not facilitate response speeds compared to canonical word order. However, inconsistent clefting generally inhibited rejection of false alternatives, although this effect was stronger in English. These results show that referent encoding in discourse is complex, with listeners affected by, and needing to integrate, multiple cues. Further, the processing effects of these cues are language-specific and task-dependent. We elaborate on the role of prosodic, clefting, and positional cues in turn below.
In both Chinese and English, responses were generally faster when the prosodic prominence was consistent with the focus in the question for subject focus, i.e., prominence on the subject. This is consistent with previous findings for both languages that prosodic prominence strengthens the encoding of focused words and their alternatives, as reviewed in Section 1.2. In particular, these results showed that prosodic prominence makes detection of false alternatives to the focus easier, confirming earlier results for English (Fraundorf et al., 2010), and showing this for the first time for Chinese; extending Yan et al.’s (2019) finding that prosodic prominence activates alternatives to the focused word in Chinese (see also Yan & Calhoun, 2019). Like that study, these results also show that prosodic prominence as expressed through pitch range expansion, as in Chinese, is functionally equivalent to pitch accenting, as in Germanic languages, in terms of its focus effects (see also Ip & Cutler, 2017). As noted above, prosodic prominence did not consistently facilitate the encoding of object focus in either language (see further below).
The facilitation effect from prosodic prominence for subject focus could arise because of the positive effect of prominence on the subject. However, it could also be an inhibition effect of prosodic prominence on the object in the non(-subject)-focus marking condition, or both a facilitation and an inhibition effect. Given the design of our experiment, we cannot distinguish between these possibilities, as there were only two prosodic conditions. However, Fraundorf et al.’s (2010) study on contrastive focus using memory tasks had stimuli with multiple contrastive accents in a sentence to look at this question, and concluded consistent accenting had a facilitatory effect, rather than inconsistent accenting an inhibition effect. Therefore, in our experiment, it is more likely that contrastive prominence had a facilitatory effect. But future studies with a similar design to Fraundorf et al. (2010) would be needed to confirm this.
For the effect of clefting, we distinguished between facilitatory and inhibitory effects, i.e., whether consistent clefting resulted in faster responses, and whether inconsistent clefting resulted in slower responses, compared to canonical order. Consistent clefting did not have a facilitatory effect for either subject or object focus in either language, whether or not the prosodic prominence was consistent (see Table 8). This is similar to the pattern of findings for Chinese reported in Yan et al. (2020) using an untimed appropriateness rating task, and in Yan and Calhoun (2019) using lexical decision tasks with sentence primes (see Section 1.2). In general, it seems that listeners in Chinese do not pay attention to clefting as a positive cue to focus. However, there was a somewhat different pattern of the findings for English using a meta-linguistic focus judgment task reported in Calhoun et al. (2019): While stressed-head clefts were no more likely to be judged as focused than the equivalent canonical sentences (e.g., canonS = ScleftS), in the ‘mismatch’ sentences, the focus was judged to be on the cleft head rather than the stressed word, i.e., the clefting overrode the prosodic cue.
The lack of a facilitatory effect of clefting on processing seems, therefore, to be due both to cross-linguistic differences, and to the nature of the task and its time course. In English, syntactic clefting seems to be a stronger cue than in Chinese, so in an untimed meta-judgment task, clefting was an effective cue to focus position. This is consistent with findings by S.-H. Chen et al. (2012) on the relative importance of syntactic clefting and prosodic prominence in Chinese and English, extending their findings on 是 (SHI) clefts to 是…的 (SHI…DE) clefts for Chinese. However, clefting is not an effective cue in immediate processing in either language. We suggest this is because the required implicatures of clefts carry an initial processing cost (see Section 1.1). Clefting has been found to have a positive facilitatory effect in English in a memory task (Kember et al., 2019), again suggesting the lack of effect here is because of the short-term processing cost of clefts. It should also be noted that all of the sentences in our studies on Chinese and English included a contrastive prominence. It could be possible that clefting would have a stronger effect if sentences with only non-contrastive prosodic prominence were used.
In both languages, inconsistent clefting had an inhibitory effect on response times in rejecting false alternatives. This effect was more consistent in English than in Chinese, and also stronger for subject focus than object focus (see Table 8). Again, this suggests that listeners are relatively more sensitive to prosodic cues in Chinese and to clefting cues in English. We suggest the inhibitory effect is because the inconsistent clefting causes listeners to generate an implicature which is irrelevant to answering the question, which exerts a processing cost.
As discussed at the end of Section 1.2, while focus-marked words are predicted to be easier to reject under either the QUD-focus or contrastive focus definitions of focus, responses to the ‘mismatch’ sentences are germane to the question of which is more relevant to performance in the ‘false alternative’ rejection task. In the ‘mismatch’ sentences (ScleftO and OcleftS), both subject and object are marked as contrastively focused, but our previous studies would seem to show that the QUD-focus is interpreted to be on the stressed word in Chinese and the cleft head in English (Calhoun et al., 2019; Yan et al., 2020). As discussed above, we did not find a facilitatory effect of clefting in these sentences in either language, inconsistent with our previous result for English. This suggests that QUD-focus is not the relevant type of focus in this task. On the other hand, we found that inconsistent clefting slowed responses in both languages. This suggests that listeners are attending to this contrastive focus marking, and it slows processing in this task when it is inappropriate. However, clefting is a less effective cue to contrastive focus than prominence in both languages. More generally, we submit that while the distinction between these types of focus is useful in discussing the semantics of focus, we can to an extent abstract away from it in evaluating the effectiveness of different cues to focus in processing.
One somewhat unexpected finding was that, in both Chinese and English, object clefts with subject stress (OcleftS) seemed to be particularly hard to process. For example, participants were slower to respond to these sentences than canonical order sentences with subject stress (canonS), and in English, also subject clefts with subject stress (ScleftS) (see Table 8). This was unexpected as the object cleft should make these sentences more consistent with object focus than either canonical order or subject clefts. We speculate that the reason for this unexpected finding is that object clefts were in general harder to process in both languages. Response times to object clefts were slower overall than the other syntactic types in both languages, e.g., 59 ms slower in Chinese and over 160 ms slower in English than canonical word order sentences (see Sections 3.2 and 4.2). Object clefts seem to be less expected to mark focus (see Section 1.1). Further, in English, in object clefts the constituent order is reversed, so the object precedes the subject, making these harder to process. When the prosodic prominence is also unexpected, as in the ‘mismatch’ cases (OcleftS), these are very hard to process.
In both Chinese and English, while for subjects, prosodic prominence and syntactic clefting had clear effects, as just discussed; for objects, these effects were absent or weak (see Table 8). This is consistent with previous findings for English and Chinese that words in the final default focus position are processed as focused even if they are not overtly focus marked (Ayers, 1996; S.-H. Chen et al., 2012; Harris & Carlson, 2018) (see Section 1.3). Notably, these results are very different from the findings in our other studies using meta-linguistic judgments, where the effect of prosodic prominence and clefting on focus judgments were similar for subjects and objects (Calhoun et al., 2019; Yan et al., 2020). This suggests that words in final position are not interpreted as focused per se, but rather, consistent with Harris and Carlson’s (2018) proposal, they carry ‘enduring focus,’ meaning in processing they are still potential locations for focus. Although it is beyond the scope of this article to consider this fully, these results are consistent with theoretical proposals that posit ‘focus as alignment’ as a separate constraint to ‘focus as prominence’ active in many languages (see Section 1.1, cf. Féry, 2013).
There is a question, then, of what carries the default focus: words in the object role, words in sentence/phrase final position, or words in the expected position of the nuclear accent. Given the design of our experiment, and the usual alignment between these three in English and Chinese, it is difficult to tell these apart. However, we found no difference between response times to object questions between subject clefts and canonical sentences. In subject clefts, the nuclear accent is expected to be on the clefted subject, not the phrase-final word (see Section 1.1), but the default bias was still found. This suggests the bias is for sentence/phrase final words to be focused, not words in the expected nuclear accent position. In Harris and Carlson’s (2018) study, they found the effect was more consistent with a default final position/expected accent position bias, than a grammatical object bias. Therefore, it appears this is most likely to be a positional final bias. However, future studies with different designs are needed to confirm this.
It should be noted that it is possible that the effects found here could also result from recency effects. For object questions, the false alternative appeared at the end of the sentence, so participants had to read the whole question in order to detect the false information conveyed by the object word before they could answer “no.” Because of this, response times for object questions were slower overall. This may also have resulted in reprocessing at the end of the sentence, which masked any effect of focus marking, as suggested by Ayers (1996) for her similar experiments (see Section 1.2). We think this is less likely, however, as the effects are similar to those reported in Harris and Carlson (2018), where the recency explanation does not arise given the different design of their experiments. We also did observe some differences in response times for object focus, in sentences involving object clefts, suggesting prosodic prominence and clefting can have an effect in this position.
Our results reveal that the task of using different cues to keep track of discourse referents, and subsequently to be able to reject false alternative to referents in the discourse model, is a complex one. Listeners are sensitive both to the weighting of different cues to focus in their language, and also apparently integrate different cues at different stages of processing, so that use of these cues varies depending on the task. In this study, we have shown that, in immediate discourse processing, Chinese and English listeners use prosodic, clefting, and positional cues to focus in generally similar ways. However, overall, Chinese listeners were relatively more sensitive to prosodic cues, and English listeners to clefting; although prosodic cues were more effective overall in both languages. This is consistent with findings on the relative importance of prosodic prominence and clefting in Chinese and English in previous studies on focus interpretation and processing, although this needs to be further validated in naturally occurring contexts rather than in the lab situation. However, it may be somewhat surprising given the heavy emphasis on prosodic prominence as the key marker of focus in English (and other Germanic languages) in most previous psycholinguistic research. We would expect that these different weightings are related to the relative use of these cues in production in the two languages. This could be looked at using a corpus-based approach. We would further expect there would be languages in which clefting cues to focus are more frequently used than prosodic, and which would hence be weighted higher by listeners in those languages, e.g., Kember et al.’s (2019) findings for Korean in a memory-based task. Further, we would expect the positional final focus bias not to occur in a language where focus was not usually final, e.g., Hungarian (for an overview of prosodic focus marking systems including interactions with phrase position see Kügler & Calhoun, to appear).
This research establishes cross-linguistic similarities and differences in the role of prosodic prominence, clefting, and default focus position in encoding discourse information in Chinese and English. In keeping with the theme of this special collection, we hope that this study encourages further work on the relative importance and use of prosodic cues in relation to other cues in speech processing, to increase our understanding of how this varies across languages.
1The first tier of (5) indicates Chinese characters. The following abbreviations are used in glosses in the second tier: PRF = perfective aspect, COP = copula, S = subject, V = verb, O = object, REL = relativizer. 的 (DE) is glossed as DE following the current literature such as Paul and Whitman (2008) and Hole (2011). Note that Mandarin Chinese 的 (DE) has multiple uses (apart from its association with past tense reading in this paper), which includes its function as a complementizer and a nominalizer (see e.g., Paul and Whitman, 2008; Xie, 2012).
The experiments reported here were approved by the Human Ethics Committee of Victoria University of Wellington (approval number 24753 for the Chinese experiment, and 23369 for the English experiment). All participants gave written informed consent to take part.
Thank you to Paul Warren for much useful discussion of the design and analysis of the study. Thanks to three reviewers, and the guest editor, for valuable comments on earlier versions of the paper. Thanks to Minxia Wang for hosting the first author, and help in recruiting the Chinese participants. Thanks to Emma Wollum for assistance with the English experiment, including creating the materials, recruiting participants, running the experiment, and preliminary data analysis. Thanks to Lisa Woods for help in advising on the analysis of the data. Finally thanks to all the participants for taking part.
This work was supported by ‘the Fundamental Research Funds for the Central Universities’ (grant number 2020kfyXJJS106), Huazhong University of Science and Technology, and a Faculty of Humanities and Social Sciences Research Grant (grant number 218221), Victoria University of Wellington, awarded to the first author Mengzhu Yan, and also a Marsden Fund Council grant (grant number 15-VUW-015) from New Zealand Government funding, awarded to the second author Sasha Calhoun.
The authors have no competing interests to declare.
The study was originally conceived by the second author Sasha Calhoun. The Chinese experiment was mainly designed and built by the first author Mengzhu Yan with contributions by the second author Sasha Calhoun. The English experiment was conducted by the second author. Both authors contributed to the write up of the paper.
Akker, E., & Cutler, A. (2003). Prosodic cues to semantic structure in native and nonnative listening. Bilingualism: Language and Cognition, 6(2), 81–96. DOI: https://doi.org/10.1017/S1366728903001056
Birch, S. L., Albrecht, J. E., & Myers, J. L. (2000). Syntactic focusing structures influence discourse processings. Discourse Processes, 30(3), 285–304. DOI: https://doi.org/10.1207/S15326950dp3003_4
Birch, S. L., & Garnsey, S. M. (1995). The effect of focus on memory for words in sentences. Journal of Memory and Language, 34(2), 232–267. DOI: https://doi.org/10.1006/jmla.1995.1011
Birdsong, D., Gertken, L., & Amengual, M. (2012, January). Bilingual Language Profile: An easy-to-use instrument to assess bilingualism. Retrieved from https://sites.la.utexas.edu/bilingual/
Boersma, P., & Weenink, D. (2018). Praat: Doing phonetics by computer [computer program]. Version 6.0.37. Retrieved from: http://www.praat.org
Braun, B., Asano, Y., & Dehé, N. (2019). When (not) to look for contrastive alternatives: The role of pitch accent type and additive particles. Language and Speech, 62(4), 751–778. DOI: https://doi.org/10.1177/0023830918814279
Braun, B., & Biezma, M. (2019). Prenuclear L*+H activates alternatives for the accented word. Frontiers in Psychology, 10, 1993. DOI: https://doi.org/10.3389/fpsyg.2019.01993
Braun, B., & Tagliapietra, L. (2010). The role of contrastive intonation contours in the retrieval of contextual alternatives. Language and Cognitive Processes, 25(7–9), 1024–1043. DOI: https://doi.org/10.1080/01690960903036836
Breen, M., Fedorenko, E., Wagner, M., & Gibson, E. (2010). Acoustic correlates of information structure. Language and Cognitive Processes, 25(7–9), 1044–1098. DOI: https://doi.org/10.1080/01690965.2010.504378
Brugos, A., Shattuck-Hufnagel, S., & Veilleux, N. (2006). Transcribing prosodic structure of spoken utterances with ToBI. Retrieved from: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-911-transcribing-prosodic-structure-of-spoken-utterances-with-tobijanuary-iap-2006/index.htm
Calhoun, S. (2010). The centrality of metrical structure in signaling information structure: A probabilistic perspective. Language, 86, 1–42. DOI: https://doi.org/10.1353/lan.0.0197
Calhoun, S., Wollum, E., & Kruse-Va’ai, E. (2019). Prosodic prominence and focus: Expectation affects interpretation in Samoan and English. Language and Speech. DOI: https://doi.org/10.1177/0023830919890362
Carlson, K., Dickey, M. W., Frazier, L., & Clifton, C. (2009). Information structure expectations in sentence comprehension. The Quarterly Journal of Experimental Psychology, 62(1), 114–139. DOI: https://doi.org/10.1080/17470210701880171
Chen, S.-H., Chen, S.-C., & He, T.-H. (2012). Surface cues and pragmatic interpretation of given/new in Mandarin Chinese and English: A comparative study. Journal of Pragmatics, 44(4), 490–507. DOI: https://doi.org/10.1016/j.pragma.2011.12.006
Chen, Y., & Gussenhoven, C. (2008). Emphasis and tonal implementation in Standard Chinese. Journal of Phonetics, 36(4), 724–746. DOI: https://doi.org/10.1016/j.wocn.2008.06.003
Chen, Y., Lee, P., & Pan, H. (2016). Topic and focus marking in Chinese. In C. Féry & S. Ishihara (Eds.), Oxford handbook of information structure. UK: Oxford University Press. DOI: https://doi.org/10.1093/oxfordhb/9780199642670.013.34
Cutler, A. (1976). Phoneme-monitoring reaction time as a function of preceding intonation contour. Attention, Perception, & Psychophysics, 20(1), 55–60. DOI: https://doi.org/10.3758/BF03198706
Cutler, A., Dahan, A., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40(2), 141–201. DOI: https://doi.org/10.1177/002383099704000203
Cutler, A., & Fodor, J. A. (1979). Semantic focus and sentence comprehension. Cognition, 7(1), 49–59. DOI: https://doi.org/10.1016/0010-0277(79)90010-6
Delin, J., & Oberlander, J. (1995). Syntactic constraints on discourse structure: The case of it-clefts. Linguistics, 33(3), 465–500. DOI: https://doi.org/10.1515/ling.19184.108.40.2065
Destruel, E., Velleman, D., Onea, E., Bumford, D., Xue, J., & Beaver, D. (2013). A crosslinguistic study of the non-at-issueness of exhaustive inferences. In F. Schwarz (Ed.), Experimental perspectives on presuppositions (pp. 135–156). Switzerland: Springer. DOI: https://doi.org/10.1007/978-3-319-07980-6_6
Downing, L., & Hyman, L. (2016). Information structure in Bantu. In C. Féry & S. Ishihara (Eds.), Oxford handbook of information structure. UK: Oxford University Press. DOI: https://doi.org/10.1093/oxfordhb/9780199642670.013.010
É Kiss, K. (1998). Identificational focus versus information focus. Language, 74(2), 245–273. DOI: https://doi.org/10.2307/417867
Féry, C. (2013). Focus as prosodic alignment. Natural Language and Linguistic Theory, 31, 683–734. DOI: https://doi.org/10.1007/s11049-013-9195-7
Fraundorf, S. H., Benjamin, A. S., & Watson, D. G. (2013). What happened (and what did not): Discourse constraints on encoding of plausible alternatives. Journal of Memory and Language, 69(3), 196–227. DOI: https://doi.org/10.1016/j.jml.2013.06.003
Fraundorf, S. H., Watson, D. G., & Benjamin, A. S. (2010). Recognition memory reveals just how CONTRASTIVE contrastive accenting really is. Journal of Memory and Language, 63(3), 367–386. DOI: https://doi.org/10.1016/j.jml.2010.06.004
Gotzner, N. (2017). Alternative sets in language processing. Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-52761-1
Harris, J. A., & Carlson, K. (2018). Information structure preferences in focus-sensitive ellipsis: How defaults persist. Language and Speech, 61(3), 480–512. DOI: https://doi.org/10.1177/0023830917737110
Hedberg, N. (2013). Multiple focus and cleft sentences. In K. Hartmann & T. Veenstra (Eds.), Cleft structures (pp. 227–250). Amsterdam/Philadelphia: John Benjamins. DOI: https://doi.org/10.1075/la.208.08hed
Hole, D. (2011). The deconstruction of Chinese shì…de clefts revisited. Lingua, 121(11), 1707–1733. DOI: https://doi.org/10.1016/j.lingua.2011.07.004
Husband, E. M., & Ferreira, F. (2016). The role of selection in the comprehension of focus alternatives. Language, Cognition and Neuroscience, 31(2), 217–235. DOI: https://doi.org/10.1080/23273798.2015.1083113
Ip, M. H. K., & Cutler, A. (2017, August). Intonation facilitates prediction of focus even in the presence of lexical tones. In Proceedings of interspeech 2017: Situated interaction (pp. 1218–1222). Stockholm, Sweden. DOI: https://doi.org/10.21437/Interspeech.2017-264
Kember, H., Choi, J., & Cutler, A. (2016). Processing advantages for focused words in Korean. In J. Barnes, A. Brugos, S. Shattuck-Hufnagel & N. Veilleux (Eds.), Proceedings of Speech Prosody 2016 (pp. 702–705). Boston, USA. DOI: https://doi.org/10.21437/SpeechProsody.2016-144
Kember, H., Choi, J., & Yu, J. (2016). Searching for importance: focus facilitates memory for words in English. In C. Carignan & M. D. Tyler (Eds.), Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology (pp. 181–184). Parramatta, Australia.
Kember, H., Choi, J., Yu, J., & Cutler, A. (2019). The processing of linguistic prominence. Language and Speech. DOI: https://doi.org/10.1177/0023830919880217
Krifka, M. (2008). Basic notions of information structure. Acta Linguistica Hungarica, 55, 243–276. DOI: https://doi.org/10.1556/ALing.55.2008.3-4.2
Kügler, F., & Calhoun, S. (to appear). Prosodic encoding of information structure: A typological perspective. In C. Gussenhoven & A. Chen (Eds.), The Oxford handbook of language prosody. Oxford, UK: Oxford University Press.
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, 82(13), 1–26. DOI: https://doi.org/10.18637/jss.v082.i13
Ladd, D. R. (2008). Intonational phonology (second ed.). Cambridge, UK: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511808814
Lambrecht, K. (2001). A framework for the analysis of cleft constructions. Linguistics, 39(3), 463–516. DOI: https://doi.org/10.1515/ling.2001.021
Liu, Y., & Yang, Y. (2016). Exhaustivity in Mandarin shi…(de) sentences: experimental evidence. In M. Köllner & R. Ziai (Eds.), Proceedings of European summer school in logic language and information 2016 student session (pp. 167–179). Bolzano: Free University of Bozen.
MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working memory: Comment on Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109(1), 35–54. DOI: https://doi.org/10.1037/0033-295X.109.1.35
Mathôt, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods, 44(2), 314–324. DOI: https://doi.org/10.3758/s13428-011-0168-7
Molnár, V. (2006). On different kinds of contrast. In V. Molnár & S. Winkler (Eds.), The architecture of focus (pp. 197–233). Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110922011.197
Onea, E. (2019). Exhaustivity in it-clefts. In C. Cummins & N. Katsos (Eds.), The Oxford handbook of experimental semantics and pragmatics. Oxford, UK: Oxford University Press. DOI: https://doi.org/10.1093/oxfordhb/9780198791768.013.17
Paul, W., & Whitman, J. (2008). Shi … de focus clefts in Mandarin Chinese. Linguistic Review, 25(3–4), 413–451. DOI: https://doi.org/10.1515/TLIR.2008.012
Prince, E. F. (1978). A comparison of wh-clefts and it-clefts in discourse. Language, 54(4), 883–906. DOI: https://doi.org/10.2307/413238
Roberts, C. (1996). Information structure in discourse: Towards an integrated formal theory of pragmatics. In J. H. Yoon & A. Kathol (Eds.), OSU working papers in linguistics, 49, 91–136). The Ohio State University. (Revised version in Semantics and Pragmatics, 5, 1–69, 2012). DOI: https://doi.org/10.3765/sp.5.6
Roettger, T. B., Mahrt, T., & Cole, J. (2019). Mapping prosody onto meaning–the case of information structure in American English. Language, Cognition and Neuroscience, 34(7), 841–860. DOI: https://doi.org/10.1080/23273798.2019.1587482
Rooth, M. (1992). A theory of focus interpretation. Natural Language Semantics, 1(1), 75–116. DOI: https://doi.org/10.1007/BF02342617
Sanford, A. J. S., Sanford, A. J., Molle, J., & Emmott, C. (2006). Shallow processing and attention capture in written and spoken discourse. Discourse Processes, 42(2), 109–130. DOI: https://doi.org/10.1207/s15326950dp4202_2
Simpson, A., & Wu, Z. (2002). From D to T–Determiner incorporation and the creation of tense. Journal of East Asian Linguistics, 11(2), 169–209. DOI: https://doi.org/10.1023/A:1014934915836
Spalek, K., Gotzner, N., & Wartenburger, I. (2014). Not only the apples: Focus sensitive particles improve memory for information-structural alternatives. Journal of Memory and Language, 70, 68–84. DOI: https://doi.org/10.1016/j.jml.2013.09.001
Sturt, P., Sanford, A. J., Stewart, A., & Dawydiak, E. (2004). Linguistic focus and good-enough representations: An application of the change-detection paradigm. Psychonomic Bulletin & Review, 11(5), 882–888. DOI: https://doi.org/10.3758/BF03196716
Tily, H., Fedorenko, E., & Gibson, E. (2010). The time-course of lexical and structural processes in sentence comprehension. Quarterly Journal of Experimental Psychology, 63(5), 910–927. DOI: https://doi.org/10.1080/17470210903114866
Traxler, M. J., Morris, R. K., & Seely, R. E. (2002). Processing subject and object relative clauses: Evidence from eye movements. Journal of Memory and Language, 47(1), 69–90. DOI: https://doi.org/10.1006/jmla.2001.2836
Vallduví, E. (2016). Information structure. In M. Aloni & P. Dekker (Eds.), The Cambridge handbook of formal semantics (pp. 728–755). UK: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781139236157.024
Ward, G., & Birner, B. (2006). Information structure and non-canonical syntax. In L. R. Horn & G. Ward (Eds.), The handbook of pragmatics (pp. 152–174). Blackwell Publishing Ltd. DOI: https://doi.org/10.1002/9780470756959.ch7
Ward, P., & Sturt, P. (2007). Linguistic focus and memory: An eye movement study. Memory & Cognition, 35(1), 73–86. DOI: https://doi.org/10.3758/BF03195944
Wei, C., Carroll, R. J., Harden, K. K., & Wu, G. (2012). Comparisons of treatment means when factors do not interact in two-factorial studies. Amino Acids, 42(5), 2031–2035. DOI: https://doi.org/10.1007/s00726-011-0924-0
Xie, Z. (2012). The modal uses of de and temporal shifting in Mandarin Chinese. Journal of East Asian Linguistics, 21(4), 387–420. DOI: https://doi.org/10.1007/s10831-012-9093-8
Xu, Y. (1999). Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics, 27(1), 55–105. DOI: https://doi.org/10.1006/jpho.1999.0086
Xu, Y. (2013). ProsodyPro – A tool for large-scale systematic prosody analysis. In Proceedings of tools and resources for the analysis of speech prosody (TRASP 2013) (pp. 7–10). Aix-en-Provence, France.
Yan, M., & Calhoun, S. (2019). Priming effects of focus in Mandarin Chinese. Frontiers in Psychology, 10, 1985. DOI: https://doi.org/10.3389/fpsyg.2019.01985
Yan, M., Calhoun, S., & Warren, P. (2019). The role of prosody in priming alternatives in Mandarin Chinese. In S. Calhoun, P. Escudero, M. Tabain & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 (pp. 437–441). Canberra, Australia: Australasian Speech Science and Technology Association.
Yan, M., Calhoun, S., & Warren, P. (2020). Prosody or syntax? The perception of focus in Mandarin Chinese. In Proceedings of the 10th International Conference on Speech Prosody, Tokyo, Japan 2020 (pp. 347–351). Tokyo, Japan: International Speech Communication Association. DOI: https://doi.org/10.21437/SpeechProsody.2020-71
Zerbian, S. (2007). Subject/object-asymmetry in Northern Sotho. In K. Schwabe & S. Winkler (Eds.), On information structure, meaning and form: Generalizations across languages. Amsterdam/Philadelphia: John Benjamins. DOI: https://doi.org/10.1075/la.100.18zer
Zimmermann, M. (2011). The grammatical expression of focus in West Chadic: Variation and uniformity in and across languages. Linguistics, 49(5), 1163–1213. DOI: https://doi.org/10.1515/ling.2011.032