1. Introduction

A longstanding question in linguistic research is whether the relationship between form and meaning is arbitrary, or whether they are related. If there is a relationship between form and meaning, the question is how far and by what factors this relationship is constrained. Early linguistic research assumed the arbitrary signifier (de Saussure, 1993) where there is no relationship between form and meaning pairs. More recent approaches have brought possible relationships between form and meaning to the fore again, such as the overviews and discussions in Nuckolls (1999), Hinton et al. (2006), and Dingemanse et al. (2015). Underlying motivations for non-arbitrary relationships that have been discussed are

  • sound symbolism: e.g., iconicity in expressing repetition, duration, or size (e.g., Dingemanse et al., 2015)
  • systematicity: e.g., regularly recurring sound sequences in semantically related words (e.g., Dingemanse et al., 2015)
  • geography: e.g., acoustic adaptation to forest or mountainous environments (Maddieson & Coupé, 2015)
  • physiology: e.g., sound inventory adaptations to auditory impairments of the speakers due to wide-spread middle ear infections, e.g., in Australian languages (Butcher, 2006)
  • ethology: e.g., pitch and vowel frequencies indexing affective meanings (Gussenhoven, 2004; Ohala, 1984)

A prosodic example where non-arbitrariness and its potential underlying motivations have been described and analyzed in detail is the prosody of polar questions, that is, of questions answered with ‘yes’ or ‘no.’ Cross-linguistically, polar questions are marked in a variety of ways ranging from syntactic strategies such as inversion to morphological strategies such as question particles and prosodic strategies such as pitch movement. Pitch is most prominently used in declarative questions, that is, in questions which are morphosyntactically identical to their counterpart statements. Strong cross-linguistic and cross-cultural similarities have been noted in the use of pitch. While a range of pitch contours is attested, the pattern found most often is a low or falling pitch in statements, contrasting with a high or rising pitch in questions.

Ohala (1984) and Gussenhoven (2004) investigate underlying reasons for these cross-linguistic tendencies in the use of pitch, focusing on the hearer and the meanings the hearer associates with the pitch. Ohala (1984, p. 2) and Gussenhoven (2004, pp. 79–96) propose that human speakers and hearers are equipped with an innately specified frequency code (Ohala) or production and frequency code (Gussenhoven). The frequency code associates fundamental frequency with physical size. High fundamental frequency is linked to small vocalizers. It is associated with affective meanings such as ‘friendly,’ ‘submissive,’ and ‘desirous of the receiver’s goodwill,’ and with informational meanings such as uncertainty. Low fundamental frequency is linked to large vocalizers. It is associated with affective meanings such as ‘authoritative’ or ‘aggressive’ and informational meanings such as certainty. Ohala (1984, p. 5) links these associations to question formation. He argues that someone asking a question wishes to obtain missing information. The person is, therefore, in a weak position and uses high pitch to appeal to the goodwill and cooperation of the receiver. A low-pitched statement, on the other hand, signals a strong position of informational independence.

Rialland (2007, 2009) approaches yes/no question prosody from an Africanist perspective and provides evidence against a cross-linguistic universal that links high or rising pitch with questions. Using data from 78 and 119 African languages, Rialland (2007, 2009) shows that low-pitched questions are not an exception but the norm in many families of the Niger-Congo phylum, and that low-pitched questions also occur frequently in other languages of the Macro-Sudan belt. Moreover, Rialland (2007) includes other prosodic markers in addition to pitch in question and statement intonation and establishes a typology of polar question intonation based on co-occurrences of the various question markers.

Rialland (2007, pp. 37–38) distinguishes between high-pitched question markers in tense prosodies, as in (1), and non-high-pitched question markers in lax prosodies, as in (2). High-pitched markers all relate to pitch. Non-high-pitched markers relate to pitch as well as to a range of other types of prosodic marking.

    1. (1)
    1. High-pitched yes/no question markers in tense question prosody
      • cancellation or reduction of downdrift, register expansion
      • raising of last high tone(s) (not necessarily sentence-final)
      • cancellation or reduction of final lowering
      • final high tone or rising intonation
      • final high-low melody
    1. (2)
    1. Non-high-pitched yes/no question markers in lax questions prosody
      • final low tone or falling intonation
      • final polar tone or mid tone
      • lengthening (vocalic mora or considerable vowel lengthening)
      • breathy termination
      • cancellation of penultimate lengthening
      • open vowel

Cancellation or reduction of downdrift refers to a reduction or suspension of gradual phonetic lowering of pitch throughout an utterance (Rialland, 2007, pp. 39–40). Register expansion is understood as a global expansion of the pitch range in which tones are realized, which comes about as a result of raising high tones (Rialland, 2007, p. 39). Register expansion differs from final high tone or rising intonation, which is understood as a local rather than global effect (Rialland, 2007, p. 41). Register expansion may also include a higher pitch onset (e.g., in Rialland, 2007, p. 40). Cancellation or reduction of final lowering refers to a reduction or suspension of a final lowering that typically occurs in statements (Rialland, 2007, p. 42). Lengthening is understood as a process that either adds a mora to the last syllable or lengthens the vowel considerably so that there are even greater durational effects. Most often, lengthening co-occurs with other markers, for example, breathy termination, which draws out the final vowel (Rialland, 2007, p. 45). Breathy termination refers to a combination of various processes. Statements end with an abrupt decrease in intensity, and a glottal stop. Questions end with a lengthened vowel, a gradual intensity decrease, and a gradual opening of the glottis, which eventually leads to voicelessness (Rialland, 2007, p. 46).

Rialland (2007, pp. 50–51) points to the frequent occurrence and co-occurrence of open vowels, low tones, falling intonation, lengthening, and at times breathy termination to mark questions. This leads her to propose that the non-high-pitched question marker comprises a lax yes/no question prosody characterized by vowel opening, relaxing of the vocal cords (inducing pitch lowering), and glottal opening. Lax question prosody occurs, for example, in Ncam (Gur, Niger-Congo). Speakers of Ncam mark questions with falling intonation, breathy termination, vowel lengthening, and a final vowel /-à/, which may assimilate to the preceding vowel (Rialland, 2009, pp. 930–932). Lax prosody contrasts with the high-pitched question markers that combine into a tense prosody. Tense prosody is characterized by rising intonation, vocal cord tension, and glottal adduction (Rialland, 2007, p. 51). Tense prosody occurs, for example, in Mende (Mande, Niger-Congo). Mende speakers mark questions with register expansion, downdrift reduction, and rising intonation (Rialland, 2007, p. 43). In addition to lax and tense prosodies, a range of languages show hybrid prosodies and combine markers from the lax and tense sets. Kanuri speakers (Nilo-Saharan) mark questions by adding /-wá/, combining an open vowel, which is a lax marker, with a high tone, which is a tense marker (Rialland, 2009, p. 944).

Cahill (2012, 2013) uses Rialland’s (2007, 2009) typology to describe question formation in four Gur languages (Buli, Deg, Safaliba, Konni) and two Kwa languages (Adele, Chumburung). Unlike the clear lax prosody patterns found by Rialland, Cahill (2012, 2013) finds only hybrid prosodies. All the languages in Cahill’s sample combine lax and tense question markers. All six languages mark questions with a form of falling final pitch and final lengthening. Chumburung1 and Deg show evidence of breathy termination. All six languages also make use of higher initial pitch, and all languages except Chumburung show evidence of register expansion. Cahill (2013, p. 3) concludes that hybrid languages may actually be fairly common and suggests that an overall raising may be universal after all, even in languages with the typical Niger-Congo lax prosody.

The descriptions and typologies put forward by Rialland and Cahill seem to contrast strongly with previous research. The focus of earlier literature is indeed on the (near-) universal occurrence of rises in questions and on pitch as a question marker (e.g., Bolinger, 1978; Hermann, 1942; Ultan, 1969). However, the same literature acknowledges a range of realizations of questions, including intonation patterns other than final rises and other prosodic markers. Descriptions of different realizations of the higher intonation in questions include different types, combinations, and locations of falls and rises as well as overall higher pitch (Hermann, 1942; Ultan, 1969). Falling question intonation has been recognized for Ewe, Efik, and Shilluk (Hermann, 1942), Kikuyu (Samarin, 1952), Chitimacha, Fanti, and Grebo (Ultan, 1969), and Swahili (Bolinger, 1978). Additional prosodic markers mentioned in the literature are stress, intensity, pause, tempo and rhythm, duration (particularly vowel length), phonation mode,2 as well as interrogative gestures such as eyebrow lifts, forward inclination of the head, or a mouth left open at the end of the utterance (Bolinger, 1978, pp. 475–477; Greenberg, 1969; Hermann, 1942, pp. 141–142; Ultan, 1969, pp. 32–42, citing Greenberg 1969, pp. 32–34, p. 42, citing Bolinger 1957). The cross-linguistic trend towards high-pitched questions and the West African countertrend towards low-pitched questions make it difficult to maintain an argument for a universal, innate, and ethologically or biologically motivated cause for high-pitched question prosody. It would imply that either the innateness hypothesis is wrong, or that speakers of some West African languages are somehow different from speakers of other languages, or that we are still missing pieces of the puzzle.

This study presents data from Ikaan, a West African Benue-Congo language spoken in Nigeria. Ikaan question prosody shows both high-pitched patterns in line with the cross-linguistic trends and low-pitched lax patterns in line with other West African languages. Ikaan speakers show both rich use of pitch as tone with multiple functions in the lexicon and grammar and rich use of pitch as one prosodic dimension in a wider prosodic range with additional strategies such as duration, intensity, and phonation mode. Based on an in-depth analysis of the data set, the study re-examines the discussions, arguments, and conclusions in Rialland (2007, 2009), Ohala (1984), and Gussenhoven (2004) from the perspective of Ikaan, offers an alternative perspective on the lax markers proposed by Rialland, and expands the set of prosodic strategies motivated by the frequency code, thus providing a broader range of data and evidence for the wider discussion of arbitrariness vs. motivation of form and meaning in general.

Section 2 of the paper provides essential linguistic background information on Ikaan. Section 3 lays out the methodology of the study, giving information on the experimental setup, data preparation, and data analysis. Section 4 presents the results of the prosodic analysis of the data. Section 5 discusses the results in detail, showing first that on their own, none of the prosodic markers can bear the full functional load of marking statements from questions, and that a morphophonological approach with a sentence-final clitic also fails to comprehensively capture the prosodic patterns observed. Second, the Ikaan results are discussed in their wider West African context and their wider cross-linguistic context. Section 6 concludes the paper.

Original recordings and the transcriptions of all data presented in the paper are accessible from the archived deposits in Salffner (2010a, 2014) and are referred to here by their recording name (e.g., ikaan170) and, where applicable, the identifier of the annotation (e.g., ikaan093.024 or ikaan170, 1q).

2. Essential Ikaan language background

Ikaan is a lect of the Ukaan cluster (ISO 639-3: kcf). The Ukaan cluster consists of four lects with different degrees of intelligibility that are spoken in five villages in south-western Nigeria. While Ukaan is unanimously classified as a Benue-Congo language within the larger Niger-Congo family, its classification within Benue-Congo is controversial. Ukaan is variously regarded as being an isolate or near-isolate within the Benue-Congo family (Bankale, 2008; Ohiri-Aniche, 1999; Segerer, n.d.; Williamson, 1989), related to Edoid languages (Abiodun, 1999; Agoyi, 2001; Elugbe, 2012), part of Western Benue-Congo (Blench, 1989), or part of Eastern Benue-Congo (Blench, 1994/2005; Connell, 1998; Williamson & Blench, 2000).

Ikaan is a tone language in which tone is active in the phonology and plays an important role in the lexicon, grammar, and at the interfaces of phonology with other levels of the grammar. There are two underlying tones, a high tone (H) and a low tone (L). A detailed description and analysis of the tonal system of Ikaan is given in Salffner (2010b).

In Ikaan, lexical tonal melodies distinguish minimal pairs from one another, as in (3).

    1. (3)
    1. èwùr
    2. òjèn
    1. L L
    2. L L
    1. ‘hair’
    2. ‘relative’
    1. vs.
    2. vs.
    1. èwúr
    2. òjén
    1. L H
    2. L H
    1. ‘dew’
    2. ‘wife’

Grammatical tonal melodies express for example tense, aspect, and mood, as in (4), or negation, as in (5).

    1. (4)
    1. dʒɛ̀kʊ́rà
    2. dʒɔ́ɔ̀kʊ̀rá
    3. dʒɛ́ɛ́kʊ́rá
    4. dʒáàkʊ́rá
    5. dʒɔ̀ɔ́kʊ́rà
    6. dʒɛ̀ɛ́kʊ́rá
    1. L H L
    2. H L L H
    3. H H H H
    4. H L H H
    5. L H H L
    6. L H H H
    1. ‘I slept/am asleep.’
    2. ‘I used to sleep.’
    3. ‘I am sleeping.’
    4. ‘I will sleep.’
    5. ‘if I slept/am asleep’
    6. ‘if I am sleeping’
    1. (5)
    1. dʒɛ̀kʊ́rà
    2. dʒɛ̀ɛ́kʊ́ràɡ
    1. L H L
    2. L H H L
    1. ‘I slept/am asleep.’
    2. ‘I did not sleep/I am not asleep.’

At the pragmatic level, tonal morphemes encode pragmatic functions such as focus or a predicative use of a noun, as in (6).3

    1. (6)
    1. èɡù
    2. òhwó
    1. L L
    2. L H
    1. ‘house’
    2. ‘bone’
    1. vs.
    2. vs.
    1. éɡù
    2. óhwó
    1. H L
    2. H H
    1. ‘house [focused]’ or ‘It’s a house.’
    2. ‘bone [focused]’ or ‘It’s a bone.’

At the interface between phonology and semantics, tonal processes that do or do not trigger downstep distinguish between modifiers with predicating function (no downstep) and modifiers with referential function (downstep), as in (7).

    1. (7)
    1. ɔ̀tá ɔ̀jã́wã́
    2. ɔ̀tá ɔ̀nɪ́
    1. [ɔ̀tɔ́ ɔ̀jã́wã́]
    2. [ɔ̀tɔ́ ɔ́nɪ́]
    1. L H L H H
    2. L H H H
    1. ‘[the] new lamp’
    2. ‘[the] person’s lamp’

Downstep in Ikaan is typologically highly unusual. High tones are downstepped to H after underlying floating low tones, as shown in the schematic pitch levels given in (8). The high tone on the demonstrative nɛ́ːn ‘that’ in (8) is preceded by an underlying floating low tone, which is lexically present but does not surface as an overt low tone.

    1. (8)
    1. ɛ̀ná
    2. L H
    3. cow
    1. nɛ́ːn
    2. LFL H
    3. DEM
    1.  
    2.  
    1. [ ̲    ̅    ―]
    2. L H LFL H
    3.  
    1. ‘that cow’ (ikaan164.053)

The pitch track of the utterance in (8), shown in Figure 1, shows that the high tone that occurs after the underlying floating low tone is pronounced with lower pitch than the previous high tone.

Figure 1 

Pitch track of ɛ̀ná nɛ́ːn ‘that cow.’

High tones are not downstepped if they occur after overt surface low tones, as shown in the schematic pitch levels in (9).

    1. (9)
    1. dʒɛ̀-r̥án
    2. L H
    3. 1SG-cook
    1. ɔ̀wɔ́ɡ
    2. L H
    3. soup
    1. àrákpà
    2. L H L
    3. beans
    1.  
    2.  
    1. [  ̶   ̅   ̶   ̅   ̶   ̅   ̲  ]
    2. L H L H L H L
    3.  
    1. ‘I cooked bean soup.’ (ikaan124.064)

The pitch track in Figure 2 shows that all high tones are at the same level and are not lowered after the surface low tones.

Figure 2 

Pitch track of dʒɛ̀r̥án ɔ̀wɔ́ɡ àrákpà ‘I cooked bean soup.’

Examples (3) to (9) show how heavily tone is involved at many levels of the lexicon and grammar, how easily tonal changes lead to meaning changes, and how complex the processes are that determine the pitch at which a tone is realized.

Statements and polar questions in Ikaan show identical morphosyntactic structures, as shown in (10).

    1. (10)
    1. a.
    1. ɔ̀-nɔ́
    2. 3SG-fall
    1. ‘S/he fell.’ (ikaan170, 1s)
    1.  
    1. b.
    1. ɔ̀-nɔ́
    2. 3SG-fall
    1. ‘Did s/he fall?’ (ikaan170, 1q)

Participant observation of natural embodied language use and informal testing in a quasi-experimental, disembodied setup outside natural and contextualized language use both suggest that speakers of Ikaan are able to distinguish questions and statements even though they are morphosyntactically identical. In natural language use, gestural cues such as raised eyebrows, an inquisitive facial expression, and a slight tilt of the head additionally mark questions from statements. In an informal perception test, speakers listened to audio recordings of statements and questions so that context as well as gestural cues were absent. The fact that hearers could still tell statements from questions suggests that the audio signal itself is sufficient to distinguish statements and questions. When speakers were asked how they knew that a given utterance was a question rather than a statement, some speakers said ‘It goes up,’ with one speaker saying, after probing, that ‘it’ was the tone that went up. Speakers did not comment on any other features of the utterances, neither in the gestural domain nor in the speech signal.

The study reported here focuses on the speech signal in statements and questions and investigates which prosodic parameters may play a role in their distinction.4

3. Method

3.1. Experimental setup

The study describes and analyses acoustic data consisting of Ikaan statements and questions recorded from 6 speakers. The test set consists of 19 predications, each pronounced as a statement and a question, resulting in 38 utterances per speaker. Statements and questions were given in English, and speakers gave translation equivalents in Ikaan. This paper includes data from the 6 participants listed in Table 1,5 who are all native speakers of the Ikaan lect of Ukaan as spoken in Ikakumo, Ondo State, south-western Nigeria, but who also speak or understand a variety of other languages. Information on the various languages the participants speak or understand was collected in separate sociolinguistic background interviews. The interview for the speaker recorded in ikaan177 has not yet been conducted. The speaker recorded in ikaan175 has since passed away and no interview data is available for him. For both of these speakers, the information on the languages they speak and understand is based on participant observation and informal conversations. Language proficiency in Nigerian English varies among speakers, with some speakers being more proficient in Nigerian English and others more fluent in Nigerian Pidgin English.

Table 1

Background information on speakers included in the investigation.

recording gender generation spoken understood source

ikaan170 male parent Ikaan, Yoruba, Nigerian English, Hausa Ebira ikaan269
ikaan171 male elder Ikaan, Yoruba, Nigerian English Arigidi, Ebira ikaan270
ikaan172 female elder Ikaan, Yoruba, Nigerian English Ebira ikaan325
ikaan173 male elder Ikaan, Yoruba, Nigerian English, Ebira ikaan335
ikaan175 male elder Ikaan, Yoruba, Nigerian English, Hausa possibly Ebira observation
ikaan177 male adolescent Ikaan, Yoruba, Nigerian English possibly Ebira observation

Speakers were recorded with a headset microphone (Shure SM10A) and a digital audio recorder (Roland R-09). For more details on the speakers and recording sessions, see recording and speaker metadata in the archived deposit in Salffner (2014). For the data set presented here, visual cues observed during participant observation and in the data collection cannot be described and analyzed because the data was not video-recorded. Since then, video recordings of question formation have been obtained in staged communicative contexts and are available from Salffner (2014).

3.2. Transcription and annotation of the data

Using Praat (Boersma & Weenink, 2015), the audio recordings were segmented into utterances and individual segments. The data was annotated with phonemic and narrow phonetic transcriptions of segments, tonemic mark-up of tonal targets, translations into English, identifiers for each utterance, and notes and comments. The following sections give more details on the annotation conventions for the individual layers.

3.2.1. Pitch

For investigating question markers that make use of pitch, tonal targets of the lexical and grammatical tones were marked as points. The exact point at which the pitch was measured depended on the sequence of tones and the resulting pitch perturbations. In sequences of identical tones where the pitch was level throughout, e.g., in low-low sequences, the tone point was put at the beginning of the last-but-one well-established period of the vowel, or two periods earlier if the pitch dropped into a voiceless consonant, e.g., before [k ʃ tʃ hj]. In sequences of different tones where the pitch shows transitions, e.g., in low-high sequences, the tone point was put where the tonal target had reached its highest or lowest point and had started to plateau. This could be within the vowel or later, e.g., in syllables ending in a nasal consonant. If there was a phonetic pitch-raising effect following, such as a pitch bump before the transition to a voiceless fricative or to a breathy or voiceless vowel, the tone point was put on the plateau before the bump. For contour tones on long vowels, e.g., low-high sequences on one vowel, tone points were put at the second and at the last well-established period of the vowel. Because of breathy and creaky voice, measuring the pitch of final low tones was done on the last well-established period rather than during creaky or breathy realizations. For creaky vowels, this meant that the pitch might still be going down after the point of measuring, for example in ikaan173, 6q, as shown in Figure 3 on the left. In breathy vowels, the pitch track occasionally showed an upwards movement on the last high tone, for example in ikaan177, 19q, as shown in Figure 3 on the right.

Figure 3 

Pitch measurements in creaky (left) and breathy realizations (right).

3.2.2. Phonation mode

For investigating phonation mode, utterance-final vowels were transcribed auditorily with narrow phonetic transcription indicating the presence of modal, breathy or creaky voice, voicelessness, and the presence and manner of release of a final glottal stop. All auditory transcriptions were verified acoustically using cues for phonation modes and glottal stops in the waveform and spectrogram.

3.2.3. Duration

All utterances were segmented into vowels and consonants. For duration, the final vowel was measured. The duration of statement vowels included any creaky phases and the closure of unreleased glottal stops. If glottal stops occurred, these were segmented separately because some releases came very late, some did not occur, and sometimes there was no glottal stop at all. The duration for question vowels (and statement vowels that ended in breathy voice or voicelessness) included any breathy and voiceless parts.

The beginning of a vowel at the beginning of an utterance was set at the first well-defined period. The end of a vowel within a word was set at the last vowel period before the consonant pattern had established itself. The beginning of a vowel within an utterance was set when the consonant pattern had stopped and the vowel period pattern was beginning to establish itself. The clear transitions to and from nasals, fricatives, and voiceless approximants visible in the spectrogram were also used to identify the boundaries between segments. Transition periods to and from plosives were marked up as part of the vowel as long as there was a fully established vowel pattern. The boundaries of voiced approximants that occurred adjacent to their vowel counterparts, i.e., [i ɪ j] and [u ʊ w], were distinguished by differences in intensity, with approximants having a slightly lower intensity than vowels. Finally, the end of a final consonant was set at the point where there was no more energy visible in the spectrogram.

The end of a vowel at the end of a word was more difficult to determine. In statements, vowels may be followed by unreleased or released glottal stops, and the release may come after a prolonged closure. Vowels ending in creaky voice were segmented after the resonances of the last glottal pulse had died down. If there was an unreleased glottal stop following, the closing phase of the glottal stop was segmented and transcribed as part of the vowel. Unreleased glottal stops were identified auditorily. If there was a released glottal stop following, the glottal stop was annotated as its own interval, the boundary was set after the release or after the secondary articulations of the release, such as strong release, had ceased. In questions, breathy vowels peter out so that the transition between a voiceless vowel and silence is not clearly defined. Vowels ending in breathy voice and voicelessness were segmented after the formant resonances of the voicing or the turbulent noise source had died down or strongly reduced.

3.2.4. Intensity

For intensity, the beginning and end of the entire utterance was annotated according to the conventions for segmenting vowels and consonants outlined in section 3.2.3. Intensity was then measured over the duration of the entire utterance.

3.2.5. Other annotation levels

Each utterance was annotated with comments on the presence of any of the prosodic markers described by Rialland (2007) and with general notes on the data.

3.3. Data measurements, conversion, and calculation

Pitch data was measured in Hertz. To normalize the pitch data across the male speakers and the female speaker, the Hertz values were converted to semitones using 1Hz as the reference fundamental frequency.

For pitch onset, the frequency of the utterance-initial low tone in each utterance was measured, and the utterance-initial low tone of a statement was compared to the utterance-initial low tone of its counterpart question.

For register expansion, the utterance-initial low tone of each utterance and the first high tone in each utterance were measured, and the difference between the two tones was calculated as the pitch range of the register. The pitch range of a statement was then compared to the pitch range of its counterpart question. For utterances 12 and 14, where two high tones followed the low tone, the second high was used as a reference point because this was where the pitch target was reached. Utterance 3, which consists entirely of low tones, is not included in the calculations. For ikaan177, utterance 15 could not be measured because for both the statement and the question, the utterance-initial low tone was too creaky to measure pitch.

For final lowering, the last high tone and the last low tone in an utterance were measured, and the difference in semitones was calculated. The difference in pitch drop in statements was then compared to the difference in pitch drop in questions. Since not all utterances contained a high tone, and not all utterances ended in low tones, only eight statement/question pairs (utterances 6, 7, 10, 11, 13, 14, 17, and 18) out of 19 statement/question pairs could be compared.

For phonation mode, the narrow phonetic transcriptions of the final vowels were grouped into four categories: Modal, breathy, creaky, and mixed breathy/creaky. Table 2 gives an overview of the categories, the patterns in each category, and examples for each pattern.

Table 2

Phonation mode categories, vowel patterns, and examples for each pattern.

category pattern examples

modal V a
creaky VV̰ʔ aa̰ʔ
VV̰ʔ ̚ aa̰ʔ ̚
ɔʔ
Vʔ ̚ ɔʔ ̚
V̰ʔ ̚ a̰ʔ ̚
VV̰ iḭ
breathy VV̤V̥ aa̤ḁ
VV̤ aa̤
VV̥ aḁ
V̤V̥ i̤i̥
mixed breathy/creaky various aa̰a̤ʔ ̚ ɛɛ̰ɛ̥ ĩĩ̤ʔ

Durations of the final vowels were measured in milliseconds for all vowel-final statements and their corresponding questions. Based on this, the duration difference in milliseconds for each statement/question pair was calculated. Underlyingly /m/-final utterances are vowel-final on the surface in both statements and questions and were also included. For /ɡ/-final utterances, the combined duration of the last vowel and the /ɡ/ was measured in statements and later compared to the final vowel in the question (for more information on /m/-final and /ɡ/-final utterances see section 5.1.2. below). Utterances ending in consonants other than /m ɡ/ in statements were excluded from the calculations because no useful counterpart to the inserted final vowel in questions could be found.

To investigate intensity effects, the mean utterance intensity of each statement and question was measured in decibel, and the difference in mean intensity for each statement/question pair was calculated.

3.4. Statistical analysis

The values for pitch onset, register expansion, final lowering, final lengthening, and intensity constitute numerical data, and one-way ANOVAs were conducted to determine the effect of utterance type on each of these prosodic features. The values for phonation mode constitute ordinal data and were tested with a bivariate chi square test. For each test set, the null hypothesis was that the values or distribution for statements and questions did not differ, and the research hypothesis was that the values or distribution systematically differed in magnitude. Probability values were interpreted using Bonferroni-adjusted p levels, differentiating between three levels of significance: Significant (*) at p < 0.00833 (0.05/6), very significant (**) at p < 0.00167 (0.01/6), and highly significant (***) at p < 0.00017 (0.001/6).

4. Results

Based on the statement and question markers observed in Ikaan and described by Rialland (2007, 2009), this section presents results for pitch onset, register expansion, final lowering, phonation mode, final lengthening, and intensity, as well as preliminary observations for downdrift and final high tones.

4.1. Pitch onset

Ikaan questions have a higher pitch onset compared to their statement counterparts. Utterance-initial low tones in statements are realized at 84.0Hz on average (SE = 0.37), whereas utterance-initial low tones in questions are realized at 87.8Hz on average (SE = 0.40). An example of higher initial pitch in questions is shown in (11) and in Figure 4.

Figure 4 

Pitch onset differences in ɔ̀kʊ́rà ‘s/he slept’ as statement (left) and question (right).

    1. (11)
    1. a.
    1. ɔ̀-kʊ́rà
    2. 3SG-sleep
    1. ‘S/he slept.’ (ikaan175, 7s)
    1.  
    1. b.
    1. ɔ̀-kʊ́rà
    2. 3SG-sleep
    1. ‘Did s/he sleep?’ (ikaan175, 7q)

The initial low tone in the statement in (11) is pronounced at 81.5 semitones. The initial low tone in the question is pronounced at 84.7 semitones, a difference of 3.3 semitones.

Speakers are consistent in using higher onset pitch in questions than in statements. The analysis of variance shows that the effect of utterance type on onset pitch is highly significant across speakers, with F (1, 224) = 28.1, p < 0.00017.

4.2. Register expansion

For most speakers, there is a tendency for the pitch distance between low and high tones to be slightly greater in questions than in statements; that is, for most speakers, questions are pronounced with a slightly wider pitch register than statements. The pitch difference between low and high tones in statements is 4.6 semitones on average (SE = 0.15), whereas the pitch difference in questions is 4.9 semitones on average (SE = 0.16). An example for a register expansion is shown in (12) and in Figure 5.

Figure 5 

Register differences in òʃèrèké ‘s/he cut sugarcane’ as statement (left) and question (right).

    1. (12)
    1. a.
    1. ò-ʃì
    2. 3SG-cut
    1. èrèké
    2. sugar.cane
    1.  
    1. [òʃèrèké]
    2.  
    1. ‘S/he cut sugar cane.’ (ikaan170, 19s)
    1.  
    1. b.
    1. ò-ʃì
    2. 3SG-cut
    1. èrèké
    2. sugar.cane
    1.  
    1. [òʃèrèké]
    2.  
    1. ‘Did s/he cut sugar cane?’ (ikaan170, 19q)

In the statement, the pitch difference between the first low tone and the high tone is 5.0 semitones. In the question, the difference is 5.4 semitones. The range of the register in the question is 0.4 semitones greater than in the statement.

While there is a tendency towards an expanded register, the pattern does not hold for the whole data set. The speaker in ikaan172, the only female speaker in the data set, on average has a narrower register in questions. The analysis of variance shows that the effect of utterance type on register expansion is not significant across speakers, with F (1,212) = 1.1 and p = 0.3036.

4.3. Final lowering

In Ikaan tonal phonology, surface low tones and mid tones are allotones of an underlying low tone. In a phonological process, phonological low tones before high tones are raised to mid, as in the process in (13) and the examples with schematic tone levels in (14).

    1. (13)
    1. /L/ → [M]/ ___ /H/
    1. (14)
    1. a.
    1. èjímò
    2. egg.plant
    1. ɛ̀ːdʒ
    2. 1SG.POSS
    1.      →
    2.       
    1. [ējímɛ̀ːdʒ]
    2. ̶   ̅   ̲
    1. ‘my eggplant’ (ikaan036.020)
    1.  
    1. b.
    1. èjímò
    2. egg.plant
    1. ɛ̀bɔ́
    2. 1PL.POSS
    1.        →
    2.         
    1. [ējímɛ̄ːbɔ́]
    2. ̶    ̅    ̶    ̅
    1. ‘our eggplant’ (ikaan036.023)

At first sight, the phrase-final lower low tone in (14) might be interpreted as a prosodic process of final lowering. However, since only low tones are affected by this potential lowering and high tones are not lowered, a phonological tonal process of low tone raising before high tones captures the observation in a more generalized way. That being said, at the prosodic level there may still be a prosodic final lowering process that may overlay the phonological processes and that may differ in extent in statements and questions.

Speakers indeed show a wider pitch difference in questions than in statements between the last of the high tones and the final low tone. In statements, this pitch difference is 9.1 semitones on average (SE = 0.4), whereas in questions this pitch difference is 12.4 semitones on average (SE = 0.6). This pitch difference is illustrated in the example in (15) and the pitch track in Figure 6.

Figure 6 

Final lowering differences in òbénò ‘s/he strolled’ as statement (left) and question (right).

    1. (15)
    1. a.
    1. ò-bénò
    2. 3SG-stroll
    1. ‘S/he strolled.’ (ikaan175, 17s)
    1.  
    1. b.
    1. ò-bénò
    2. 3SG-stroll
    1. ‘Did s/he stroll?’ (ikaan175, 17q)

In the statement, the pitch difference between the high tone and the final low tone is 8.7 semitones. In the question, the difference is 12.1 semitones. That is, the drop from the high to the low tone is 3.4 semitones larger in the question than in the statement.

The utterance-final wider pitch drop is used consistently by all speakers. The analysis of variance shows that the effect of utterance type on final lowering is highly significant across speakers, with F (1, 94) = 24.2, p < 0.00017.

4.4. Other pitch markers

There are no indications for phonological boundary tones such as final high, low, mid, or polar tones, as described for other languages by Rialland (2007, 2009). Reduction of downdrift and raising of the last high tone cannot be investigated in detail with the existing dataset, but a preliminary description can be presented.

4.4.1. Reduction of downdrift

Downdrift is best observed in sequences of identical tones such as all-low sentences or all-high sentences. Downdrift in Ikaan has not been investigated in detail. All-high utterances consisting of more than one word rarely occur in Ikaan and are not included in this dataset. An all-low utterance exists in this dataset in utterance 3 ɔ̀jã̀nà ɔ̀bɛ̀ɡɛ̀ [ɔ̀jã̀nɔ̀bɛ̀ɡɛ̀] ‘s/he bought plantain.’

The graphs in Figure 7 plot pitch trajectories for each speaker for the low tones in ɔ̀jã̀nà ɔ̀bɛ̀ɡɛ̀ ‘s/he bought plantain’ as a statement and as a question. For most speakers, the first four tones are fairly level and the final low tone shows a downward drop.

Figure 7 

Downdrift in ɔ̀jànɔ̀bɛ̀ɡɛ̀ ‘s/he bought plantain’ as statement (left) and question (right).

Table 3 and Table 4 list the pitch values for each low tone for each speaker. For the final low tone, the pitch is given in semitones for the beginning of the vowel-bearing tone (L5(b)) and at the end of the vowel (L5(e)). For each speaker, the pitch difference between the first and the last low tone is given, for both the beginning and the end of the vowel that bears the last low tone. Finally, the pitch difference between the first and the last low tone is given in semitones and as a percentage, averaged across all speakers.

Table 3

Downdrift in utterance 3 (statement).

L1 L2 L3 L4 L5(b) L5(e) L5(b) – L1 L5(e) – L1

ikaan170 80.9 80.3 79.9 79.1 79.5 77.8 –1.4 –3.1
ikaan171 86.1 85.2 86.2 83.3 83.8 78.9 –2.3 –7.2
ikaan172 91.4 92.4 91.6 91.4 90.9 81.7 –0.4 –9.7
ikaan173 85.3 84.8 83.5 82.7 83.1 79.3 –2.2 –6.0
ikaan175 83.4 82.6 81.6 81.8 81.9 77.1 –1.5 –6.3
ikaan177 83.1 83.2 82.7 82.3 83.7 78.1 0.6 –5.0
average (st) –1.2 –6.2
average (%) –1.4 –7.2

Table 4

Downdrift in utterance 3 (question).

L1 L2 L3 L4 L5(b) L5(e) L5(b) – L1 L5(e) – L1

ikaan170 81.9 81.3 81.2 80.8 81.5 79.2 –0.4 –2.6
ikaan171 88.2 86.9 87.0 86.5 86.5 80.3 –1.6 –7.9
ikaan172 92.9 93.4 93.5 92.6 93.8 86.5 0.9 –6.4
ikaan173 87.6 87.5 87.2 87.4 86.8 81.9 –0.8 –5.7
ikaan175 83.6 82.7 82.3 82.3 82.0 78.3 –1.6 –5.4
ikaan177 85.5 85.4 85.0 84.7 85.0 80.8 –0.5 –4.7
average (st) –0.7 –5.4
average (%) –0.8 –6.2

Comparing the pitch of the first tone and the beginning of the last tone, in statements there seems to be a slight tendency towards downdrift for some speakers, while other speakers show less downdrift, and one speaker shows a very steady pitch with a raise at the beginning of the last vowel. In questions, there is a tendency to a steadier pitch (cf. Figure 7) with less downdrift, but again there is variation among the speakers, with some speakers showing more of a downward drift, others less, and one speaker showing an upward tendency until the beginning of the last vowel.

Overall, it seems that there may be some downdrift in statements and that this may be somewhat suspended in questions. However, compared to Connell’s (2001) reports on Hausa, using data from Lindau (1986), the Ikaan data shows less slope. Further research needs to be done to establish the extent to which there is downdrift in Ikaan and whether it is suspended in questions.

4.4.2. Raising of last H(s)

The raising of high tones was tested on utterance 15 in (16) and utterance 16 in (17). Utterance 14, which also contains more than one high tone, was excluded because it includes an object introduced by the locative marker b-LOC.’ This marker seems to co-occur with register lowering in this context, which itself would mask any raising of the high tone.

    1. (16)
    1. a.
    1. ɔ̌ː-wɔ̃̀
    2. 3SG-drink
    1. ʊ̀hjã́
    2. alcoholic.drink
    1. òjíd
    2. palm.tree
    1. ‘S/he drank palm wine.’ (15s)
    1.  
    1. b.
    1. ɔ̌ː-wɔ̃̀
    2. 3SG-drink
    1. ʊ̀hjã́
    2. alcoholic.drink
    1. òjíd
    2. palm.tree
    1. í
    2. EPV
    1. ‘Did s/he drink palm wine?’ (15q)
    1. (17)
    1. a.
    1. ɔ̌ː-mà
    2. 3SG-beat
    1. òjètẽ́j
    2. small.child
    1. ‘S/he beat the child.’ (16s)
    1.  
    1. b.
    1. ɔ̌ː-mà
    2. 3SG-beat
    1. òjètẽ́j
    2. small.child
    1. í
    2. EPV
    1. ‘Did s/he beat the child?’ (16q)

Both statements end in consonants and receive an epenthetic vowel in the question (for more information on epenthetic vowel insertion, see section 4.6.). Therefore both utterances have an additional tone in the question.

Figure 8 plots the pitches of all high tones in semitones for all speakers. The trajectories suggest an upward movement for all speakers in utterance 15. In utterance 16, there is more variation between the speakers.

Figure 8 

High tone trajectories in utterance 15 (top) and utterance 16 (bottom), with statements (left) and questions (right).

Table 5 shows pitch differences in semitones between the first and last high tones for all speakers and averages across all speakers.

Table 5

Raising of last H(s): Pitch differences between the first and last high tone.

utterance 15 utterance 16

statement question statement question

ikaan170 0.5 0.6 1.0 –0.2
ikaan171 0.0 0.7 0.5 1.0
ikaan172 1.5 0.3 1.4 0.8
ikaan173 0.0 0.7 2.0 0.9
ikaan175 –0.6 0.0 0.3 –0.2
ikaan177 0.1 0.9 1.2 2.3
all 0.2 0.5 1.1 0.8

In both statements and questions, the final high tone is usually higher than the first high tone in the utterance. However, this difference is wider in the question in utterance 15 and narrower in the question in utterance 16 compared to their counterpart statements, so that from the little data available here, no pattern emerges.

4.5. Phonation mode

Ikaan shows contrasting phonation modes in statements and questions. Ikaan statements tend to end in glottalization. They end with an abrupt decrease in intensity, sometimes occur with creaky voice on the final vowel, and may have an utterance-final released or unreleased glottal stop. Questions end in breathiness and gradually decrease in intensity with the final vowel becoming gradually breathier until voicing stops. An example showing the full range of phonation mode transitions is given in (18).

    1. (18)
    1. a.
    1. ɔ̀-nɛ́
    2. 3SG-defecate
    1.  
    1. [ɔ̀nɛ́ɛ̰ʔ]
    2.  
    1. ‘S/he defecated.’ (ikaan173, 2s)
    1.  
    1. b.
    1. ɔ̀-nɛ́
    2. 3SG-defecate
    1.  
    1. [ɔ̀nɛ́ɛ̤ɛ̥]
    2.  
    1. ‘Did s/he defecate?’ (ikaan173, 2q)

The graphs on the left in Figure 9 show the spectrogram and waveform for the utterance as a statement. The final vowel begins with modal voice, then goes through a very brief period of creaky voice, and ends in a released glottal stop. The waveform shows the abrupt amplitude decrease. The spectrogram shows the weak pulses characteristic of creak, voicelessness during the closure of the glottal stop, the release of the glottal stop, and turbulent noise after the release of the stop in the frequency area of the second and third formant of the preceding vowel [ɛ].

Figure 9 

Phonation mode differences in ɔ̀nɛ́ ‘s/he defecated’ as statement (left) and question (right).

The graphs on the right in Figure 9 show the spectrogram and waveform for the utterance as a question. The final vowel begins in modal voice, goes through a period of breathy voice, and ends in voicelessness. The waveform shows a less abrupt decrease in amplitude. The spectrogram first shows strong vertical striations for modal voice and clearly defined formants, then weaker striations and more high frequency noise for breathy voice, with weaker and less clearly defined formants. Finally, the voice bar and vertical striations disappear but energy concentrations remain within the frequency area of the second and third formants for the vowel [ɛ].

Table 6 shows how many times each phonation mode occurred in statements and questions, across all speakers. The distribution of phonation modes is not equal within each set. Within statements, the predominant tendency is towards creaky realizations. Within questions, the main tendency is towards breathy realizations. This distribution pattern was tested statistically with a bivariate chi square test and found to be highly significant, with χ2(df = 3) = 103.7 at p < 0.00017.

Table 6

Phonation mode categories in statements and questions (instances and percent).

phonation mode statement question

number % number %

breathy 21 23.3 102 89.5
creaky 54 60.0 1 0.9
modal 12 13.3 9 7.9
mixed 3 3.3 2 1.8

4.6. Final lengthening

Final vowels in Ikaan questions are longer in duration than final vowels in Ikaan vowel-final statements, and longer than the combined duration of the pre-final vowel and final /ɡ/ in /ɡ/-final statements. In statements, the duration of the final vowel is 188ms on average (SE = 8.0), whereas in questions the final vowel is 277ms on average (SE = 10.7). Examples of longer vowels in both contexts are shown in (19) and (20).

    1. (19)
    1. a.
    1. ɔ̀-kpɪ́
    2. 3SG-hear
    1.  
    1. [ɔ̀kpɪ́]
    2.  
    1. ‘S/he understood.’ (ikaan170, 5s)
    1.  
    1. b.
    1. ɔ̀-kpɪ́
    2. 3SG-hear
    1.  
    1. [ɔ̀kpɪ́ːː]
    2.  
    1. ‘Did s/he understand?’ (ikaan170, 5q)
    1. (20)
    1. a.
    1. ɔ̀-kɪ́ɡ
    2. 3SG-go
    1.  
    1. [ɔ̀kɪ́k]
    2.  
    1. ‘S/he went.’ (ikaan171, 8s)
    1.  
    1. b.
    1. ɔ̀-kɪ́ɡ
    2. 3SG-go
    1.  
    1. [ɔ̀kɪ́ːː]
    2.  
    1. ‘Did s/he go?’ (ikaan171, 8q)

Figure 10 shows waveforms and spectrograms of the vowel-final utterance in (19). The final vowel in the statement is 140ms long, the final vowel is 233ms long, a difference of 93ms.

Figure 10 

Final lengthening in ɔ̀kpɪ́ ‘s/he heard’ as statement (left) and question (right).

Figure 11 shows waveforms and spectrograms of the /ɡ/-final utterance in (20). The sequence of pre-final vowel and /ɡ/ in the statement is 330ms long, while the final vowel in the question is 430ms long, a difference of 100ms.

Figure 11 

Final lengthening in ɔ̀kɪ́ɡ ‘s/he went’ as statement (left) and question (right).

The differences in the length of the final vowels in statements and questions are consistent across speakers. The analysis of variance shows that the effect of utterance type on the length of the final segment is highly significant across speakers, with F (1, 226) = 48.0, p < 0.00017.6

4.7. Intensity

Statements and questions differ in overall intensity, with questions generally being louder than statements. The average overall intensity of statements is 58dB (SE = 0.42), whereas the average overall intensity of questions is 59.6dB (SE = 0.38). An example of higher intensity in questions is shown in (21) and in Figure 12.

Figure 12 

Intensity differences in ɔ̀nɔ́ ‘s/he fell’ as statement (left) and question (right).

    1. (21)
    1. a.
    1. ɔ̀-nɔ́
    2. 3SG-fall
    3. ‘S/he fell.’ (ikaan172, 1s)
    1.  
    1. b.
    1. ɔ̀-nɔ́
    2. 3SG-fall
    3. ‘Did s/he fall?’ (ikaan172, 1q)

The statement in (21) is pronounced with a mean intensity of 65.5dB. The question is pronounced with a mean intensity of 67.3dB. The difference in intensity between the statement and the question is 1.8dB.

The analysis of variance shows that the effect of utterance type on the intensity of the utterance is significant across speakers, with F (1, 226) = 7.7, p = 0.0059. This holds true even though questions contain a rather long interval of lower-intensity breathiness and voiceless at the end of the utterance, which lowers the mean intensity.

4.8. Summary of question markers

Figures 13, 14, 15, 16, 17, 18 give an overview of the markers discussed here, showing averages and standard errors for all markers except phonation modes, where the columns show numbers of instances. The results suggest the differences between statements and questions in Ikaan as laid out in Table 7, with the significance of each marker indicated by asterisks.

Figure 13 

Onset pitch.

Figure 14 

Pitch difference between initial L and H.

Figure 15 

Pitch difference between last H and final L.

Figure 16 

Phonation mode of final vowel (statements in grey, questions in black).

Figure 17 

Duration difference of the final vowel.

Figure 18 

Intensity difference.

Table 7

Overview of use of prosodic markers in Ikaan statements and questions.

marker statement question p

pitch onset lower higher ***
register expansion narrower wider not significant
final lowering narrower wider ***
phonation mode creaky breathy ***
final lengthening shorter longer ***
intensity quieter louder *

5. Discussion

The results presented in the previous section show that speakers of Ikaan consistently use a wide range of prosodic markers to distinguish statements from questions, and that these usage patterns are statistically significant. A more detailed investigation of the data, however, shows that the situation is not quite so straightforward. The following sections discuss how individual prosodic markers may in fact fail to apply, and how prosodic changes are accompanied by segmental phonological changes that might find alternative explanations in a morphophonological analysis. Following this closer look at the Ikaan data, Ikaan question prosody is discussed in the context of West African languages and the lax question prosody proposed by Rialland (2007, 2009), and in the context of the frequency code (Gussenhoven, 2004; Ohala, 1984).

5.1. Language-internal problems with the prosodic markers

Even though the prosodic markers described in the previous section are used fairly consistently across speakers, a more detailed investigation of the data reveals a more complex picture than suggested by the relatively straightforward results of the simple quantitative analysis. Prosodic markers may fail to apply because their context is not met, because speakers vary in their usage, because markers are decontextualized or seem unnecessary for hearers, and because the prosodic strategies co-occur with unexpected segmental phonological processes.

5.1.1. Prosodic markers fail to apply, vary, are decontextualized, or are jumped by hearers

The first language-internal issue with the use of the prosodic markers is that the prosodic question markers used by Ikaan speakers often rely on prosodic processes which require certain prosodic contexts in order to apply. These contexts, however, are sometimes just not given by the prosodic structure of the utterance and therefore the process cannot apply.

For example, without any high tones present, there is no pitch difference between low and high tones. If, in a larger data set, the wider pitch register proved to be a statistically significant marker in its own right, speakers would still not be able to use it to mark questions in all-low utterances as shown in (22).

    1. (22)
    1. a.
    1. ɔ̀-jã̀nà
    2. 3SG-buy
    1. ɔ̀bɛ̀ɡɛ̀
    2. plantain
    1. ‘S/he bought plantain.’ (3s)
    1.  
    1. b.
    1. ɔ̀-jã̀nà
    2. 3SG-buy
    1. ɔ̀bɛ̀ɡɛ̀
    2. plantain
    1. ‘Did s/he buy plantain?’ (3q)

Similarly, final lowering cannot distinguish questions from statements in utterances ending in high tones, as shown in (23), because high tones are not affected by final lowering.

    1. (23)
    1. a.
    1. ɔ̀-nɔ́
    2. 3SG-fall
    3. ‘S/he fell.’ (1s)
    1.  
    1. b.
    1. ɔ̀-nɔ́
    2. 3SG-fall
    3. ‘Did s/he fall?’ (1q)

In utterances ending in consonants, phonation mode changes and final lengthening are not applied. The final consonant in the statements does not show creaky phonation, even in the preceding vowel, as shown in (24) and Figure 19. Additionally, the durations of final consonants do not contrast with a longer vowel in the question counterpart.

Figure 19 

Modal voice in the statement ɔ̌ːwʊ̀hjã́òjíd ‘S/he drank palm wine.’

    1. (24)
    1. ɔ̌ː-wɔ̃̀
    2. 3SG-drink
    1. ʊ̀hjã́
    2. alcoholic.drink
    1. òjíd
    2. palm.tree
    1.  
    1. [ɔ̃̌ːwʊ̃̀hjã́òjít]
    2.  
    1. ‘S/he drank palm wine.’ (15s)

A second issue in the use of the prosodic markers is that speakers of Ikaan noticeably vary in their use of some prosodic markers. Some of this variation applies to particular utterances or phonological contexts; other variation shows differences between speakers in the same context.

In the utterance ɔ̀há ‘s/he fell,’ all speakers use breathy phonation instead of creaky phonation in statements, as shown in Table 8. Why speakers use this breathy phonation is not clear yet, though the verb ha ‘see’ shows slightly different segmental behaviour as well, such as not undergoing vowel deletion or vowel assimilation across word boundaries, as other vowel-final verbs would do.

Table 8

Phonation modes in ɔ̀há ‘s/he fell’ as statement and question for all speakers.

speaker statement question

ikaan170 ɔ̀háa̤ḁ ɔ̀háa̤ḁ
ikaan171 ɔ̀háa̤ḁ ɔ̀háa̤ḁ
ikaan172 ɔ̀háa̤ḁ ɔ̀háa̤ḁ
ikaan173 ɔ̀háa̰a̤ḁʔ̚ ɔ̀háa̤ḁ
ikaan175 ɔ̀háa̤ ɔ̀háa̤
ikaan177 ɔ̀háa̤ḁ ɔ̀háa̤ḁ

Nasal vowels do not take part in the creaky vs. breathy distinction as much as oral vowels. In some cases statements are not marked with changes in phonation mode. In others cases, questions are left unmarked. In yet other cases, both statements and questions only show modal voice, as in (25) and the waveform and spectrogram in Figure 20, both of which show modal voicing throughout.

Figure 20 

Waveform and spectrogram for òʃéʃèdũ̀ ‘s/he stood up’ as statement (left) and question (right).

    1. (25)
    1. a.
    1. ò-ʃéʃèdũ̀
    2. 3SG-stand.up
    1.  
    1. [òʃéʃèdũ̀]
    2.  
    1. ‘S/he stood up.’ (ikaan172, 13s)
    1.  
    1. b.
    1. ò-ʃéʃèdũ̀
    2. 3SG-stand.up
    1.  
    1. [òʃéʃèdũ̀]
    2.  
    1. ‘Did s/he stand up?’ (ikaan172, 13q)

Across all utterances, Ikaan speakers use creaky phonation in 60% of all statements and breathy phonation in 89.5% of all questions (cf. Table 6 above). Modal phonation plays a minor role, with 13.3% of statements and 7.9% of questions realized with modal voice. This is different for final nasal vowels, as shown in Table 9, which gives the number of instances and percentage of occurrence of each phonation mode in statements and questions ending in nasal vowels. Whereas all phonation modes are attested in both contexts (with the exception of creaky voice in questions), there is a noticeably higher proportion of modal voice realizations in nasal vowels, with just over a third in statements and just under a third in questions.

Table 9

Phonation mode categories in statements and questions ending in nasal vowels.

phonation mode statements questions

number % number %

breathy 5 20.8 16 66.7
creaky 9 37.5 0 0.0
modal 9 37.5 7 29.2
mixed 1 4.2 1 4.2

Between themselves, speakers vary considerably in how strongly they realize phonation mode differences in statements and questions. The speaker in ikaan173 uses phonation mode most consistently, with clear strong glottal closures and releases in the majority of statements, and clear long breathy and voiceless phases in the majority of questions, as shown in the phonation mode patterns and number of instances for each pattern in Table 10. Incidentally, he is also the speaker who has been exposed to the largest range of fieldworking linguists, as he is the person to whom linguists are first sent when they arrive in the village to collect data.

Table 10

Phonation mode patterns in ikaan173.

statement question

pattern number pattern number

VV̰ʔ 8 VV̤V̥ 12
1 VV̤ 3
1 VV̥ 2
V 1 VV̰V̤ 1
VV̤ 2
VV̥ 1

The speaker in ikaan177, on the other hand, does not use creaky voice at all and uses breathy voice in both statements and questions, albeit more so in questions than in statements, as shown in the count in Table 11.

Table 11

Phonation mode types in ikaan177.

phonation mode type statement question

breathy 7 17
modal 7 1

This does not mean, however, that this speaker does not distinguish between statements and questions using phonation mode. Instead, he seems to make a more subtle distinction, using modal voice and breathy phonation without voicelessness more in statements, and breathy phonation with voicelessness more in questions, as shown in Table 12.

Table 12

Phonation mode patterns in ikaan177.

phonation mode pattern statement question

V 7 1
0 1
VV̤ 4 4
VV̥ 2 3
VV̤V̥ 1 9

The range of variation across speakers and contexts shows that even though prosodic markers are used by speakers with statistically significant usage patterns, there is variation hidden within this usage that hearers in real life have to deal with.

A third issue in the use of prosodic markers is that pitch onset and intensity as prosodic markers might bring to the fore a more conceptual problem: Out of context, one might not expect hearers to be able to tell if an utterance is higher or louder than usual.

Tonal differences and pitch prosodies are not marked by absolute pitch points. Instead, tone and the use of pitch in prosody are considered to be relative pitch that is compared to other tones in an utterance to establish whether it is high or low (Pike, 1948). Without a reference point, hearers might not be able to tell whether an utterance is pronounced with higher pitch. Studies by Wong and Diehl (2003) and Honorof and Whalen (2005), however, show that speakers can indeed identify lexical tones in isolation and can locate a speaker’s pitch in their pitch range successfully, even out of context.

Like tone, perceived loudness is also not marked by absolute intensity or other acoustic and articulatory correlates. Rather, loudness is perceived as a relative difference and again might not be detectable for hearers out of context. Similar to tone and pitch, however, experimental studies have shown that hearers pick up on a range of acoustic and articulatory markers to judge loudness, e.g., sound pressure level (Stevens, 1955) or vocal effort (Allen, 1971), and it is possible that speakers can do this without context, too. These studies therefore suggest that the problem of tone, pitch and intensity being relative might not be a problem after all.

In addition to perception of acoustic markers, a hearer’s familiarity with a speaker may play a role in both pitch and intensity perception. Given that the Ikaan-speaking community is very small and seems to have been very small historically, it is conceivable that all Ikaan speakers today know each other at least to some degree, and speakers would have known each other historically. As a “society of intimates” (cf. Trudgill, 2011), speakers and hearers of Ikaan may be at least somewhat familiar with each other’s natural pitch and intensity ranges and might therefore be in a position to exploit relative markers such as pitch onset and intensity to a greater degree then members or larger communities where speakers are not always familiar with each other.

The fourth issue in the use of question markers is that hearers recognized statements and questions early on, before utterance-final markers applied. The production data from the speakers shows a range of utterance-final prosodic markers: Lowering of final low tones and increased duration of final vowels in questions, and creaky and breathy termination in statements and questions respectively. Even though these markers are used consistently by speakers, hearers do not seem to have to rely on them in all cases. In the informal perception experiment conducted among speakers and hearers of Ikaan, hearers often correctly identified questions and statements well before the end of the utterance and did not seem to have to wait for utterance-final cues.

Still, the utterances produced by the Ikaan speakers show that speakers do not have to rely on morphosyntactic markers to mark statements and questions, and that they use a range of prosodic markers for this task. Yet the lack of context for some markers, the variation between and within speakers, the fact that relative utterance-initial markers may not be perceivable, and utterance-final markers may not be necessary all show that the occurrence of a single marker on its own may not be sufficient to mark questions reliably. It seems that it is only in combination that the cues manage to cover all the contexts, and that the cues are employed by speakers as a robust multi-dimensional prosodic bundle that marks statements and questions as a package.

That being said, in addition to the problems with the prosodic markers discussed above, segmental phonology is involved in Ikaan question formation, which casts some doubt on a prosody-only approach. As already seen in (16) and (17) above, some consonant-final questions are produced with a final [i] or [ɪ]. A prosodic approach cannot easily account for this vowel and a morphophonological approach, as outlined below, might be more insightful.

5.1.2. Morphophonology to the rescue? A clitic solution?

So far, the data indicates that a multidimensional prosodic package of pitch change, phonation mode changes, final lengthening, and intensity changes mark questions from statements. While not all elements apply equally in all contexts and while not all speakers employ the markers in the same way, together the prosodic cues seem to cover all contexts.

One context, however, can be interpreted differently, and that is the realization of final lengthening in consonant-final questions, as described in section 4.6. In these contexts, we encounter vowel lengthening, as in (26), but also consonant deletion with accompanying long vowels, as in (27), and vowel insertion, as in (28).7

    1. (26)
    1. vowel lengthening
    1.  
    1. a.
    1. ɔ̀-kpɪ́
    2. 3SG-hear
    1.  
    1. [ɔ̀kpɪ́ɪ̤ɪ̥]
    2.  
    1. ‘S/he understood.’ (ikaan171, 5s)
    1.  
    1. b.
    1. ɔ̀-kpɪ́
    2. 3SG-hear
    1.  
    1. [ɔ̀kpɪ́ɪ̤ɪ̥]
    2.  
    1. ‘Did s/he understand?’ (ikaan171, 5q)
    1. (27)
    1. consonant deletion
    1.  
    1. a.
    1. ɔ̀-kɪ́ɡ
    2. 3SG-go
    1.  
    1. [ɔ̀kɪ́k]
    2.  
    1. ‘S/he went.’ (ikaan171, 8s)
    1.  
    1. b.
    1. ɔ̀-kɪ́ɡ
    2. 3SG-go
    1.  
    1. [ɔ̀kɪ́i̤]
    2.  
    1. ‘Did s/he go?’ (ikaan171, 8q)
    1.  
    1. c.
    1. ɔ̀-mâm
    2. 3SG-laugh
    1.  
    1. [ɔ̀mâa̰ʔ]
    2.  
    1. ‘S/he laughed.’ (c)
    1.  
    1. d.
    1. ɔ̀-mâm
    2. 3SG-laugh
    1.  
    1. [ɔ̀mâa̤ḁ]
    2.  
    1. ‘Did s/he laugh?’ (ikaan171, 11q)
    1. (28)
    1. vowel insertion
    1.  
    1. a.
    1. ɔ̌ː-mà
    2. 3SG-beat
    1. òjètẽ́j
    2. small.child
    1.  
    1. [ɔ̌ːmòjètẽ́j]
    2.  
    1. ‘S/he beat the child.’ (ikaan171, 16s)
    1.  
    1. b.
    1. ɔ̌ː-mà
    2. 3SG-beat
    1. òjètẽ́j
    2. small.child
    1. í
    2. EPV
    1.  
    1. [ɔ̌ːmòjètẽ́jíi̤i̥]
    2.  
    1. ‘Did s/he beat the child?’ (ikaan171, 16q)

The long final vowels in the questions are pronounced with varying degrees of breathy voice and voicelessness, as described in section 4.5. above. If, however, we were to interpret breathy voice and voicelessness not as surface representations of phonation mode but as a manifestation of the consonant /h/, an alternative to the analysis of phonation mode as proposed in section 4.5. arises.

Wellformedness constraints in the Ikaan phonology require segmental sequences to be made up of alternating consonants and vowels. Within words, VV or CC sequences do not occur. Across word boundaries, sequences of V ## V or C ## C are resolved. If consonants come in contact across word boundaries, an epenthetic vowel is inserted. The epenthetic vowel is always a high vowel but takes its rounding and its tone from the preceding vowel, as shown in (29).

    1. (29)
    1. a.
    1. òjíd
    2. tree
    1. -V
    2. EPV
    1. nɔ̀ː
    2. DEM
    1.  
    1. [òjíd í nɔ̀ː]
    2.  
    1. ‘this tree’ (ikaan164.090)
    1.  
    1. b.
    1. èkpòd
    2. hare
    1. -V
    2. EPV
    1. nɛ̀ː
    2. DEM
    1.  
    1. [èkpòd ù nɛ̀ː]
    2.  
    1. ‘this hare’ (ikaan164.081)

If, however, the final consonant is /m/ or /ɡ/, the consonant is deleted, as shown in (30).

    1. (30)
    1. a.
    1. ɛ̀r̥áɡʊ̃́m
    2. sheep
    1. nɛ̀ː
    2. DEM
    1.  
    1. [ɛ̀r̥áɡʊ̃́ nɛ̀ː]
    2.  
    1. ‘this sheep’ (ikaan096.007)
    1.  
    1. b.
    1. èwóɡ
    2. finger
    1. nɛ̀ː
    2. DEM
    1.  
    1. [èwó nɛ̀ː]
    2.  
    1. ‘this finger’ (ikaan096.021)

In most consonant-final utterances, statements simply end in a consonant. If the consonant is voiced, it undergoes devoicing, as in (31).

    1. (31)
    1. a.
    1. ò-ʃédʒ
    2. 3SG-steal
    1. í
    2. EPV
    1. b-
    2. LOC
    1. ɛ̀ːwɪ́
    2. goat
    1. ɛ̀ːdʒ
    2. 1SG.POSS
    1.  
    1. [òʃédʒíbɛ̀ːwɪ́ɛ̀ːʃj]
    2.  
    1. ‘S/he stole my goat.’ (ikaan172, 14s)
    1.  
    1. b.
    1. ɔ̌ː-wɔ̃̀
    2. 3SG-drink
    1. ʊ̀hjã́
    2. alcoholic.drink
    1. òjíd
    2. palm.tree
    1.  
    1. [ɔ̌ːwʊ̃̀hjã́òjít]
    2.  
    1. ‘S/he drank palm wine.’ (ikaan171, 15s)

In questions, on the other hand, a vowel is added at the end, similar to the vowel insertion process in (29). The inserted vowel shows the properties of an epenthetic vowel—it is high and takes its rounding and tone from the preceding vowel. Like other final vowels in questions, it becomes breathy and/or voiceless, as in (32).

    1. (32)
    1. a.
    1. ò-ʃédʒ
    2. 3SG-steal
    1. í
    2. EPV
    1. b-
    2. LOC
    1. ɛ̀ːwɪ́
    2. goat
    1. ɛ̀ːdʒ
    2. 1SG.POSS
    1.  
    1. [òʃédʒíbɛ̀ːwɪ́ɛ̀ːɟɪ̤̀ɪ̥]
    2.  
    1. ‘Did s/he steal my goat?’ (ikaan172, 14q)
    1.  
    1. b.
    1. ɔ̌ː-wɔ̃̀
    2. 3SG-drink
    1. ʊ̀hjã́
    2. alcoholic.drink
    1. òjíd
    2. palm.tree
    1.  
    1. [ɔ̌ːwʊ̃̀hjã́òjídi̋i̤i̤]
    2.  
    1. ‘Did s/he drink palm wine?’ (ikaan171, 15q)

Speakers are consistent in inserting epenthetic final vowels, and no variation regarding the presence or absence of the vowel has been found. The only differences between speakers are in the actual durations of the vowel and in the presence of the various phases of devoicing. Figure 21 shows the speaker in recording ikaan171, who pronounces the consonant-final statement with a strongly released final devoiced plosive and the vowel-final statement with all three phases of phonation mode (modal, breathy, and voiceless).

Figure 21 

Vowel insertion in a consonant-final question.

Utterances ending in /ɡ/ or /m/ behave differently from other consonant-final utterances and do not end in inserted vowels. Instead, they show deletion processes like that shown in (30). Final /ɡ/ surfaces and devoices in statements but is deleted in questions, as shown in (33). The final vowel in the question becomes breathy and then devoices.

    1. (33)
    1. a.
    1. ɔ̀-kɪ́ɡ
    2. 3SG-go.to
    1.  
    1. [ɔ̀kɪ̰́k]
    2.  
    1. ‘S/he went.’ (ikaan175, 8s)
    1.  
    1. b.
    1. ɔ̀-kɪ́ɡ
    2. 3SG-go.to
    1.  
    1. [ɔ̀kɪ́ɪ̤ɪ̥]
    2.  
    1. ‘Did s/he go?’ (ikaan175, 8q)

Figure 22 gives waveforms and spectrograms of both the statement and the question, showing the devoiced consonant in the statement and the breathy and voiceless vowel in the question.

Figure 22 

/ɡ/-final utterance ɔ̀kɪ́ɡ ‘s/he went’ as statement with final /ɡ/ (left) and question without final /ɡ/ (right).

Final underlying /m/ is deleted in both statements and questions. In both utterance types, the final sound of the utterance is the vowel that precedes the underlying /m/, as shown in (34). The final vowel in the statement is 184ms long and shows creaky voice, while the final vowel in the question is longer, with a duration of 286ms, and becomes breathy and then voiceless, as shown in Figure 23.

Figure 23 

/m/-final utterance ɔ̀mâm ‘s/he laughed’ as statement (left) and question (right).

    1. (34)
    1. a.
    1. ɔ̀mâm
    2. 3SG-laugh
    1.  
    1. [ɔ̀mâa̰ʔ]
    2.  
    1. ‘S/he laughed.’ (ikaan171, 11s)
    1.  
    1. b.
    1. ɔ̀mâm
    2. 3SG-laugh
    1.  
    1. [ɔ̀mâa̤ḁ]
    2.  
    1. ‘Did s/he laugh?’ (ikaan171, 11q)

How can the presence of inserted vowels and the deletion of final consonants be explained? Instead of following a prosodic approach with phonation mode changes, we might assume that there is a sentence-final clitic = h, whose function is to mark questions. This = hQU’ cliticizes to the last word in the utterance. If the utterance-final word ends in a vowel, the sequence of vowel and = hQU’ might sound like [aa̤ḁ] or [ah], which is what can be observed in the data. This approach is promising for three reasons.

First, if we allow for a range of variation in the phonetic realization of /h/, a sentence-final clitic = hQU’ accounts for vowel-final and consonant-final utterances. The fact that the sentence-final clitic = hQU’ consists of a consonant would explain why a wellformedness constraint prohibiting consonant sequences would be violated and why, as a result, vowel insertion and consonant deletion occur to repair these violations.

Second, sentence-final clitics are not unknown in Ikaan and are attested elsewhere in the grammar. Negation is expressed (among other markers) with a sentence-final clitic = ɡNEG,’ as shown in (35).

    1. (35)
    1. a.
    1. dʒɛ̀-há
    2. 1SG-see
    1. èr̥ìd
    2. smoke
    1. ‘I saw smoke.’
    1.  
    1. b.
    1. dʒɛ̌ː
    2. 1SG.NEG-see
    1. èr̥ìd
    2. smoke
    1. ì
    2. EPV
    1. = ɡ
    2. = NEG
    1. ‘I did not see smoke.’ (ikaan096.153)

Third, unlike the use of phonation mode changes as a strategy for making grammatical distinctions, which would be rare, clitics as grammatical markers are cross-linguistically wide-spread and well-established.

However, while a clitic analysis seems reasonable from a phonological and a cross-linguistic perspective, there are drawbacks to the clitic analysis.

Firstly, a clitic = hQU’ would show highly unusual phonotactics for Ikaan. The consonant /h/ as a phoneme does not occur in final position in any other word or construction in Ikaan. The question-final position would be the only instance where the phoneme /h/ would occur in final position. Secondly, while a clitic = hQU’ does account for the breathiness and voicelessness in questions, it cannot account for the contrast with creak and glottal stops in statements. A breathy vs. creaky contrast, however, integrates both contexts. Thirdly, with a segmental clitic, the tight prosody package that marks statements and questions becomes less coherent. Phonation mode on the other hand fits perfectly into a prosodic bundle. Finally, a clitic = hQU’ analysis would lose the parallels that can be drawn with Gur languages, where Rialland (2009) also finds breathy phonation as a question marking strategy.

But if not because of the presence of a consonantal segmental clitic, how can the vowel insertion and vowel deletion in consonant-final utterances be explained?

From a language-internal perspective, other parts of Ikaan phonology may be able to shed light on this behaviour. While it is perfectly possible to change the phonation mode of a voiced consonant from modal to breathy and voiceless, devoicing is not a distinctive enough process in Ikaan. Final devoicing is already employed as a process to mark phrase boundaries, as in (31) and (33), where /dʒ d ɡ/ devoice at the phrase boundary to [tʃ t k] respectively. Devoicing of final consonants would, therefore, not be a sufficient marker for identifying questions. With devoicing already in use elsewhere, a language-internal reasoning could be that vowel insertion and consonant deletion are used to provide a vowel that can subsequently be gradually devoiced by breathy phonation as a question marking strategy.

From a comparative perspective, the inserted vowel could be a reflex of the low vowel that forms part of the historic lax prosody proposed by Rialland (2009). Rialland (2007, p. 49) finds reflexes of this vowel occurring in many languages in different forms, as for example in Tikar, where the final vowel has two surface forms [ɛ] and [a]. The surface form shows vowel harmony with the preceding vowel, with [ɛ] following the vowels [i e ɛ] and [a] following the vowels [u o a] (Stanley, 1991).

The inserted vowel could also be interpreted as an instance of final lengthening (see section 4.6.), as discussed for Ncam (Rialland, 2009, pp. 930–931, citing Podi 1995) and Moba (Rialland, 1984, p. 261). These languages have similar vowel insertion processes occurring with consonant-final utterances, as in Ncam in (36) or Moba in (37). For Moba, the vowel insertion is explained as a process that provides segmental material for the realization of the melody as well as the breathy termination. Both of these processes require a final vowel, which is reminiscent of a more archaic stage of the language (Rialland, 1984, p. 261; 2009, p. 933).

    1.  
    1. a.
    1. tī nyát
    2. nights
    1. EPV
    1. ‘nights’
    1.  
    1. b.
    1. tī nyát
    2. nights
    1. -āːà
    2. QU
    1. ‘nights?’
    1. (37)
    1. Moba
    1. statement
    2. kút ̀
    3. nìb ̀
    4. bát ́
    1. question
    2. kúdí
    3. nìbì
    4. bádí
    1. gloss
    2. ‘make’
    3. ‘people’
    4. ‘cook’

To sum up, a morphosyntactic solution employing a consonantal clitic seems compelling at first because it accounts for the occurrence of segmental processes. On the other hand, a clitic solution does not account for the second, contrasting phonation mode in the language and does not capture the parallels with both phonation mode changes and vowel insertion processes attested in other Niger-Congo languages that have similar prosodic packages to encode questions. A prosodic approach, including breathy voice rather than a clitic = h, captures both the language-internal prosodic package better and explains the similarities with other languages.

This prosodic approach sees pitch as a form that has many functions in the phonology, lexicon, and grammar of Ikaan. At the prosodic level, pitch functions as one of the many prosodic layers for marking yes/no questions. The prosodic packages for marking statements and questions with their various layers can be viewed as two prosodic morphemes. Each of the prosodic morphemes has a range of prosodic allomorphs that combine different layers of prosodic markers depending on the various segmental and tonal contexts (cf. Liu et al., 2013).

But where do these many dimensions of prosody come from and how can their use be explained? In a regional context this is discussed by Rialland (2007, 2009) and from a cross-linguistic perspective by Ohala (1984) and Gussenhoven (2004). Both perspectives are discussed in the following sections.

5.2. Regional context

As a Benue-Congo language, Ikaan is part of the Niger-Congo phylum, for which Rialland (2007, 2009) proposes a lax question prosody. Rialland (2007, pp. 37–38) distinguishes question markers into tense strategies involving high pitch, and lax strategies involving low pitch, phonation mode, duration, and vocalic elements. Regarding their geographical and phylogenetic distribution, languages with tense question prosodies, which go with the cross-linguistic trend, occur at the periphery of the Niger-Congo area. They are attested in the west in Atlantic languages, in the north in central-western and north-western Mande languages, in the east in Kordofanian languages, and in the south in the Bantu languages (Rialland, 2009, pp. 929–930, 937–938). Tense question prosodies also occur in some Adamawa-Ubangi and Benue-Congo languages. Languages with lax question prosodies, which go against the cross-linguistic trend, occur in languages spoken in the core Niger-Congo area, where the languages with highest number of typically African features are found (Rialland, 2009, p. 930). They are attested in Gur, Kwa, and Kru languages, and in south-eastern Mande. As with tense prosodies, lax question prosodies also occur in some Adamawa-Ubangi and Benue-Congo languages.

Some languages in Rialland’s samples use question markers from both the tense and the lax set. Rialland calls these hybrid question prosodies. Regarding their geographical distribution, a cluster of hybrid prosody languages is present at the north-eastern fringe of the lax prosodic area, in Kordofanian languages and in some Nilo-Saharan languages (Kanuri, Krongo). Other hybrid languages seem to be scattered throughout Niger-Congo, e.g., Baule (Kwa) with a lax low boundary tone and potentially a tense register expansion, Bambara (Mande) with lax final /-a/ or /-wa/ and tense rising intonation, and Izon (Ijoid) with a lax final low tone and tense high tone raising. Rialland (2009, p. 938, 945) briefly discusses contact or convergence as causes, but because of her focus on lax prosody she does not go into more detail on hybrid prosody languages.

Ikaan uses both cross-linguistically more common tense question markers and some of the typical West African lax question markers described by Rialland (2007, 2009). Table 13 gives an overview of the Ikaan question markers in the classification proposed by Rialland. Register expansion, though not statistically significant across speakers, is included as a potential question marker at least for some speakers.

Table 13

Ikaan question markers in the classification proposed by Rialland (2007, 2009).

question prosody question marker

tense higher pitch onset
(register expansion)
lax expanded final lowering
breathy termination
final lengthening
final vowel insertion
unclassified creaky termination in statements
increased intensity
final consonant deletion

How has this mix of question markers come about? Rialland attributes the presence of lax question markers to the lax prosody of the historic question marker /-à/, which originated in Niger-Congo and later spread across the Macro-Sudanic belt to Afro-Asiatic and Nilo-Saharan languages (Rialland, 2009, p. 929). Tense question markers are attributed to contact with languages at the Niger-Congo periphery, e.g., in the case of Bambara (Rialland, 2009, p. 938). To account for the mixed prosody in Ikaan, we would, therefore, expect to find lax markers among linguistic relatives of Ikaan and tense markers among unrelated geographical neighbours of the language.

5.2.1. Inheriting lax prosodic markers from historic /-à/

Ukaan is part of the Benue-Congo family, in which Rialland (2007) finds lax prosodies in the Edoid, Cross-River, Plateau, Nupoid, and Idomoid subfamilies, as well as languages with high-pitched prosodic markers, e.g., Efik and Obolo (Cross-River), Igbo (Igboid), and Yoruba (Defoid).

Edoid has received the most attention in the more recent literature as a potential close relative for Ikaan (Abiodun, 1999; Agoyi, 2001; Elugbe, 2012). Edoid languages show strong patterns of lax prosody with low pitch markers such as final low boundary tones and a final vowel /-à/ (Rialland, 2009, p. 939), though there is no indication of breathy termination. Still, inheritance of lax prosodic markers through Edoid is conceivable. Some Edoid languages, however, follow different patterns. Yekhee (North-central Edoid), the geographically closest Edoid language to Ikaan in Rialland’s sample, uses tense prosody, marking questions with a high boundary tone. Engenni (Delta-Edoid) uses hybrid prosody with register expansion, downdrift reduction, a low boundary tone, and a final low-toned open vowel /-à/ and possibly final /-e/.

The lax and tense markers used in Edoid languages do not overlap well enough with Ikaan to single out Edoid as a particularly close match and candidate for inheritance of the lax question prosody. Low boundary tones are not attested in Ikaan, final vowels are attested but not with the same vowel quality and not with a low tone, and Edoid languages show no match for the creaky and breathy terminations found in Ikaan. Where Ikaan markers do match with features of individual Edoid languages, such as with Engenni, they do not fit the overall trend in Edoid.

5.2.2. Acquiring tense prosodic markers through contact

Ikaan speakers live in a context of rich multilingualism and language contact so that contact and convergence can easily be argued to play a role in Ikaan acquiring tense question markers. Speaker multilingualism is widespread today and it is reasonable to believe that at least some degree of multilingualism has been there for a long time. In addition, some of the oral history describes past migrations, which would have contributed to language contact and multilingualism in the past.

The main contact languages in the area are Akokoid languages, Edoid languages, dialects of Yoruba, and Ebira. In the last three generations, Nigerian Pidgin English and Nigerian English have become contact languages in some domains of language use for many but not all speakers. This highlights a problem with contact as a contributor of tense markers. Apart from Pidgin and Nigerian English, all the contact languages are Benue-Congo languages, which makes them linguistic relatives as well as contact languages. They may not be close linguistic relatives and are more likely to be distant cousins in the extended family, yet they still all descend from Benue-Congo, which in turn descends from Niger-Congo. Presumably all these languages would have inherited the same lax prosody and historic question marker /-à/. Leaving this problem aside for a moment, let us discuss these languages in terms of them being contact languages that may have introduced tense prosodic markers to Ikaan.

Akokoid languages are in close contact with Ikaan because of local trade, intermarriage, and child fostering. Unfortunately, Akokoid languages are not yet well described, so it is difficult to say anything about their contribution.

Edoid languages show predominantly lax prosodic markers (cf. above). Engenni may have contributed register expansion as a tense question marker, but this may be difficult to uphold as a contact-induced change. There is no known contact with Engenni from the oral history or other sources, and Engenni is located south of any historic migration routes that are described for the Ikaan-speaking community.

Ebira is a close contact language of Ikaan because there has been a substantial Ebira minority population in the villages for at least four generations. Many Ikaan speakers today also speak or at least understand Ebira. Ebira belongs to the Nupoid family, where Rialland finds lax prosodies in Gwari. Question prosody in Ebira is not yet described in detail, although Scholz (1976, pp. 53–54) gives the data in (38) and (39) to illustrate grammatical functions of tone.

    1. (38)
    1. máā-vɛ́
    2. máá-vɛ́
    1. ‘I came.’ (completive)
    2. ‘Did I come?’ (question)
    1. (39)
    1. ɔ́ɔ̄-càká-á
    2. ɔ́ɔ̄-càká-á-ā
    1. ‘He broke it.’ (statement)
    2. ‘Did he break it?’ (question)

If (38) is seen as contrasting statement and question rather than completive and question, a tonal change from mid to high could indicate a question. This is not a marker listed by Rialland (2007). In (39) a final vowel /ā/ with a mid tone distinguishes the question from the statement. Both the vowel and the lower tone are typical lax markers as listed by Rialland (2007). Scholz (1976, pp. 55–62) also discusses downdrift and final lowering, but only for statements and not in questions. With a possible tonal change from mid to high as a tense marker and a final vowel /a/ with a lower tone as lax markers, Ebira might be a hybrid prosody language. Ikaan speakers do not seem to have borrowed either of these markers into their language though, so that contact between Ebira and Ikaan cannot be seen as a contributing factor to the Ikaan hybrid prosody.

Yoruba is arguably the language with which Ikaan is in the closest contact. All Ikaan speakers in Ikakumo (Ondo) are also native speakers of the Yoruba dialect spoken in Akoko, and Yoruba is the lingua franca in the Akoko area, at least within Ondo State. In Standard Yoruba (Fajobi, 2005), yes/no questions have a higher onset pitch and a more expanded register than their corresponding statements. The last high tone may also be raised and the final syllable may be lengthened. In addition, Fajobi states that downdrift may be suspended, but does not show data to illustrate this. If Ikaan only had lax markers before its contact with Yoruba, it could indeed have acquired higher pitch onset and register expansion from Yoruba.

5.2.3. Ikaan realization of the historic /-à/ question marker

If Rialland’s proposal of a lax prosody and a historic /-à/ question marker is seen as a set of prosodic and segmental markers, it can be broken down into a bundle made up of

  • specifications for duration or weight (a mora)
  • specifications for segments (an open vowel)
  • specifications for tone (a low tone)
  • specifications for phonation mode (breathy)

If Ikaan had inherited the historic /-à / question marker, Ikaan would have lost segmental specifications since the vowels that occur in consonant-final utterances are not low but high vowels [i ɪ u ʊ]. In addition, the historic [-à] would have lost tonal specifications since there is no trace of a low tone or a low boundary tone.

Ikaan would have retained duration or weight specifications such as final lengthening and vowel insertion, and phonation mode specifications such as breathy voice and voicelessness. In addition, Ikaan would have increased the contrastiveness of breathy voice through creaky voice in the opposing declarative sentences.

5.2.4. Tense and lax prosodies: How to fit in intensity?

Assuming that we can account for hybrid prosody through inheritance from Niger-Congo and contact with Yoruba, this still leaves increased intensity as a question marker that cannot easily be included in a lax/tense distinction because it is not easily explained by laryngeal settings.

Increased intensity is the result of increased respiratory activity. A voice becomes louder if more air is pushed out of the lungs and flows across the glottis. The increased airflow blows the vocal folds apart wider and more forcefully. The vocal folds are then sucked together with more force, resulting in a more forceful and, therefore, louder glottal pulse. In addition, more airflow across the glottis may also result in higher pitch. The higher pitch in louder voices is not necessarily a result of more tension in the vocal folds. Instead, it is a side effect of the increased airflow, which leads to increased speed across the glottis, which in turn leads to lower pressure along the edges of the glottis, which in turn sucks the vocal folds together more quickly, which then leads to a higher frequency.

If intensity cannot fit into a tense vs. lax distinction, how else can we account for the fact that Ikaan questions are generally louder than statements? One possibility is to pursue the links between intensity and higher pitch. Alku et al. (2002) show that on the one hand loudness may affect pitch, in that raised intensity may lead to higher pitch. On the other hand, pitch may also affect loudness, in that speakers may increase the pitch of their voices to produce a louder voice. If the latter holds true for Ikaan, raised intensity may be a by-product of the higher onset pitch. If the former holds true, it leaves open the question why speakers produce a louder voice in the first place.

A second possibility would be to investigate how focus is realized phonetically in Ikaan. If increased intensity is a focus marker, then it could be that the entire yes/no question is in focus and is therefore louder. Increased intensity would then be a marker of the function ‘focus’ rather than a marker of the type ‘question.’

A third possibility is to re-think this classification into tense and lax markers, which is based on the physiology of what speakers do. Instead, increased intensity could be seen as part of a package with which speakers want to achieve something, such as catching the hearer’s attention by making the question stand out and persuading the hearer to answer their question. In addition to linking intensity in with other prosodic markers, this approach would have the added advantage of fitting in with the ethologically-based hearer-centred approach of the frequency code (Ohala, 1984) and the production and frequency code (Gussenhoven, 2004).

5.3. Cross-linguistic context – from speaker-centred back to hearer-centred?

Hybrid question prosodies can to some degree be explained by language contact. This is not satisfactory, however, when the contact language which donated the outside feature is from the same language family as the recipient language and should have inherited the same features as the recipient language.

In terms of historical linguistics, it might help to take a different approach. This would have two consequences. First, instead of assuming that the low question prosody was the original prosody of the Niger-Congo phylum and that languages with hybrid prosodies lost some of their purity through contact, the trajectory may have been from the cross-linguistically widespread tense prosody languages via hybrid question prosody languages to lax prosody languages. Languages with hybrid prosody would be of interest to linguistic research because they would have introduced lax aspects into the prosody. Languages with lax prosody would be of interest to linguistic research because they are the logical extension of the process of using lax prosodic markers, against the cross-linguistic trend.

The second consequence would be that Rialland’s (2007, 2009) classification of West African languages would not be sufficient anymore. Based on speaker physiology during speech production, only pure tense prosody languages and pure lax prosody languages find a neat classification in Rialland’s model. Hybrid languages simply cannot be explained sensibly. In Ikaan, higher pitch (tense laryngeal settings) and breathy phonation plus voicelessness (lax laryngeal settings) mark questions. Non-high pitch (lax) and creaky phonation plus glottal stops (tense) mark statements. The prosodic markers for the contrasting functions of statement and question do not divide along the lines of laryngeal settings because each function combines both tense and lax settings. Hybrid languages like Ikaan might motivate a shift away from Rialland’s classification based on speaker physiology during speech production to a classification that includes properties of the acoustic signal and their affective interpretation by the hearer. So where can links between tone and phonation mode be found that might offer an alternative explanation?

Older surveys of question intonation mention a wide range of prosodic question markers (Bolinger, 1978; Greenberg, 1969; Hermann, 1942; Ultan, 1969). None of these, however, attempt to explain how and why these markers co-occur with pitch markers. Ohala’s frequency code approach (1984) and Gussenhoven’s production and frequency code approach (2002, 2004) investigate the underlying reasons as to why pitch has such an important role in questions, but focus only on pitch and do not include other prosodic markers.

According to the frequency code model, someone asking a question is in a position of uncertainty and therefore weakness because of missing information. The speaker uses high pitch to signal to the hearer affective meanings of friendliness in order to appeal to the goodwill and cooperation of the hearer. At the same time, the speaker needs to catch the attention of the hearer in the first place, which requires a balance between attention on the one hand and friendly, appealing, and non-threatening behaviour on the other hand. In the development of question prosody, an increase of intensity as seen in Ikaan could have been an attention catcher, as long as it was not too much of an increase so that it would be perceived as shouting or threatening behaviour. Higher pitch would signal friendliness and non-threatening behaviour, but how could phonation mode have come about in the development of the statement and question prosodies?

Ohala’s frequency code draws on and extends Morton (1977). Morton describes and analyzes a wide variety of close contact animal vocalizations by different bird and mammal species. He shows links between low-frequency harsh vocalizations and hostile communicative meanings on the one hand, and high-frequency pure-tone vocalizations and friendly, frightened or appeasing communicative meanings on the other hand. Morton argues that these links are not arbitrary but would arise through evolutionary pressures. His proposals are intended as a model to compare the evolution of vocal communication in any species, including humans, and would motivate including creaky voice as a marker for statements.

From an experimental perspective, Xu et al. (2013) show that hearers rate breathy voices in both men and women as more attractive. Speakers with attractive voices in turn receive more support during social interactions (Sarason, 1985), which would explain why the use of breathy voice may be advantageous in the development of a question prosody.

An active area of research that investigates the affective and social meanings indexed by the use of language is sociolinguistics. By its very definition, sociolinguistic research is specific to individual groups and cultures. However, language is common to humans as a species and if the frequency code is innate to humans it should underlie all culture-specific variation.

Podesva (2013) reviews older literature on links between gender and phonation mode in British Englishes. In these accents, creak is traditionally associated with masculinity (Esling, 1978; Henton & Bladon, 1988; Stuart-Smith, 1999a, 1999b). More recent research finds that this link has weakened. Studies such as Yuasa (2010), Mendoza-Denton (2011) and Podesva (2013) find that American women use creak to index traditionally male attributes such as an authoritative stance or toughness. The sociolinguistic meanings indexed by creaky voice in these studies overlap very well with the meanings Ohala (1984) and Gussenhoven (2004) associate with low pitch. In Ohala’s frequency code, low pitch is associated with derived meanings such as ‘dominant’ or ‘aggressive.’ In Gussenhoven’s production and frequency code, low pitch signals termination as well as masculinity, authoritativeness, assertiveness, and protectiveness.

While there are indicators linking creaky voice and low pitch, it is not as easy to make a parallel case for linking breathy voice and high pitch. Empirical research on the meanings indexed by breathy voice is not as rich as for creaky voice, though some research does exist. Dehqan et al. (2010) establish baseline acoustic parameters for Iranian men and women and find a greater harmonic-to-noise ratio, which is linked to added breathiness, among women than among men. With their acoustic focus, however, they do not discuss whether the breathiness itself indexes femininity, and they do not discuss which affective meanings would be associated with femininity in Iranian societies. Kajino and Moon (2011) investigate interview-style speech of two Japanese adult video actresses and find that breathy voice is exploited to index sexual invitation, though creaky voice is also used in the same context by the same speakers. Callier (2012) investigates actors’ voices in a Mandarin TV drama, finding correlations between phonation mode, on the one hand, and gender and good vs. bad characters on the other. The strongest users of creaky and harsh voice are the ‘villainous’ female and male characters respectively. The two ‘good’ female characters use very little non-modal voice, but the two ‘good’ male characters use high rates of breathy voice. If the use of breathy voice in these studies can be interpreted as indexing friendly, non-threatening behaviour that is desirous of the receiver’s goodwill, and if studies like these are representative of a wider trend, the meanings associated with high pitch and the meanings associated with breathy voice show some similarities and can serve as a first empirical finding to motivate including breathiness in the frequency code model together with high pitch.

Sociolinguistic research shows that speakers do use prosodic markers based on phonation mode to signal affective meanings similar to those signalled by pitch for questions and statements. Prosodic research has not yet systematically investigated phonation mode in statements and questions, so it is unknown whether speakers outside Africa employ phonation mode to mark the statement-question contrast. But if speakers were able to distinguish questions from statements using only pitch markers, why would they add layers of non-pitch markers that result in very complex prosodic systems? For West African tone languages the answer may be that in these languages pitch is very busy already and higher question tone may be difficult to implement, as suggested as early as Hermann (1942, p. 364). In languages such as Ikaan, pitch as tone carries meaning and functions at various levels, from the lexicon to the morphology, syntax, and the interface between semantic, syntax, and phonology. There is very little wriggle room at the categorical tonal level; small changes in tonal placement and pitch contour affect the meaning or structure of an utterance. If morphosyntactic segmental and tonal markers do not contribute, other prosodic markers in addition to pitch are brought in to share the work. The multifunctional nature of tone triggers a multidimensional form of prosody where intonation makes systematic use of other prosodic markers to relieve pitch of some of the burden. In Ikaan, creaky voice is combined with low pitch to mark and/or reinforce the marking of statements. Breathy voice is used to help high pitch to mark and/or reinforce the marking of questions. This use of phonation mode is possible because the affective meanings indexed by low pitch and creakiness vs. high pitch and breathiness overlap. The result is a hybrid prosody, from the perspective of the laryngeal settings of the speaker, but a unified question or statement prosody from the perspective of the affective meanings signalled to the hearer. The Gur, Kwa, and Kru languages at the core of Niger-Congo may have taken the use of pitch and phonation mode even further than Ikaan: Speakers of these languages leave out high pitch altogether and use only breathy phonation to signal to the hearer the affective meanings associated with question formation.

6. Conclusion

There are various ways for speakers to mark the function ‘question.’ There are syntactic means such as inversion in German, morphological means such as the question particle ʃé in Yoruba, and intonational means, particularly for declarative questions. In intonation, many languages use pitch, and in many languages, statements are marked with low and/or falling pitch whereas questions are marked with high and/or rising pitch. As an underlying explanation for this pattern, models have suggested innate codes for human speakers. These codes propose that the mapping between form and meaning is not arbitrary, but instead ethologically motivated, linking low pitch to large vocalizers, strength, certainty, and independence, and high pitch to small vocalizers, weakness, uncertainty, and a desire to evoke goodwill and collaboration in the hearer (Gussenhoven, 2004; Ohala, 1984).

In many West African languages, however, questions are marked with low or falling intonation. Moreover, some West African languages use phonation mode as an additional marker, often co-occurring with supporting markers such as lengthening of the vowel. This important countertrend has been identified by Rialland (2007, 2009), and has led her to propose a West African question prosody with lax laryngeal settings that can be traced back to a historic /-à/ marker. This proposal accounts well for the lax-only prosodies; the tense-only prosodies are of course already accounted for by the general cross-linguistic trend. What throws a spanner in the works, however, are languages that combine tense and lax markers. Since there were fewer of those in Rialland’s sample, and since the mixed languages did not occur in the core-area languages, Rialland (2009) explained these mixed situations by contact and convergence.

This paper discussed the prosodic encoding of yes/no questions in Ikaan, a language that combines both tense and lax markers, and related the results to findings for West African languages, as well as to cross-linguistic tendencies and models. The paper showed how speakers of Ikaan distinguish questions from statements with higher onset pitch, expanded final lowering of low tones, breathy termination, final lengthening of vowels and vowel insertion or consonant deletion, and increased intensity. In addition, speakers use creaky termination to mark statements. There is variation among speakers in the use of some of the prosodic markers. In addition, not all markers apply in all phonological contexts, some markers would not be expected to be recognizable in utterances in isolation, and some hearers did not have to hear the realization of a prosodic marker before being able to tell whether an utterance constituted a statement or a question. This suggested that individual markers on their own are not sufficient to distinguish statements from questions. Instead, it is the bundle of prosodic markers that works as a prosodic morpheme to mark the distinction. Different prosodic allomorphs of the morpheme are used in different contexts so that other question markers pick up the functional load when one marker cannot apply.

The Ikaan question markers were shown to not fit well into the tense vs. lax distinction proposed by Rialland (2007). While contact and convergence could be argued to account for the two tense markers used by Ikaan speakers, these tense markers would have to have come from another Benue-Congo language. This would simply move the problem of the occurrence of tense markers to another language that should have inherited a lax prosody. Also, intensity would not be accounted for by a tense vs. lax model. Instead of Rialland’s approach based on the physiology of the speaker, a different, more hearer-oriented approach was proposed. This approach tied in with the frequency code (Ohala, 1984) and the production and frequency code (Gussenhoven, 2004) and showed how creaky and breathy voice could be included to index similar meanings, as indexed by low and high pitch respectively, using research findings from animal communication, experimental prosodic research and sociolinguistic literature on phonation modes as a basis.

Follow-up work on Ikaan needs to test how far the cues that seem salient from the production data are also salient as perceptual cues for Ikaan hearers. Ideally, these cues would be tested with data from a speaker whom the hearers do not know, so that familiarity with individual speakers can be excluded as a factor (although this is a rather artificial scenario given the reality of the community, as most people know each other at least to some degree). However, even if perceptual saliency can be established for the individual cues, it would still be difficult to make generalizations about how hearers perceive questions in Ikaan. This is because most Ikaan speakers today speak at least also Yoruba and Nigerian English or Nigerian Pidgin English, as well as having passive, if not active, knowledge of Ebira; they often also speak other languages such as Hausa or other Nigerian languages. Without further research on multilingual speakers, it would be next to impossible to tell how the speakers’ phonologies are organized and which language(s) hearers draw on as a reference frame when judging test recordings.

Further follow-up work could investigate other types of questions, both in Ikaan and in the other languages Ikaan speakers know and understand. For example, there are other ways of expressing yes/no questions, which include question particles that add shades of meaning to the utterance (see Note 5) and of course wh-questions. It is not yet clear whether these types of questions are also marked with the same prosodic markers. Finally, the current findings for controlled elicitation sessions could be tested against more natural data to see how speakers mark statements and questions in less controlled contexts. Such data has already been collected for Ikaan, Yoruba, and English and is awaiting analysis (see data under the blocks worlds theme in Salffner, 2014).

The data presented here, however, already shows that languages with ‘hybrid’ question prosody and prosodies using lax laryngeal settings encourage us to reconsider the frequency code and the production and frequency code from a broader perspective. Leaving the meanings signalled to the hearer as they are, the encoding through fundamental frequency can be expanded to a multidimensional view of prosody that integrates a range of prosodic markers into the model. Further research on the frequency code could incorporate tone languages with complex prosodies into these models to account for phonation mode distinctions, duration, and intensity changes from an ethological perspective. Languages such as Ikaan, therefore, not only push us to expand the notion of prosody and offer a test bed for investigating an expanded idea of prosody, but also shed more light on the link between form and meaning beyond the arbitrariness of the signifier.

Additional Files

The additional files for this article can be found as follows:

Appendix A

Notes on transcription, glossing and citation references to archived data. DOI: https://doi.org/10.5334/labphon.94.s1

Appendix B

List of utterances elicited as statements and questions from speakers. DOI: https://doi.org/10.5334/labphon.94.s2