Lenition is a pervasive phenomenon in human language. Although lenition patterns have been described in hundreds of languages (e.g., Kirchner, 1998; Lavoie, 2001), fundamental questions about the nature of these patterns are still a topic of debate in the phonetic and phonological literature. Several recent studies address the question of how best to describe, measure, and quantify lenition in phonetic terms (e.g., Kingston, 2008; Hualde, Simonet, & Nadeu, 2011; Warner & Tucker, 2011; Bouavichith & Davidson, 2013; Ennever, Meakins, & Round, 2017; Cohen Priva & Gleason, 2019). Understanding the functional nature of lenition and its place in phonological grammar requires that we first understand what lenition does to sounds; while there is broad agreement that lenition tends to shorten consonants and render them louder or more vowel-like, there is limited consensus on the most principled ways to measure these properties.
This paper explores consonant lenition in a corpus of field recordings of Campidanese (also called Campidanian) Sardinian, a language with complex lenition patterns that interact with voicing, manner, and length contrasts (Virdis, 1978; Bolognesi, 1998). We show that the duration and change-in-intensity algorithm devised by Ennever et al. (2017) can be fruitfully extended to a different language and to a more heterogeneous corpus. We also compare the measurements from this algorithm to various alternatives in use by other researchers: intensity slope extrema (Kingston, 2008; Hualde et al., 2011), intensity minima (Bouavichith & Davidson, 2013; Cohen Priva & Gleason, 2019), and qualitative phonetic features such as the presence or absence of stop bursts and voicing (Warner & Tucker, 2011; Bouavichith & Davidson, 2013). One way of characterizing the results is that all of these measurements succeed at capturing information about lenition patterns. Comparison of regression models with and without mediating factors, however, allows us to draw some preliminary conclusions about which phonetic factors are causally prior to others (Ennever et al., 2017; Cohen Priva & Gleason, 2019).
The paper also provides the first detailed phonetic description of Campidanese consonants. The consonant system has been described in broad phonetic and phonological terms (Virdis, 1978; Bolognesi, 1998), and has figured in debates about theoretical phonology (e.g., Lubowicz, 2002; Hayes & White, 2015). But there are only a few sources of detailed quantitative data on Campidanese: Frigeni (2009) focuses on sonorants and Cossu (2013) on vowels. The current study helps clarify the nature of consonants, contrasts, and lenition in Campidanese, which have been the subject of some uncertainty and disagreement in phonological descriptions of the language.
The remainder of this section describes recent developments in the measurement of lenition, reviews phonological descriptions of Campidanese, and discusses the relationship between phonetic studies of lenition and overarching questions about its fundamental nature. Section 2 describes the materials in the Campidanese corpus. Section 3 tests the robustness and consistency of various methods for measuring duration and intensity. Section 4 presents phonetic results on prosodically-conditioned lenition-fortition patterns and explores their causal structure. Section 5 discusses the implications of the findings for the theory of lenition.
‘Lenition’ is a label assigned to a large and heterogeneous set of phonetic and phonological patterns (see Honeybone, 2008 for a thorough history of the term). It is generally agreed to involve some notion of reduction or weakening, in articulatory terms (Donegan & Stampe, 1979; Kirchner, 1998), acoustic terms (Kingston, 2008; Katz, 2016), or featural/informational terms (Harris, 2003; Ségéral & Scheer, 2008). While the use of the term lenition varies quite a bit between researchers, there are certain processes that are universally considered to be ‘core’ cases. This study concerns two such processes: voicing lenition, where voiceless obstruents become voiced; and spirantization lenition, where stops become continuants.
We single out the lenition ‘versions’ of these processes, sometimes referred to as ‘sonorization’ (Szigetvári, 2008) or ‘continuity lenition’ (Katz, 2016). There are other processes that affect the voicing or continuancy of obstruents but differ from the lenition patterns described here in terms of their characteristic contexts, interaction with phonological contrast, or phonetic characteristics (e.g., final devoicing, assibilation). The continuity lenition processes studied here are typologically widespread, defined as those that affect consonants in intervocalic position in every language in which they occur, with extensions to some non-intervocalic consonants in some languages (Kirchner, 1998; Lavoie, 2001). Both voicing and spirantization have a strong typological tendency to be complemented by strengthening or fortition at the beginning of prosodic domains (Gurevich, 2003; Katz, 2016). Both processes tend to increase the intensity of consonants and thus their similarity to surrounding vowels or other sonorant sounds (Kingston, 2008; Bouavichith & Davidson, 2013; Ennever et al., 2017). And both processes tend to reduce the duration of consonants (Kingston, 2008; Hualde et al., 2011; Ennever et al., 2017). While virtually all camps agree that intensity, duration, and qualitative features (e.g., voicing and manner) are key acoustic properties in this type of lenition-fortition pattern, there are significant questions about how best to measure each of them.
Intensity measurements are affected by many factors extrinsic to language, such as the physical recording setup, level of background noise, and general loudness of a speaker’s voice. This means that raw intensity measurements may not be reliable for tracking linguistic properties. Researchers have proposed various ways to control for such factors: measurements of consonant intensity relative to other consonants in a recording session with the same underlying features (Cohen Priva & Gleason, 2019), relative to some part of a flanking vowel or transition (Warner & Tucker, 2011; Ennever et al., 2017), or measurements of intensity slope/velocity during the transitions to and from flanking vowels (Kingston, 2008; Hualde et al., 2011). These procedures, however, may introduce their own problems: Because some of them reflect the intensity of adjacent vowels, they may not reliably isolate consonantal lenition effects (Bouavichith & Davidson, 2013; Cohen Priva & Gleason, 2019). This is not only a methodological issue. It reflects uncertainty about the relevant notion of intensity for lenition-fortition patterns in terms of a speaker’s production, perception, or mental representation. Intensity may be static (pertaining to the consonant alone), relative (to something in the proximal or distal context), dynamic (pertaining to rates of change), or some mixture of these possibilities. In Section 4, we compare several types of intensity measurement with regard to lenition-fortition patterns at multiple levels of prosodic boundary. We show that all measurements pattern similarly in boundary-driven lenition, but differ with regard to stress and features of underlying representations.
Measuring segment duration is hard even in the clearest and slowest speech. Because segments or gestures in actual speech are produced in overlapping and interactive ways (e.g., Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; Browman & Goldstein, 1986), it is frequently the case that no point in the acoustic record corresponds straightforwardly to the beginning or end of a segment (Ladefoged, 2003). Even acoustic regions that appear to correspond straightforwardly to vowels contain perceptual information about flanking consonants (Raphael, 1972; Sussman & Shore, 1996). Conversely, acoustic regions associated with consonants contain perceptual information about adjacent vowels (Winitz, Scheib, & Reeds, 1972; Yeni-Komshian & Soli, 1981), and this information increases with consonantal intensity or sonority (Katz, 2013). The fact that lenited consonants have increased intensity and are more similar to surrounding vowels makes measuring duration especially difficult. Ennever et al. (2017) give an insightful overview of this issue and some of the points here are drawn from that discussion.
The most frequent criteria for placing boundaries between vowels and consonants in lenition (and other kinds of) studies pertain to changes in intensity: The onset of a stop, for instance, is frequently marked at a point where formant structure or pitch periods in the preceding vowel disappear from the visual display of a spectrogram, and the onset of an approximant at a point where the spectrogram notably lightens (e.g., Hualde et al., 2011; Warner & Tucker, 2011; Bouavichith & Davidson, 2013). The location of such points, however, depends in part on the display settings of phonetic software, the intensity of other sounds within the viewing window, and the researcher’s visual acuity and judgment. Another issue here is that duration marked using visually obvious intensity changes may not be measuring the same thing for different manners of consonant. One could imagine, for instance, that stop closure gestures will noticeably reduce the intensity of a preceding vowel relatively early in their trajectories, while approximant constriction gestures could take longer to exert a noticeable effect. For these reasons (as well as practical efficiency), Kingston (2008) and Ennever et al. (2017) develop automated duration-measurement algorithms based on changes in intensity. Kingston’s algorithm measures the interval from the greatest downward intensity slope (corresponding to consonantal constrictions) to the greatest upward slope (corresponding to the release of constrictions); Ennever et al. (2017) use smoothing splines fit to intensity contours, and propose a threshold at a fraction of the slope extrema, meant to correspond more closely to articulatory landmarks in consonantal constriction gestures. In Section 3, we extend the Ennever et al. algorithm to a new language, to multiple speakers in sometimes sub-optimal recording conditions, and to more heterogeneous phonological materials with regard to stress, prosodic position, and consonantal features. We also investigate optimal settings for the free parameters in the algorithm.
A third type of measurement frequently used in lenition studies is the categorical presence or absence of manner-related phonetic features, such as bursts, formants, and voicing. Indeed, this is the implicit methodology of virtually all phonological descriptions of lenition phenomena, where the linguist generally classifies segments by ear; this is why processes like spirantization and voicing tend to be described in the first place as changes in phonological features. More laboratory-oriented approaches use acoustic software to assess the presence or absence of such features, subsequently quantifying their probabilities of occurring across different consonants and phonetic environments to derive an index of lenition (e.g., Lavoie, 2001; Hualde et al., 2011; Warner & Tucker, 2011; Bouavichith & Davidson, 2013). Ennever et al.’s (2017) criticism of visual-inspection procedures also holds for these qualitative features: Bursts, formants, and voicing all exist on a continuum of strength or magnitude, and detecting them may involve many factors beyond the intrinsic phonetic properties of consonants. They illustrate this point dramatically with a comparison of formant ‘breaks’ in two spectrograms; we give a similar illustration in Figure 1 with a comparison of weak bursts drawn from our Campidanese materials.
The spectrograms each show a VCV sequence. A researcher in the midst of classifying a large number of segments would probably classify the consonant on the left as having no burst (and possibly a ‘break’ in formants), and the one on the right as having a (weak) burst. The ‘trick’ here is that these are two spectrograms of the same token; the dynamic range setting in Praat (Boersma & Weenink, 2018) is the only difference between them. If the measurement of such qualitative phonetic features depends on software settings in addition to actual phonetic properties, one could doubt whether these features are appropriate for phonetic studies.
More generally, one could argue that categorical classifications of this type arbitrarily impose binary restrictions on underlyingly continuous properties: Formants, voicing, bursts, and frication can be present at higher or lower levels of intensity and perceptibility. Qualitative classifications, on this view, are best thought of as (visual and auditory) perceptual data from a two-alternative forced-choice task performed by the researcher: As such we expect them to be subject to all of the random and non-random sources of error familiar from perceptual studies, but we also expect them to contain non-trivial information about the underlying properties of the stimuli. In Section 4, we compare categorical phonetic features to continuous measurements of intensity and duration. The question to be asked is not whether one type of measurement is superior to the other, but whether the information provided by the two modes of measurement is largely redundant or at least partially complementary.
Sardinian is a Romance language or dialect continuum descended from the Latin brought to Sardinia by Roman invaders in the first few hundred years BCE; Wagner (1941) and Blasco Ferrer (1984) give comprehensive historical accounts of the development of the language; the historical information in this section is drawn from those works and from Bolognesi’s (1998) concise summary. Inhabitants of the island were isolated from the rest of the former Roman empire by the sixth or seventh century CE, and this is ostensibly when Sardinian’s divergence from other early Romance varieties accelerated. Successive waves of invaders imposed Catalan, Spanish, and Italian as official languages in Sardinia, each of which has left some linguistic trace in parts of the island. In response to the chaos and repeated conquests that followed the fall of the Roman empire, Sardinian residents fled coastal cities and settled in isolated villages in the mountainous interior of the island. The result, persisting to some extent to the present day, is extensive geographical variation, with lexical items and pronunciations differing from one village to the next; this can be clearly seen in the dialectological studies of Blasco Ferrer (1984), Contini (1987), and Cossu (2013).
Despite this pervasive variation, most specialists agree that Sardinian varieties can be coherently classified into three main dialect groups; Campidanese is the dialect continuum spoken in most of the southern half of the island. The synchronic phonology of Campidanese is described in some detail by Virdis (1978) and Bolognesi (1998). Blasco Ferrer (1984) presents a wealth of historical information and broad phonetic transcriptions from a variety of towns and villages in the Campidano; Wagner (1941) is the seminal work on the linguistic history of Sardinia and includes information on Campidanese; and Cossu (2013) presents broad phonetic transcriptions from many Campidanese towns and villages, along with acoustic data on the vowel system. Contini and Boë (1972) and Frigeni (2009) discuss the phonetics and phonology of Campidanese sonorants and vowels in great detail, also presenting acoustic data on vowel nasalization. Molinu (1998, 2017) gives phonological analyses of alternations and phonotactic patterns, mainly focused on the Logudorese variety but also touching on Campidanese.
Despite this fairly extensive literature, we are not aware of any detailed acoustic investigation of Campidanese obstruents, nor of the consonant system more generally. This is surprising, because the phonology of Campidanese obstruents is extremely intricate and interesting. The description that follows is based on Virdis’ (1978) and Bolognesi’s (1998) accounts, though most data come from our fieldwork. We describe consonant categories and their underlying representations (URs) in line with previous accounts, though our phonetic data establish doubts about the precise nature of URs and surface forms for some consonants.
In absolute initial position, including citation forms, Campidanese contrasts voiced and voiceless stops, voiceless fricatives, voiceless affricates, nasals, and a lateral liquid, as shown in Table 1. There are several other consonants marginally present in initial position, shown in parentheses; these are either extremely rare or limited to recent loanwords. Note that [r] and [ɖ] are rare only in word-initial position, and [v] is rare only in absolute initial position; they are common elsewhere (to be described below).
|Marginal sounds||(v, w)||(dz, r)||(ɖ, dʒ, j)|
When words beginning with consonants appear following a vowel, most obstruents lenite, as in (1). The voiceless stops, voiceless fricatives (except for /ʃ/ and the initial /s/ in the definite determiner sa/su), and /tʃ/ are said to become voiced fricatives. In some varieties, word-initial /l/ becomes [β]. The status of the voiced stops is less clear: While Bolognesi (1998) claims that they categorically fail to spirantize, Virdis (1978) states that they show variable spirantization. Transcriptions from Blasco Ferrer (1984) and Cossu (2013) seem to match Virdis’s description; Bolognesi acknowledges the discrepancy and suggests that it may be a regional difference. Both authors agree that voiced stops other than [ɖ] optionally delete, and that this is more common in careful, slow registers than in casual speech. Example (1) contains only penultimate-stress words. This is the dominant pattern in the language, though there is a significant class of antepenultimate stress words (impressionistically, this class seems to us to be larger in Sardinian than in Italian) and a few final-stress ones, including verbal infinitives and participles.
|(1)||Utterance-medial lenition in Campidanese|
In word-medial intervocalic position, the consonant inventory is similar, but lexical length contrasts emerge. For nasals, this is described as a straightforward duration contrast. For /r/, the contrast is between a tap and a trill or approximant. For /l/, etymological singletons from Latin surface as [β], while some etymological geminates surface as [l]. For most obstruents, the contrast between underlying singletons and geminates is described in the same way as the contrast between lenited and unlenited word-initial forms, as in (1): UR geminates are realized similarly to the citation forms in the left column, while UR singletons are realized similarly to the post-vocalic lenis forms in the right column. Illustrations of these fortis/lenis or length contrasts are shown in (2). Note that the invariant lenis property of short medial stops entails absence of the contrast between UR voiced and voiceless singletons attested in initial position. In addition to the initial consonants shown in Table 1, several consonants appear mainly or only in word-medial position. This includes [ɖ] and [r] from Table 1, which are sparsely attested in word-initial position, as well as [ɾ] and [ɲ], which are unattested word-initially.
|(2)||Medial obstruents contrast for lenition or length|
|[noβu] ‘new’||[aribu] ‘I arrive’||[apu] ‘I have’|
|[bɾoðu] ‘broth’||[kuaɖu] ‘horse’||[totu] ‘all’|
|[kazu] ‘cheese’||[lasu] ‘I leave’|
|[meziʒeɖa] ‘small table’||[pitʃoku] ‘young man’|
|[donai] ‘to give’||[anːaðu] ‘I swim’|
|[oβia(ða)] ‘3s wanted’||[olu] ‘I want’|
|[maɾi] ‘sea’||[ariu] ‘river’|
In addition to the UR length contrasts in (2), post-lexical or ‘false’ geminates can be created word-initially. We do not fully discuss post-lexical geminates in this paper; they are described as patterning similarly to medial lexical geminates, with the exception of voiced stops (see Bolognesi, 1998; Ladd & Scobbie, 2003 for details). We did elicit such segments and they are included in the results presented in Section 4.
For the remainder of the paper, we refer to the contrasting underlying series of consonants using capital letters and quantity based on previous theoretical descriptions, although there are significant questions about URs in some cases. For instance, the initial consonants from (1) appear as voiceless stops, voiceless fricatives, voiced stops, and liquids in citation forms. We refer to these series as /T/, /S/, /D/, and /L/, respectively. The long consonant series from (2) will be referred to as /TT/, /DD/, etc.
The Campidanese consonant system is particularly interesting for studying lenition, due to the tight relationship between length, manner, and lenition. In contexts where sonorants display length contrasts, obstruents display contrasts between lenis and fortis realizations. In contexts where lenition affects domain-initial obstruents, we find a lack of lexical geminates. Investigating this system in more detail is important to the theory of lenition because issues surrounding the interaction of lenition with underlying contrasts in length and manner have been at the center of the theoretical lenition literature for several decades. We review some of this literature in the next section.
The phonetic grounding and functional roots of lenition have been approached in a wide variety of frameworks. Articulatory theories propose that the gestures associated with consonants are shortened and undershot, resulting in shorter and less constricted consonants (e.g., Donegan & Stampe, 1979; Kirchner, 1998). The ultimate cause of synchronic and diachronic lenition in this approach is the tendency for humans to minimize articulatory effort (Lindblom, 1983). Other approaches instead locate the functional motivation for lenition on the side of the listener. One such approach claims that lenition is fundamentally geared towards helping the listener recover prosodic constituents (Keating, 2006; Kingston, 2008; Katz, 2016). The idea is that lenited consonants are relatively vowel-like, fortified consonants less vowel-like. If lenis forms occur domain-medially between vowels, they disrupt the speech stream less than fortis forms would. And if fortis forms occur initially in prosodic domains, they disrupt the speech stream more. By aligning auditory discontinuities with prosodic boundaries, even probabilistically, this should help listeners chunk the speech stream into constituents; Katz and Fricke (2018) provide some preliminary evidence from a word-segmentation experiment that this prediction is correct. Another version of the listener-oriented approach holds that it is information that is aligned with prosodic boundaries (Harris, 2003; Cohen Priva, 2017). In this view, lenition is a way of aligning less perceptually salient phonetic realizations with positions that convey less information. Because initial positions tend to be highly informative, more salient or stronger consonant articulations are preferred initially. Here, the ultimate goal of lenition is to direct the listener’s attention to more informative points in an utterance.
While a single study can’t definitively settle such overarching debates, the varied approaches to lenition form an important backdrop to this study because of what they have in common: All make substantially similar predictions about what lenition is like in phonetic terms. Though auditory theorists posit continuity and disruption as the goals of lenition and fortition, it is logically necessary that these goals are accomplished by reducing and strengthening consonantal gestures. All else being equal, shortened or less extreme consonantal constriction gestures will result in shorter and more sonorous consonants, and shorter and more sonorous consonants in the context of surrounding vowels will result in decreased perceptual salience. So all theories agree that initial consonants should be stronger, more disruptive to or dissimilar from vowels, and more salient. These similarities are a positive sign: They show that various approaches are converging on similar results, because those results are robust.
One of the questions that comes up in all of these approaches is whether temporal reduction and the manipulation of consonantal manner and laryngeal features contribute independently to lenition-fortition patterns, or whether one might suffice to explain the other. From an articulatory standpoint, the question is whether a single articulatory target is sufficient for describing the continuum of fortis and lenis realizations of a given segment, in the presence of undershoot for shorter realizations (Ennever et al., 2017). From the acoustic standpoint, the question is whether prosodically-driven variation in the intensity characteristics (and related manner features) for a segment can be predicted by variation in the duration of that segment (Cohen Priva & Gleason, 2019). We attempt to address these questions at multiple levels of prosodic boundary in Section 4.
Ennever et al. (2017), studying Gurindji, and Cohen Priva and Gleason (2019), studying American English, both use a form of mediation analysis to ask questions about the causal structure of lenition. The basic idea in these analyses is to use regression models to first examine the magnitude and robustness of some structural or informational effect on lenition-related phonetic parameters, and then see how that effect changes when other phonetic parameters are incorporated as predictors. If there is a large effect of structural factor S on phonetic parameter P1, but that effect largely vanishes when phonetic parameter P2 is incorporated into the model, it suggests that P1 is not being directly affected by differences in S, but instead the relationship is mediated by P2.
Ennever et al. (2017) and Cohen and Gleason (2019) find that intensity differences associated with more fortis and more lenis variants are largely or wholly mediated by changes in duration, and thus may not require a separate target for segmental features such as continuancy.1 This is ostensibly different from featural contrasts in URs, which are expected to be specified with constriction targets or phonological features pertaining to voicing and manner. Note that in Gurindji, neither voicing nor continuancy are contrastive, so the view of lenition as changing phonological features would run into difficulties even in the absence of such mediation effects. In this study, we explicitly compare the interaction of duration and intensity as instantiated in boundary-driven fortition-lenition patterns to the interaction as instantiated in stress-driven patterns, as well as UR contrasts for length, voicing, and manner. We also extend the mediation analysis to categorical phonetic features such as formants and voicing, asking how these features explain or are explained by continuous phonetic measurements.
We refer to the set of characteristically intervocalic lenition processes including voicing and spirantization as continuity lenition, to distinguish them from processes such as debuccalization that are not characteristically favored in intervocalic contexts, and that arguably do not share the continuity/disruption profile of the processes investigated here (Ségéral & Scheer, 2008; Smith, 2008; Szigetvari, 2008; Katz, 2016). All of the generalizations in this study are limited in scope to continuity lenition and its complementary fortition.
Speakers were recruited through the social and professional networks of the second author, a native of Uta. While most people in the area speak at least a bit of Sardinian, it is far from universal and Italian is the dominant language in educational and business settings. We sought out people with exposure to Sardinian in early childhood who reported speaking ‘a good Sardinian.’ We eventually recorded 17 speakers of varying ages, hometowns, and levels of Sardinian/Italian dominance. Two subjects’ recordings are not analyzed here, due to poor audio quality. Information for the 15 speakers in the current study is shown in Table 2.
|Subject||Hometown(s)||First language||Spoken w/family||Sex||Age|
|4||San Gavino Monreale, Sanluri||Ital||Ital||M||36|
|6||San Sperate||No answer||both||F||45|
The speakers come from areas to the north and northwest of Cagliari, the capital of Sardinia. According to the classifications used by Blasco Ferrer (1984), they speak Western Campidanese; about half are from the area known as the Trexenta, which Blasco Ferrer treats as a sub-dialect. All speakers displayed the most characteristic phonetic aspect of Western Campidanese: vowel nasalization (Contini & Boë, 1972; Frigeni, 2009). Several of these towns, notably Uta and San Sperate, are within 15 kilometers of Sestu, whose dialect Bolognesi (1998) describes.
There are minor lexical and phonological differences amongst the speakers, though impressionistically the sample seems reasonably phonetically homogeneous. Speakers from the Trexenta have a number of forms reflecting diachronic and possibly synchronic (see Bolognesi, 1998) metathesis of medial /r/-stop clusters: forms that surface elsewhere as [eɾβa] ‘grass,’ [kaɾðu] ‘thistle,’ [saɾðu] ‘Sardinian’ are realized in the Trexenta as [eβɾa], [kaðɾu], [saðɾu]. Some of the Trexenta speakers also lack post-lexical geminates following verb forms that generate geminates in other varieties; these speakers instead epenthesize into contact clusters, e.g., /at/+/tastau/ → [aði ðastau] versus [atastau] ‘3s has tasted.’ No other systematic differences have been noted between speakers or groups of speakers. And we will see in Section 4 that most statistical models of phonetic realizations derive little or no benefit from including by-subject variation.
Most subjects were recorded in their homes, with a Samson C01 condenser microphone set on a table or other flat surface 2–4 feet from their mouths. Two subjects were recorded in a university office. The recordings are not laboratory quality. There is a fair bit of ambient noise and echo, and levels are lower than desired for several of the subjects. One of the promising aspects of the Ennever et al. (2017) measurement procedure is its reported effectiveness with sub-optimal field recordings of casual speech; this is one reason why we used their procedure as a starting point. In a few cases, ambient noise or jostling of the microphone obscured part or all of a segment; sounds in the vicinity of such events were not analyzed.
The data reported on here come from a translation task. The second author read Italian sentences aloud to the subjects, asking them to translate those sentences into Sardinian. Each subject translated 25–30 sentences, in a pseudo-random order. We analyzed roughly 400 total utterances from the 15 speakers included here. The Italian sentences were chosen in an attempt to elicit a balanced set of consonants in various syntactic contexts that we hoped would correspond to prosodic boundaries of various sizes.
There are several possible issues with this task. One is difficulty. In general, subjects had little trouble with the task. Occasionally, they struggled to find a word or collocation, or simply made speech errors; sounds following the filled or unfilled pauses and restarts associated with such disfluencies are categorized as ‘post-pausal’ in our analysis; interestingly, they had a strong tendency to occur in syntactic positions that are independently associated with higher-level prosodic boundaries (as gauged by duration and intensity) in our materials. We nonetheless treat these (and other) post-pausal consonants separately from those in the same syntactic position realized without a notable pause. This is partly for the sake of being conservative in testing for the existence of prosodic boundaries above the word, and partly because we expect the interplay of consonantal duration, intensity, and manner features to be quite different when the voicing of the preceding vowel has been allowed to completely die off.
Another possible issue with the translation task is lack of control over word choice. There may be a variety of Sardinian words or phrases that are plausible correspondents with a single Italian item. And there is also a fair bit of lexical variation between subjects: For instance, we elicited three distinct translations of the Italian verb seminare ‘to sow’ from multiple subjects: semiãi, prantai, and arai. Other words we intended to elicit never occurred at all: For instance, Bolognesi (1998) reports the word gattu for ‘cat’; all of our subjects instead used pisittu. While such lexical variation means that we were not always able to elicit the consonants we wanted in the variety of positions we wanted, there is no obviously superior procedure. Sardinian has no standard written form, and most speakers are not accustomed to reading and writing in the language. In any case, reading tasks might result in slower and more careful speech, which would be detrimental to studying lenition.
Finally, one may wonder whether the translation task exerts an Italian phonetic influence on Sardinian responses. We certainly can’t rule this out, but there are several reasons why we don’t think it is a major factor in our data. One is that the results here largely match previous phonological descriptions, and don’t show any obvious signs of Italianization. A second reason is that we also recorded spontaneous Sardinian conversation between each subject and the second author; while we have not yet transcribed and analyzed those data, it is clear from preliminary listening that nothing radically different is going on with regard to the consonant system. A final fact to note is that Sardinian is constantly in contact with Italian, and most speakers freely switch and mix the languages (see Bolognesi 1998, Ch. 1, for a brief description). So while the translation task is clearly somewhat ‘artificial,’ it is not completely alien to the everyday use of Sardinian.
Materials were annotated in Praat (Boersma & Weenink, 2018) by the first author, an experienced phonetician. For all relevant consonants, points corresponding to the general vicinity of transitions to and from adjacent sounds were marked in a text grid. These were not particularly precise; the initial point eventually served as input for the extraction of more precise duration measurements by script, as described below. For each consonant segmented in this manner, we recorded the information in (3) in the text grid:
|(3)||Annotations used in this study:|
As noted in Section 1.1, the final four properties are somewhat subjective and arguably force a binary classification onto continuous phonetic properties. Indeed, there was quite a bit of phonetic ambiguity for all four properties, and judgments were difficult. Nonetheless, we suspect that these qualitative judgments, when carried out by an expert phonetician, may capture information that automated intensity measurements could miss. We attempt to assess the situation in Section 4.
The representation of prosodic domains here is quite indirect. Because there are few generally agreed upon principles for annotating prosodic boundaries in Sardinian, we instead have recorded information about a consonant’s position in morphosyntactic word and phrase structure. Intervocalic consonants, for instance, have been transcribed as being initial in a syllable (meaning not initial in any morphosyntactic constituent), word, larger syntactic phrase, or utterance. The syntactic positions singled out as being potentially ‘larger’ domains are initial in a matrix verb phrase (equivalent to post-subject in most cases), verbal argument/modifier (mainly DP and PP), or finite subordinate clause (CP). These codings were based on anecdotal evidence from Del Mar Vanrell, Ballone, Schirru, & Prieto (2015), where figures suggest that the relevant positions can be marked by a boundary tone, or by separate assignments of pitch accents to the preceding and following words. Consonants preceded by a noticeable pause are transcribed as ‘post-pausal’ regardless of their syntactic position. While it’s difficult to know what causes a pause in speech, our impression is that the post-pausal category is a mix of large prosodic phrase breaks, filled pauses, and restarts after planning or other errors. While there are very few studies of Sardinian prosody, there is some evidence that at least one level of constituent exists above the word but below the intonational phrase level (Jones, 1993; Del Mar Vanrell et al., 2015). The approach taken here is to compare consonants at the beginning and/or end of relatively large syntactic constituents with those at the edges of smaller constituents. Even if the relationship between syntax and prosody is not fully deterministic (and there is no reason to believe it is), this procedure may result in enough prosodic differences on average between structural positions to reveal prosodically-driven phonetic patterns. The results in Section 4 suggest that the strategy succeeded in this regard.
Ennever et al. (2017) have graciously made their code available on Github. We used that code, referred to as stop_lenition, as a starting point. We also made a few adjustments to adapt the code to our materials.2 Ennever et al. (2017) give a comprehensive description of the algorithm, which extends a general approach developed by Kingston (2008).
The stop_lenition code is in R, and interfaces with Praat (Boersma & Weenink, 2018). It takes as input a series of sound files paired with Praat textgrids that mark the origins of various consonants, the rough location of the transition between those consonants and preceding sounds. The script bandpasses each sound file according to parameters set by the user, extracts intensity data from each frequency band of each sound file, then attempts to segment each consonant and measure various types of intensity extrema, changes, and slopes. The segmentation routine first retrieves intensity data in the vicinity of each origin using forward and backward search windows specified by the user. It then fits smoothing splines to the intensity contour in R, where smoothing is more or less extreme according to the smoothing parameter selected by the user. Next, stop_lenition searches the smoothed intensity contour for a local minimum (the pit) following the origin. If no intensity minimum is found, the consonant is considered unmeasurable and stop_lenition returns NA for the value of all measurements. If it does find a pit, it then picks out: (1) the maximum downward slope of intensity to the left of the pit, ostensibly corresponding to closure (the closure velocity extreme, CVE); (2) the maximum upward slope of intensity to the right of the pit, corresponding to release (the release velocity extreme, RVE); and (3) the peak intensity following the pit, corresponding to the following vowel’s intensity (the right intensity peak). Duration is defined according to these intensity inflection points: Ennever et al. (2017) suggest that sensible demarcation criteria result from marking the onset of the consonant at the last point before the CVE where intensity slope reaches 60% of the CVE’s value, and the release of the consonant at the first point following the RVE where intensity slope reaches 60% of the RVE’s value.
In addition to these intensity-based duration measurements, the algorithm also produces several intensity measurements. Ennever et al. (2017) investigate in detail the measurement labeled delta-i, the change in intensity from the measured onset of the consonant to the intensity pit. CVE and RVE are themselves potentially informative, and correspond to measurements used by Kingston (2008). We altered the script to attempt to extract a left intensity peak, one preceding the marked origin and presumably associated with a preceding vowel; as well as attempting to extract CVE, RVE, and intensity peaks even in cases where a pit could not be found. The latter attempt turned out to produce highly unreliable measurements that were not necessarily associated with any meaningful phonetic aspect of the consonants and vowels in question, and we ended up discarding the vast majority of those measurements. We also changed the rightward search window from 200 to 300 ms, to deal with the longer consonants found in our materials.
All of the measurements described here are illustrated in Figure 2, which shows a phrase medial /eu#ka/ sequence. This is a fairly successful application of the algorithm: The consonant onset appears to be close to the beginning of intensity movement indicating constriction and the offset is just before a rise in intensity indicating the release of that constriction. Left and right intensity peaks are near the midpoints of adjacent vowels. The intensity pit looks to be well aligned. The velocity extrema may show slight mismatches with the visual display of the intensity contour; this is because the Praat smoothing procedure is slightly different from the splines used by stop_lenition. The consonant onset and offset measurements here both precede to some extent the location where we would likely mark them by hand; this is a feature of the stop_lenition algorithm, which attempts to locate the beginning of consonantal closure gestures and the moment of release. All annotated consonants, consonant sequences, and null consonants (indicating hiatus, glide, or V-to-V transitions) were submitted to stop_lenition with a variety of settings for the free parameters. There were 4,973 annotated strings in total.
We investigate optimal settings for the free parameters in Section 3, comparing our results to those of Ennever et al. (2017). Consonant clusters other than obstruents followed by liquids were excluded from all analyses: We hope to analyze these clusters at some point, but they are not amenable to the methods used here. The final number of useable segments reported on in Sections 3–4 is thus 4,151. Duration measurements for utterance-initial sounds were also discarded; the stop_lenition algorithm is predicated on the assumption of a preceding vowel or sonorant sound, and does not work for utterance-initial sounds. We explore some measurements for utterance-initial sounds in Section 4.
The ‘final’ data set reported on in Section 4 is available as a supplementary file: https://doi.org/10.5334/labphon.184.s1.
A number of the data explorations in Sections 3–4 include constructing and comparing regression models of various properties of the consonants in the corpus. All of the regression modeling here was done with the lme4 package, version 1.1-14, in R (Bates, Maechler, Bolker, & Walker, 2015a). Continuous phonetic parameters were z-tranformed by subject and modeled with linear mixed-effects regression including random intercepts by subject and word. Duration measurements were log-transformed before z-scoring. Categorical variables involving phonetic features (burst, voicing, etc.) or UR features were modeled with logit mixed-effects regression. Random effects of word were omitted for the models of UR features (UR stop, fricative, etc.) in Section 3, because these features do not vary for a segment within a word. So giving the model access to the UR of the word would fully determine the value of the dependent variable, except for the occasional cases of words whose URs happen to contain segments with both values of the relevant feature.
We took two approaches to random effects structure. We first tried to fit maximal models, following Barr, Levy, Scheepers, and Tilly (2013). As noted by Bates, Kliegl, Vasishth, and Baayen (2015b), these models are frequently overparameterized, more complex than justified by the data and impossible to fit accurately. Our second approach was similar to the one advocated by Bates et al. (2015b): We added by-subject random slopes in blocks corresponding to coherent theoretical entities (position with regard to stress, UR features, etc.) and used a likelihood-ratio test to assess whether these parameters improved fit enough to justify the increased complexity of the resulting models. In the vast majority of cases, the answer was ‘no.’ The few exceptions are noted in the text in Sections 3–4.
In this section we evaluate the robustness and utility of the stop_lenition algorithm under various settings for the smoothing and bandpass parameters. The procedure follows Ennever et al. (2017) initially, but we also describe a series of additional tests designed for our somewhat more varied and noisy materials.
We compared three settings of the smoothing parameter (spar) across 10 different frequency bands. The three spar values were 0.7, which Ennever et al. (2017) report as optimal for their materials, as well as one higher value (0.8) and one lower one (0.6). Spar represents the degree to which local fluctuations in intensity are ignored by the spline fit, so higher values represent smoother contours. The 10 frequency bands included 9 examined by Ennever et al. (2017), corresponding to different regions associated with f0, lower formants (mostly F1), higher formants (F2 and F3), and high-frequency noise. We also added an ‘omnibus’ band from 0–3200 Hz, because we suspected that all of the bands examined by Ennever et al. (2017) could contain complementary information, and that some bands could be more reliable for certain manners or places of articulation. Table 3 shows the proportion of failed segmentations for each combination of spar and band.
|Band (Hz)||spar 0.6||spar 0.7||spar 0.8|
As in Ennever et al.’s (2017) results, the collection of ‘low formant’ bands in the 300–1200 Hz range tend to have the highest success rate, and the high-frequency noise and low-frequency voicing bands tend to have the highest failure rate. The omnibus 0–3200 band appears to be slightly more successful even than the low formant bands for spar values 0.6 and 0.7. Unlike the earlier results, measurement failure is a monotonic function of spar within each band: Higher spar values produce fewer successful segmentations. The other major difference with the earlier results is the overall lower success rates shown here: For Ennever et al.’s (2017) more homogeneous data, the vast majority of settings produced at least 90% success, and the most successful reached 99%.
Both the lower overall success rate and the monotonic effect of spar appear to be related to the more difficult nature of our materials. Inspection of several hundred tokens suggests that the reason spar 0.6 succeeds more often is that it frequently picks out spurious intensity movements and counts them as extrema. Virtually all of the tokens we examined where spar 0.6 succeeded and spar 0.7 failed were cases of this sort. A number of these were cases where consonants did not actually cause a notable drop in intensity; the fact that there are such consonants in our data set explains why the overall success rate here is lower than the earlier study, where at least 99% of consonants contained a relevant intensity drop in some frequency band. Comparing spar 0.7 and 0.8, both mostly produce reasonable segmentations, and it is not entirely clear whether one is more principled or accurate than the other.
Because failure rate here is not reliably tracking measurement quality, we tried additional ways of assessing parameter settings. Ennever et al. (2017) use a comparison of how tightly correlated their duration, intensity drop, and intensity slope measurements are. The idea is that if the automated measurements are tracking acoustic properties associated with consonantal constriction gestures, they should reflect the physical laws of articulator movement, according to which the amplitude of a movement is positively correlated with its duration and velocity (Munhall, Ostry, & Parush, 1985). Using delta-i as a proxy for amplitude, measured duration as a proxy for articulatory duration, and CVE as a proxy for velocity, the prediction is that delta-i should be highly correlated with the product of the other two parameters. If the algorithm is instead picking out random fluctuations in intensity that are not related to the global trajectory of consonantal articulations, this correlation should be lower (though probably not 0, because the measurements are not entirely independent). This correlation, then, can be used to track how much different parameter settings are capturing signal associated with consonantal constrictions, as opposed to noise associated with other factors. Results are shown in Table 4.
|Band (Hz)||spar 0.6||spar 0.7||spar 0.8|
Interestingly, values of r here are uniformly high and don’t vary as much by setting as they did in Ennever et al.’s (2017) study. For the most part, correlations seem to increase with spar, but there are exceptions. Generally, the low-frequency voicing bands have the highest correlations, but all differences between bands are relatively small. While this test doesn’t provide much in the way of favoring some settings over others, it is at least reassuring to see that measurements correlate in the expected way and that this is relatively robust to parameter settings.
Our final test of the utility of various parameter settings is a more direct one. We fit logistic regression models to compare how well different settings did at predicting the difference between word-medial singleton and geminate consonants. The idea is to use a contrast for validation that is clearly expected to affect duration, and that is not one of the principal differences being investigated in the study (those being lenition-fortition patterns). The dependent variable is UR length, the fixed-effect predictor is the natural logarithm of duration, and the random effects are by-subject intercepts and slopes associated with duration. The statistic we used for comparing models is the test statistic z for the fixed effect of duration. This statistic, which is the estimated coefficient of the effect divided by the standard error inferred by the model, combines several desirable properties into one number. It will tend to grow larger as the contrast between short and long consonants grows longer. It will also tend to grow larger as the algorithm succeeds more often, because the standard error is reduced with increased sample size, all else being equal. And finally, it will grow larger as the measurements become more robust to within-subject and between-subject variability, because both kinds of variability contribute to higher standard errors. Results are shown in Table 5.
|Band (Hz)||spar 0.6||spar 0.7||spar 0.8|
These results reveal several new patterns. It is immediately clear here that spar 0.7 produces clearer duration contrasts than either the lower or higher value for most bands. The omnibus 0–3200 Hz band has a slight edge over others, although the collection of ‘low formant’ bands in the 300–1200 Hz range, which performed best in Ennever et al.’s (2017) study and one of which was selected as optimal there, are almost as good here. Again, one encouraging pattern is that separation tends to be pretty good at most parameter settings; an effect of roughly 2 corresponds to a ‘statistically significant’ difference between long and short stops, but that is a very low bar. At the spar values 0.7 and 0.8, in particular, most settings produce a difference of 7–15 standard errors between the two categories, which is quite good. The comparison with Table 3 also shows that, as suspected, setting spar to 0.6 produces the greatest quantity of measurements by lowering the quality of those measurements.
As a final check, we wanted to test intensity separation for a straightforward contrast. For this purpose, we examined UR manner: geminate voiceless stop versus nasal, and singleton fricative versus nasal. The idea is that the fricative comparison should be relatively minimal in acoustic terms, because the most frequent realization for these segments is a voiced continuant; while the long stop comparison should be more extreme, because the most frequent realization for these segments is a voiceless stop. We examined logit mixed effects models of these manner contrasts with delta-i as predictor rather than duration. This test is conceptually similar to the one Cohen Priva and Gleason (2019) suggest for intensity, which involves item-total correlations. The quantitative details are different, but the basic idea is to show that measurements produce internally coherent and consistent results when split into categories on the basis of recording session (here, subject) and segment. Results are shown in Table 6.
|[T-N]||spar 0.6||spar 0.7||spar 0.8||[Z-N]||spar 0.6||spar 0.7||spar 0.8|
The results show that the omnibus 0–3200 Hz band is superior for both contrasts. For the stop-nasal contrast, spar value makes little difference, but for the fricative nasal contrast, spar 0.7 is best. More generally, the larger numbers for the stop model confirm that it involves a larger intensity contrast than the fricative-nasal model. Finally, the fricative-nasal model shows reversed effects in the noise band, 3200–10000 Hz. This shows that the intensity of nasals relative to fricatives is higher in lower frequency bands, but in the band meant to capture obstruent noise the relationship is reversed, with fricatives having higher intensity. This is confirmation that the measurements are capturing meaningful differences between segments.
Given these results, we used frequency band 0–3200 Hz with spar 0.7 for the rest of our analyses. Beyond selecting settings for further analysis, the other purpose of these tests was to explore the reliability and utility of various settings for measuring duration and intensity. The results suggest that measurements are robust and informative across a range of settings. This is noteworthy in part because the stop_lenition algorithm was originally used for a rather different and more homogeneous set of materials, from a different language, with only one speaker, and without differences in stress or underlying length. That said, the authors clearly intended for the algorithm to be more generally applicable to the analysis of lenition phenomena; our findings here suggest that it is, and that the results are not hugely sensitive to settings for the free parameters. Given that Gurindji and Sardinian are completely unrelated, we think these results are reason to be hopeful that the stop_lenition algorithm could be applied to a wide variety of languages.
This section reports qualitative generalizations about the manner of Campidanese obstruents in various positions. Most aspects of the traditional descriptions discussed in Section 1.2 were broadly confirmed by the materials we gathered. But there are a few points of phonetic detail and phonological ambiguity worth clarifying before we begin to investigate quantitative aspects of the system in more detail.
Virtually all sources agree that word-initial voiceless stops lenite to voiced fricatives following a vowel in a preceding word, and that word-medial etymological short voiceless stops are realized in the same way (Virdis, 1978; Blasco Ferrer, 1984; Bolognesi, 1998; Cossu, 2013). While these consonants are indeed mainly continuants in our materials, it is worth noting that they very rarely include audible or visible frication noise. As in many (perhaps most) other ‘spirantizing’ languages, the most frequent realization here is an approximant (Peninsular Spanish, Romero, 1996; Logudorese Sardinian, Ladd & Scobbie, 2003; Japanese velars, Kawahara, 2006; Djapu, Chong, 2011; English, Bouavichith & Davidson, 2013; Kinande and Venezuelan Spanish, Katz, 2016). Rates of burst, formant, and frication presence are shown in Figure 3. Here and throughout the paper, we use capital letters to refer to classes of consonant by manner and voicing: /T/ for voiceless stops, /D/ for voiced, /S/ for fricatives, etc.
Figure 3 shows that 80–90% of these segments in non-post-pausal environments display visible formants throughout their closure, and less than 20% have visible/audible burst or frication. The plot also shows that manner features are drastically different in post-pausal and utterance-initial positions; this is consistent with the traditional description of these sounds as voiceless stops in citation forms, and either phonetic variability or a mix of prosodic positions in the contexts singled out as post-pausal. While ‘voiceless stop’ may be a reasonable phonological description, note that the presence of frication in a large proportion of post-pausal and utterance-initial tokens reflects the fact that these segments often have a weak, fricated release; frication was marked as present if noise persisted for more than 20 ms following the identified burst. In post-pausal positions, then, these segments are most often some kind of stop, but with a weak and variable release.
Two issues concerning underlying voiced stops in word-initial position require some clarification. One is the difference between Bolognesi’s (1998) description of Sestu Campidanese and all other descriptions of Campidanese varieties: Bolognesi claims that these stops systematically fail to undergo spirantization, while Virdis (1978, and implicitly Blasco Ferrer, 1984 and Cossu, 2013) claim that they sometimes undergo spirantization. Results for these stops across prosodic positions are shown in Figure 4.
Figure 4 shows that these segments are generally realized as voiced stops in post-pausal and utterance-initial positions, though a non-trivial portion (20–30%) devoice. In utterance-initial position bursts are sometimes weak or absent (though not to the same extent as the /T/ series). One implication is that either voicing or fricated/weak release are potential cues to the /T/-/D/ contrast in absolute initial position. Given that these segments don’t appear to contrast for voicing in any other position in the language, this raises the question of whether UR voicing is the right way to think of the contrast.
At the word- and phrase-initial levels, the /D/ series is highly variable, with 50–60% displaying bursts and about the same proportion displaying formants throughout closure (both features may be present in the same token, as in Figure 1). This clearly accords with Virdis’s (1978) description and not with Bolognesi’s (1998). While Bolognesi suggests that the difference may have to do with the specific towns examined in the two studies, we think there may be a simpler explanation. As we will see in Section 4.2, the /D/ series tends to be of longer duration, with a larger drop to a lower intensity level, than the /T/ series. In other words, the /D/ series in these positions is more stop-like than the /T/ series, even when not phonetically clear-cut stops. This could explain Bolognesi’s impression that they are stops and don’t undergo lenition. If we were forced into an IPA transcription, we would likely have transcribed some (though not all) of these tokens as stops even though they lack obvious bursts. Nonetheless, Figure 4 shows that these segments are more likely to be continuant and sonorous in non-post-pausal positions, just like the voiceless series. So while they do lenite less than the voiceless stops, in these data the voiced stop series still shows evidence of prosodically conditioned lenition. This is one way in which IPA symbols are not optimal for describing anything as variable and fine-grained as Campidanese consonants.
Another point of ambiguity in the previous literature pertains to manner and length features of historically long obstruents (and more recent borrowings into these categories). Both Virdis (1978) and Bolognesi (1998) note that geminates may either be conceived of as contrasting with short obstruents in length, or in manner/voicing. Both authors assume that if there are consistent manner or voicing correlates of the contrasts in question, these are to be preferred to the hypothesis of a length contrast. Both describe word-medial contrasts in terms of /DD/ and /TT/ surfacing as stops, opposed to a single short stop series surfacing as voiced fricatives. The /SS/ series is said to surface as voiceless fricatives, while the /S/ series surfaces as voiced.
Our results suggest that manner and voicing are not as straightforward as previously described. Both /TT/ and /SS/ tokens sometimes display voicing throughout their closure: the proportions are 35–45% for stops and 15–20% for fricatives. For long stops, about 20% of the /TT/ series and 40% of the /DD/ series lack visible or audible bursts. And about half of all /DD/ tokens display formants throughout closure (just like word-initial /D/). While none of these findings makes it impossible that the contrasts are represented in terms of voicing and manner, it is at least worth investigating the robustness of duration as a cue to these contrasts. Results for qualitative features and duration are shown in Figures 5 and 6.
Relative to UR singletons, word-medial geminates are longer, and are more likely to be voiceless (for /T/ and /S/), lack formants, and display bursts (for stops). Long and short sonorants also differ in duration. These results are consistent with medial obstruents contrasting phonologically for length, manner, voicing, or all of the above. If they contrast for manner and voicing, the phonetic implementation of those contrasts is probabilistic and variable. If the contrast is one of length, long and short categories overlap to some extent. One concrete result here is that duration differences for obstruents are at least as clear as those for nasals and liquids. That said, even the sonorants can’t be described as ‘pure’ length contrasts: Short /r/ is realized as [ɾ]; short /l/ is realized as [β]; and short nasals tend to be only weakly consonantal, sometimes indistinguishable from ‘vocalized’ tokens in post-tonic position that are normally described as vowel nasalization rather than a consonant (Frigeni, 2009).
An important question in the Campidanese literature is whether lenis and fortis with regard to prosodic structure is the same phonological contrast as singleton and geminate with regard to URs. The results in this section show that geminacy contrasts are marked by differences in duration and by probabilistic differences in manner features. In the following sections, we examine how duration, intensity, and manner-related phonetic features interact in lenition and in UR geminacy contrasts. To preview, we find that they interact quite differently in the two phenomena, in ways that are not consistent with the phonetic or phonological equation of length and lenition.
This section presents results on fortition/lenition patterns driven by prosodic boundary strength, and attempts to establish causal priority between the various phonetic and phonological parameters involved in these patterns. The analysis builds on techniques from Ennever et al. (2017) and especially Cohen Priva and Gleason (2019). Both papers propose that the influence of various factors on lenition (prosodic structure, speech rate, lexical frequency) is mediated by duration. More specifically, these authors question whether the intensity-related effects of prosody or other drivers of lenition can be understood partly or wholly as resulting from changes in duration, rather than any direct manipulation of intensity and/or manner.
The proposed methods for answering this question involve comparing various regression models with covarying effects. Cohen Priva and Gleason (2019), for instance, first model intensity as a function of stress adjacency (among other variables), then ask whether the effect of stress on intensity persists when duration is added to the model, or whether the effect diminishes or vanishes. The idea is that if stress exerts no effect on consonantal intensity independent of duration, then we need not posit any direct link between stress and intensity/manner features. Instead, the influence of stress on consonants can be modeled as solely involving duration, and intensity effects will emerge from the general relationship between consonant duration and (lowered) intensity. Cohen Priva and Gleason confirm this hypothesis for English, as well as the contrapositive hypothesis, that effects of stress/frequency/speech rate on duration are not entirely mediated by intensity. Taken together, these results suggest that the effects of stress-driven lenition on intensity can be attributed mostly to differences in duration, but not vice versa. Ennever et al. (2017) investigate similar questions using a different intensity parameter (CVE, closure velocity extreme as defined in Section 2.5) and a prosodic variable more directly relevant to boundary strength (presence of a word boundary). Like Cohen Priva and Gleason, they find that the effects of some linguistic variables (word boundary, place of articulation) on intensity are largely mediated by duration.
In this section, we extend this general methodology to multiple levels of prosodic prominence, and to qualitative phonetic features in addition to duration and intensity. The ultimate goal is to determine whether duration, intensity, or categorical feature changes have causal priority in lenition, and whether lenition differs in this regard from ‘normal’ phonological features, contrasts, or processes.
The models in this section examine the effects of prosodic boundary strength on duration and intensity, then examine whether those effects persist when duration and intensity are used to predict one another. Unless stated otherwise, all models in this paper include only by-subject and by-word intercepts as random effects; random slopes by subject were tested and found not to significantly affect fit in most models.
Our first model attempts to predict the change in intensity (delta-i) associated with consonants in terms of their prosodic initiality, position with regard to stress, and underlying manner. As in previous studies (Kingston, 2008; Ennever et al., 2017; Cohen Priva & Gleason, 2019), a simple additive model is used because there are not sufficient data to cross all factors. All models presented in this section include the following fixed effects:
The ‘consonants’ modeled here include null consonants, that is, cases of vowel hiatus or glide transitions. This is because such sequences form an interesting basis for comparison with lenited consonants (how similar to zero are short lenis approximants?) and because finding an effect of prosodic position on the duration or intensity of intervocalic transitions in such cases would be interesting in and of itself. Medial short stops, being ambiguous with regard to UR voicing, are treated as a separate category from voiced and voiceless stops in modeling, but grouped with the /T/ series in figures. Delta-i for short consonants across prosodic positions is shown in Figure 7.
Patterns here are somewhat variable and some consonants have sparse data in higher prosodic positions. There doesn’t seem to be any tendency for consonants to lenite word medially relative to word-initial position, and there may be an opposite effect. The other two steps in the scale, however, from word to phrase and phrase to post-pausal, show at least small fortition/lenition effects for most consonants (and hiatus). Fixed effects from the initial model of delta-i and the comparison model with duration as a predictor are shown in Table 7.
|Model 1||β||t||Model 2 (w/dur)||β||t||β decrease|
|UR: geminate||–0.79||–16.02||UR: geminate||–0.25||–8.06||0.68|
|UR: /D/||–0.02||–0.31||UR: /D/||0.16||4.08||9.00|
|UR: medial stop||0.13||1.59||UR: medial stop||0.16||3.36||–0.23|
|UR: /N/||0.67||9.87||UR: /N/||0.59||14.60||0.12|
|UR: /L/||0.89||10.02||UR: /L/||0.45||8.24||0.49|
|Random effs.||Var.||Random effs.||Var.|
|Word (int.)||0.06||Word (int.)||0.01|
|Sub. (int.)||0.00||Sub. (int.)||0.00|
Model 1 shows that word-initial consonants display successively larger intensity drops (more negative delta-i) in phrase-initial than word-initial position, and in post-pausal than phrase-initial position. Word-initial consonants tend to show smaller intensity drops than word-medial (reference level) ones. Intensity drops more after stressed vowels, and this effect is somewhat larger after nuclear stress. Underlyingly long consonants show much larger intensity drops than short ones. Finally, segments differ from one another in their inherent intensity by UR manner: Compared to the reference level of UR voiceless stops, fricatives show larger intensity drops, while sonorants and hiatus show smaller ones. Note that in these and most of the following linear models (though not the logistic ones), the variance associated with subjects is 0. This probably indicates that the variable could be dropped from the models, but we retain it because it is part of the study design and makes these models more comparable to the logistic ones reported below than they would otherwise be.
There are several points of interest in comparing the models with and without duration. First, the effect of duration is massive, which is expected given the tight correlation between delta-i and duration in our data (r = –0.83). A second point of interest is the change in estimated parameter values once duration is incorporated into the model. In Table 7, this is labeled ‘β decrease.’ It is the proportion of the original effect accounted for by the addition of duration in the second model. I follow Baguley (2009) in treating the coefficient as a simple effect size, and when comparing values across models (of the same data) I will refer to how much of an effect is explained by other factors. The values for the prosodic variables, for instance, show that duration accounts for 87% of the intensity change associated with phrase-initial position, and 81% of the effect associated with post-pausal position, but only 43% of the effect associated with word-initial position. Numbers larger than 1 here indicate that the effect is reversed when duration is taken into account. Negative numbers indicate that effects get larger. So, for instance, nuclear (compared to non-nuclear) stress has a marginal effect of increasing the intensity drop associated with a following consonant. But once duration is taken into account, it more than explains that effect: Given the duration of these consonants, in the absence of any other effects we would expect them to show even larger intensity drops than they actually do.
Another value to pay attention to is the ‘left-over’ effect once duration has been incorporated. For phrase-initial and post-pausal positions, residual effects in model 2 are about 1.30 and 1.18 standard errors, respectively. These are fairly small and variable; a p-value of 0.05 corresponds to roughly 2 standard errors plus a bit (the difficulty of defining the distribution of the test statistic under the null hypothesis for mixed models makes it hard to be more precise than that). So we can say that there are fairly robust effects of phrase-initial and post-pausal position on intensity, but most of the difference is explained by the increased duration associated with these positions, and the remaining effects taking duration into account are consistent with there being no difference at all at the population level. For word-initial versus medial position, on the other hand, duration explains less than half of the intensity difference and there is clearly an initial intensity boost above and beyond what duration would predict. This last result also shows that the diminution of effects on delta-i when duration is taken into account are not an inevitable consequence of the high correlation between duration and delta-i; it doesn’t happen for the word-level ‘anti-lenition’ effect.
Duration explains most of the effect associated with following a stressed vowel, but following a nuclear stress has an independent effect on intensity changes. And differences in delta-i between different manners of consonant are, unsurprisingly, not explained by differences in duration. Instead, the models suggest that relative to voiceless stops, we would expect all other manners of consonant to show much larger drops in intensity than they actually do if duration were the only factor mediating intensity. Another way of putting this is that the differences in intensity between the UR /T/ series and other segments are not entirely due to duration. This is unsurprising, because we expect that other manners of consonant are specified for features that affect intensity and are different from the features of the /T/ series, such as nasality and sonoracy.
Duration explains about half of the delta-i difference between short and long consonants, but UR length also has an independent effect. This begins to answer the question of whether the geminate/singleton and fortis/lenis oppositions are represented identically. The answer for this data set appears to be that they are not: Differences between long and short consonants involve independent aspects of intensity and duration; differences between fortis and lenis consonants involve duration, with differences in intensity following from those duration differences.
The next model comparison involves the categorical features we annotated: presence of voicing and formants throughout the consonant, and presence of visible or audible burst and frication. Comparison of models with and without these features is shown in Table 8. We refer to differences in these features as ‘qualitative phonetic differences.’
|Delta-i||Model 1||Model 2 (qual. features)|
|Fixed effects||β||t||β||t||β decrease|
|UR: medial stop||0.13||1.59||–0.03||–0.38||1.23|
|Random Effects||Variance 1||Variance 2|
All qualitative phonetic differences have robust effects on delta-i, as one would expect. In contrast to duration, manner differences explain little of the effect of phrase-initiality on delta-i. For word-initial and post-pausal positions, qualitative phonetic differences explain a larger proportion of intensity differences, but there are still substantial residual effects of the prosodic variables. This means that the effects of prosodically conditioned lenition on intensity are not being driven principally by changes in phonetic manner or voicing. It also means that measuring delta-i captures information that qualitative phonetic features do not capture.
The previous section showed that differences in intensity associated with boundary-conditioned lenition are mainly explained by the predictable effects of duration, and not by differences in phonetic manner or voicing. The next question is whether the effect of prosodic structure on duration can be explained by intensity or qualitative phonetic features. If not, then we have an argument that duration is causally prior to manner in our lenition data. Duration of singleton consonants by UR and prosodic position is shown in Figure 8.
Both UR stop series and hiatus transitions show lengthening at successively higher boundaries above the word. Patterns are somewhat less clear for fricative and nasal series, where there are less data, but each series shows some lengthening at at least one level. Both the /T/ and /S/ series show word-initial shortening effects, unexpected on the basis of lenition, but concordant with the intensity patterns discussed in Section 4.2.1. Comparison of duration models with and without delta-i and manner features as predictors are shown in Tables 9 and 10.
|Duration||Model 1||Model 2 (delta–i)|
|Fixed effects||β||t||β||t||β decrease|
|Duration||Model 1||Model 3 (qual. feats.)|
|Fixed effects||β||t||β||t||β decrease|
Qualitative phonetic features explain little of the lengthening associated with phrase-initial and post-pausal positions. Delta-i explains some portion of the phrase-initial lengthening effect, but there is still a robust residual effect after taking delta-i into account. Delta-i explains a substantial portion of the post-pausal lengthening effect, and the residual effect is fairly small/variable. For both prosodic levels, the proportion of lengthening explained by delta-i is less than the converse from Table 7, and residual effects on duration are stronger than those on delta-i.
The shortening effect at word boundaries is entirely explained by intensity changes and qualitative phonetic features. This word-initiality effect, which goes in the opposite direction from lenition/fortition patterns, thus also seems to be implemented in completely different terms from those lenition/fortition patterns.
While we chose for modeling purposes to use Ennever et al.’s (2017) dynamic representation of intensity, delta-i, there are questions about the robustness and appropriateness of delta-i relative to other measurements used in the literature. Delta-i measures the drop in intensity during a VC transition, down to the minimum intensity of the consonant. Other possible measures include the intensity slopes or velocities associated with closures and releases (Kingston, 2008; Hualde et al., 2011; Ennever et al., 2017), and intensity minima derived from the unambiguously consonantal portion of the acoustic signal (Bouavichith & Davidson, 2013; Cohen Priva & Gleason, 2019). In this section we compare delta-i to ‘raw’ intensity-minimum (‘pit’) data, pit data centered by speaker and segment UR using Cohen Priva and Gleason’s (2019) procedure, and closure and release velocity extremes (CVE and RVE). Results for intensity pit are shown in Figure 9.
Centering the pit data according to Cohen Priva and Gleason’s (2019) procedure made very little difference. The centered and uncentered data correlate at r = 0.73; z-scoring the raw data by subject brings this to r = 0.82. Though neither measure correlated with delta-i at such an extreme level (r = 0.30 for raw pit, r = 0.10 for centered pit), both produced regression models qualitatively similar to delta-i with regard to prosodic position: significantly lowered intensity in phrase-initial and post-pausal positions. The effect of phrase-initiality was smaller than with delta-i (not visible at all in Figure 9), and the effect of post-pausal position (where minimum intensity is by definition close to the background noise level) was much larger. The word-initial intensity boost suggested by delta-i data did not show up with pit measurements. In conjunction with the preceding sections, this means that the word-initiality effect flattens intensity transitions across a word boundary (Table 7) but does not affect the absolute intensity of the following consonant. One possibility is that this effect pertains to differences between reduced word-final vowels versus non-final vowels, rather than anything inherent to consonants.
The biggest difference between minimum intensity measurements and delta-i pertains to stress. Delta-i, recall, shows larger intensity drops following stressed than unstressed vowels, and following nuclear than non-nuclear stress (Table 7). Both pit measurements, on the other hand, show a large increase in intensity following a stressed vowel, and an even larger decrease in intensity after nuclear stressed vowels. In other words, consonants increase in intensity following stress, but they decrease in intensity relative to the preceding transition.
Modeling revealed a clear asymmetry between closure and release measurements. The two velocity measures, CVE and RVE, correlate at r = –0.30. CVE is highly variable and does not show clear effects of prosodic boundaries or stress. RVE shows robust effects of phrase-initial and post-pausal positions, though not word-initiality. Intensity transitions into the following vowel tend to be more abrupt at higher prosodic levels (and with UR geminates), consistent with fortition. RVE by consonant and prosodic boundary is shown in Figure 10.
Unsurprisingly, RVE differs from all other measures in showing a large effect of stress on a following vowel, where slopes are much steeper. On the other hand, the effect of a following nuclear stress causes a large drop in RVE relative to non-nuclear stress. This indicates that the amplitude rise for nuclear stressed vowels in Campidanese is more gradual than for other stressed vowels.
Regression comparisons of the type in Sections 4.2.1–4.2.2 show that neither duration nor delta-i fully explain the effects of prosody on RVE. In the following sections, therefore, we adopt delta-i and RVE as independent, complementary descriptions of intensity dynamics. This also allows us to extend our analysis to utterance-initial position: While neither duration nor closure onset can be coherently measured in acoustic terms at the beginning of an utterance, release velocity can be.
Sections 4.2.1–3 have shown increased duration, drops in intensity, and more abrupt releases at successively higher positions in the prosodic hierarchy above the word. The effects on delta-i largely reduce to changes in duration; effects on release velocity do not. None of the changes in duration, delta-i, or RVE can be reduced to changes in categorical manner and voicing features. This section attempts to determine whether lenition-driven changes in those qualitative phonetic features can themselves be explained by changes in duration, and whether qualitative features convey information about lenition above and beyond continuous phonetic measurements of intensity.
The modeling procedure here is similar to the previous sections, but with a few differences. These models are limited to UR obstruents, the only segments that vary in voicing, formants, burst, and frication. UR fricatives are excluded from the burst and frication models because burst rates are 0 and frication rates are at or near 100%. The models are logistic, predicting the (log odds of the) outcome of a binary variable, and as such they model one categorical phonetic parameter at a time. Logit models by default return a z score (derived from the Wald statistic) test statistic rather than a t score (derived from estimates of the standard deviation of sample means).
Unlike linear models of duration and intensity, models of qualitative features tend to show large improvements in fit from including by-subject random slopes. This indicates that the effect of linguistic variables on features like voicing and bursts varies substantially between speakers, in a way that effects on duration and intensity do not.
Frication showed very weak effects of prosodic boundaries, not clearly different from 0, so no model comparisons were carried out. The other features all showed fairly clear effects of phrase-initial and post-pausal positions: Higher positions in the prosodic hierarchy are associated with lower probabilities of voicing and formant presence, and higher probability of bursts, all consistent with fortition (z values in the 2–5 range).
Prosodic parameters from models of voicing are shown in Table 11. Burst and formant models patterned very similarly, and are omitted here for the sake of brevity.
|Voicing||Model 1||Model 2 (duration)||Model 3 (RVE & delta-i)||Decrease M1-2||Decrease M1-3|
|UR: medial stop||–0.51||–1.21||–0.49||–1.17||–0.43||–0.99||0.04||0.16|
|Post-tonic × sub||0.12||0.13||0.10|
|Post-nuc. × sub||0.75||1.01||0.85|
|Geminacy × sub||1.10||1.20||0.89|
The mediating effects of continuous phonetic measurements on qualitative ones with regard to phrase-initiality are larger than the converse effects from the preceding sections (which entailed less than 10% reduction). In this and other models, duration explains 20–40% of the difference in qualitative phonetic features between word- and phrase-initial position. Residual effects of phrase-initiality on qualitative features have z scores in the 1.3–2.3 range, indicating trends that are not particularly robust, though possibly still evidence for an independent effect on qualitative features. Duration explains less of the post-pausal effects on these features (10–25%), and residual effects are large and robust for the most part (z scores in the 3–4 range). Models incorporating intensity showed a similar pattern: Phrase-initiality effects on qualitative phonetic features partially reduce to delta-i and RVE differences (20–50% effect decreases with residual z scores in the 1–2 range), but post-pausal effects do not (0–20% decreases, z scores in the 3–4 range).
It is important to note that these mediation estimates, unlike the ones in the preceding sections, are almost certainly overly conservative. This is because they exclude trials where the continuous predictors can’t be measured, and those missing trials will tend to be cases with extremely fortis consonants (mainly in domain-initial positions) and extremely lenis ones (those with no intensity pit, mainly in domain-medial positions). For instance, simply comparing baseline models (with no continuous phonetic predictors) with and without the missing-measurement trials results in a 12% reduction in the effect of phrase-initiality independent from the reduction shown above.
Until now, we have excluded utterance-initial consonants from models. This is because neither duration nor delta-i can be measured in this position, where VC transitions are absent. RVE and qualitative phonetic features, however, are coherent for utterance-initial consonants. These consonants are important because they form the basis for segmental generalizations about unlenited forms in phonological descriptions, which compare word-initial consonants in citation forms to those observed sentence-medially.
We compared models of RVE in post-pausal and utterance-initial positions with and without qualitative phonetic features, as well as the converse models of qualitative phonetic features with and without RVE. We found that logit models were almost impossible to fit and returned convergence warnings, due to the sparsity of data at such high prosodic levels. As a follow-up, we tried models including data from the phrase-initial position in addition to post-pausal and utterance-initial. Here models were easier to fit and returned clear results. Results for modeling RVE are shown in Table 12, results for voicing in Table 13. Again, we omit models of the other qualitative features, which patterned very similarly to voicing.
|RVE||Model 1||Model 2 (qualitative features)|
|Fixed effects||β||t||β||t||β decrease|
|Voicing||Model 1||Model 2 (RVE)|
|Fixed effects||β||z||β||z||β decrease|
Utterance-initial obstruents display much steeper release slopes than post-pausal ones; this effect is barely mediated by qualitative phonetic features. Utterance-initiality makes consonants more likely to be voiceless, formant-less, and fricated (this is due to the fricated release of many voiceless stops in this context). There is barely any effect on burst probability, so no further modeling of this variable was done. Inspection of Figures 3 and 4 suggests that the lack of effect here is because voiceless stops are somewhat more likely to display a burst in utterance-initial position, while voiced stops are somewhat less likely to do so; in general, burst rates for both types of segments are high in both positions.
For voicing, formants, and frication, which do show clear effects of utterance-initiality, 20–60% of the effects are captured by RVE. The residual effects after RVE is taken into account are fairly weak: frication comes the closest to showing an effect independent of RVE, with a z-score of 1.84. So while it’s not clear that RVE captures all of the information about utterance-initiality that qualitative phonetic features do, it is clear that it captures some or most of the same information, and significant amounts of additional information (as shown by the large residual effect of initiality on RVE).
Using Ennever et al.’s (2017) method for quantifying lenition, this study has produced evidence for a range of prosodic and featural influences on consonantal acoustics. Here we summarize the main findings and discuss their implications.
We found that the putative /T/ series is more affected by lenition than the /D/ series, in terms of qualitative phonetic features like voicing and continuancy. This is broadly consistent with Bolognesi’s (1998) description. However, the /D/ series does still frequently lenite, and both series are far more likely to lenite to noiseless approximants than noisy fricatives. These results differ from Bolognesi’s impressionistic description. We found that medial geminate obstruents differ from singletons in duration, intensity, and the qualitative phonetic features associated with fortition (more likely to be voiceless, audibly released, etc.). While Virdis (1978) and Bolognesi (1998) both assume that this means the relevant contrasts are featural rather than length-based, our modeling suggests that neither qualitative differences nor duration differences can be reduced to the other. This means that there is no prinicipled basis on which to posit one contrast as ‘basic’ and the other as ‘enhancement’; they are specified independently at some level of grammar, and differ notably in this regard from lenis-fortis alternations.
We found evidence for at least one level of prosodic structure intermediate between the word and intonational phrase. This result provides quantitative confirmation of the native-speaker intuitions expressed by Del Mar Vanrell et al. (2015). The intermediate phrasal level is associated probabilistically with large syntactic constituents such as matrix predicates, verbal arguments and adjuncts, and clauses; these are opposed to smaller word/phrase boundaries such as prepositional complements of nouns and verbs following auxiliaries. While this syntactic coding of position undoubtedly misses some information about prosodic units, the results here show that it still detects reliable prosodic effects in the expected direction (fortition at larger boundaries). This is true even though the largest prosodic breaks, marked by a full pause, were treated separately.
We showed that lenition/fortition processes extend throughout the prosodic hierarchy at levels above the word. At each higher level of prosodic boundary, duration is longer, intensity is lower, intensity changes are more pronounced, and phonetic features characteristic of obstruents are more likely to be present. We found an ‘anti-fortition’ effect at word boundaries relative to word-medial position. The causal structure of this effect is different from fortition-lenition effects, and it may be a result of word-final vowel reduction rather than manipulation of consonants. The phrasal fortition-lenition pattern is not limited to obstruents: Sonorants, and even vowel-initial lexical items, all show at least one kind of duration or intensity effect at each level of prosodic phrase examined here. This is broadly consistent with a phonetic account of phrasal fortition-lenition, stated in terms of gestural properties and/or auditory targets (for disruption, salience, or some other property). It is less clear that the pattern could be accommodated in a feature-based phonological framework. And to the extent that intervocalic continuity lenition processes such as spirantization tend to be accompanied by these broader lenition-fortition patterns across languages (e.g., Pierrehumbert & Talkin, 1992; Turk, 1993 for English; Kingston, 2008; Hualde et al., 2011 for Spanish), it suggests that treating them as phonological processes may not be the right approach at all.
For lenition/fortition patterns above the word, we found that differences in intensity-movement during closure are mostly a consequence of changes in duration, and not vice versa. Neither duration nor intensity effects of prosodic phrase-initiality are mediated by qualitative phonetic changes; there is more evidence of mediation for post-pausal consonants, but robust independent effects on continuous measurements persist. Changes in qualitative phonetic features of obstruents between the word- and phrase-initial levels, and between the post-pausal and utterance-initial levels, are partially mediated by changes in duration and release velocity, even judged by an overly conservative procedure. Residual effects of prosody on qualitative features tend to be fairly weak and variable, though not completely eliminated. For the distinction between post-pausal and phrase-initial positions, however, there are large changes in both continuous phonetic measures and qualitative phonetic features, neither of which fully explains the other.
These results converge on arguments from word boundaries (Ennever et al., 2017) and stress adjacency and informational factors (Cohen Priva & Gleason, 2019) that duration is causally prior to intensity in lenition/fortition processes. We also extended this argument to show that variation in manner and voicing features across prosodic positions is largely accounted for by differences in duration and release velocity. It should be noted, however, that asymmetries are weaker for post-pausal consonants and it is consequently more difficult to disentangle the roles of duration, intensity, and qualitative features in this position. This is not entirely surprising: It simply shows that segmental dynamics are different when the acoustic energy of a preceding vowel is allowed to fully die out than when it is not, in ways that duration alone can’t predict.
Our results also show that only some intensity changes from lenition can be reduced to duration: Changes in release velocity between prosodic levels are not fully explained by duration. The independent role for release velocity suggests that both duration and some specification of articulatory stiffness or auditory disruption will be necessary to characterize boundary-driven lenition. Another thing this shows is that the use of closure measurements to the exclusion of release ones in characterizing lenition/fortition processes is not justified. While some articulatory approaches suggest that lenition can be adequately described solely in terms of undershoot and/or shallower transitions into the consonant closure (Ennever et al., 2017), our data clearly show that in Campidanese the velocity of the release is more relevant.
These results bear on one of the overarching theoretical questions discussed in Section 1.3: To what extent can the effects of lenition on consonant manner be reduced to effects on duration? The question is important for Campidanese because describing lenition as changes in consonant manner has been problematic for virtually all phonological theories and requires special theoretical mechanisms (Bolognesi, 1998; Lubowicz, 2002; Hayes & White, 2015; Storme, 2018). The results presented here justify cautious optimism that most of the changes in manner associated with Campidanese lenition at the phrasal and utterance levels (though not post-pausally) causally reduce to changes in duration. In other words, the theory may not require speakers to have voicing and spirantization lenition under active control. If so, then there is no theoretical issue and no special mechanisms are required to describe the system. That said, the ability of duration measurements to account for other lenition effects in these data is not complete; in the face of measurement error, it is unrealistic to expect one set of imperfect phonetic measurements to fully explain some other set of imperfect measurements. The data also suggest that while many manner-relevant parameters may be reduced to duration (and pause presence), the abruptness of consonantal releases (RVE) is probably under independent control.
While this study focused on prosodic boundaries, our modeling also teaches us about other structural and featural properties. In general, preceding and following stress, nuclear stress, and UR features (length, manner, voicing) all had robust effects on most of the continuous and categorical variables studied here. For most of these predictors, neither type of phonetic effect appeared to be mediated by the other, unlike phrase- and utterance-driven lenition-fortition patterns. This amounts to an argument that the phonetic properties targeted by boundary-driven fortition-lenition processes are not the same as the phonetic properties involved in UR contrasts, nor those affected by stress. This is a further argument that treating Campidanese lenition as a rule that changes manner or length features is not the correct approach, because it misses important generalizations about which parameters are under grammatical control.
Cohen Priva and Gleason (2019) find that in American English, intensity-based lenition for post-tonic consonants is largely mediated by duration, but this may be very specific to American English. Core lenition processes such as tapping and approximantization in American English do clearly interact with stress (e.g., Turk, 1993; Bouavichith & Davidson, 2013), possibly because both are conditioned by prosodic foot structure. But while stress-conditioned lenition is not unheard of in other languages, it is certainly not the norm (González, 2003; Bye & De Lacy, 2008). Our data show that the effect of prosodic boundaries on consonantal fortition-lenition can be quite cleanly separated from the effect of preceding or following stress.
We investigated various ways of measuring intensity. To sum up, all intensity measurements except CVE deliver qualitatively similar results for prosodic boundary-driven lenition above the word. Delta-i and RVE are somewhat more uniform across the various levels of phrasal lenition/fortition examined here, whereas intensity minima return much larger effects for pauses than non-pausal phrase boundaries. One implication of this is that any of these choices for measuring intensity-related lenition is reasonable. Because phoneticians are almost always interested in the relative intensity of various items rather than their absolute intensity (or more properly, intensity relative to ambient sound pressure), the choice of intensity measurement makes little difference to our ability to detect effects, even with relatively noisy and heterogeneous recordings. And the fact that effects of prosodic boundaries on delta-i and RVE can be robustly detected even in the presence of variation in stress, accent, and vowel-quality suggests that these measurements are less susceptible to interference from vowels than one might have suspected. That said, various intensity measurements do differ quite a bit in their patterning with regard to stress and accent.
For stress, intensity minima tend to change more and in more complex ways in different post-tonic and non post-tonic positions than delta-i and RVE; whether this is a desirable property depends on what one is trying to study. While post-tonic position is associated with increased intensity minima, post-nuclear position had a large negative effect, more than reversing the positive effect from non-nuclear stress. We suspect that this is due to a high-level prosodic feature observed in our materials: The ends of utterances tend to be marked by extreme lowering and compression of pitch and intensity, sometimes accompanied by glottalization. All post-nuclear consonants occur in the final 2 or 3 syllables of an utterance, which will generally be contained within this area of compression when it occurs. This may be why intensity minima are so low here. Delta-i, being defined relative to the local context, is more likely to factor out this high-level compression pattern, and the results are not terribly different from cases of stress elsewhere in the utterance. This shows that centering measurements by speaker or sound file does not necessarily succeed at factoring out variation in intensity extrinsic to consonant lenition; delta-i comes much closer to eliminating the influence of this variation.
Finally, there is the issue of whether impressionistic measurements of qualitative phonetic features can be abandoned and replaced by continuous properties of the intensity contour. Our results suggest that differences in burst, voicing, etc. associated with fortition-lenition patterns are mainly explained by continuous intensity measurements (and by the presence or absence of a preceding pause). But the same is not true for differences between geminates and singletons, UR contrasts for voice or manner, nor for the effect of stress on qualitative phonetic features. This suggests that, despite their seemingly less objective and principled basis, impressionistic phonetic judgments capture some kinds of information beyond what can be inferred from intensity slopes, changes, and minima. For lenition-fortition patterns specifically, however, there may not be any need to state generalizations in terms of categorical phonetic or phonological features: The only irreducible information about such patterns in our data come from duration and RVE. Even differences between consonants in absolute initial and post-pausal positions were shown to follow from release velocity (RVE) in a way they do not follow from qualitative features.
There are several outstanding questions about Campidanese and lenition more generally that we hope to address in future research. While we gave a broad phonetic description of various UR categories in Section 4.1, a full exploration of the phonetics of UR manner, voicing, and length contrasts was not carried out here. Our data reveal several ambiguous or overlapping contrasts, such as the distinction between word-initial /T/ and /D/ series: Both are often realized as approximants phrase medially, and in post-pausal positions there are only probabilistic differences in voicing and fricated release. Further investigation of these and other contrasts could plausibly help settle questions about their precise featural specifications, which have generated a fair bit of disagreement and theoretical difficulties in the phonological literature (Virdis, 1978; Bolognesi, 1998; Lubowicz, 2002; Hayes & White, 2015; Storme, 2018).
One particularly interesting area is the phonetic and phonological properties of post-lexical geminates, which are described by Bolognesi (1998) and investigated phonetically by Ladd and Scobbie (2003) in the related Logudorese variety. Our data suggest that post-lexical voiceless stop geminates are quite similar to word-medial long voiceless stops. Post-lexical voiced stop geminates, on the other hand, pattern similarly to word-initial voiceless singletons with regard to qualitative manner features (this is a variant of the pattern described by Bolognesi and Ladd & Scobbie), but similarly to word-initial voiced singletons with regard to duration. This complex pattern deserves further study.
Finally, the phonetics of utterance-initial consonants should be studied in more detail. While we have qualitative features and release measurements for many of these consonants, duration is often impossible to measure and there are a host of other phonetic factors involved in utterance-initial fortis-lenis or length contrasts (see e.g., Ridouane & Hallé, 2017 for a host of intricate F0, intensity, and timing effects in Tashlhiyt Berber). These consonants are particularly important because they generally form the basis for the URs posited by phonologists in impressionistic descriptions. The data here suggest that the contrast between ‘voiceless’ and ‘voiced’ stops is particularly variable and unclear in absolute position, so additional phonetic investigation would be valuable.
The additional files for this article can be found as follows:Appendix A
FinalData.txt, the data used in the analyses in Section 4, in tab-delimited text format. DOI: https://doi.org/10.5334/labphon.184.s1Appendix B
DataDescription.txt, an explanation of the variables in the data file. DOI: https://doi.org/10.5334/labphon.184.s2
1Ennever et al. (2017), while they find that the effect of word-initiality on CVE is mediated by duration, still conclude that there is substantial variability in intensity not accounted for by duration. They suggest that the intensity target associated with a consonant is window-like rather than a point target.
2We made the following adjustments to the stop_lenition code: (1) ln. 30–31, added 0 and 3200 to default_band_floors and default_band_ceilings to test the ‘omnibus’ 0–3200 Hz band; (2) ln. 200, set relative_end_time to +300, to extend rightward search window for long consonants; (3) changed the identify_events function to return timepoints for CVE and RVE, to search for a preceding intensity peak, and to attempt to extract intensity landmarks even when there is no pit present. This last alteration was the only substantial change we made to the code, but as mentioned in the text, it returned uninterpretable noise. So we have not posted that code publicly.
Many thanks to Bob Ladd for extensive help with both linguistic and practical aspects of this research. Erich Round shared his brilliant code, his expertise in helping to implement it for our materials, and some very useful comments on an earlier draft of this paper. Uriel Cohen provided extensive and helpful discussion and comments on an earlier draft. Thanks as well to Maria Cristina Lavinio for logistic support, and Bruce Hayes for pointing the first author to Sardinian in the first place. This research has benefitted from discussion with Edward Flemming, Haike Jacobs, John Kingston, Donca Steriade, Anne-Michelle Tessier, and Maurizio Virdis. We are also very grateful to our Sardinian consultants for their time and hospitality.
The authors have no competing interests to declare.
The authors jointly designed elicitation materials. Gianmarco Pitzanti found and contacted participants, conducted the interviews, and helped extensively with understanding the lexical representations and variation present in the data. Jonah Katz made and processed the recordings, conducted acoustic and statistical analyses, and wrote the paper.
Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100, 603–617. DOI: https://doi.org/10.1348/000712608X377117
Barr, D., Levy, R., Scheepers, C., & Tilly, H. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255–278. DOI: https://doi.org/10.1016/j.jml.2012.11.001
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015a). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48. DOI: https://doi.org/10.18637/jss.v067.i01
Blasco Ferrer, E. (1984). Storia Linguistica della Sardegna. Tübingen: Niemayer. DOI: https://doi.org/10.1515/9783111329116
Boersma, P., & Weenink, D. (2018). Praat: doing phonetics by computer [Computer program]. Version 6.0.43, retrieved 8 September 2018 from http://www.praat.org/.
Bouavichith, D., & Davidson, L. (2013). Segmental and prosodic effects on intervocalic voiced stop reduction in connected speech. Phonetica, 70, 182–206. DOI: https://doi.org/10.1159/000355635
Browman, C., & Goldstein, L. (1986). Towards an Articulatory Phonology. Phonology, 3, 219–252. DOI: https://doi.org/10.1017/S0952675700000658
Bye, P., & de Lacy, P. (2008). Metrical influences on lenition and fortition. In J. de Carvalho, T. Scheer, & P. Ségéral (Eds.), Lenition and Fortition (pp. 173–206). Berlin: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110211443.1.173
Chong, A. (2011). Lenition in Gaalpu: An Optimality Theoretic analysis. Australian Journal of Linguistics, 31, 473–490. DOI: https://doi.org/10.1080/07268602.2011.625601
Cohen Priva, U. (2017). Informativity and the actuation of lenition. Language, 93(3), 569–597. DOI: https://doi.org/10.1353/lan.2017.0037
Cohen Priva, U., & Gleason, E. (2019). The causal structure of lenition: A case for the causal precedence of durational shortening. Ms., Brown University. https://urielcpublic.s3.amazonaws.com/CohenPriva_Gleason-Lenition_Submitted.pdf.
Contini, M., & Boë, L. J. (1972). Voyelles orales et nasales du sarde campidanien. Phonetica, 25, 165–191. DOI: https://doi.org/10.1159/000259379
Del Mar Vanrell, M., Ballone, F., Schirru, C., & Prieto, P. (2015). Sardinian intonational phonology: Logudorese and Campidanese varieties. In S. Frota, & P. Prieto (Eds.), Intonation in Romance (pp. 317–349). Oxford, UK: Oxford University Press. DOI: https://doi.org/10.1093/acprof:oso/9780199685332.003.0009
Ennever, T., Meakins, F., & Round, E. (2017). A replicable acoustic measure of lenition and the nature of variability in Gurindji stops. Laboratory Phonology, 8(1), 1–32. DOI: https://doi.org/10.5334/labphon.18
Harris, J. (2003). Grammar-internal and grammar-external assimilation. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences (pp. 281–284). Barcelona: Futurgraphic.
Hayes, B., & White, J. (2015). Saltation and the P-map. Phonology, 32, 267–302. DOI: https://doi.org/10.1017/S0952675715000159
Honeybone, P. (2008). Lenition, weakening, and consonantal strength: Tracing concepts through the history of phonology. In J. de Carvalho, T. Scheer, & P. Ségéral (Eds.), Lenition and Fortition (pp. 9–93). Berlin: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110211443.1.9
Hualde, J. I., Simonet, M., & Nadeu, M. (2011). Consonant lenition and phonological recategorization. Laboratory Phonology, 2, 301–329. DOI: https://doi.org/10.1515/labphon.2011.011
Katz, J. (2013). Asymmetries in English vowel perception mirror compression effects. Phonetica, 70, 93–116. DOI: https://doi.org/10.1159/000354535
Katz, J. (2016). Lenition, perception, and neutralisation. Phonology, 33(1), 43–85. DOI: https://doi.org/10.1017/S0952675716000038
Katz, J., & Fricke, M. (2018). Auditory disruption improves word segmentation: A functional basis for lenition phenomena. Glossa, 3(1), 38. DOI: https://doi.org/10.5334/gjgl.443
Kawahara, S. (2006). A faithfulness ranking projected from a perceptibility scale: The case of [+voice] in Japanese. Language, 82, 536–574. DOI: https://doi.org/10.1353/lan.2006.0146
Keating, P. (2006). Phonetic Encoding of Prosodic Structure. In J. Harrington, & M. Tabain (Eds.), Speech production: Models, phonetic processes, and techniques (pp. 167–186). New York: Psychology Press.
Ladd, D. R., & Scobbie, J. (2003). External sandhi as gestural overlap? Counter- evidence from Sardinian. In J. Local, R. Ogden, & R. Temple (Eds.), Papers in Laboratory Phonology VI (pp. 164–182). Cambridge: Cambridge University Press.
Lavoie, L. (2001). Consonant Strength: Phonological Patterns and Phonetic Manifestations. New York: Garland. DOI: https://doi.org/10.4324/9780203826423
Liberman, A., Cooper, F., Shankweiler, D., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431–461. DOI: https://doi.org/10.1037/h0020279
Lindblom, B. (1983). Economy of Speech Gestures. In MacNeilage (Ed.), The Production of Speech. DOI: https://doi.org/10.1007/978-1-4613-8202-7_10
Lubowicz, A. (2002). Derived environment effects in OT. Lingua, 112, 243–280. DOI: https://doi.org/10.1016/S0024-3841(01)00043-2
Molinu, L. (2017). Fonetica, fonologia, prosodia: Sincronia. In E. Blasco Ferrer, P. Koch, & D. Marzo (Eds.), Manuale di Linguistica Sarda (pp. 339–358). Berlin: de Gruyter. DOI: https://doi.org/10.1515/9783110274615-021
Munhall, K., Ostry, D., & Parush, A. (1985). Characteristics of velocity profiles of speech movements. Journal of Experimental Psychology: Human Perception and Performance, 11(4), 457–474. DOI: https://doi.org/10.1037//0096-15126.96.36.1997
Pierrehumbert, J., & D. Talkin. (1992). Lenition of /h/ and glottal stop. In Gerard J. Docherty, & D. Robert Ladd (Eds.), Papers in Laboratory Phonology II (pp. 90–117). Cambridge, UK: Cambridge Univ. Press. DOI: https://doi.org/10.1017/CBO9780511519918.005
Raphael, L. (1972). Preceding vowel duration as a cue to the perception of the voicing characteristic of word-final consonants in American English. Journal of the Acoustical Society of America, 51(4), 1296–1303. DOI: https://doi.org/10.1121/1.1912974
Ridouane, R., & Hallé, P. (2017). Word-initial geminates: From production to perception. In H. Kubozono (Ed.), The Phonetics and Phonology of Geminate Consonants (pp. 66–84). Oxford, UK: Oxford University Press. DOI: https://doi.org/10.1093/oso/9780198754930.003.0004
Ségéral, P., & Scheer, T. (2008). Positional factors in lenition and fortition. In J. de Carvalho, T. Scheer, & P. Ségéral (Eds.), Lenition and Fortition (pp. 131–172). Berlin: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110211443.1.131
Smith, J. (2008). Markedness, faithfulness, positions, and contexts: Lenition and fortition in Optimality Theory. In J. de Carvalho, T. Scheer, & P. Ségéral (Eds.), Lenition and Fortition (pp. 519–560). Berlin: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110211443.3.519
Storme, B. (2018). Derived environment effects and logarithmic perception. In G. Gallagher, M. Gouskova, & S. Yin (Eds.), Supplemental Proceedings of the 2017 Annual Meeting on Phonology. Washington, DC: Linguistic Society of America. DOI: https://doi.org/10.3765/amp.v5i0.4229
Sussman, H., & Shore, J. (1996). Locus equations as phonetic descriptors of consonantal place of articulation. Perception & Psychophysics, 58(6), 936–946. DOI: https://doi.org/10.3758/BF03205495
Szigetvári, P. (2008). What and where? In J. de Carvalho, T. Scheer, & P. Ségéral (Eds.), Lenition and Fortition (pp. 93–130). Berlin: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110211443.1.93
Warner, N., & Tucker, B. (2011). Phonetic variability of stops and flaps in spontaneous and careful speech. Journal of the Acoustical Society of America, 130(3), 1606–1617. DOI: https://doi.org/10.1121/1.3621306
Winitz, H., Scheib, M., & Reeds, J. (1972). Identification of Stops and Vowels for the Burst Portion of /p, t, k/ Isolated from Conversational Speech. Journal of the Acoustical Society of America, 51, 1309–1317. DOI: https://doi.org/10.1121/1.1912976
Yeni-Komshian, G., & Soli, S. (1981). Recognition of vowels from information in fricatives: Perceptual evidence of fricative-vowel coarticulation. Journal of the Acoustical Society of America, 70, 966–975. DOI: https://doi.org/10.1121/1.387031