Previous studies of the phonetics of Danish stops have neglected closure voicing. Danish is an aspiration language, but the aspirated stops /p t k/ are produced with shorter closure duration and less articulatory effort than the unaspirated stops /b d ɡ/. Furthermore, all Danish stops are characterized by some degree of glottal spreading during the closure. In this study, we use a corpus of Danish spontaneous speech (DanPASS) to investigate the intervocalic voicing—its distribution across the two laryngeal categories, whether it patterns as a lenition phenomenon, and whether the aerodynamic environment predicts its distribution. We find that intervocalic voicing is not the norm for either set of stops and is particularly rare in /p t k/. Voiced tokens are mostly found in environments associated with lenition. We suggest that the glottal spreading gesture found in all Danish stops is a phonological mechanism blocking voicing, which is probabilistically lost in spontaneous speech. This predicts our results better than relying on laryngeal features like [voice] or [spread glottis]. The study fills a gap in our knowledge of Danish phonetics and phonology, and is also one of the most extensive corpus studies of intervocalic stop voicing in an ‘aspiration language.’
All things considered, the phonetics of Danish stops are extremely well-described, largely due to the keen understanding shown early on in the writings of Otto Jespersen (e.g.,
There is no closure voicing in absolute initial position in Danish, and negligible voicing in final position. Voicing is less well-understood in intervocalic position. An oft-repeated claim has it that medial stop allophones in Danish are almost always voiced (
Intervocalic voicing of underlyingly voiceless stops is phonetically well-understood but is a phonological conundrum. Voicing is usually difficult to maintain during closure, leading to the general assumption that the feature [voice] is phonologically marked in stops. Intervocalically, however, the vocal folds are initially adducted and tensed, and subglottal pressure is high, providing ideal conditions for closure voicing (
In this paper, we present an empirical study of intervocalic stop voicing in Danish, based on an existing corpus of spontaneous speech (the Danish Phonetically Annotated Spontaneous Speech corpus, or DanPASS;
Our results show that intervocalic voicing is very rare in /p t k/. Although much more frequent in /b d ɡ/, intervocalic voicing occurs in less than half of all /b d ɡ/ tokens. This rarity of intervocalic voicing is in essence the opposite conundrum of what we discussed above. Voicing
The paper is structured as follows: In the following subsections, we provide a background of closure voicing in phonetics and phonology with special focus on intervocalic position, and give an overview of the phonetics and phonology of Danish stops. In Section 2, we summarize our research questions and motivate all our independent variables. In Section 3, we provide an overview of our methods: We introduce the corpus we use and our data treatment. In Section 4, we provide an exploratory analysis of the data. In Section 5, we describe the selection of a logistic mixed-effects regression model and the results of that model. In Section 6, we discuss our research questions in light of our results, and in Section 7, we briefly summarize the findings.
Closure voicing in stops is relatively ‘unnatural’ (e.g.,
Westbury and Keating (
Articulatory naturalness does not always translate directly into attested typological patterns. On the one hand, in accordance with articulatory naturalness, there is a strong implicational hierarchy regarding voiced stops in phonological inventories: In almost all cases, languages with voiced stops also have voiceless stops (e.g.,
Below, we will characterize three general approaches to the representation of laryngeal contrasts in the phonological literature, and the predictions they make with regards to intervocalic stop voicing. There is a huge literature on the topic, so some approaches will necessarily be missed, while others may be grouped together even if they differ in some respect. We will refer to these approaches as ‘concrete [voice]’ approaches, ‘abstract [voice]’ approaches, and gesture-based approaches.
The phonological feature [voice] has been conceptualized in different ways. It sometimes refers quite narrowly to the presence of voicing during closure, which is what we refer to as concrete [voice]. This is how [voice] is conceptualized in the laryngeal feature geometry of Lombardi (
Jessen and Ringen (
In abstract [voice] approaches, the feature needs not directly refer to closure voicing. Chomsky and Halle (
If we expect a direct relationship between phonetics and phonology, then there should be a correspondence between phonologically and phonetically unmarked material. Given the aerodynamic account of stop voicing given above, this means that a phonologically unmarked stop should be voiceless initially and voiced intervocalically. It also means that phonetic reduction will be positionally defined: Devoicing of [voice] stops is a lenition phenomenon syllable-initially, whereas voicing of stops without [voice] is a lenition phenomenon intervocalically.
Gesture-based approaches of phonological representation can straightforwardly account for these positional markedness relations. One such approach is Articulatory Phonology (
These are a few predictions about the patterning of intervocalic stop voicing based on different conceptualizations of laryngeal representation: In concrete [voice] approaches, closure voicing is a necessary and sufficient criterion for [voice] and a different feature like [spread glottis] is needed to represent aspiration. From a concrete [voice] perspective, we would predict essentially categorical intervocalic voicing of all stops in ‘true voice’ languages, since [voice] ensures voicing in one category, and there are no available phonological mechanisms to counteract voicing in the other (unmarked) category. We would predict varying degrees of intervocalic voicing of unmarked stops in ‘aspiration’ languages, and very little voicing in [spread glottis] stops (following
Standard Danish has six phonemic stops, /b d ɡ p t k/. A common analysis of Danish holds that there are strong and weak syllabic positions: Strong position refers to onsets before full vowels, while weak position refers to codas and onsets before neutral vowels (e.g.,
Similar to some traditions of English transcription, /b d ɡ/ are in narrow transcription usually given as [b̥ d̥ ɡ̊], indicating that they are voiceless but phonetically lenis. /p t k/ are usually transcribed phonetically as [pʰ tˢ kʰ], with the superscript
The terms fortis and lenis are used in quite distinct ways in the phonetic and phonological literature. One use is as an essentially arbitrary label for stop contrasts in languages where the said contrast does not depend on voicing. Fortis–lenis has often been used in this sense when discussing Germanic languages, where the historic voiced-voiceless distinction has a diverse set of phonetic reflexes in the modern languages (
Regarding the [b̥]-style notation of lenis voiceless stops, this is only briefly mentioned in the
Overall, little has been written about closure voicing in Danish stops, and to our knowledge, no quantitative studies have been made of the topic. In essence, what we know from the existing literature is that all stops show some degree of voicing during the first portion of the closure when they occur between other voiced sounds (
Articulatory studies of carefully read speech have shown that intervocalically before stressed syllables, both /b/ and /p/ are accompanied by a glottal opening gesture during the closure (
From a phonological perspective, Iverson and Salmons (
Kingston and Diehl (
As mentioned in Section 1 above, a number of facts about Danish stops make it difficult to predict the relative likelihood of intervocalic voicing. First of all, most of the relevant literature seems to assume that intervocalic voicing is categorical or near-categorical. Some say that muscular tension is overall low in Danish stops, increasing the chances of voicing; but all Danish stops are also characterized by a glottal opening gesture, decreasing the chances of voicing. Closure duration is shorter and muscular tension weaker in the production of /p t k/ relative to /b d ɡ/, but /p t k/ also have a glottal opening gesture of greater magnitude.
The results may allow us to compare some of the predictions from different approaches to phonological laryngeal specification. If [spread glottis] is indeed the only active laryngeal feature, we would predict at most variable voicing in /b d ɡ/, and little voicing in /p t k/ (following
This paper is partly hypothesis testing, and partly exploratory in nature. We set out with the following research questions (RQ) in mind:
The known facts about Danish stop production point in different directions. If the vocal folds were in a neutral, adducted position during the closure, one would expect a higher likelihood of continuous closure voicing in /p t k/, since they have shorter closure duration (
From an aerodynamic perspective, voicing is natural in intervocalic stops, and there is evidence that voicing is actively blocked in all Danish stops. We test whether intervocalic voicing is more common in environments where we would generally expect lenition, which would be predicted from gesture-based underlying representations.
In addition to phonological laryngeal category and lenition, a host of other phonetic and extraphonetic factors are known to or can be expected to affect the probabilistic occurrence of consonant voicing (as established by e.g.,
The detailed annotations of the DanPASS corpus (see Section 3.1) allowed us to test how a large number of mostly categorical predictors may affect closure voicing. These predictors relate to segmental, prosodic, morphosyntactic, and other factors, which are discussed in the following subsections. Variables are capitalized when they are first mentioned.
We coded the stops themselves according to Laryngeal Category and Place of Articulation. There is really no theory-neutral way to refer to the two laryngeal stop series. Here, we use ‘aspirated’ and ‘unaspirated’ as short-hand terms for the surface contrast between /p t k/ and /b d ɡ/ in distinct speech, as discussed in more detail in Section 1.2.
We expect place of articulation to influence the likelihood of voicing, such that occlusions further back in the oral cavity reduce the chance of voicing. This is aerodynamically motivated (see Section 1.1 above for more details), and is reflected typologically: Voiced velar stops are less common than alveolar ones, which are in turn less common than bilabial ones (
The quality of surrounding vowels is expected to have an influence on the likelihood of closure voicing; note that Danish has an exceptionally complex vowel system (see
In locating intervocalic stops, Approximants were also considered vowels. We assume that approximants occurring immediately before the intervocalic stop decrease the chances of voicing, simply because approximants are less sonorous than nuclear vowels (e.g.,
As discussed above, there is reason to assume that intervocalic voicing in Danish is a lenition phenomenon resulting from voicing continuing from the preceding vowel lasting throughout the closure. Therefore, we expect voicing to be more likely in environments that are generally associated with weakening. We expected surrounding Neutral Vowels to increase the chances of voicing, since the Danish neutral vowels [ə ɐ] generally occur in prosodically weak syllables (e.g.,
We expected Stress on the syllable in question to reduce the chances of voicing, since stress generally reduces the chances of lenition phenomena occurring. If the preceding syllable has stress, we expect this to increase the chances of voicing, as it is unlikely for two syllables in a row to carry stress.
We expected the prosodic laryngealization phenomenon Stød to reduce the chance of voicing when adjacent, no matter whether on the preceding syllable or the syllable in question.
We coded the type of Morphological Boundary at which the intervocalic stop occurred. These include word boundaries, boundaries between roots and (derivational and inflectional) affixes, boundaries between separate parts of compounds, as well as no boundary if the intervocalic stop occurred morpheme-internally. It should be noted here that prefixes in Danish are exclusively derivational, while suffixes are mostly inflectional but can also be derivational. As consonants tend to be strong domain-initially (e.g.,
Additionally, we also coded words for being a member of either a Closed or an Open word class. Words from closed classes are often function words, and it is well-known that these often show significant phonetic reduction (e.g.,
In addition to the predictors already mentioned above, we included a lexical frequency measure. Lexical frequency is known to cause phonetic reduction, both in the course of language change (e.g.,
We also included a local measure of speech rate. Local Speech Rate should affect the chances of voicing for aerodynamic reasons: Unless inhibited, post-vocalic voicing should automatically continue for a certain amount of time during a stop closure (see Section 1.1). A higher speech rate also causes a shorter occlusion (as demonstrated for Danish by
We also coded the Individual Words, since Pierrehumbert (
Finally, we coded for a few extralinguistic factors pertaining to the speakers. Sex has been shown to have an influence on closure voicing, such that men are more likely than women to produce fully voiced stops (
We also coded the Individual Speakers. Sonderegger, Stuart-Smith, Knowles, Macdonald, and Rathcke (
The potential predictors and the directionality of their expected influence on closure voicing are summarized in
Potential predictors and the expected directionality of their influence on closure voicing.
Laryngeal Category | unaspirated > aspirated | |
Place of Articulation | bilabial > alveolar > velar | strongest effect for velar stops |
Adjacent Approximant | decreased | |
Adjacent High Vowel | decreased | strongest effect preceding the stop |
Adjacent Neutral Vowel | increased | |
Stress | unstressed > stressed | |
Preceding Stress | stressed > unstressed | |
Adjacent Stød | decreased | strongest effect preceding the stop |
Morphological Boundary | internal (no boundary) > inflectional > derivational > compound > word | |
Word Class Type | closed > open | |
Local Lexical Frequency | increases with frequency | |
Local Speech Rate | increases with speech rate | |
Lexical Item | random | |
Sex | men > women | |
Age | decreases with age | |
Individual Speaker | random | |
In order to answer our research questions, we used the DanPASS corpus (
The full DanPASS corpus consists of a number of monologues recorded in 1996, and a number of dialogues recorded in 2004. While the dialogues probably constitute a more natural speech setting, they are also somewhat more challenging to analyze. For this reason, the current study only makes use of the monologues. Monologues were recorded from 18 speakers, 13 men and 5 women. The speakers were between 20 and 68 years of age, with a mean age of 29 years. Overall, the monologues constitute 171 minutes of speech, with a mean duration of 9m27s of speech per speaker (range 6m13s – 15m49s). Technical details about the recordings can be found in Grønnum (
The recordings are accompanied by quite detailed annotations in Praat (
We used a Praat script to find stops that occur intervocalically in the DanPASS monologues, i.e., stops that do not occur initially in a prosodic phrase and are flanked on both sides by either vowels or central approximants. Approximants were included because there are well-defined phonological processes whereby they syllabify (
Intervocalic stops in the DanPASS monologues by phonemic category.
/b/ | 189 | 3–25 | ||
/d/ | 1,278 | 28–167 | ||
/ɡ/ | 752 | 26–65 | ||
/p/ | 327 | 8–32 | ||
/t/ | 431 | 16–39 | ||
/k/ | 767 | 24–67 | ||
Total | 3,744 | 117–341 | ||
There are 303 unique lexical items in the data, with an average of 12.4 observations per item, albeit a median of just 2 observations (range 1–303). The variance in lexical frequency is rather extreme. There are 125 lexical items which occur just once, while the 10 most frequent items occur a total 2,025 times.
For each of the intervocalic stops, we manually checked if it was voiced throughout the closure. This was done on the basis of visual inspection of the waveform: Constant periodicity up to the burst was taken as continuous closure voicing. Whenever stops from the /p t k/ series were fully voiced, they typically also had breathy voiced release. This method proved to be relatively straightforward to implement, although it is certainly a simplification of the complexity in the phonetic signal.
Waveforms exemplifying: 1) A fully voiced token of /b/ in the phrase <fr(a be)ˈgyndelsen> ‘from the start.’ 2) A mostly voiceless token of /ɡ/ in the same phrase as 1, <fra b(eˈgy)ndelsen>. 3) A fully voiced token of /k/ from the phrase <d(u ka)n> ‘you can’. 4) A mostly voiceless token of /k/ from the word <ˈf(irka)nt> ‘square.’
Recent studies by e.g., Davidson (
Ideally, we would be working with a continuous measure of closure voicing, possibly measuring both intensity and relative duration of voicing. However, this would require much more fine-grained segmentation of the sound files, and we did not have the resources to add these to the existing annotations. It is quite possible that true effects of some lower-level variables on voicing are masked in this study because of our relatively rough voicing measure.
All statistics used in the current study were calculated using the R statistical environment (
In this section, before proceeding to building a regression model, we take a closer look at the data and explore correlations between the individual predictors and the presence of intervocalic voicing.
Table of proportion of fully voiced tokens for each level of each categorical variable. Variables marked √ show correlations in agreement with our hypotheses in
Laryngeal category | Aspirated |
5.05 |
77 |
√ |
Place of articulation | Bilabial |
17.25 |
89 |
÷ |
Preceding approximant | Absent |
25.81 |
832 |
√ |
High vowel | Absent |
22.38 |
584 |
÷ |
Preceding high vowel | Absent |
25.18 |
748 |
√ |
Neutral vowel | Absent |
22.31 |
747 |
√ |
Preceding neutral vowel | Absent |
26 |
612 |
÷ |
Stress | Absent |
26.13 |
712 |
√ |
Preceding stress | Absent |
25.31 |
637 |
÷ |
Stød | Absent |
26.39 |
792 |
√ |
Preceding stød | Absent |
25.26 |
914 |
√ |
Morphological boundary | Internal |
36.12 |
95 |
÷ |
Word class type | Open |
19.33 |
407 |
√ |
Sex | Female |
20.96 |
192 |
√ |
Stacked bar plots showing the proportions of tokens with and without continuous voicing for each level of each categorical variable. (Morphological boundary levels = internal, inflectional, derivational, compound, word).
Laryngeal Category shows a clear correlation in the expected direction. As we predicted above, intervocalic voicing is quite rare in /p t k/, where it was only found in 5% of all tokens. Intervocalic voicing is more common in /b d ɡ/, where it was found in 38% of all tokens. Hence, voicing is not the norm for /b d ɡ/, even though this is sometimes described as being essentially categorical. In total, continuous closure voicing is found in 24.6% of all intervocalic stops in the corpus.
Place of Articulation does not pattern as predicted from our aerodynamically motivated expectations; as expected, bilabials are voiced more often than velars, but unexpectedly, alveolars are voiced at a much higher rate. Presumably, there are non-aerodynamic reasons for this. Alveolar stops are generally more frequent than other places of articulation, and they are found at a higher rate in function words. While the transcriptions do in principle indicate tapped realizations of the alveolar stops, it is also likely to be somewhat inconsistent, such that some realizations that are transcribed as alveolar stops are in fact alveolar taps [ɾ]; these are of course always voiced.
Preceding Approximants, as expected, are less likely than nuclear vowels to correlate with voicing in the following stop.
The behavior of High Vowels goes against our predictions; we expected high vowels to decrease the chances of voicing, in particular preceding the stop. In fact, high vowels preceding the stop just show a weak correlation in the expected direction, and high vowels in the same syllable correlate positively with voicing. This is contrary to our aerodynamically motivated predictions but could have a number of other explanations: High vowels are found in a number of very frequent function words; [ɪ ʊ] are both included in this group, and they are derived from underlying sequences of the approximants [ɪ̯ ʊ̯] assimilating with schwa. As such, there are predictable reasons why we might expect syllables with high vowels to frequently undergo phonetic reduction.
As predicted, Neutral Vowels in tautosyllabic position correlate positively with the presence of closure voicing. However, against expectations, neutral vowels in the preceding syllable show a slight correlation with the absence of closure voicing.
As predicted, voicing is more common in
Our predictions regarding Morphological Boundary Type mostly do not pan out. By far the most voiced stops are at inflectional boundaries, with derivational morphemes and morpheme-internal stops being voiced at approximately the same rate. Stops at word boundaries, by far the most common category, show intervocalic voicing at around chance rate, i.e., the same rate as the data set as a whole. Finally, stops at compound boundaries are rarely voiced. Given the complexity of this factor, we will hold off on interpreting these results further until we present the results of the regression model.
As predicted, Word Class Type interacts with closure voicing, such that members of the closed classes are voiced at a higher rate.
Sex correlates with voicing in the predicted direction, such that male speakers produce more voiced stops than female speakers.
Having discussed all categorical predictors, we now turn to the continuous ones.
Density plots showing the tokens with and without continuous voicing relative to continuous variables on a log-scale.
It is clearly (and logically) the case that there are most tokens of the most Frequent words in both the voiced and voiceless group. It is also clearly the case that the words with very high frequency show a higher proportion of voiced tokens, and similarly that the words with medium frequency, particularly between 50–500, show a higher proportion of voiceless tokens.
As predicted, Speech Rate clearly correlates with voicing, such that voiceless tokens are more common during slow speech, and voiced tokens are more common during quick speech (recall that speech rate is coded as the duration of the syllables flanking the stop, so a low value equals high speech rate). In both lexical frequency and speech rate, the distribution of fully voiced tokens is visibly more peaked than tokens which are not fully voiced.
We also see a correlation in the expected direction between Age and voicing. Most speakers in the corpus are younger than 25 years old, so it follows naturally that most tokens, both voiced and voiceless, are also produced by this age group. It is, however, also the case that speakers in their thirties and forties produce a relatively higher proportion of voiceless stops.
Having examined the correlations that are found in the empirical data, we will now move on to analyzing the data with mixed-effects regression modeling.
Our data comes from a corpus that was not collected for our purposes, and we are interested in quite many independent variables. Given the lack of experimental control and the partly exploratory nature of the study, our data is presumably not structured in a way that allows us to retain a maximal random effects structure; this is a common problem with mixed-effects models in linguistics (
The raw values of all our continuous variables are positively skewed, so they were log-transformed in order to reach a normal-distribution, and standardized to aid interpretation of the model.
1.
Velar contrast: –⅓ bilabial, –⅓ alveolar, +⅔ velar
Bilabials versus alveolars: +½ alveolar, –½ bilabial
The five-level Morphological Boundary variable is rather complicated. Here we coded four theoretically-guided Helmert contrasts: 1) Internal Contrast, testing the distinction between morpheme-internal and non-morpheme-internal; 2) Affix Contrast, testing the distinction between affix-boundaries and non-affix-boundaries; 3) Affix Type Contrast, testing the distinction between derivational affix-boundaries and inflectional affix-boundaries; and 4) Compound Contrast, testing the distinction between word-boundaries and compound boundaries.
2.
Internal Contrast: +⅘ internal, –⅕ inflectional, –⅕ derivational, –⅕ compound, –⅕ word
Affix Contrast: +½ inflectional, +½ derivational, –½ compound, –½ word
Affix Type Contrast: +½ derivational, –½ inflectional
Compound Contrast: +½ compound, –½ word
The data is modeled using logistic mixed-effects regression.
Fixed effects selection: All independent variables were theoretically motivated in Section 2 above, and are all included in the model. We have no theoretical motivation for including interactions. However, we saw in Section 4 that voicing in /p t k/ is near-floor, and this could be masking true effects in the data. For this reason, we tested all possible interactions with Laryngeal Category in a random intercepts-only model, in case some effects could be found only in /b d ɡ/. Only significant interactions were kept.
Random effects selection: All meaningful by-speaker and by-item random slopes were then added to the model; Sex and Age can of course not vary by-speaker, and all by-item slopes for phonological or morphosyntactic variables are at least potentially problematic. We used strictly uncorrelated random effects; this leads to much higher convergence rates in logistic models, and Seedorff et al. (
Summary of the final model.
Simple fixed effects | Intercept, Laryngeal Category, Place of Articulation (velar contrast, bilabials versus alveolars), Preceding Approximant, Preceding High Vowel, High Vowel, Preceding Neutral Vowel, Neutral Vowel, Preceding Stress, Stress, Preceding Stød, Stød, Morphological Boundary (Internal Contrast, Affix Contrast, Affix Type Contrast, Compound Contrast), Word Class Type, Local Lexical Frequency, Local Speech Rate, Sex, Age |
Interactions with laryngeal category | Preceding Approximant, Preceding Stress, Local Speech Rate |
By-speaker random effects | Intercept, Laryngeal Category (zero variance), Velar Contrast, High Vowel, Stress, Stød, Internal Contrast, Affix Type Contrast, Compound Contrast, Local Speech Rate |
By-item random effects | Intercept, Age, Sex |
None of the included independent variables shows problematic collinearity; the variance inflation factor (VIF) is below 1.5 for all variables except those appearing in interaction effects.
The coefficients of a generalized linear model correspond to log-odds. These are suitable for regression modeling, as they are unbounded and normally distributed. Odds and odds ratio (OR), on the other hand, are easier to interpret. In order to aid interpretability, we report both the model coefficients and standard error in the log-odds scale, and odds (ratio), which corresponds to exponentiated coefficients. The odds for the intercept can straightforwardly be interpreted as the odds of closure voicing with all other variables kept at zero. Since all variables are either contrast-coded or standardized, the ORs can be interpreted straightforwardly as the change in probability associated with that variable (see
The results of the logistic mixed-effects regression model described above is summarized in
Summary of logistic mixed-effects regression model. √ indicates agreement with our hypotheses in
(intercept) | 1 : 11.34 | –2.43 | 0.42 | –5.72 | <.001 | *** | |
Laryngeal Cat., – asp., + unasp. | 20.15 : 1 | 3 | 0.39 | 7.76 | <.001 | *** | √ |
Place, Velar Contrast (+velar) | 1 : 3.57 | –1.27 | 0.3 | –4.22 | <.001 | *** | √ |
Place, – bilabial, +alveolar | 1.16 : 1 | 0.15 | 0.34 | 0.44 | 0.66 | ||
Preceding Approximant | 1.63 : 1 | 0.49 | 0.3 | 1.62 | 0.1 | ||
Preceding High Vowel | 1.1 : 1 | 0.1 | 0.16 | 0.59 | 0.55 | ||
High Vowel | 1 : 1.06 | –0.06 | 0.25 | –0.24 | 0.81 | ||
Preceding Neutral Vowel | 1.32 : 1 | 0.28 | 0.14 | 1.95 | 0.05 | . | |
Neutral Vowel | 1.88 : 1 | 0.63 | 0.23 | 2.77 | <.01 | ** | √ |
Preceding Stress | 3.7 : 1 | 1.31 | 0.24 | 5.41 | <.001 | *** | √ |
Stress | 1 : 1.94 | –0.66 | 0.21 | –3.11 | <.01 | ** | √ |
Preceding Stød | 1 : 9.53 | –2.25 | 0.52 | –4.36 | <.001 | *** | √ |
Stød | 2.11 : 1 | 0.75 | 0.23 | 3.18 | <.01 | ** | ÷ |
Bnd: Internal Contrast (+int) | 1.28 : 1 | 0.25 | 0.32 | 0.78 | 0.44 | ||
Bnd: Affix Contrast (+affix) | 4.79 : 1 | 1.57 | 0.36 | 4.33 | <.001 | *** | (√) |
Bnd: Affix Type Contrast (+infl.) | 3.13 : 1 | 1.14 | 0.61 | 1.87 | 0.06 | . | |
Bnd: Compound Contrast (+cp.) | 1.57 : 1 | 0.45 | 0.41 | 1.1 | 0.27 | ||
Word Class (– open, +closed) | 1.57 : 1 | –0.11 | 0.31 | –0.37 | 0.71 | ||
Local Speech Rate | 1 : 18.29 | –2.91 | 0.27 | –10.86 | <.001 | *** | √ |
Local Lexical Frequency | 1.85 : 1 | 0.61 | 0.27 | 2.25 | 0.02 | * | √ |
Sex (– f, +m) | 1.7 : 1 | 0.53 | 0.44 | 1.22 | 0.22 | ||
Age | 1 : 3.09 | –1.13 | 0.41 | –2.74 | <.01 | ** | √ |
Lar.cat. : Preceding glide | 1 : 2.59 | –0.95 | 0.56 | –1.71 | 0.09 | . | |
Lar.cat. : Preceding stress | 1 : 3.7 | –1.31 | 0.47 | –2.76 | <.01 | ** | |
Lar.cat. : Local speech rate | 6.5 : 1 | 1.87 | 0.51 | 3.67 | <.001 | *** | |
In some cases, the results of the mixed effects model tell quite a different story than the exploratory analysis presented in Section 4. In these cases, the results of the mixed effects model should be taken as the best possible description of the data. The odds for the intercept means that the relative likelihood of a stop being fully voiced is predicted as 11.34 times lower than not being fully voiced if all other variables are controlled for (i.e., kept at their average).
The significant variables overwhelmingly pattern as predicted. For the following categorical variables, this means that they are significant in the same (expected) direction as we saw in the exploratory analysis: Laryngeal Category, Neutral Vowel, Stress, and Preceding Stød. The effect of laryngeal category is very strong, with the unaspirated set being more than 20 times more likely to be voiced intervocalically. The probability of voicing is approximately doubled in syllables with neutral vowels as well as in unstressed syllables, and the probability is around 10 times lower immediately following syllables with stød.
The Place of Articulation variable patterns differently from what we saw in the exploratory analysis. The model finds that voicing in stops with a fronted occlusion, i.e., in bilabials and alveolars, is around four times more likely than in velar stops, but there is no significant difference between bilabials and alveolars. This is in line with our aerodynamically motivated predictions. Recall that alveolars were overall voiced at a much higher rate than other places of articulation; this effect disappears in a model that also takes e.g., stress and lexical item into account.
We actually see a fairly strong effect of Preceding Stress in the expected direction; voicing is around four times more likely following stressed syllables. This is interesting, because in the exploratory analysis there was essentially no correlation between preceding stress and voicing.
Unexpectedly, the Stød variable patterns in the opposite direction of our predictions and what we saw in the exploratory analysis. Closure voicing is found to be around twice as likely in syllables with stød. We return to this in the discussion in Section 6.3 below.
Only one of the contrasts for Morphological Boundary Type is found to have a significant effect on closure voicing: Affix-initial stops are voiced at a much higher rate (around four times) than stops at other morphological boundaries. There are good reasons to expect this at face value: /p t k/ are rarely found in affixes and never in inflectional affixes, affixes are almost never stressed, and affixes often have neutral vowels. However, these are all variables that we control for independently in the model, and because of this, we predicted that word-internal stops would be voiced at a higher rate than affixes. We return to this in the discussion.
Other categorical variables—Preceding Approximant, Preceding High Vowel, High Vowel, Preceding Central Vowel, Word Class Type, and Sex—have no significant influence on voicing in the model, although in some cases, there seemed to be clear correlations in the exploratory analysis. In all cases, we must assume that the correlation we saw at face value can be better explained by other (potentially random) variables in the data.
The influence of continuous predictors is visualized in
Plots showing the likelihood of fully voiced stops of continuous variables as predicted from the mixed-effects model. The x-axes are standardized units. Note that y-axes differ due to the very high likelihood of voicing in very quick speech, so keeping y-axes identical would blur the effect in other variables.
Plots showing the likelihood of fully voiced stops of interaction effects as predicted from the mixed-effects model. The x-axes are standardized units. Note that y-axes differ due to the very high likelihood of voicing in very quick speech, so keeping y-axes identical would blur the effect in other variables.
In this section, we discuss the results in relation to the research questions we presented in Section 2.
The strongest predictor of closure voicing is laryngeal category. There are two main findings here: 1) /p t k/ are voiced only very rarely, and much more rarely than /b d ɡ/, and 2) /b d ɡ/ are voiced commonly, albeit still at lower than chance rate. The three major accounts of laryngeal representation in (Danish) stops that we presented in the introduction all have mechanisms that can account for the second finding.
Abstract [voice] approaches straightforwardly predict the first finding; [–voice] stops are naturally voiced less frequently than [+voice] stops. With regards to the second finding, in Kingston and Diehl’s (
Concrete [voice] approaches also straightforwardly account for the first finding, but not necessarily the second finding. [spread glottis] should block voicing, while unmarked stops are expected to be voiced whenever natural (i.e., intervocalically). In Beckman et al.’s (
In the end, we believe the best explanation of our acoustic results is one that relies on our existing knowledge of glottal activity in Danish stops from research by Frøkjær-Jensen et al. (
The results relating to laryngeal category can be accounted for by all three major accounts of laryngeal representation, but not all theories predicted the results equally well. Recall from Section 1.1 that an abstract [voice] account did not allow us to make any specific predictions. A concrete [voice] account only predicts the results with the added machinery of gradient phonetic interpretation of feature values. A gesture-based account predicts the results well with no additional machinery: The necessary ‘ingredients,’ so to speak, are already built into the representational grammar.
On a final note, Schachtenhaufen (
Closure voicing is to a large extent found in environments where we expect to find phonetic lenition: Its occurrence increases with speech rate; it is found more frequently in unstressed syllables, in syllables with schwa, and in affixes. On the basis of our results, it seems sensible to consider intervocalic closure voicing a lenition phenomenon in itself.
This has some interesting phonological consequences. As discussed in Section 1.1, it is often difficult to account for phonetic voicing processes with reference to spreading of a [voice] feature. In phonological representational frameworks relying on privative features, the voiceless unaspirated stop is generally considered the unmarked one, i.e., it carries no laryngeal features. Similarly, voicing is generally not considered phonologically marked for sonorant sounds (e.g.,
The question remains:
This type of lenition is not predicted from either of the featural representational accounts discussed above. If /b d ɡ/ are abstractly specified as [voice], we would not expect lenition to be a requirement for phonetic voicing—in fact, Kingston and Diehl (
In
Summary of predictions from different theoretical approaches.
Concrete [voice] | Variable voicing | √ | Little voicing | √ | ||
Abstract [voice] | All outcomes possible | All outcomes possible | ||||
Gestures | Little voicing | √ | Very little voicing | √ | Voicing in both series | √ |
More voicing in /b d g/ | √ | |||||
We have already discussed the predictive power of some of our variables. Laryngeal Setting was a very strong predictor of voicing, as were a number of variables associated with lenition. Particularly strong lenition variables are Local Speech Rate, Preceding Stress, and Affix Boundaries —but overall, the majority of lenition variables have a significant influence on voicing in the expected direction. It is interesting that no effect was found for morpheme-initial stops, but one was found for stops at affix boundaries. In Section 2.1.3, we hinted that this may have an exemplar theoretic explanation: Affixes are so often encountered with closure voicing that it has seeped into the underlying representations at the morpheme-level in a way that is not predictable at the phoneme-level. This is obviously controversial, in large part because it is impossible to represent in many modular frameworks (where phonetic information is invisible to morphology), and it represents a quite different conception of phonological representation than those we have discussed above. The effect of affix boundaries remains an interesting problem for further research.
Many of the other variables that we expected to influence closure voicing were aerodynamic in nature, and fewer of those have an observable effect on closure voicing in our data. This may be either because these variables truly have no effect on closure voicing, or because the influence of these variables is more gradient in nature. It is possible that these variables would affect the relative duration of closure voicing within those stops that we simply categorize as ‘not fully voiced.’
We had a number of predictions for how the tongue position before and after the occlusion would affect the prevalence of closure voicing, which mostly come down to this: We expected a narrower constriction in the oral cavity before and after the occlusion to decrease the odds of closure voicing, because such sounds are sometimes taken to be less sonorous (e.g.,
Place of articulation has quite a strong effect on voicing, and this has an aerodynamic explanation. The supralaryngeal cavity is relatively small during a velar occlusion and provides little opportunity for passive expansion, and as such, velar stops are generally voiced at a lower rate. Alveolar and bilabial occlusions are more amenable to voicing, and the difference in size between the resulting cavities is negligible, which may be why they do not differ significantly in their amenability to voicing.
The influence of stød on the potential for closure voicing can also be thought of as an aerodynamic effect. The naturalness of intervocalic closure voicing crucially depends on high subglottal pressure at the time of occlusion and on vocal fold configuration being amenable to voicing. Closure voicing following stød is very rare—this was a strong effect in spite of the total number of relevant tokens being quite small—presumably because laryngeal contraction in the production of stød causes a vocal fold configuration that is less amenable to voicing than that of modally voiced vowels. Tautosyllabic stød was found to increase the chances of voicing, which is surprising, given that stød has many of the same syllable-initial cues as stress. However, one initial articulatory correlate of stød reported by Fischer-Jørgensen (
In this study, we report on the occurrence of intervocalic stop voicing in a corpus of spontaneous Danish speech. Although Danish stops are generally well-described, most of what has been previously written about voicing has been speculative. We show that intervocalic voicing is very rare in /p t k/ and occurs in less than half of the /b d ɡ/ tokens. In our modeling of the data, we controlled for a number of aerodynamically motivated predictors, most of which appear to have little influence on the occurrence of closure voicing. However, closure voicing was generally found at relatively high rates in environments where we also expect lenition, i.e., quick speech, unstressed syllables, before neutral vowels, and in morphological affixes. This supports an analysis of intervocalic voicing as a lenition phenomenon. These findings can be accounted for with reference to previous articulatory studies showing that both laryngeal series of Danish stops are produced with glottal opening gestures that counteract voicing, although these gestures differ in timing, magnitude, and functional importance. Intervocalic voicing can be modeled as the loss of this gesture, which is lost at a higher rate in /b d ɡ/, where it is shorter, of smaller magnitude, and does not serve a critical distinctive function. There is an extremely broad literature on laryngeal features and related phonological representation, and we have necessarily discussed only a few possible viewpoints here. If intervocalic voicing is indeed a lenition phenomenon, we suggest that this is best represented in a phonological representational framework with the capacity to directly incorporate the timing and magnitude of articulatory gestures, such as Articulatory Phonology.
Few corpus studies of intervocalic voicing are available, and as such, it is difficult to compare these results to other aspiration languages (or with true voice languages for that matter). This means that more studies are necessary detailing how different variables influence the probabilistic occurrence of closure voicing in stops in other languages. This will help determine which effects should be associated with phonetic implementation only, and which should be considered grammatically encoded.
See Kirby and Ladd (
The cause of this pattern is disputed. According to Kingston and Diehl (
See Steriade (
Katz (
Neutral vowel here refers to schwa as well as approximants that are syllabic due to schwa-assimilation, as well as unstressed [i] in some morphological contexts.
In Frøkjær-Jensen, Ludvigsen, and Rischel (
[ɤ̯] is usually transcribed with [ð], in spite of the sound being highly vocalic (
At the morpheme level, primary stress is a phonological prerequisite for stød. In compounds, however, primary stress generally falls on the first member, while the second member has stød; some derivational processes also behave this way (e.g.,
An alternative would be to use Basbøll’s (
This is admittedly a rough measure of speech rate, chosen mostly out of convenience (it was easy to extract from the existing data frame). It is not unheard of, though; Bohn (
However, tapped realizations of /t ~ d/ are marked as such, i.e., with [ɾ].
Davidson (
We fitted logistic mixed effects models using the
The examples from above the age of 50 all come from a single speaker, so these can safely be ignored.
We standardized continuous variables by subtracting the mean and dividing by two standard deviations, following Gelman and Hill (
The model was fitted using the glmer() function in lme4, using bound optimization by quadratic approximation (the ‘BOBYQA’ optimizer), with the maximal number of iterations increased from the default 105 to 106. These low-level mechanical details should have no effect on the results, but could be important for reproducibility. See note 12 for more details on the R packages used and Puggaard-Rode et al. (
This is, of course, a direct result of the generative capacity of Articulatory Phonology being very powerful; this is an advantage here, but certainly also has its disadvantages.
The corpus used for this study is available online, but password-protected (
We would like to thank Dirk Jan Vet for invaluable help with Praat scripting, and Paul Boersma for providing thorough comments on a previous version of the statistical modeling. We would also like to thank Nicolai Pharao, Bert Botma, and Janet Grijzenhout for helpful comments on previous versions of the manuscript, as well as the audience at the 2nd Phonetics and Phonology in Denmark meeting. Finally, we would like to extend our gratitude to associate editor James Kirby and two anonymous reviewers, who made several suggestions that significantly improved the quality of the manuscript. Any remaining faults are our own.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 894936.
The authors have no competing interests to declare.