There are numerous reported cases in English dialects where apparent surface contrasts, manifested by the presence of minimal pairs, are, in fact, structurally predictable. For instance, in Scottish English, long vowels are found in open syllables, e.g., in brew, but also preceding a morphological boundary, e.g., in brew-ed. In contrast, vowels are shortened in the same segmental context when no morphological boundary intervenes, e.g., in brood (Aitken, 1981; Scobbie et al., 1999; Scobbie & Stuart-Smith, 2008). Similarly, for some accents of American English, /l/-darkening is reported to apply in canonical coda positions, but also pre-vocalically before a morphological boundary, yielding a contrast between words like hail-y and Hailey (Boersma & Hayes, 2001; Lee-Kim et al., 2013). An even more striking example, since it involves high frequency words and highly productive suffixation, involves day-s and daze in Belfast English, where the latter is pronounced with a centring diphthong, while the former has a more monophthongal quality (Harris, 1994). We will refer to cases like this as ‘fuzzy contrasts’ or ‘morphologically conditioned contrasts/differences,’ meaning segmental differences triggered by the presence of morpheme boundaries (see Hall, 2009 for discussion on other terms used in the literature in such cases).
Extant theories of sound change have formulated distinct hypotheses concerning the diachronic origins of fuzzy contrasts. Bermúdez-Otero & Trousdale (2012) argue that sound changes go through a specific life cycle which involves progression through increasing levels of grammatical complexity. According to that model, sound changes originate as phonetic, subsequently enter phonological grammar, and only later become sensitive to morphological or lexical influences. Fuzzy contrasts are introduced through a specific type of innovation, domain narrowing, whereby a previously transparent phonological rule begins to operate in a smaller morphosyntactic domain: For example, a rule that initially applied domain-finally in grammatical words begins to apply domain-finally in stems. An empirical prediction which follows from this model is that sound changes should be phonologically transparent during early stages of their phonetic development. A related prediction is that whenever morphological effects are present, they involve distinctions between categorical allophones. This is tied to a broader theory of modularity in grammar (e.g., Levelt et al., 1999), which posits that organization of grammar is strictly hierarchical. The aspects of this hierarchy that are relevant to us state that the output of the morphological component is fed into the phonology module, and the output of that becomes the input into phonetics. The mental lexicon interacts with both morphology and phonology, but it does not interact with phonetics. Similarly, there are no direct interactions between morphology and phonetics, because these two modules do not share an interface.
Since in the modular architecture fuzzy contrasts are a product of a distinct sound change, they are expected to be somewhat empirically restricted. An alternative view is that fuzzy contrasts are relatively common, although they may sometimes be too small to be readily observable. This is proposed, among others, by Bybee (2001), who argues for an exemplar-based approach, where lexically related words are conditioned to undergo similar processes in sound change (see also Johnson, 1997 and Bybee, 2006). This idea entails a different conception of fuzzy contrasts: The morphological effects are apparent, because the contrast is a reflection of lexical relationships between related words. Furthermore, there are no restrictions on when fuzzy contrasts may first appear in sound change. In fact, such contrasts may potentially be present from the very onset of change. Further adjustments to the exemplar model of lexical storage are hybrid approaches which also have capacity to model phonological category behaviour, in addition to word-specific phonetics (Pierrehumbert, 2002, 2006, 2012, 2016).
In a recent study, we analyzed the articulatory properties of the contrast between words like hula and fool-ing in Southern British English (Strycharczuk & Scobbie, 2016). Within monomorphemic words, the vowel is relatively front and the /l/ is clear. In comparison, before a morphological boundary, the vowel is more retracted, and the /l/ is relatively darker. In some cases, this morphologically-conditioned difference may create minimal pairs, e.g., ruler ([ɹʉ:lə] ‘measuring device’) and rul-er ([ɹu:ɫə], ‘leader of a country’). In Strycharczuk & Scobbie (2016), we considered whether the difference between monomorphemic and morphologically complex words in this case can be convincingly analyzed as involving allophonic oppositions, as predicted by the life-cycle model. We argued that this is not the case, since the phonetic difference between the two conditions may involve very subtle articulatory adjustments which are not categorical, and thus not unambiguously allophonic.
In the present paper, we address the question of whether a fuzzy contrast similar to the hula∼fool-ing one also appears in the context of another similar vowel, namely /ʊ/. The vowel /ʊ/ provides an interesting comparison for a number of reasons. There is phonetic similarity between /uː/ and /ʊ/, so we expect them to enter in a similar coarticulatory relationship with the following /l/. This expectation is supported by the findings in Kleber et al. (2011) that F2 is lower in /ʊ/ before a following /l/ than before a following coronal obstruent (wool vs. soot). Another aspect in which /uː/ and /ʊ/ are similar is that both vowels are currently undergoing fronting in SBE (Bauer, 1985; Hawkins & Midgley, 2005; Fabricius, 2007; McDougall & Nolan, 2007; Harrington, 2007; Harrington et al., 2008; Kleber et al., 2011). However, /ʊ/-fronting appears to be a younger change overall (Hawkins & Midgley, 2005; Harrington et al., 2011), which allows us to investigate the hypothesis that fuzzy contrasts affect only phonetically advanced changes. If this is true, as predicted by modular theories, we may expect to find no difference between /ʊl/ in monomorphemic words, such as bully, and morphologically complex words, such as pull-ing. In contrast, if we find an emerging bully∼pull-ing contrast, this would be in line with the predictions made by non-modular approaches, such as the exemplar-based ones, where fuzzy contrasts are expected to be fairly ubiquitous, because paradigmatic relationships may condition subtle effects on the phonetics.
The predictions as specified above may be complicated, depending on how we formulate the relevant phonological generalization underlying the hula∼fool-ing contrast. A categorical modular approach counter-predicts the bully∼pull-ing contrast only if we assume an analysis where the contrasts between hula and fool-ing is due to direct interaction between morphology and rules governing /uː/-fronting before /l/.
An analysis along those lines is proposed by Uffmann (2012), who states that the fronting of high-back vowels in English is blocked before tautosyllabic /l/. This /uː/-as-trigger analysis can distinguish between hula and fool-ing, so long as a version of /uː/-fronting sensitive to the syllable and segmental environment applies early, before a derivationally later process of re-syllabification. This can be captured either through extrinsic rule ordering or by placing the two processes in different levels of grammar which correspond to different morpho-syntactic domains. We sketch out the relevant analysis in (1) to illustrate how two rounds of syllabification interact with the segmental processes.1
- The hula∼fool-ing contrast under the /uː/-as-trigger scenario.
Level Process [PL[WL[SL[hu:lə]]] [PL[WL[SLfu:l][SLIŋ]]] Stem level Initial syllabification hu:.lə fu:l.Iŋ /uː/-fronting/ (blocked by coda /l/) h u:.lə NA Word level re-syllabification NA fu:.lIŋ Phrase level /l/-darkening in codas NA NA /l/-darkening following back [uː] NA fu:.ɫIŋ Surface form [h u:.lə] [fu:.ɫIŋ] relatively front [ u:]clear [l] back [u:] dark [ɫ]
At stem level, /l/ is an onset in hula, so /uː/-fronting applies there, whereas in fool-ing, /l/ is a coda, and so fronting is blocked. At word level, the /l/ in fool-ing re-syllabifies into an onset. This is followed by a phrase level rule of coda /l:/-darkening. Notice that this does not predict /l/-darkening in fool-ing, as /l/-darkening is restricted to phrase level codas. In order to accommodate /l/-darkening in cases like this, an additional assimilation process needs to be posited, one which specifically triggers /l/-darkening following the back [uː] allophone. Finally, note that the /uː/-as-trigger scenario is vowel-specific, and makes no clear predictions about any possible bully∼pull-ing contrast when /ʊ/-fronting is added as a component of the analysis.
An alternative analysis is one where the morphologically-sensitive rule is /l/-darkening, rather than contextualized /uː/-fronting. We shall term this possibility the /l/-as-trigger scenario. In this case, /uː/-fronting is blocked before a coda /l/, or before an /l/ that had been a coda at some stage in the derivation. As schematized in (2), initial syllabification observes morphological boundaries, so /l/ in morphologically complex fool-ing is syllabified into the coda, unlike in monomorphemic hula, where /l/ is in the onset. Due to this difference in syllabification, /l/ in fool-ing undergoes coda /l/-darkening at stem level, whereas /l/ in hula does not. If coda /l/-darkening is analyzed as a stem level process, the difference between more front [ʉ:] in hula and back [uː] in fool-ing does not require a morphologically conditioned analysis of its own. Instead, we can generalize that/uː/-fronting applies in all environments, except before a following dark [ɫ]. This transparent phrase level rule is sufficient to derive the contrast in vowel position between [ʉ:] in hula and [uː] in fool-ing, as shown in (2).
- The hula∼fool-ing contrast under the /l/-as-trigger scenario.
Level Process [PL[WL[SLhu:lə]]] [PL[WL[SLfu:l][SLIŋ]]] Stem level Initial syllabification hu:.lə fu:l.Iŋ coda /l/-darkening NA fu:ɫ.Iŋ Word level re-syllabification NA fu:.ɫIŋ Phrase level /uː/-fronting h u:.lə NA (blocked by following dark [ɫ]) Surface form [h u:.lə] [fu:.ɫIŋ] relatively front [ u:]clear [l] back [uː] dark [ɫ]
In addition to the hula∼fool-ing difference, the analysis in (2) predicts that morpheme-final /l/ will be different from intervocalic /l/ inside a morpheme, regardless of what the preceding vowel is. This alone may condition some degree of contrast between bully and pull-ing, and such contrast may become more robust if we also posit an additional rule that blocks /ʊ/-fronting before dark [ɫ] (e.g., in pull): Such a rule would also block /ʊ/-fronting in pull-ing. These predictions of the /l/-as-trigger scenario receive some support from work by Turton (2014), who finds clearer /l/ in helix compared to a darker one in heal-ing, though only in one of her participants, a young female speaker from Essex (South-East UK). None of the other speakers from other regions in the UK analyzed by Turton (2014) show a clear contrast between helix and heal-ing, including a young male speaker of Received Pronunciation (RP), i.e., the standard accent.2
The two modular analyses in (1) and (2) illustrate that a traditional segmental approach could, in principle, capture either presence or absence of the bully∼pull-ing contrast, while the hula∼fool-ing contrast is already in place. However, one could question the rationale behind setting up either the /uː/-as-trigger or /l/-as-trigger scenario, since each is a segmental approach to a phenomenon which spans multiple segments, and which might be more sensitively approached non-segmentally. If we are to discriminate between the modular approach and the exemplar one, we must consider not only if the bully∼pull-ing contrast occurs, but also how robust this contrast is relative to the hula∼fool-ing one.
The modular approach predicts that the bully∼pull-ing contrast does not occur at all, if the/uː/-as-trigger scenario is correct. Alternatively, if we assume the /l/-as-trigger scenario, it predicts a categorical allophonic opposition between clear [l] in bully and dark [ɫ] in pull-ing. In contrast, the predictions of the non-modular approach are less restrictive, because the model allows for phonetically gradient analogical effects. If we find emergent fuzzy contrasts that are not phonetically robust, this would provide support for non-modular mapping.
The data we present in this paper are ultrasound recordings of pairs such as hula∼fool-ing, containing /uːl/ in different morphological contexts, and pairs like bully∼pull-ing, containing /ʊl/ in different morphological contexts. Here we use extensive automatic image processing of raw ultrasound data, and present a new dynamic analysis able to capture subtle and gradient intra-segmental changes in tongue shape and location throughout the entire vowel plus lateral segmental sequence. This goes beyond our previous findings based on tongue-surface shapes at single segmental measurement points, while confirming that all speakers articulate /uːl/ differently in hula and fool-ing (Strycharczuk & Scobbie, 2016). Then, extending the automated analysis method to ultrasound recordings of pairs such as bully and pull-ing, we find that /ʊl/ sequences may or may not differ as a function of the morphological structure, depending on the speaker. When morphological differences do occur, they tend to be phonetically marginal, and considerably smaller than differences between hula and fool-ing, as pronounced by the same speakers. The existence of such vowel-specific effects and phonetically marginal contrasts is difficult to capture in a strictly modular analysis. We develop this argument in Section 4, although we also consider a different possibility, that intermediate phonetic representations may result from simultaneous activation of multiple phonological forms. This possibility is offered by cascading activation models (Goldrick & Blumstein, 2006; McMillan & Corley, 2010), and it allows, in some cases, to model phonetically gradient lexical effects without abandoning modularity.
2. Materials and method
Our data come from a production experiment with 20 speakers of SBE. We collected ultrasound and audio signal in the experiment, as detailed below.
The experimental stimuli included /l/ preceded by the vowels /uː/ and /ʊ/ in four different conditions: 1) morpheme-internal, e.g., hula, bully; 2) morpheme-final, e.g., fool-ing, pull-ing; 3) word-final pre-vocalic, e.g., fool#it, pull#it; 4) word-final pre-consonantal, e.g., fool#five, pull#five. For the word-final pre-consonantal tokens, the consonant following /l/ was part of the carrier phrase (e.g., Say ‘fool’ five times). In the same experiment, we also included 24 items of /uː/ and /ʊ/ in the context of a following coronal obstruent (e.g., food, foot). These are not analyzed in the current paper, but form a part of another investigation into quantifying the degree of /uː/ and /ʊ/ fronting in SBE. The added number of test items prevented us from also including fillers in the experimental design, as we aimed to restrict the duration of the experiment to ca. 30 mins.
For the purpose of this study, the crucial distinction is that between morpheme-internal (VlV) and morpheme-final (Vl-V) context, whereas the remaining two contexts (word-final pre-vocalic and word-final pre-consonantal) serve as baselines. The word-final pre-consonantal context (Vl#C) is expected to involve relatively greatest /l/-darkening and vowel retraction overall, whereas the word-final pre-vocalic context (Vl#V) shows the full extent of /l/-darkening and preceding vowel retraction, when the /l/ is followed by a vowel. Three different lexical items were used for each combination of vowel and condition. Non-lingual consonants, such as labials or /h/, were preferred preceding the/uːl/ or /ʊl/ sequence. This was done to avoid progressive coarticulatory influences on the vowel. If, due to lexical restrictions, lingual consonants had to be used, they were balanced across the set. Lexical items with yod-insertion before /uː/, such as mule, were avoided. A full list of test items is in Table 1.
|school#in here||full# in here|
Altogether, 23 speakers participated in the experiment. Data from 3 speakers had to be excluded, due to disruptions during the recording, or due to particularly poor quality of the ultrasound image. The 20 speakers whose data we present were 10 older speakers (3 males 49–66, mean = 56, 7 females 45–62, mean = 55) and 10 younger speakers (3 males 21–28, mean = 25, 7 females 20–25, mean = 22). They had all been born and had grown up in the South of England or the English Midlands. They were not aware of the purpose of the experiment. They were paid £10 for participation.
Time-synchronized articulatory and audio data were collected in the experiment. Tongue movement data were captured using a high-speed Sonix RP ultrasound system (frame rate = 121.5 fps, scanlines = 63, pixels per scanline = 412, field of vision = 134.9̊, pixel offset = 51, depth = 80 mm). The ultrasonic probe was positioned under the participant’s chin and stabilized using a headset (Articulate Instruments Ltd, 2008). The audio data were captured using a lavalier Audio-Technica AT803 condenser microphone connected to a synchronization unit (Articulate Instruments Ltd, 2010). The audio data were sampled at 22 kHz. Time synchronization between ultrasound and audio data was controlled by the Articulate Assistant Advanced software version 15 (Articulate Instruments Ltd, 2013).
The stimuli were presented to the participants on a computer screen, one at a time. Altogether, the participants read four repetitions of the experimental material (96 test items). In addition, each participant was recorded swallowing water, in order to image the hard palate, and biting on a piece of plastic (a bite plate) while pushing the tongue up to make contact, in order to image the occlusal plane (Scobbie et al., 2011). We used the images of the hard palate and the occlusal plane in visual exploration of the data, and in our previous work (Strycharczuk & Scobbie, 2016), but not in the analysis reported in Section 2.4.
During the debriefing, we asked the participants whether they believe they pronounce words like ruler (‘measuring device’) and rul-er (‘political leader’) in the same way, and whether bully rhymes with wool-ly in their own pronunciation.3 Overwhelmingly, the participants did not notice any differences in their own pronunciation, even when producing a difference between ruler and rul-er that was audible to the experimenter. Speaker YM1 said that ruler and rul-er were different, but bully and wool-ly were the same, speaker OF6 said ruler and rul-er were the same, but bully and wool-ly were different, and speaker OF3 thought ruler was different from rul-er, and bully was different from wool-ly. We also asked each participant whether they could guess the purpose of the study. One of the speakers (YM2) noticed that a number of words rhymed, and 7 out of 20 speakers realized that we were interested in high-back vowels (typically they commented on the spelling, e.g., vowels spelt with u and oo).
The acoustic data were automatically segmented using the University of Pennsylvania Forced Aligner (FAVE, Rosenfelder et al., 2011). The automatic segmentation was hand-corrected by the first author. For the purpose of our analysis, we were mainly interested in extracting the initial and the final boundary of the /uːl/ or the /ʊl/ sequence. As these sequences were embedded between neighbouring obstruents in the experimental materials, the segmentation was generally robust. The boundary between the vowel and the following /l/, on the other hand, was difficult to determine reliably, which is expected especially when /l/ becomes vocalized (Turk et al., 2006). Since no reliable segmentation strategy could be established to separate the vowel from the /l/, we proceed in our analysis to approach these sequences as a unit. We also note that the vowel was always clearly audible. This is in contrast to what we might find in some dialects of American English, as pointed out to us by a reviewer, where /ʊl/ can be realized as a syllabic /l/.
In the articulatory analysis, we included the parts of the ultrasonic signal corresponding to the acoustic duration of /uːl/ or /ʊl/. We extracted these from the ultrasound recordings and submitted them to a Principal Component Analysis which was carried out using the software suite TRACTUS (Carignan, 2014; Carignan et al., 2016). This method analyzes pixel intensity data in the ultrasound image, and reduces the information to a set of orthogonal principal components (PCs) which account for the greatest amount of variance in the set (Hueber et al., 2007; Mielke & Carignan, 2013; Pouplier & Hoole, 2013; Carignan et al., 2016). Therefore, the PCA allows us to extract quantifiable information from ultrasonic images. However, the numerical information itself, expressed as the PC values, is not immediately phonetically interpretable. We therefore need to use another method to transform the PCs in a way that allows us to express meaningful information.
For each speaker, we extracted a set of PCs corresponding to 80% of the variance. The median number of PCs retained per speaker based on this criterion was 49. The PCs were subsequently used in a Linear Discriminant Analysis (LDA), which was carried out using the MASS package (Venables & Ripley, 2002) in R version 3.1.2 (R Development Core Team, 2005). We trained the classifier to distinguish between the morpheme-internal condition (hula) vs. the word-final pre-consonantal condition (fool#five), hypothesizing that these two conditions represent the environment for the relatively most extreme realizations of the vowel and /l/, where the morpheme-internal condition should show the most vowel fronting and the relatively clearest /l/, while the word-final pre-consonantal condition should show least fronting and most /l/-darkening. Two separate analyses were run for the/uːl/ data, first using the first half of the frames from the/uːl/ sequence, then using the second half. The rationale was the intention to reduce some of the variance in the data associated with the dynamic transition between /uː/ and /l/.4 We expected that the analysis based on the first half of the frames would be more sensitive to the vocalic features crucial for distinguishing hula from fool#five (e.g., tongue root position), whereas the consonantal features (such as tongue tip raising), would become more prominent in the analysis based on the second half. We followed the same procedure for analyzing the /ʊl/ items: We trained a classification algorithm to distinguish monomorphemes from word-final pre-consonant items (bully vs. pull#five). We then used the discrimination algorithm to classify data in all the /ʊl/ contexts, including data from pull-ing and pull#it. Ultrasonic frames from the first and second half of the /ʊl/ sequence were analyzed separately. Separate LDAs were run for each speaker.
We analyzed the LD values assigned by the classifier in order to investigate the morpho-syntactic effects on the LD values. If monomorphemes, like hula and bully, pattern differently from morphologically-complex words like fool-ing and pull-ing, this would indicate the presence of fuzzy contrasts. The data were analyzed dynamically, using Smoothing Spline Analysis of Variance (SS ANOVA, Gu, 2013, 2014; Davidson, 2006). Separate SS-ANOVAs were run for the results of each LDA-based classification.
Figure 1 introduces our time-series of linear discriminant (LD) values plotted with SS-ANOVA. It shows the first half of both vowel+/l/ sequences from speaker YF9. For each vowel in each condition, we report and plot the mean LD value and 95% Bayesian confidence intervals. For both/uːl/ and /ʊl/, we find the highest (and positive) LD values in the word-final pre-consonantal contexts (fool#five and pull#five), i.e., in the contexts for maximal vowel backing and maximal /l/-darkening. Furthermore, the LD values increase with the strength of the morpho-syntactic boundary (morpheme-internal < morpheme-final < word-final pre-vocalic < word-final pre-consonantal). Each category, within each vowel context, was significantly different from all the others. Crucially, the results indicate that there is a fuzzy contrast between hula and fool-ing, and between bully and pull-ing, as evidenced by the large mean difference and non-overlapping confidence intervals. However, the distance between hula and fool-ing is relatively larger than the distance between bully and pull-ing.
The analysis based on the second half of the ultrasonic frames for this speaker (Figure 2) returns a very similar result to the analysis of the first half. The main difference in comparison to Figure 1 is that the the distance between hula and fool-ing is relatively smaller, and the curves for hula, fool-ing and fool#it converge towards the end of the /l/, which is expected, considering that all these contexts include a following vowel, as opposed to fool#five which includes a following labial consonant.
The results for all speakers from the first half of the/uːl/ sequence are plotted in Figure 3. All speakers show a significant difference between hula and fool-ing, although for some speakers, such as OF4 and YF4, the difference is quite small. Furthermore, for all speakers, the difference is in the expected direction, i.e., fool-ing shows higher, more fool#five-like, LD1 values. Comparing all four contexts, most speakers show the same trend, where fool#five has the highest LD1 values, followed by fool#it, and then fool-ing and hula. For OF1 and YM3, the curves for fool#it, and then fool-ing overlap, and YF6 shows partial reversal of the general trend for fool#it, and then fool-ing at the onset of the/uːl/ sequence.
For the bully and pull-ing difference, we find more individual variation. Mean curves for these conditions based on the first half of the /ʊl/ sequence are illustrated in Figure 4. For 8 out of 20 speakers, the mean difference between bully and pull-ing is not significant, although some of those speakers, such as YF6, show a trend in the expected direction. One speaker, YF7, shows a significant difference in the unexpected direction (bully diverges from pull-ing towards pull#five). For 11 out of 20 speakers, we find a significant difference in the expected direction. However, although significant, the relevant differences are typically very small, with confidence intervals neighbouring closely, and partially overlapping in some cases.
As far as results from the second half of vowel + /l/ sequence are concerned, we generally find that they reveal a subset of contrasts compared to the first half. Some speakers show a contrast in the first (vocalic) half, but not in the second (lateral) half, but the reverse is never true. This could mean that the contrast is overall less robust in the second, lateral half of the vowel + /l/ sequence, but we suspect that our analysis is overall less successful at classifying new data based on the second half. This is likely because towards the end, forms like pull#five may differ from forms like bully in many ways: There is no coarticulation with the following vowel in pull#five, and we may also find the reduction/delay of the tongue tip gesture for pull#five (impressionistic analysis of our data confirms that some speakers vocalize the /l/ in the pre-consonantal position). However, if the LDA assigns much weight to such features, it might be less successful at detecting differences between cases like bully and pull-ing, where the /l/ is intervocalic in both cases. Since the results from the second half do not provide any information concerning additional fuzzy contrasts that are not already detected by the first-half data, we do not report them in detail.
Analysis of individual variation illustrated in Figures 3 and 4 provides some insights into apparent time effects in the development of morphology-driven contrasts for the two vowels. All speakers, older and younger, have a contrast between hula and fool-ing, whereas there is variation within both age groups as far as the bully∼pull-ing contrast is concerned. Four out of 10 older speakers show the bully∼pull-ing contrast, as do 8 out of 10 younger speakers. We followed up the individual analysis with an SS-ANOVA carried for each vowel within each age group, in order to ascertain whether mean comparisons across the entire age group also reveal significant differences between the relevant levels, especially between bully and pull-ing. For this analysis, we used the LDA results based on the first half of the vowel + /l/ sequence.
The results of apparent-time comparison for the/uːl/ series are illustrated in Figure 5. Unsurprisingly, both older and younger speakers show a clear contrast between hula and fool-ing, where fool-ing diverges towards fool#five. The contrast between bully and pull-ing, shown in Figure 6, also comes out as significant for both age groups, but the difference is marginal for older speakers.
We recognize that it is not always appropriate to carry out SS-ANOVA comparisons spanning data from different speakers, depending on how much inter-speaker variation there is. An example of a study using such an across-speaker comparison involves dynamic formant measurements by Docherty et al. (2015). In our case, we carried out the comparison, because the values are generally similar across speakers (see Figures 3 and 4 for partial illustration of individual variation). In order to verify further the validity of the apparent-time comparison reported above, we scaled the LD1 values within each speaker and re-ran the apparent-time SS-ANOVAs, using the normalized values. We obtained similar results. Crucially there was a significant difference between hula∼fool-ing and between bully∼pull-ing within each age group.
Although we report results in normalized time, the reader should bear in mind that there are duration differences between the different conditions. Mainly, the word-final pre-consonantal context (fool#five and pull#five) typically involve increased duration compared to the remaining three contexts (see Table 2). The duration of the vowel+lateral phase in the two key contexts for us are comparable (i.e., the monomorpheme vs. morpheme-final condition, in hula vs. fool-ing and bully vs. pull-ing).
The question guiding our data analysis concerned the differences in vowel and /l/ articulation between monomorphemic and morphologically complex words. The results show a very clear difference between these two morpho-syntactic conditions for words containing /uːl/ sequences, viz. for all speakers, /uːl/ is realized differently in hula than in fool-ing. For the words with /ʊl/ sequences, we find variation. Eleven out of 20 speakers show a significant difference between bully and pull-ing in the expected direction (pull-ing being more similar to pull relative to the monomorphemic bully). For all the speakers who show an effect, the size is appreciably smaller than in the case of hula∼fool-ing difference. This observation is somewhat informal, since separate analyses were run on the items containing /uː/ and /ʊ/ vowels, and therefore the relevant values are not on the same scale, but are interpreted in terms of the relative difference between the extreme forms input to the linear discriminant analyses for each vowel. Nevertheless, the difference in the size of morphology-driven contrast between the two vowels is very robust: For the /uː/ vowel, we typically find that the normalized time PC1 curves representing hula and fool-ing are at a considerable distance from each other (Figure 3). The curves representing bully and pull-ing, on the other hand, typically have very similar means with closely neighbouring or overlapping confidence intervals (Figure 4).
As far as the temporal dimension is concerned, the difference between monomorphemes and morphologically-complex words, if present, is consistently found already at the vowel onset, as shown in Strycharczuk & Scobbie (2016) for /uː/. For some speakers, like OF5 or YF6, the hula∼fool-ing difference is greatest at the vowel onset, slowly converging towards the middle of the vowel + /l/ sequence.
In Section 1, we noted that comparing the blocking of/uː/- and /ʊ/-fronting only makes sense for a study of how fuzzy contrasts interact with sound changes at different stages of their development if we are confident that morphological constraints affect the vowels directly. An alternative is that contextual blocking of vowel fronting itself is not sensitive to morphology, but rather it is conditioned by an intermediate process of /l/-darkening. This possibility follows from a minimally redundant analysis, where only /l/-allophony is directly conditioned by the morphological structure. The allophony is encoded phonologically and acts as a trigger for other processes. Specifically, it blocks /uː/-fronting before dark /l/. We then hypothesized that the presence of a fuzzy contrast affecting /l/ in the context of other vowels could suggest that /l/ is the primary trigger. However, whilst we find that a fuzzy contrast may affect /ʊl/ for some speakers, there is no clear evidence that categorical allophony is involved. For the bully and pull-ing case we find variation, both categorical and gradient. Some speakers (7 out of 20) show a morphological effect for /uːl/, but not for /ʊl/, and most speakers (11) have a morphological effect in both cases, but the size of the effect is much larger for /uːl/.
The vowel-conditioned difference in effect sizes is crucial to consider in the context of our question of whether morphological differences only affect /l/-darkening, or whether such differences are vowel-specific. The former hypothesis (/l/-as-trigger scenario) does not necessarily predict that hula∼fool-ing and bully∼pull-ing contrasts should differ so much in size. Instead, the striking advancement of the hula∼fool-ing contrast seems more consistent with a vowel-specific phonological rule. We believe that the co-existence of two types of contrasts we observe, phonetically elusive bully∼pull-ing contrast and phonetically robust hula∼fool-ing contrast, is best modelled in a hybrid exemplar approach, as developed by Pierrehumbert (2002, 2006, 2012, 2016).5 On the one hand, this model contains phonetically-rich lexical representations influenced by relationships between related words, such as members of lexical neighbourhoods, or paradigmatically related words. Such analogical relationships may be responsible for small phonetic changes in morphologically-complex words, such as pull-ing. On the other hand, the model also has the scope to model categorical effects which are a product of emergent generalizations that percolate directly between phonetic and morphological structure. When phonetics and morphology are able to see each other, in a way not necessarily directly mediated by categorical phonological representations, we open the way for morphosyntax-phonetics interactions to cause changes that can become phonologized. We propose this is what happened for the hula∼fool-ing contrasts: Initial analogical effects have been re-interpreted by speakers in terms of more abstract vowel-specific generalizations, whether subconsciously in stored and planned aspects of speech production, or consciously as reflected in meta-phonological awareness, or both.
A major strength of the exemplar approach is the capacity to model extremely small effect sizes, such as the differences we observe between bully and pull-ing. Roettger et al. (2014) make a case for this, looking at near-neutralization of voicing in German. Several studies find small, but systematic differences in ostensibly neutralized word-final stops in German (Port et al., 1981; Port & Crawford, 1989; Charles-Luce, 1985; Kleber et al., 2010). Similar observations concerning near-neutralization have been made for voicing in Catalan (Dinnsen & Charles-Luce, 1984) and Russian (Kharlamov, 2014). Roettger et al. revisit this phenomenon in German, paying attention to potential methodological confounds, and confirm the presence of a small, but nevertheless significant near-neutralization effect. In their analysis, Roettger et al. argue that such small differences can be accounted for in a model where paradigmatically related forms are co-activated in speech production (Collins & Loftus, 1975; Ernestus & Baayen, 2006). Importantly, since this is the feature of a production mechanism, speakers may potentially articulate differences that they cannot reliably perceive. A similar explanation can be made for the hitherto reported instances of morphology-phonetics interactions that involve extremely subtle differences that seem to be below the level of consciousness (Cho & Keating, 2001; Sugahara & Turk, 2009; Song et al., 2013; Plag et al., 2015). Speakers may be producing such small differences due to co-activation of related lexical forms.
In contrast, modelling subtle phonetic distinctions is more challenging in strongly abstractionist models which require formal phonological units like segments and features to mediate between morphology and phonetics. Consider the case of speaker YM1, who shows a relatively large difference between hula and fool-ing, but a vanishingly small one between bully and pull-ing. Let us now assume that this speaker has a categorical rule of /l/-darkening which applies morpheme-finally, i.e., in fool-ing and in pull-ing. The vowel retraction in those words can then be attributed to coarticulation, where a certain degree of lingual retraction is anticipated in the vowel. In monomorphemes, like hula and bully, the /l/-darkening rule does not apply, because the structural criteria are not met, and the vowel fronting is not limited. In such a case, monomorphemes and morphologically-complex words would be analyzed as containing categorically different /l/-allophones. However, the corresponding phonetic difference between monomorphemes like bully and complex words like pull-ing is not categorical in the sense of being clearly phonetically distinct. The same problem transpires if we assume that this speaker has separate phonological vowel-specific rules, one for/uː/ followed by /l/, and one for /ʊ/ followed by /l/. Whether we attribute the bully∼pull-ing contrast to rules controlling /ʊ/-allophony, or /l/-allophony, the problem remains that phonetically it is not clear that there are two allophones. Not all phonologists would assign equal weight to this criticism, as some may deny that allophones can be defined using phonetic criteria (see for instance Fruehwald, 2013, ch.4 for discussion on this issue). However, in the absence of independent phonetic criteria, we are left with simply assuming allophony as the phonological analysis demands it, without much independent motivation.
If we do accept that allophony is partially diagnosed by phonetic criteria, we have a case here where a phonetically subtle contrast is apparently sensitive to morphological boundaries, contra the modular prediction. This is similar to findings from other studies of morphology-phonetics interactions (see above), as well as to studies looking at how phonetics interacts with the lexicon. The line of research on phonetics-lexicon interactions is important to acknowledge, as it potentially offers a way of reconciling apparently non-modular effects with a modular analysis. We know that lexical factors, such as neighbourhood size, frequency, or lexical predictability, influence continuous phonetic dimensions, such as for instance VOT, segmental duration, or degree of coarticulation (Munson & Solomon, 2004; Scarborough, 2004; Baese-Berk & Goldrick, 2009; Arnon & Cohen Priva, 2013; Cohen Priva, 2015). These findings have not led to a unanimous rejection of modular processing, as some psycholinguistic models have the capacity to capture such gradient phonetic effects using simultaneous activation of multiple categories. In particular, cascading models propose that competition at one stage of processing may activate multiple representations at the subsequent stages. For instance, simultaneous activation of two competing phonological representations may give rise to intermediate phonetic realization, as seen for instance in speech errors (Goldrick & Blumstein, 2006; McMillan & Corley, 2010). Baese-Berk & Goldrick (2009) and Peramunage et al. (2011) extend this idea to situations of lexical competition which might simultaneously activate different phonological categories, resulting in phonetic gradience.
The cascading proposal could perhaps be applied to account for morphological effects. Suppose that the production of morphologically-complex forms such as fool-ing or pull-ing leads to simultaneous activation of two allophones: A clear [l] (due to intervocalic position), and a dark [ɫ] (due to morphological constraints or analogy to a related word like fool or pull). As both categories are activated simultaneously, both of them influence the resulting /l/, which is a phonetic blend of a dark and a clear /l/. Although fundamentally modular, this approach also takes a more gradient view of phonological categories, captured through simultaneous activation, which produces phonetic gradience down the line. Note that the proposal crucially relies on the idea that there are two allophones of /l/ in the system (clear and dark) to begin with. No morphological effect would be expected where the relevant opposition is not a priori allophonic and categorical, encoded as two different structures in the activated lexicon. This prediction is somewhat similar to that of the life cycle model (see Section 1), which proposes that fuzzy contrasts always involve categorical phonological processes. However, whilst a strictly categorical modular approach would predict that fuzzy contrasts themselves involve allophonic differences (e.g., /l/ in bully and pull-ing are categorically distinct allophones), for the cascading approach, it is sufficient if there are distinct allophones in bully and pull. The presence of allophony in this case is relatively uncontroversial, since there are robust phonetic differences between /l/ in the two cases. In contrast, smaller or less robust phonetic differences reported by other studies, for instance the duration difference between morphological marker s and lexical s could potentially be more challenging for a modular cascading account. Such differences have been reported by Song et al. (2013) and Plag et al. (2015), although note that these two studies found effects in opposite directions, so further empirical work in this area is necessary.
Finally, consider the speakers who show a contrast between hula and fool-ing, but no contrast between bully and pull-ing. One possibility to account for the absence of a bully ∼ pull-ing contrast would be to posit different phonologies (in terms of the categorical representations and/or the constraint or rule set used) for speakers who do and who do not show this contrast. Following the distinctions set out in Section 1, we could say that some speakers have generalized the hula∼fool-ing contrast, according to the/uː/-as-trigger scenario, which excludes a contrast in bully∼pull-ing, whereas speakers with both contrasts generalize them according to the /l/-as-trigger scenario. However, such an analysis would class the absence of a bully∼pull-ing contrast as fundamentally different from cases where this contrast is present, but is very small. The latter situation dominates in our data, and we could analyze the absence of a bully∼pull-ing contrast as part of a continuous phenomenon: Since the contrasts we find can vary in size, they can also be apparently absent/non-detectable. In an exemplar model, we might capture this through assuming that the strength of lexical relationships triggers considerably more vowel retraction and /l/ darkening in /uːl/ cases, but less so in /ʊl/ cases. In a modular-cascading approach, a similar solution is available if we propose different strength of activation for the dark /l/: Stronger activation will trigger relatively more darkening, where the strength of activation is vowel-specific or word-specific. This model permits gradient variation in the extent to which such differences can and should be said to be phonologized. We must, however, acknowledge that this is merely a sketch of how the relevant differences could be captured, and does not tackle key issues such as predicting when continuous phonetic distributions are transformed into categorical ones. It remains a fundamental challenge for phonologists to account for the emergence of clear phonological identities (e.g., features) for categories in the subconscious mind or conscious opinion of the native speaker.
Another challenging issue for future enquiry is why the hula∼fool-ing contrast increases in sound change to the extent that it does, whereas the bully∼pull-ing contrast remains, at least for the time being, small. One possibility is that the increase is related to the phonetic distance between hula and fool#five. This distance may be larger than the distance between bully and pull#five; if /uː/-fronting is an older change that has progressed further compared to /ʊ/-fronting, such a difference is expected. Phonetic distance alone, however, would predict that each and every fuzzy contrast develops in step with the phonetic advancement of the underlying phonetic change. This is unlikely, considering that reports of perceptually salient fuzzy contrasts are relatively scarce. Thus, the analogy-based accounts of fuzzy contrasts need to address a version of the same question which the life-cycle theory would phrase as: “Why do fuzzy contrasts emerge?” In a gradient approach, this becomes a question of size (“Why do fuzzy contrasts increase?”), but even under this revision, the actuation problem still stands.
In this paper, we presented new articulatory data on a recently reported ‘fuzzy contrast’ between /uː/ sequences in monomorphemic words like hula and morphologically complex words like fool-ing. We asked whether this contrast is special in the sense of being limited to the words containing the /uː/ vowel. We also considered whether fuzzy contrasts are special in general, or whether they are just an instantiation of ubiquitous effects of analogy between related words. We find that the hula∼fool-ing contrast is not unique, in the sense that morphologically conditioned contrasts are also found for /l/ preceded by other vowels, but it is special in the sense of size: It is much more robust that a contrast between bully and pull-ing. This scenario is explicitly predicted by models where both analogy and phonological abstraction play a role, but it is challenging to strongly abstractionist accounts.