1 Introduction

1.1 /s/-aspiration in Spanish

The weakening of syllable-final fricatives is a process that is common to many languages (see Solé, 2010, for an overview with a focus on the Romance languages). In Spanish, the weakening of syllable-final /s/—usually referred to as /s/-aspiration—occurs in many varieties and has been the object of numerous studies within the field of Hispanic linguistics. From a dialectological point of view, /s/-aspiration has been of interest since it allows an areal division of Spanish varieties (for European Spanish, see Samper Padilla, 2011; for American Spanish, see Canfield, 1981, or Lipski, 1994). The realization of syllable-final /s/ has also attracted a long-standing interest from the variationist community and has been shown to vary with speech style and with the social variables of the speaker (e.g., Fontanella de Weinberg, 1973, for Buenos Aires; Cedergren, 1978, for Panama City; López Morales, 1983, for San Juan de Puerto Rico; Samper Padilla, 1990, for Las Palmas de Gran Canaria; and Momcilovic, 2009, for Madrid). Syllable-final /s/ in Spanish has been studied within the theoretical framework of the functionalist hypothesis (e.g., Hernández-Campoy & Trudgill, 2002; Ma & Herasimchuk, 1971; Poplack, 1981; Terrell, 1979) since it functions as a plural marker for nouns, and in the context of resyllabification (e.g., Harris, 1983; Hualde, 1991) as a loss of syllable-final /s/ goes along with a change in syllable structure from CVC to CV.

In Spanish, the above-mentioned process of lenition is not limited to /s/ in syllable-final position, but also affects syllable-final /θ/ (e.g., capaz [kaˈpah]), and it may additionally affect syllable-initial /s/ as well as /θ/ in word-medial and word-initial position (Brown & Torres Cacoullos, 2002; Jiménez Sabater, 1975). In the latter position, however, it occurs less frequently and is often sociolinguistically stigmatized (Narbona et al., 2003, p. 205). Several factors have been shown to favour a weakening of syllable-final /s/ in Spanish, such as its occurrence in unstressed syllables (e.g., Alba, 1990; File-Muriel & Brown, 2011), a fast speech rate (File-Muriel & Brown, 2011), high lexical frequency (e.g., Brown, 2009), and word-medial position preceding consonants (e.g., Alba, 1990; Momcilovic, 2009; Samper Padilla, 2011).

As the term indicates, /s/-aspiration in Spanish generally results in aspiration,1 i.e., a glottal fricative (e.g., pasta [ˈpahta]). However, early dialectological work (Alther, 1935; Alvar, 1955) reports on the high variability of syllable-final /s/ when it comes to its phonetic realization: many different phonetic variants are possible and effects often spread to adjacent segments. /s/-lenition may, for instance, be associated with the lengthening and/or lowering of the preceding vowel (Martínez Melgar, 1994, for Eastern Andalusian Spanish; Narbona et al., 2003) and/or with lengthening of the subsequent consonant (see Alvar, 1955, for Andalusian, and Ruiz Hernández & Miyares Bermúdez, 1984, for Cuban Spanish). This variability is reflected in the various transcriptions in dialectological and some variationist studies as, for instance, [∅], [s], [h], [hs].

/s/-aspiration in Spanish is usually considered a linguistic change that is stable (Gimeno, 2008; Labov, 1994). Labov (1994, pp. 583–585) describes the variable realization of syllable-final /s/ in varieties of Spanish as the result of a historical sound change, namely the “massive weakening and deletion of final /s/”. A closer look into the synchronic variation and the differences in the production of syllable-final /s/ among several Spanish dialects therefore could enhance our understanding of the historical process of /s/-weakening itself. In order to be stable across time, the variable rule (i.e., the probability of /s/ being realized as [h] or [∅]) must be acquired by children and be transmitted from one generation to the following (Labov, 1994, pp. 583–584).

Despite its variability, the phenomenon of /s/-lenition has usually been analyzed auditorily and categorized as either full [s], lenited /s/ [h], or elision [∅] to trace the process of lenition from an alveolar fricative to its loss, [s] → [h] → [∅]. Although this method gives an idea of the distribution of variants according to the independent variables (e.g., age, gender, register, or phonological context), it gives the impression that /s/ varies categorically, and it does not account for compensatory processes of /s/-lenition such as a lengthening of the preceding vowel or the following consonant. The method routinely used in variationist studies might have contributed to the view of /s/-weakening as a case of stable variation since it does not account for more fine-grained phonetic variation and change such as the above-mentioned lengthening of adjacent segments. Only a few studies have quantified /s/-lenition in Spanish with gradual, acoustic parameters such as duration, centre of gravity, or voicing. File-Muriel and Brown (2011) showed for Caleño Spanish (Colombia) how phonological context, speaking rate, and lexical stress variably affected the gradual acoustic parameters of /s/, namely its durational, spectral, and voicing properties, suggesting that /s/-lenition should more appropriately be described as a gradient than a categorical phenomenon (see also Erker, 2010, and Torreira & Ernestus, 2012, for similar approaches).

The current study represents a counter-example to the commonly assumed stable variation of /s/-aspiration in Spanish. It will show how syllable-final /s/ preceding voiceless stops is in Andalusian Spanish variably realized as aspiration preceding the stop closure (e.g., pasta [ˈpahta]), following the stop closure (pasta [ˈpatha]), or both ([ˈpahtha]), and how its realisation changes in apparent-time, giving rise to a new sound in this variety, a post-aspirated stop. Consistent with previous studies (O’Neill, 2010; Parrell, 2012; Ruch & Harrington, 2014; Torreira, 2007a, 2007b), the study will refer to aspiration that occurs preceding the oral stop closure as pre-aspiration, and aspiration occurring after the stop closure as post-aspiration. In this paper, pre- and post-aspiration will be used as phonetic, not phonological, terms. It should be noted that from a phonological point of view the type of pre- and post-aspiration in Andalusian Spanish is different from that which appears in languages such as Icelandic or Scottish Gaelic that have it segmentally: in Andalusian Spanish, pre- and post-aspiration result from the debuccalisation of /s/ preceding voiceless stops, and (pre-)aspiration is not limited to this position but can occur in all contexts where syllable-final /s/ is debuccalized (e.g., isla ‘island’ [‘ihla], mismo ‘the same’ [‘mihmo], etc.).

1.2 /s/ + voiceless stop sequences in Andalusian Spanish

In Andalusian Spanish — spoken in the southern part of Spain — syllable-final /s/ is generally weakened even in formal speech situations (Carbonero Cano, 1982; Villena Ponsoda, 2008). /s/-lenition can therefore be considered as a completed sound change in this variety. However, the realisation of lenited /s/ is itself highly variable synchronically. The phonological sequences /sp, st, sk/ are of particular interest since they have recently been found to be produced with aspiration preceding the stop closure (henceforth pre-aspiration; Gerfen, 2002; O’Neill, 2010), with aspiration subsequent to the stop closure (henceforth post-aspiration; Parrell, 2012; Torreira, 2007a, 2012), and with a long stop closure (O’Neill, 2010; Torreira, 2007a, 2012).

Gerfen (2002) analyzed minimal pairs of bisyllabic words with V(s)CV, where C was a voiceless stop (e.g., casta /ˈkasta/, realized as [ˈkahta] vs. cata /ˈkata/) produced by 10 younger speakers of Eastern Andalusian Spanish (EAS). He found that /sC/- and /C/-sequences differed not only in the presence of aspiration preceding the stop closure, but also in terms of the relationship between stop closure duration and vowel duration, the latter being significantly greater in /sC/- sequences than in intervocalic stops. Torreira (2007a) investigated /s/ + voiceless stop sequences in younger speakers of Western Andalusian Spanish (WAS) and discovered that in this variety, words such as lista were produced with aspiration following a long stop closure (i.e., [ˈlitːha]). He further observed that the stop closure was longer in /s/ + voiceless stop sequences than in intervocalic stops, and longer than in /s/ + voiceless stop sequences in two other Spanish varieties with /s/-weakening, namely Buenos Aires and Puerto Rican Spanish (Torreira, 2007a). A precise comparison of the production of /s/ + voiceless stop sequences among four Andalusian varieties was carried out by O’Neill (2010). Systematic differences between the two Western (Cádiz, Seville) and the two Eastern varieties (Almería, Granada) are reported for the duration of the stop closure and the aspiration following the stop closure. Speakers from Cádiz and Seville were found to produce a shorter stop closure than speakers from Almería and Granada, and a longer voice onset time (VOT). This regional variation is interpreted as the result of a sound change in progress with the possible result that post-aspirated stops [ph, th, kh] are already phonologized in Western Andalusian varieties (O’Neill, 2010, p. 39).

The hypothesis of a sound change in progress was explicitly addressed in Ruch and Harrington (2014). In an apparent-time study with 24 speakers of an Eastern (Granada) and 24 speakers of a Western (Seville) variety, they provided evidence for a sound change in progress from pre- to post-aspiration not only in Western, but also in Eastern Andalusian /st/-sequences: younger speakers produced words with medial /st/ (e.g., estanco ‘kiosk’ /esˈtanko/) with a long post-aspiration and a short or even absent pre-aspiration ([eˈthaŋko]), while older speakers showed a short post-aspiration and a longer pre-aspiration ([eˈhthaŋko]), these age-dependent differences being more marked among Seville than among Granada speakers. For younger Sevillian speakers, the long post-aspiration was additionally associated with a shorter closure duration, although the latter was still longer in /st/-sequences than in intervocalic stops (e.g., etapa ‘stage’ /eˈtapa/). A first aim of this paper is to test if the sound change from pre- to post-aspiration also affects /sp/- and /sk/-sequences.

Different scenarios have been proposed to explain the change from pre- to post-aspiration. For example, Parrell (2012) formulated an articulatory model based on acoustic data elicited in an experiment with manipulation of speech rate. The hypothesis to be tested was that a faster speech rate would be associated with a longer VOT as the result of a reorganization of the articulatory gestures from anti-phase (in pastándola, ‘grazing it’, [pahˈtandola]) to in-phase ([paˈthandola]). Acoustic data from 20 younger Seville speakers confirmed the hypothesis of longer pre-aspiration in slow speech and longer post-aspiration in fast speech. At the same time, the results demonstrated that the phenomenon of pre- and post-aspiration in Andalusian Spanish might be of a more complex nature, since some speakers produced long post-aspiration across all speech tempi, and the oral stop closure was more variable than predicted by Parrell’s (2012) model.

Another model is proposed by Torreira (2012). Here, the influence of lexical stress and speech rate on the production of /sp, st, sk/ in Western Andalusian Spanish was investigated. He tested the idea made explicit in Kessinger and Blumstein (1997) that, if post-aspiration is an acoustic cue to /sp, st, sk/ in Western Andalusian Spanish, it should be longer in stressed than in unstressed syllables, and longer in slow than in fast speech (i.e., in hyper-articulated words). Since no systematic effects were found in the data of three speakers from Cádiz, the author concluded that post-aspiration in this variety was the result of articulatory overlap rather than intended by the speaker, and that post-aspiration is not a robust cue to /sp, st, sk/ sequences in Western Andalusian Spanish. Torreira (2012) points to the long oral closure that was consistently produced by WAS speakers, which he suggests is a compensatory lengthening for the realignment of the oral closing gesture with respect to the glottal opening (Torreira, 2012, p. 61). He points to other reports of long consonants after a lenited /s/ in Western Andalusian Spanish (e.g., isla [ˈihlːa]), and suggests that overlap between a long oral gesture and the glottal gesture may be the more general production pattern for /s/ + consonants in this variety.

Yet another model that is based on acoustic and perceptual data is proposed by Ruch and Harrington (2014). Their apparent-time data of an Eastern and a Western Andalusian variety indicated that (1) pre-aspiration fades gradually over time, (2) post-aspiration increases gradually, and (3) intervocalic /t/ and /st/ sequences are distinguished systematically by closure duration and voice termination time (VTT), and also by voice onset time (VOT) for all but older Eastern Andalusian speakers. Correlation analyses within /st/-tokens indicated that, within a speaker, pre-aspiration shortening and post-aspiration lengthening are not necessarily interrelated with each other, but instead, younger WAS speakers showed a trading relationship between closure duration and VOT. Based on these findings, the authors hypothesized that closure lengthening might have arisen prior to the emergence of post-aspiration, and that, “if the sound change of post-aspiration were to lengthen further (…), post-aspiration on its own may eventually become the primary cue for distinguishing intervocalic /st/ from intervocalic /t/ in Andalusian Spanish” (Ruch & Harrington, 2014, p. 24). This interpretation contrasts with Torreira’s (2012) view of post-aspiration not as an intended cue to /sp, st, sk/ by the speaker, but as a result of articulatory overlap. A perception experiment in Argentinian Spanish (a variety of Spanish in which syllable-final /s/ is also weakened; see Aleza Izquierdo & Enguita Utrilla, 2002) showed that, even in a non-post-aspirating variety, post-aspiration is parsed with the underlying phonological /st/-sequence and serves as a perceptual cue. Based on these findings, the authors argue that an articulatory model may not entirely explain the change from pre- to post-aspiration, but instead perceptual factors may also be involved in the sound change.

The studies by Parrell (2012) and Ruch and Harrington (2014) are based on isolated words containing /st/-sequences. Torreira (2007b, 2012) and O’Neill (2010) also considered words containing /sp/- and /sk/-sequences, but did not specifically analyze the effect of place of articulation. From previous work on the realization of syllable-final /s/ in Spanish varieties, it is evident that the aspiration resulting from /s/-lenition varies according to the phonological context. In an acoustic study on Canarian Spanish, Marrero (1990) found weakened /s/ to be realized as breathy voice preceding labial and dental stops, and as voiceless aspiration preceding velar stops. Sánchez-Muñoz (2004) found /sk/-sequences in a Castilian variety to be realized more often with aspiration than /sp/- or /st/-sequences, the latter being produced more frequently without aspiration. Similar observations are reported in Alther (1935, p. 121) who describes that /s/ preceding /p/ is reduced in the vast majority of cases, while /s/ preceding /t/ can be realized as [h], as a velar fricative, or may be assimilated to the subsequent stop.

Although the phonemic status of aspiration in Spanish is different from pre-aspiration in languages that have phonemic pre-aspirated stops (see Section 1.1), a view on the phonetics of pre-aspiration in such languages could also be enlightening in the case of Andalusian Spanish, especially when it comes to phonetic and auditory principles. Helgason (2002, p. 41) argues that “two phonetically similar sound sequences that differ only in terms of phonological interpretation should not respond in different ways to the same auditory constraint”. In the next section the paper will offer a brief review of the literature on the phonetics of pre-aspiration in languages that have it phonologically such as Scottish Gaelic and Icelandic, or as phonetic variants of, for instance, geminates, such as Italian or Swedish.

1.3 Place of articulation and pre-aspiration in other languages

Clayton (2010, p. 165) reports pre-aspiration in Scottish Gaelic and Icelandic to be longest preceding velar stops and shortest preceding bilabial stops, confirming Ní Chasaide’s (1985) findings for pre-aspiration in Irish, Scottish Gaelic, and Icelandic. Clayton (2010, p. 192) assumes this variation to be conditioned by physiological or articulatory factors rather than being language-specific. His comparison between different phoneme inventories of various Scottish Gaelic varieties (Clayton, 2010, pp. 129–130) further shows de-aspiration to occur first among bilabial stops, and only later among alveolar stops. Buccalization of pre-aspiration, on the other hand, is more frequent in the velar than in the alveolar and bilabial context.

Turning now to dialects that have pre-aspiration as a phonetic variant, similar patterns have been observed: a phonetic study on emerging pre-aspiration in Standard Swedish (Helgason & Ringen, 2008) reports pre-aspiration to be longer in the velar and dental than in the bilabial context. Stevens and Hajek (2004) found emerging pre-aspiration in Italian dialects more often preceding velar than dental geminates. This pattern may be related to articulatory constraints, as suggested by Ní Chasaide and Ó Dochartaigh (1984, p. 153) and Helgason and Ringen (2008). Helgason and Ringen (2008, pp. 623–624) argue, based on Hardcastle (1973), that the lower velocity of the tongue body movement for velars compared to coronals and bilabials might lead to a longer transition phase (i.e., pre-aspiration) between the preceding vowel and the silence due to the oral closure.

A second aim of this study is to systematically analyze the influence of place of articulation on the production of /s/ + voiceless stop sequences among older and younger speakers in two different varieties. The idea is to compare synchronic variation in the production of /sp, st, sk/ — as inferred from variation within the group of older EAS speakers — with diachronic change, which will be analyzed in terms of apparent-time comparisons between older and younger speakers in both varieties. By doing so, the paper hopes to shed light on the role of articulatory and perceptual factors in sound change actuation. The so-called actuation problem (Weinreich et al., 1968) addresses the questions of why a change takes place in a particular language at a given time, and of what factors contribute to this change (Weinreich et al., 1968, p. 102).

1.4 Sound change actuation

When relating specifically to sound change, the actuation problem has been re-defined in different ways (see Stevens & Harrington, 2014, for an overview). While some authors distinguish between sound change initiation and spread, others draw the line between variation and selection of variants on the one hand and spread of change on the other (Stevens & Harrington, 2014, p. 4). In the present paper, the term sound change actuation is used when referring to the critical moment where the listener becomes speaker, or, in other words, sound change actuation is defined as the transition from initiation to spread: Why would a listener adopt a particular new pronunciation form and use it in his own speech?

Directly related to this issue is the role of articulatory as opposed to perceptual factors in sound change: Is sound change mainly speaker-driven, or is it primarily listener-driven? Ohala’s (1993) model proposes that sound change arises from the misperception of the acoustic signal by the listener, that is, when the listener erroneously interprets the acoustic effects of coarticulation as a new sound instead of attributing it to its articulatory source. Other models, such as Articulatory Phonology (Browman & Goldstein, 1991, 1992), suggest that several types of sound change result from the variability that normally occurs in speech, namely reductions in the magnitude of articulatory gestures as well as increase in gestural overlap. When the listener fails to “correctly identify which of two overlapping gestures is the source of some aspect of the acoustic signal” Browman & Goldstein, 1991, p. 313, then sound change can arise: the listener when becoming the speaker will use a different gestural configuration for the same sound. Common to these and other views is the idea that sound change arises from synchronic variation in spoken language. While Ohala’s (1993) model focuses more on the perceptual consequences of variation in articulation, Browman and Goldstein’s (1991, 1992) main focus lies on the articulatory sources of variation.

How strong must the coarticulatory effects be in order to be perceivable and to be misinterpreted by the listener, and how strong may they be in order to be taken up by the listener? Why should a listener take up a pronunciation form that deviates from what he usually hears? Baker et al. (2011) suggest that sound change arises from the individual differences in coarticulation among speakers, arguing that “if every speaker produces essentially the same amount of coarticulation, then there is nothing to imitate in the first place” (Baker et al., 2011, p. 350). Only if the phonetic effect of coarticulation is large enough and perceptible can the listener interpret it as a different production target, and only then can it be imitated. In contrast, from an exemplar-theory (Pierrehumbert, 2001) point of view, we would expect phonetically similar productions to be imitated, and phonetically deviant productions not to be stored in the set of exemplars used for speech production (Garrett & Johnson, 2013, pp. 86–87). Slightly different, but still similar variants, on the other hand, are stored in memory and can gradually shift a cloud of exemplars in one direction and would therefore predict sound change to be accumulative.

In what concerns the emergence of post-aspirated stops in Andalusian Spanish, the role of both articulatory (Parrell, 2012; Torreira 2012) and perceptual factors (Ruch & Harrington, 2014) has been discussed and tested, with different and, to some extent, conflicting results (see Section 1.2). In the present study this issue is readdressed by systematically investigating the influence of the place of articulation in two varieties and two apparent-time stages of the sound change. By doing so, a comparison can be made of phonetic variation in the production of /sp, st, sk/ in groups of speakers that have to a different degree undergone the sound change from pre-to post-aspiration. The paper will acoustically compare /sp, st, sk/-sequences produced by older EAS speakers — who have not yet undergone the sound change — with those of younger WAS speakers, who appear to be the most advanced in the sound change, and two intermediate groups, older WAS and younger EAS speakers (Ruch & Harrington, 2014). The influence of stop type on the duration of pre- and post-aspiration will also be analyzed. If for VOT the typical pattern as predicted by articulatory and aerodynamic factors among all four speaker groups is found, that is, the greatest VOT for the velar, the shortest for the bilabial context (Cho & Ladefoged, 1999), then the effect of stop type on VOT will be ascribed to universal phonetic principles (Maddieson, 1997). If a VOT-pattern is found that differs from the predicted one, or diverging patterns between younger and older speakers, perceptual factors are more likely to be involved in the emergence of post-aspiration. This analysis will further shed light on the relationship between fading of pre-aspiration and emergence of post-aspiration, the contexts where the sound change might have started, and how it is generalized within the phonological system of one speaker.

Based on the literature on Andalusian Spanish and on findings for languages with phonological pre-aspiration, it is expected that pre-aspiration in Andalusian Spanish /s/ + voiceless stop sequences will be longer in velar than in dental stops, and shortest preceding bilabial stops, and VOT is expected to be longest in /sk/, and to be shortest in /sp/. At the same time, it is expected that pre-aspiration will fade faster in the bilabial than in the velar context, a hypothesis that will be tested by combining the factors age and stop type. Younger and older speakers are hypothesized to differ in pre-aspiration more clearly among bilabial than among velar stops. Concerning words with intervocalic stops, the typical VOT pattern (Cho & Ladefoged, 1999) is expected to occur, with velars showing the longest and bilabials showing the shortest VOT, and no difference among the age groups or varieties.

A third issue to be addressed in this paper is the perception of post-aspirated stops. Conflicting claims have been made about the possible phonologization of post-aspirated stops in Andalusian Spanish based on acoustic data. As discussed in Section 1.2, Torreira (2012) concluded that post-aspirated stops in Western Andalusian Spanish are the result of articulatory overlap, and that no series of phonologized post-aspirated stops exists in this variety. Parrell (2012) observed that some speakers produced a long VOT across all speech rates, suggesting that this came about because post-aspirated stops might be phonologized to a certain degree in Western Andalusian Spanish.

If listeners of Andalusian Spanish are able to distinguish a minimal pair /pasta/-/pata/ based only on the presence or absence of post-aspiration, it can be concluded that post-aspiration is interpreted as a cue to /sp, st, sk/ on its own, and that the originally phonetic effect of a long post-aspiration is interpreted as a distinct phonetic target. Following Hyman (1976, 2013) phonologization is defined as the exaggeration of a phonetic effect “beyond what can be considered universal” (Hyman, 2013, p. 6). The use of such an exaggerated phonetic effect in the perception of a phonological contrast—in the absence of pre-aspiration or [s]—would then indicate that post-aspiration in Andalusian Spanish is to a certain degree phonologized (see Baker et al., 2011; Beddor, 2009; Harrington & Stevens, 2014; Kirby, 2014, for similar accounts of phonologization).

In order to test if and to what degree listeners of Andalusian Spanish make use of post-aspiration to distinguish between /t/ and /st/, a forced-choice perception experiment was conducted with a minimal pair pata-pasta that differed only in VOT. The perception experiment was accomplished using younger and older listeners of Eastern and Western Andalusian Spanish. The idea was to assess whether listeners of EAS and WAS differed in the perception of the phonological contrast, and to learn if the sound change in apparent-time found for production of /st/ (Ruch & Harrington, 2014) also takes hold in perception. The hypothesis tested is that younger and WAS listeners will distinguish pata and pasta more categorically than older and EAS listeners. The results of this perception test were further correlated with the production data of the same speakers. The findings of several studies suggest a relationship between the perception and the production of a phonological contrast in a sound change in progress (e.g., Fridland & Kendall, 2012; Harrington et al., 2008; Kleber et al., 2012). In the case of Andalusian Spanish, the question is whether a speaker who realizes the contrast between /t/ and /st/ in production based on post-aspiration is also more sensitive to post-aspiration as a cue to the /t/-/st/ distinction in perception. However, empirical work on vowel mergers illustrates how perception and production can be misaligned, that is, how a perceived phonological contrast is by the same speaker merged in production (Babel et al., 2013; Labov et al., 1991). Further evidence against a direct production-perception link comes from studies on compensation for coarticulation (Grosvald & Corina, 2012; Kataoka, 2011) in which the degree between perceptual compensation for coarticulation and the production of coarticulation was not correlated within speakers. An analysis of the relationship between production and perception may provide further insights into the role of perceptual or articulatory factors that contribute to the sound change.

In summary, there are three aims to this study. First, to assess whether the sound change from pre- to post-aspiration for /st/ found by an apparent-time study (Ruch & Harrington, 2014) takes hold also for /sk/- and /sp/-sequences. Second, to systematically compare velar, dental, and bilabial stops in order to understand in which contexts the sound change might have started and which articulatory factors might have brought it about. And third, to tackle the hypothetical process of phonologization of post-aspirated stops in this variety of Spanish by testing if Andalusian listeners use post-aspiration as a perceptual cue to /st/-sequences.

2 Production

2.1 Method

To investigate the influence of stop type, age, and variety on the production of pre- and post-aspiration, 18 isolated words from 48 speakers were analyzed.2 All target words were trisyllabic words with either intervocalic /s/ + voiceless stop sequences (12 target words, e.g., espada, estado, escapa) or intervocalic singleton stops (6 target words, e.g., separa, etapa, secaba), embedded into an /e_a/ context. The lexical stress in all target words fell on the second syllable so that a phonological /s/ occurred in the unstressed syllable (/esˈpada/, /esˈtado/, /esˈkapa/). Every target word was produced three times, resulting in a total number of 18 (target words) × 3 (repetitions) × 48 (speakers) = 2,592 tokens. Table 1 contains a list of the target words.

Table 1

Target words in the production study.

/st/ /sp/ /sk/

estaba /esˈtaba/ ‘to be’ espada /esˈpada/ ‘sword’ pescado /pesˈkado/ ‘fish’
estado /esˈtado/ ‘state’ espalda /esˈpalda/ ‘back’ escama /esˈkama/ ‘fish scale’
pestaña /pesˈtaɲa/ ‘eyelash’ españa /esˈpaɲa/ ‘Spain’ escaso /esˈkaso/ ‘scarce’
estanco /esˈtanko/ ‘kiosk’ espanto /esˈpanto/ ‘horror’ escapa /esˈkapa/ ‘escape’, 3rd p. sg. pres. ind.
/t/ /p/ /k/
etapa /eˈtapa/ ‘stage’ separa /seˈpaɾa/ ‘separate’, 3rd p. sg. pres. ind. secaba /seˈkaba/ ‘to dry’, 3rd p. imp. ind.
retara /reˈtaɾa/ ‘to challenge’, 3rd p. sg. imp. subj. repata /reˈpata/ (non-word) secara /seˈkaɾa/ ‘to dry’, 1st/3rd p. sg. imp. subj.

All 48 subjects were native speakers of Andalusian Spanish; 24 were from Seville, the capital of Western Andalusia, and 24 from Granada, a city in Eastern Andalusia. For each variety, there was an older group (age range 55–79 years) and a younger group (age range 20–36 years; see Ruch & Harrington, 2014). These four speaker groups were equal in terms of gender, i.e., there were six women and six men in each speaker group. All but six subjects had lived for at least 20 years in Seville or in Granada. The remaining six speakers had lived for at least 20 years in the nearby surrounding area.

The recordings were carried out in spring 2011 in Seville or Granada, using the SpeechRecorder software (Draxler & Jänsch, 2004). One recording session consisted of a semi-directed interview, reading a text, and reading isolated words in which the target words of this study were imbedded. The 18 target words were displayed individually and in a randomized order on a laptop monitor at a constant rate at just over 40 items per minute, together with 45 fillers and 118 words for a related study, resulting in a total number of 181 words per speaker. A laptop computer was used with a USB device (Cakewalk UA-25 EX CV 2 or M-Audio MobilePre) and a headset microphone (Beyerdynamic Opus 54.16-3), and the recordings were digitized at 44.1 kHz. Recordings were carried out in the phonetics laboratory at the University of Seville, in the radio studio of the University of Granada, or in a quiet room at the subjects’ residence or work place. Before starting the interview, all speakers were asked to speak in their dialect, in a natural way as if they were talking to a friend. Despite these instructions, some speakers used a more formal speech style and produced some tokens with a full alveolar fricative [s] instead of with a lenited /s/. These tokens were removed using an acoustic procedure (see below) and were not considered in the statistical analysis since the focus of this study is not on whether, but on how /s/-aspiration is realized.

From the 2,592 target tokens, 219 had to be discarded because of hesitations or false starts or because the speaker had produced a different word from the one displayed on the screen. The remaining 2,373 target words were segmented automatically using the Munich Automatic Segmentation System (MAuS; Schiel, 2004) on the basis of a broad phonemic transcription. The segment boundaries were then adjusted manually for the onset of V1 (V1.Onset), the onset (Cl.Onset) and offset of oral closure (Cl.Offset), and the offset of V2 (V2.Offset; see Figure 1). The boundaries of the oral closure were set where the energy decreased clearly, as inferred from the spectrogram and the waveform. The onset of V1 was set at the beginning of the first periodic waveform. All /s/-tokens were classified auditorily as [s], corresponding to a full alveolar fricative, or [h], corresponding to a weakened /s/ realized as either [h] or elided.

Figure 1 

Waveform and spectrogram of the word /esˈtanko/, produced by a young female speaker from Granada. The solid lines represent the manually set boundaries, the dashed lines the automatically set boundaries.

The onset of pre-aspiration (i.e., V1.Offset) and the offset of post-aspiration (i.e., V2.Onset) were set by an automatic procedure using two pitch trackers, one based on ESPS/Waves and one based on Scheffers (1983). The idea of this procedure is to find the offset of voicing preceding the oral stop closure, and the onset of voicing subsequent to the stop closure. For the onset of pre-aspiration, this was done by moving from left to right and starting at V1.Onset to find the first point in time where the pitch value was zero. When voicing ceased preceding the oral closure, which was the case for the majority of /sp, st, sk/ tokens, this was done based on the pitch calculated with the ESPS/Waves algorithm. When voicing extended into the oral closure (which happened in several tokens with intervocalic /p, t, k/), then the pitch calculated with the Scheffers (1983) algorithm was used instead (see Ruch & Harrington, 2014, p. 15, for details on the differences between the two pitch trackers and for further reasoning about this method). To set the onset of post-aspiration, exactly the same process was used in exactly the same way, but going backwards in time from right to left, starting at V2.Offset. Tokens where no pitch could be calculated because the preceding vowel was completely voiceless or deleted were removed; this resulted in the removal of 137 tokens.

The semi-automatically calculated interval between V1.Offset and Cl.Onset was defined as voice termination time (VTT) which is henceforth used to measure the duration of pre-aspiration. Accordingly, voice onset time (VOT) — the interval between Cl.Offset and V2.Onset — is used to measure post-aspiration duration. It has to be kept in mind that in perception, Andalusian listeners might also rely on other acoustic details such as the voiced transition between the preceding vowel and the voiceless pre-aspiration (see Ní Chasaide, 1985, for a perception experiment with Icelandic listeners).

In a second step all /sC/-tokens were removed that had been produced with a full alveolar fricative [s], i.e., where /s/ was not weakened. This was achieved by separating the /sC/-tokens into two groups — [s] and [h] — entirely based on acoustic parameters (cf. Ruch & Harrington, 2014, p. 15). First, all /sC/-tokens for which the automatically detected /s/-duration was smaller than 5 ms were assigned to the category [h] (lenited tokens). For the remaining /sC/-tokens, k-means clustering (Hartigan & Wong, 1979) was applied to the mean zero-crossing rate calculated over the automatically detected interval between V1.Offset and Cl.Onset. The group with higher zero-crossing density was assigned to the [s]-group (unlenited tokens); the other group was assigned to the [h]-group (lenited tokens). The choice of this parameter was motivated by the fact that alveolar fricatives usually have energy concentrated in a higher frequency range than glottal fricatives, which is reflected in a higher zero-crossing rate in alveolar fricatives (see Ruch & Harrington, 2014, p. 15). This procedure was verified by a comparison between the auditive and the acoustic labels of /sC/-tokens: there was agreement between the acoustically and the auditorily labelled tokens in 92.2% of cases. With the auditive procedure, 255 out of 1,467 were categorized as [s]; by the automatic procedure using k-means clustering, only 221 tokens were assigned to this category. These 221 tokens were removed from further analysis. The remaining 1,246 /sC/-tokens with lenited /s/ (henceforth hC-tokens) together with the 769 C-tokens were subjected to acoustic and statistical analysis using the Emu/R-interface (Harrington, 2010b). The distribution of the /sC/-tokens according to the automatically assigned [s]/[h] category, stop type, variety, and age group is displayed in Table 2.

Table 2

Recorded, discarded, and analyzed tokens in the production study.

Eastern Andalusian Western Andalusian


younger older younger older total

/sC/-tokens 432 432 432 432 1728
Hesitations; incorrectly read 18 53 30 32 133
No voicing in V1 21 44 33 30 128
Discarded [sC]-tokens 54 54 31 82 221
Analyzed hC-tokens 339 281 338 288 1246
           
/C/-tokens 216 216 216 216 864
Hesitations; incorrectly read 15 24 18 29 86
No voicing in V1 0 5 4 0 9
Analysed C-tokens 201 187 194 187 769

2.2 Results

2.2.1 The production of pre- and post-aspiration in Andalusian Spanish

2.2.1.1 Voice termination time

Figure 2 shows the mean values for voice termination time (VTT) as a function of age, variety, and place of articulation, separately for hC-words (e.g., estado, escapa, espada) and C-words (e.g., etapa, secaba, separa). As is apparent from Figure 2, VTT mean values per speaker for hC-words are mostly positive which means that voicing ends prior to the oral closure, i.e., they are pre-aspirated. VTT values for C-words, on the other hand, are mostly negative, i.e., voicing extends into the oral closure (see Torreira & Ernestus, 2011, for similar results on conversational Madrid Spanish). A mixed model with VTT as the dependent variable, Sequence (hC-words vs. C-words), Age (older vs. younger speakers), and Variety (EAS vs. WAS speakers) as fixed factors, and Speaker and Word as random factors showed a significant three-way interaction (χ2[1] = 23.2, p < 0.001). Post-hoc Tukey tests showed that younger and older speakers did not differ significantly in C-words, while younger and older speakers did differ significantly in hC-words in WAS (p < 0.001), but not in EAS (although the same trend is observable from Figure 2 also for this variety). This means that younger speakers of Western Andalusian Spanish produced a shorter pre-aspiration than older speakers in hC-words. The difference between hC- and C-words was significant in all four speaker groups (p < 0.001).

Figure 2 

Voice termination time (VTT) in hC-words (green) and C-words (yellow), separately for variety (EAS vs. WAS), age group (older vs. younger), and place of articulation (/p/ vs. /t/ vs. /k/). Each boxplot contains one mean value per speaker.

Furthermore, Figure 2 shows that VTT varies with age and place of articulation: VTT is longest before velar stops, and longer for older than for younger speakers. Because of the three-way interaction mentioned above between Age, Variety, and Sequence, further statistical tests were conducted separately for hC- and C-words.

A mixed model on the hC-data with VTT as the dependent variable, with Age, Variety, and Stop Type as fixed factors, and Speaker and Word as random factors showed a significant effect of Age (χ2[4] = 13.1, p < 0.05) and Stop Type (χ2[6] = 39.4, p < 0.001) on VTT. There was no significant difference in VTT between Eastern and Western Andalusian speakers. The results of post-hoc Tukey tests showed that among hC-words, velar and bilabial stops (EAS: p < 0.001; WAS: p < 0.001) as well as velar and dental stops (EAS: p < 0.001; WAS: p < 0.001) differed significantly between younger and older speakers of both varieties. The difference between dental and bilabial stops was not significant.

2.2.1.2 Voice onset time

It has been shown for numerous languages that voice onset time varies with place of articulation (Cho & Ladefoged, 1999). If we find the same VOT pattern in hC- and in C-words, it could very likely be attributed to articulatory or aerodynamic principles. A different VOT pattern in hC- compared to C-words on the other hand will improve understanding of which factors favour the emergence of post-aspiration, and how it spreads from one place of articulation to the other. As is apparent from Figure 3, we find the typical VOT pattern in C-words and in hC-words: bilabial stops show the shortest voice onset time, velar stops the longest. Among younger WAS speakers, however, VOT in dental stops exceeds that of velar stops in hC-words. At the same time, VOT is generally longer in hC-words than in C-words, with younger and WAS speakers showing the longest VOT.

Figure 3 

Voice onset time (VOT) in hC-words (green) and C-words (yellow), separately for variety, age group, and place of articulation. Each boxplot contains one mean value per speaker.

A mixed model with VOT as the dependent variable, Sequence, Age, and Variety as fixed factors, and Speaker and Word as random factors showed a significant three-way interaction (χ2[1] = 30.4, p < 0.001). Post-hoc Tukey tests confirmed that hC- and C-words differed in terms of VOT among younger (WAS: p < 0.001; EAS: p < 0.05), but not among older speakers. This means that younger speakers of both varieties produced hC-words with a longer post-aspiration than C-words, while older speakers of both varieties did not differ in their VOT.

In order to test the influence of Age, Variety, and Place of Articulation, a second mixed model was applied to hC-words. Among hC-words there was a significant three-way interaction (χ2[2] = 10.2, p < 0.01) among Place of Articulation, Age, and Variety, so that post-hoc Tukey tests were conducted. Age turned out to be significant among all three stop types in WAS (hp: p < 0.001; ht: p < 0.001; hk: p < 0.001) and in EAS among ht (p < 0.001), but not among hp- and hk-words. Thus, in EAS, younger speakers produced a longer VOT duration than older speakers only in one particular stop type (ht-sequences). The above-mentioned observation that VOT is longer in velar than in alveolar stops was confirmed statistically: there was a significant difference in VOT between velar and bilabial (p < 0.001), alveolar and bilabial (young WAS: p < 0.001; old WAS: p < 0.001; young EAS: p < 0.001; old EAS: p < 0.05), and between velar and alveolar stops in all speaker groups (old WAS: p < 0.001; young EAS: p < 0.05; old EAS: p < 0.001) except in young WAS speakers. This is in line with what is evident from Figure 3 that the VOT of velar and alveolar hC-sequences does not differ in young WAS speakers.

Although the number of words presented per second was controlled in the experimental setting, variation in speech rate cannot be excluded. To ascertain that the observed differences in VTT and VOT are not a result of a variation in speech rate between the varieties and age groups, an analysis of total word duration was carried out. The average word durations per speaker and stop sequences are displayed in Figure 4. Younger WAS speakers exhibit slightly shorter word durations (i.e., a faster speech rate) than the other three speaker groups. However, a mixed model with Word Duration as the dependent variable, Age and Variety as fixed factors, and Word and Speaker as random factors revealed no significant effects. If the age- and variety-dependent differences in VTT and VOT are due to differences in speech rate, then similar patterns should be found for word duration, VTT, and VOT. This was not the case: older and younger speakers displayed different VTT in both EAS and WAS, although in EAS there was no difference in word duration. Young WAS speakers showed a longer VOT, but shorter word duration.

Figure 4 

Word duration in hC- and C-words according to variety and age group (one mean value per speaker).

To exclude the possibility that the effect of stop type on VOT and VTT was due to differences in lexical frequency of the target words, an additional test was conducted. If the sound change is favoured by lexical frequency, more frequent target words should show a longer VOT and a shorter VTT than less frequent words.

The lexical frequency of target words was classified as “high” and “low” according to the frequency table based on the movie subtitles corpus for Spanish (Cuetos et al., 2011). A token frequency count of higher than 25 within a million words was categorized as “high”, a count less than 25 as “low”. There was no effect of lexical frequency on VOT or VTT for any of the varieties and age groups.

2.2.1.3 Discussion

hC-sequences and intervocalic stops clearly differed in terms of voice termination time, the former displaying mostly positive, the latter mostly negative values. Overall, younger speakers produced hC-sequences with a shorter VTT than older speakers, but this difference was significant only in WAS, not in EAS, suggesting that in WAS pre-aspiration is fading in apparent-time. Within hC-sequences, VTT was significantly longer in the velar than in the bilabial and the dental context. There was no interaction among place of articulation, age, and variety indicating that pre-aspiration is shortening to an equal degree across stop types. The finding of a longer VTT preceding velar than preceding bilabial/dental stops is consistent with dialectological studies on Andalusian Spanish as well as with phonetic studies of pre-aspiration in languages that have it segmentally such as Icelandic or Scottish Gaelic (see Section 1.3).

Voice onset time was much more variable across phonological sequences, varieties, and age groups. VOT in hC-sequences was longest for young WAS speakers, and generally longer for WAS and younger than for EAS and older speakers. These results indicate that in Andalusian /s/ + voiceless stop sequences VOT is lengthening in apparent-time and therefore confirm the findings of a previous study (Ruch & Harrington, 2014) for a larger data set. A subsequent analysis of place of articulation, age, and variety indicated that younger and older speakers differed in terms of VOT for all three stop types /sp, st, sk/ in WAS, and only for /st/ in EAS. This pattern suggests that the sound change is more advanced in WAS than in EAS, and is in line with previous studies that found long post-aspiration in Western Andalusian varieties (e.g., Parrell, 2012; Torreira, 2012), and differences between WAS and EAS varieties in terms of VOT (O’Neill, 2010). Furthermore, the observed pattern suggests that, at least in EAS, the sound change first affects /st/-sequences, and only later might generalize to /sp, sk/-sequences. As expected by phonetic principles and VOT data from many different languages (Cho & Ladefoged, 1999), velar stops displayed the longest, and bilabial stops the shortest VOT in C- as well as in hC-sequences on average. Interestingly, this pattern did not hold for hC-sequences in young WAS speakers, where /st/ displayed a post-aspiration duration as long as /sk/. It seems, though very unlikely, that long post-aspiration in /st/ is the result of universal principles or coarticulatory overlap in the speech of younger WAS speakers. The found pattern instead suggests that a long post-aspiration is intended by the speakers as an articulatory target for /st/-sequences. Although not significantly so, the measured VOT in hC-sequences in older speakers of both varieties slightly exceeded the VOT in singleton stops. It is possible that this extended VOT arises as a result of coarticulation or a looser coupling between the glottal adduction and the oral release of the stop. This idea will be discussed in further detail in Section 5.

In the next paragraph three more durational parameters of C- and hC-sequences will be investigated in our production data: the total duration of the voiceless interval, the duration of the preceding vowel, and the oral stop closure.

2.2.2 Temporal coordination of glottal and oral gestures

The aim of these subsequent analyses is to understand how glottal and oral events are coordinated temporally, and what coordination mechanism might have given rise to the diachronic change from pre-aspiration to post-aspiration. Is the diachronic change mainly due to a change in the duration and timing of the oral closure gesture, or is the timing of the glottal opening gesture itself changing over time?

In order to shed light on these questions, three additional analyses are carried out. First, the total duration of the voiceless interval in hC-sequences will be compared across age groups and varieties to see if the duration of the glottal opening changes with the sound change. Second, the timing of the onset of pre-aspiration will be analyzed relative to an anchor point, in this case, the onset of voicing in the preceding vowel. This will be done to further understand if the fading of pre-aspiration and the emergence of post-aspiration came about due to a rightwards shift of the glottal opening, and whether the sound change is associated with a change in the duration of the preceding vowel. Third, oral stop closure duration will be looked at within (h)C-sequences in both varieties and age groups and at the different places of articulation. The aim here is to understand if the fading of pre-aspiration and the lengthening of post-aspiration are associated with a change in the oral closure duration.

The total duration of the voiceless interval in hC-sequences will be quantified by summarizing voice termination time, duration of the oral stop closure, and voice onset time. This procedure is based on the idea that one underlying glottal gesture is present for each aspirated /s/ + stop sequence. Although no physiological data for Andalusian Spanish is so far available, the idea of a single glottal opening gesture is supported by acoustic data provided by Torreira (2012), who found a consistent co-variation between the sum of pre-aspiration and closure duration and VOT, as well as by Parrell (2012), where pre-aspiration duration co-varied with post-aspiration duration.

Figure 5 displays the total duration of the voiceless interval in hC- and C-sequences. The values appear to be very stable not only across variety and age group, but also across place of articulation. It is further apparent that the total duration of the voiceless interval is greater in hC-sequences than in C-sequences in all four speaker groups and among all three places of articulation. A mixed model with the total duration of the voiceless interval as the dependent variable, with Age, Variety, and Sequence as fixed factors, and with Word and Speaker as random factors showed a significant three-way interaction (χ2[1] = 4.0, p < 0.05). The interaction might have come about because of younger WAS speakers displaying a slightly shorter total duration for the intervocalic stop /t/, while the other three speaker groups showed slightly longer total durations in this context. Post-hoc Tukey tests confirmed what is evident from Figure 5 that C- and hC-sequences clearly differed in all four speaker groups (p < 0.001), and that there was no influence of Age or Variety on this measure. These results show that speakers of Eastern and Western Andalusian Spanish distinguish C- from hC-sequences in production, the latter being produced with a very long voiceless interval (mean = 155.8 ms) which is almost twice as long as in intervocalic stops (mean = 82.4 ms).

Figure 5 

Total duration of the voiceless interval in intervocalic stops (yellow) and in hC-sequences (green) according to place of articulation, variety, and age group.

The next step is to analyze the duration of the preceding vowel in hC-sequences. If the gradual shift from pre- to post-aspiration came about because of a rightwards shift of the glottal opening, i.e., because the glottal opening takes place later in time, we would expect younger speakers to produce a longer preceding vowel than older speakers. Indeed, some studies (Carlson, 2012; Figueroa, 2000; Resnick & Hammond, 1975) suggest that lengthening of the preceding vowel compensates for /s/-weakening (in their case, /s/-deletion) in some varieties of Spanish. The question in the present study is if the fading of pre-aspiration (not the debuccalisation of /s/) is compensated for by vowel lengthening. The preceding vowel duration will be quantified by measuring the interval between the onset of the preceding vowel (V1.Onset) and the offset of voicing (V1.Offset). C-sequences are taken into account by way of comparison.

In 120 hC-tokens (9.6%) and in 189 C-tokens (24.6%), VTT was negative, meaning that voicing extended into the oral closure. In these cases, the interval between V1.Onset and Cl.Onset was used instead to assess vowel duration. The duration of the preceding vowel is shown in Figure 6. Vowels preceding C-sequences are slightly longer than those preceding hC-sequences. This difference is, however, not very consistent: vowel duration preceding hC-sequences appears to be longer in Eastern than in Western Andalusian Spanish, and in Western Andalusian Spanish slightly longer for older than for younger speakers. Taken together, vowels preceding C- or hC-sequences seem to be shorter when produced by young WAS speakers than when produced by older WAS speakers, which may be attributed to the observed differences in speech rate (see Figure 4).

Figure 6 

Duration of the vowel preceding singleton stops (yellow) and hC-sequences (green). Each boxplot contains one mean value per speaker.

In a first step the influence of phonological sequence (hC vs. C), age, and variety on vowel duration was tested using a mixed model in which Speaker and Word were included as random factors. This model showed a significant effect of Sequence (χ2[3] = 27.0, p < 0.001) and Age (χ2[3] = 22.1, p < 0.001) on Vowel Duration. Since there was a two-way interaction between Age and Sequence (χ2[1] = 19.6, p < 0.001), post-hoc Tukey tests were conducted and indicated that Vowel Duration preceding /C/ and /sC/ differed only in older (p < 0.001), but not in younger speakers. There was no significant difference in Vowel Duration between older and younger speakers in any of the phonological sequences.

In a second step the influence of age and variety on vowel duration was tested for hC-tokens with a positive VTT only. A mixed model with Vowel Duration as the dependent variable, Age and Variety as fixed factors, and Word and Speaker as random factors revealed no significant effects. This suggests that younger and WAS speakers do not realize the glottal abduction (devoicing gesture) later in time than older and EAS speakers; in other words, there is no rightwards shift of the devoicing gesture in apparent-time.

As a last step, closure duration in C- and hC-words will be looked at. Figure 7 suggests that closure duration is longer in hC- than in C-sequences, a tendency that seems to be more marked in EAS than in WAS. A mixed model with Closure Duration as the dependent variable, Age, Variety, and Sequence (hC- vs. C-words) as fixed factors, and Speaker and Word as random factors confirmed that Closure Duration was significantly longer in hC-sequences than in singleton stops (χ2[3] = 55.5, p < 0.001) in both varieties. Furthermore, there was a significant interaction between Sequence and Age (χ2[1] = 7.7, p < 0.01) as well as between Sequence and Variety (χ2[1] = 30.5, p < 0.001). The results of post-hoc Tukey tests displayed highly significant differences between C- and hC-sequences for all four speaker groups (p < 0.001), and a significant difference in Closure Duration of hC-sequences between young WAS speakers and the two EAS speaker groups (p < 0.05). This means that younger WAS speakers produced shorter closure durations. There was no significant effect of Age or Variety on the closure duration in C-words.

Figure 7 

Duration of the oral closure in intervocalic stops (yellow) and hC-sequences (green). Each boxplot contains one mean value per speaker.

The results for closure duration have to be interpreted with caution because of the slightly faster speech rate of young WAS speakers (see Section 2.2.1). It might be the case that the reduction processes due to a faster speech rate affect the long closure duration in hC-sequences to a greater extent than in singleton stops. If this is the case, then the smaller difference between C- and hC-sequences in young WAS speakers cannot be traced back to the sound change, but to speech reduction processes that affect long segments to a greater degree than short segments.

2.3 Discussion

In this section further durational parameters have been analyzed that might be related to the change from pre- to post-aspiration in Andalusian Spanish: the total duration of the voiceless interval in (h)C-sequences, the duration of the preceding vowel, and the duration of the stop closure. The aim of these analyses was to shed light on the stability and variation of these durational parameters, and to investigate whether the sound change is associated with a change in one of these parameters.

The total duration of the voiceless interval was significantly longer in hC-sequences than in intervocalic stops for both varieties and age groups. The analysis further demonstrated that the total duration of hC-sequences—as inferred from the total duration of the voiceless interval—is very stable across varieties, place of articulation, and age groups, and is not affected by the sound change.

There was no effect of age or variety on the duration of the preceding vowel, suggesting on the one hand that, relative to the onset of the preceding vowel, the glottal opening gesture does not take place later in time in younger than in older speakers. At the same time, this finding does not support compensatory vowel lengthening that has been suggested as taking place in Eastern Andalusian (Carlson, 2012), in Cuban (Resnick & Hammond, 1975), and in Puerto Rican Spanish (Figueroa, 2000). By contrast, vowels were slightly shorter when followed by a phonological /s/ + voiceless stop than when preceding intervocalic stops. This finding, again, does not support the assumption of vowel lengthening as a compensation for /s/-lenition. An inter-dialectal comparison at this point, however, is difficult because of the different methods used to measure pre-aspiration and vowel duration. In an auditory or manual segmentation procedure, for instance, the breathy part of the vowel (voiced aspiration) can be treated either as part of the vowel or as pre-aspiration (see Gerfen, 2002, for a discussion of this issue).

Closure duration appeared to be an important parameter for distinguishing C- and hC-sequences in production, displaying significantly longer durations in hC- than C-sequences in all four speaker groups. This effect, however, appeared to be less marked in younger Western Andalusian speakers, who also pronounced the longest post-aspiration (see Section 2.2.1). The long stop closures that were found not only for younger, but also for older speakers in both varieties suggest that long stop closures did not arise to compensate for the shortening of pre-aspiration, but might have arisen previous to the sound change to compensate for /s/-weakening (see Gerfen, 2002; Ruch & Harrington, 2014).

Taken together, the results of the closure duration, the duration of the voiceless interval, and the previous vowel suggest that the shortening of pre-aspiration came about because the oral closure is formed earlier in time, while the timing of the glottal abduction relative to the preceding vowel remains relatively constant across age, variety, and place of articulation.

3 Perception

In order to test whether the sound change from pre-aspiration to post-aspiration also affects perception, a forced-choice perception experiment was conducted. A VOT-continuum was synthesized between the minimal pair pasta [ˈpatha] – pata [ˈpata], with pata having a short VOT (15 ms) and pasta having a long VOT (55 ms). If post-aspiration is used as a cue to /st/, then listeners of Andalusian Spanish are expected to distinguish the stimuli of the VOT-continuum in a categorical manner. If the sound change also affects perception and if there is a relationship between the production and perception of post-aspiration, then younger and Western Andalusian listeners should be more sensitive to VOT as a cue for /st/ than older and Eastern Andalusian listeners. That is, younger and WAS listeners are expected to need a shorter VOT to perceive pasta, and they should differentiate more consistently between the two words (i.e., show a more categorical perception).

3.1 Method

As a baseline for the VOT-continuum, the utterances Digo pasta ‘I say paste’ [ˈdiɣo ˈpatha] and Digo pata ‘I say paw’ [ˈdiɣo ˈpata], produced by a 31-year-old female speaker from Seville, were used. The VOT in the originally produced tokens was 55 ms for [ˈpatha], and 13 ms for [ˈpata]. Measures of the preceding and the following vowel V1 and V2 and the oral closure are summarized in Table 3. The pasta-continuum was generated by shortening the long VOT of pasta (55 ms) in eight equal steps (5 ms each) to 15 ms using the Akustyk plugin in Praat (Boersma & Weenink, 2011). Therefore, all nine stimuli of the pasta-continuum showed exactly the same acoustic properties except for VOT. The pata-continuum was generated by replacing the short original VOT of pata (13 ms) by the long VOT of pasta (55 ms). The long VOT was then shortened using the same procedure as in the pasta-continuum, generating 9 stimuli that differed only in VOT. The reason for doing so was that pata and pasta differ not only in VOT, but also in other acoustic parameters such as the duration of stop closure and the duration of the preceding vowel (see Table 3). If listeners use closure duration as a cue to the minimal pair pata-pasta, then they might answer pasta to all stimuli within the continuum. In the original pasta token, the C:V1-relationship turned out to be greater than in pata. These differences will be taken into account in the analysis and the discussion of the results. The 18 stimuli of the two continua were multiplied, resulting in 9 (steps) × 2 (continua) × 10 (repetitions) = 180 stimuli, and embedded in a randomized order in an online perception experiment using Percy (Draxler, 2011).

Table 3

Measurements of the stimuli in the forced-choice perception experiment.

Stimulus Duration (ms) Ratio

Pata-continuum V1 V2 Closure VOT C C:V1
Pata1 92.7 77.0 82.5 15.0 97.5 1.05
Pata9 92.7 77.0 82.5 55.0 137.5 1.48
Pasta-continuum V1 V2 Closure VOT C C:V1
Patha1 76.6 78.0 88.4 15.0 103.4 1.35
Patha9 76.6 78.0 88.4 55.0 143.4 1.87

Listeners were told that they would hear an Andalusian woman saying the word “pata” or “pasta”. They were asked to judge for each stimulus if they heard pata or pasta and mark the corresponding box with the orthographic form on the computer screen. They could listen once to every example, and there were no training tokens. All listeners used Beyerdynamic DT-770 Pro Studio headphones and ran the experiment in a quiet room. The whole experiment took between 10 and 30 minutes.

Seventy-nine listeners participated in the perception experiment. Because of technical problems, 5 subjects judged fewer than 170 out of 180 stimuli, so we removed their data. The remaining 74 subjects completed between 94 and 100% of the experiment (9 or 10 repetitions of each stimulus). As in the production study, there were an approximately equal number of subjects from Seville (39; 14 women and 25 men) and Granada (35; 16 women and 19 men). Of the 74 listeners, 48 also took part in the production study described above, which was conducted prior to the perception experiment. The listeners were between 18 and 86 years old. Forty-three of them had lived for their whole life in Seville or Granada; the remaining 31 subjects had lived for at least 20 years in Seville/Granada or the respective province. Of the 74 participants, 65 reported having no hearing impairment, 7 reported having bilateral hearing impairment, and 2 having unilateral hearing impairment; all of those reporting hearing impairment were part of the older group.

3.2 Results

For data analysis the remaining 74 subjects were, again, divided in an older (> 50 years; 36 subjects, mean age 67.2 years) and a younger (< 50 years; 38 subjects, mean age 26.8 years) group. Men and women were approximately equally distributed among the two dialect and age groups (see Table 4).

Table 4

Distribution of the participants in the forced-choice perception experiment.

Western Andalusian Eastern Andalusian


Women Men Women Men

Younger 8 12 7 11
Older 6 13 9 8
Total 39 35

A generalized linear mixed model in R was used to calculate the slope and the intercept individually for each listener. The Listeners’ Judgement was set as the dependent variable (two levels: pata/pasta), the VOT Step as fixed factor (9 levels), and the Listener (74 levels) and the Continuum (two levels: pasta-/pata-continuum) as random factors.3 The speaker-specific slope m and intercept k were then used to calculate a psychometric curve for each listener.4 It turned out that for five listeners the cross-over point (i.e., the 50% decision boundary) was situated outside the stimuli range, meaning that their answers were not consistently influenced by VOT. They were all over 50 years of age, one was from Granada and four from Seville of which one reported having hearing problems. The data of these five listeners were excluded from further analysis.

Figure 8 displays the decision boundaries for all four listener groups. As expected, older listeners needed a longer VOT to perceive pasta than younger subjects, that is, they show a higher value for the cross-over point. No difference can be seen between EAS and WAS listeners in what refers to the cross-over point. An ANOVA with the Cross-over Point as the dependent variable and Age and Variety as between-subject factors showed a highly significant effect of Age (F[1,65] = 37.6, p < 0.001) on the dependent variable; the effect of Variety was not significant and there was no interaction between the factors.

Figure 8 

Cross-over point (50% decision boundary) in the perception experiment (one value per listener), separately for old (dark grey) and young (white) speakers of EAS (left two boxplots) and WAS (right two boxplots).

Figure 9 shows the psychometric curves separately for age and variety. The endpoints of the VOT-continua were identified consistently as /pata/ or /pasta/ by all four listener groups. The mean psychometric curves of the younger listeners appear to be steeper than those of the older listeners. At the same time, Eastern Andalusian subjects show a flatter psychometric curve than Western Andalusian subjects, this difference being more marked for younger than for older subjects. An ANOVA with the speaker-specific Slope as the dependent variable and Age and Variety as between-subject factors showed a significant effect of Age (F[1,65] = 33.4, p < 0.001) and Variety (F[1,65] = 19.6, p < 0.001) and an interaction between them (F[1,65] = 6.3, p < 0.05). In a post-hoc Tukey test, Age turned out to be significant only among WAS (padj < 0.001), but not among EAS listeners, and there was no significant effect of Variety among the older listeners.

Figure 9 

The mean psychometric curves in the perception experiment. Dashed lines refer to young speakers, solid lines to old speakers; black lines refer to WAS and grey lines to EAS speakers.

As explained in Section 3.1, the stimuli for this perception experiment were generated based on two different productions: /pata/ [ˈpata] and /pasta/ [ˈpatha]. As displayed in Table 3, the first of the two naturally produced baseline tokens contains additional cues to /pata/, the latter additional cues to /pasta/. Although the main aim of the perception experiment lays on the role of VOT, a comparison between the two continua—the pata- and the pasta-continuum—may give some indication of whether other acoustic cues besides post-aspiration potentially play a role in the perceptual distinction between pata and pasta. The psychometric curves were again calculated for all 74 listeners, but this time separately for each of the two continua. The data of 8 listeners had to be removed because they did not converge in one or in both continua. Figure 10 shows the psychometric curves separately for EAS and WAS listeners, and separately for the two continua. In both varieties, the curves of the pasta-continuum are slightly left-shifted, meaning that the participants gave fewer pata answers than when the continuum was generated based on [ˈpata]. This trend seems to be more pronounced for older than for younger listeners. A repeated-measures ANOVA with Cross-over Point as the dependent variable, Continuum as within-subject factor, and Age and Variety as between-subject factors showed a highly significant effect of Continuum (F[1,62] = 115.3, p < 0.001) and a significant interaction between Age and Continuum (F[1,62] = 32.3, p < 0.001). In post-hoc t-tests with Bonferroni correction, Continuum appeared significant in both older (padj < 0.001) and younger listeners (padj < 0.001). Age turned out to be significant only in the pata- (padj < 0.001), but not in the pasta-continuum. There was no influence of Continuum on the slope of the psychometric curve, and no significant interaction.

Figure 10 

Psychometric curves showing the perception of VOT according to continuum, variety, and age group.

3.3 Discussion

The results showed that younger and older listeners of Eastern and Western Andalusian Spanish were able to distinguish the minimal pair pata-pasta that differed only in VOT of /(s)t/. The age-dependent differences in the cross-over point suggest that younger listeners are more sensitive to post-aspiration as a cue for /st/ than older listeners. The differences between EAS and WAS listeners in the slope of the psychometric curves, on the other hand, indicate that the latter have a more categorical perception of post-aspiration in this minimal pair, i.e., they distinguish more consistently between stimuli containing either a short or a long VOT.

Stimuli that had been resynthesized based on an original pasta token were more likely to be perceived as pasta. When the cross-over point was calculated separately for each continuum, age appeared to be significant only in the pata-continuum. This finding underlines the importance of additional cues to /st/. It is possible that older listeners based their judgements primarily on the C:V1-ratio, and not on VOT. The measurements in Table 3 demonstrate that the C:V1-ratio was greater in the stimuli of the pasta- than in those of the pata-continuum (1.05 for VOT-step 1 in the pata-continuum, 1.35 for VOT-step 1 in the pasta-continuum).

As hearing is known to decline with age (Boenninghaus & Lenarz, 2005, p. 109), the question arises whether the reported age-dependent differences in the perception of post-aspiration are due to hearing loss in the elderly subjects. Additional analyses were conducted in order to address this question. The 9 subjects with self-reported hearing problems (all of them older than 50 years) and all other 16 subjects older than 65 years5 were excluded for these tests. The same statistical tests as described above were performed on the data of the remaining 48 listeners. For this reduced dataset the same significant effects of Age on the Cross-over Point (F[1,44] = 14.9, p < 0.001) and on the Slope of the psychometric curve (F[1,44] = 7.7, p < 0.01) were found.

The fact that continuum had a significant effect also among older listeners suggests that older listeners did perceive the phonetic details that distinguished the two continua, and thus supports the interpretation that the differences between older and younger listeners in crossover-point and slope of the psychometric curve do not have to be attributed to hearing-loss, but can be interpreted as a sound change in progress. In order to fully exclude hearing degradation as an artefact to listener age, future perception experiments with Age as an independent variable should include a control continuum (e.g. /x/-/f/) or a same-different test on the stimuli.

With the caveat that the effect of hearing loss may not be entirely excluded, the results of the forced-choice perception experiment suggest that the sound change in progress from pre- to post-aspiration also affects perception.

Except for 5 out of 74 listeners, the endpoint of the continua with a VOT of 55 ms was identified very consistently as /pasta/ even by the older EAS subjects. This result is surprising when it is considered that older EAS subjects in the production task did not use post-aspiration for distinguishing hC- and intervocalic C-sequences.

Still, on a whole, there seems to be a relationship between production and perception of post-aspiration such that the group of younger Western Andalusians displaying the longest VOT in production showed also a more categorical perception of post-aspiration in the forced-choice perception experiment, and older Eastern Andalusians producing the shortest VOT were also the least sensitive to post-aspiration in the perception experiment. The more abrupt psychometric curves of the younger WAS listeners suggest that the post-aspirated stop [th] is for this group, to a certain degree, phonologized. The next section tests whether this group relationship also holds true at an individual speaker-listener level.

4 The relationship between production and perception of post-aspiration

If there is a relationship between perception and production of post-aspiration, then speakers who produced /s/ + voiceless stops with a long post-aspiration should also be more sensitive to post-aspiration as a cue to /st/ in perception. This means that there should be a correlation between VOT in production and the cross-over point and the slope of the psychometric curve in perception, respectively. Since in the perception experiment only a minimal pair of /t/ vs. /st/ was tested, and production-perception comparison will only be based on ht-words. The analysis in this section is based on the data of those 48 listeners who also participated in the production task.

The VOT-difference between ht- and t-words was calculated for each speaker-listener by subtracting the speaker-wise mean value of VOT in t-words from the mean value in ht-words. This procedure was chosen in order to get per speaker one value that discloses to what extent a speaker uses VOT for distinguishing between ht- and t-words.6 The scatterplots in Figure 11 display the VOT-difference and its relationship to the cross-over point (a) and the psychometric curve (b) for each speaker-listener. As is apparent from Figure 11, there is a negative relationship between the VOT-difference and the perception data: Speakers who distinguished ht- from t-words by VOT in production displayed a lower cross-over point and a steeper psychometric curve (i.e., higher slope values).

Figure 11 

Scatterplot to show the relationship between the production data (x-axis; VOT-difference between ht- and t-words) and the perception data (y-axis; a. cross-over point, b. slope of the psychometric curve, cf. Figures 8 and 9) within one speaker-listener. Circles stand for younger, dots for older speaker-listeners; black for WAS, grey for EAS subjects.

Two linear regression models in R were then applied to test whether (1) the individual cross-over point and (2) the slope of the psychometric curve can be predicted from the VOT difference in production. In both models Age and Variety and the interaction between them were included as fixed factors in order to test whether the relationship between production and perception differed across age and variety. The first model showed a significant effect of VOT-difference (F[1,34] = 13.8, p < 0.001) and a significant effect of Age on the Cross-over Point (F[1,34] = 4.3, p < 0.05). There was no significant interaction effect.

The second model showed, similarly to the first, a significant effect of VOT-difference on the Slope of the psychometric curve (F[1,34] = 9.4, p < 0.01). This indicates that there is a relationship between the production and the perception data. Age did not significantly influence the Slope and there was no interaction effect.

At first glance, these results might be counterintuitive since it would be expected that speakers who produced a longer VOT should also use a longer VOT in perception to distinguish the minimal pair pata-pasta. However, taking into account that a lower crossover-point and a steeper slope in perception indicate that the subjects were more sensitive and showed a more categorical perception of post-aspiration, the results are compatible with the predictions.

5 General discussion

This paper investigated the production and perception of /s/ + voiceless stops in two varieties and two age groups of Andalusian Spanish. One of the aims of this study was to assess if the finding of a sound change in progress for /st/-sequences of a previous study (Ruch & Harrington, 2014) holds true for /sp/- and /sk/-sequences as well. This apparent-time study with 48 speakers of Andalusian Spanish, 24 of an Eastern (Granada) and 24 of a Western variety (Seville Spanish) showed that younger WAS speakers produced /sp, st, sk/ with a longer post-aspiration and a shorter pre-aspiration than older WAS speakers. Pre-aspiration and post-aspiration duration were inferred by a semi-automatic procedure (Ruch & Harrington, 2014) that acoustically measures voice termination time (VTT) and voice onset time (VOT), that is, the interval between the offset/onset of voicing in the preceding/following vowel. The results confirmed the hypothesis of a sound change in progress in Andalusian Spanish (O’Neill, 2010; Parrell, 2012; Torreira, 2007a) not only for /st/-, but also for /sp/- and /sk/-sequences: /s/ + voiceless stop sequences are increasingly produced with a long post-aspiration and a very short pre-aspiration in young speakers. This tendency for a longer post- and a shorter pre-aspiration was found for Eastern Andalusian Spanish as well, where younger and older speakers, however, differed significantly only in post-aspiration duration in /st/-sequences, but not in /sp, sk/ and not in pre-aspiration duration. The findings provide evidence that in Andalusian Spanish, aspiration resulting from debuccalisation of /s/ can variably be produced (1) preceding, (2) following, or (3) preceding and following the oral stop closure of a plosive, and it therefore cannot be captured by the commonly used categorization in variationist studies of [s], [h], or [∅]. Our findings of a sound change in Andalusian /sp, st, sk/-sequences present a counterexample of the routinely taken perspective of /s/-aspiration as a case of stable variation (Gimeno, 2008; Labov, 1994).

Another aim of this study has been to investigate the effect of place of articulation on the production of pre- and post-aspiration in the two varieties and age groups in order to tackle the articulatory or perceptual factors that might have given rise to the sound change. VTT was longest preceding velar, and shortest preceding bilabial and dental stops. This pattern is consistent with dialectological (e.g., Alther, 1935) and phonetic studies (e.g., Marrero, 1990; Sánchez Muñoz, 2004) that describe how aspiration resulting from /s/-weakening sometimes disappears preceding /p/ and /t/, but rarely so preceding /k/. It is also in line with findings for languages that have pre-aspiration segmentally such as Scottish Gaelic (Clayton, 2010; Nance & Stuart-Smith, 2013; Ní Chasaide, 1985) or dialects of Swedish (Helgason & Ringen, 2008), where pre-aspiration is reported to be longer in the velar than in the bilabial context. The longer pre-aspiration duration preceding velar stops is likely to be due to articulatory factors, that is, to a slower movement of the tongue back as opposed to the tongue tip and the lips in dental and bilabial stops (Helgason & Ringen, 2008). The lack of interactions between place of articulation, age, and variety in our study suggests that pre-aspiration is fading to equal degrees across stop types.

VOT appeared to be more variable when compared among places of articulation, varieties, and age groups. Intervocalic stops /p, t, k/ exhibited the expected VOT-pattern with the velar displaying the longest, and the bilabial stop displaying the shortest VOT. The same gradation of VOT was found for hC-sequences in older WAS and in EAS participants. In younger WAS speakers, however, post-aspiration appeared to have the same length in /st/- as in /sk/-sequences and did therefore deviate from the VOT-pattern that has been attributed to phonetic universals and has been found for many languages (Cho & Ladefoged, 1999). The very long post-aspiration in /st/-sequences in young WAS speakers, and the fact that younger EAS speakers produced a longer VOT than older EAS speakers only in /st/ suggests that the upcoming post-aspiration cannot entirely be explained by articulatory factors. The finding that in EAS only /st/-sequences show the change supports the idea that the sound change was actuated in the dental context. This interpretation is based on the assumption that, up to this point, the trajectory of the change was the same in EAS and WAS. Due to aerodynamic and perceptual factors, a slightly longer VOT might be particularly prone to imitation and further lengthening in the dental, but not in the bilabial and the velar context: There is evidence that the stop release of a [th] contains more energy in the high-frequency range than that of a [ph] (Harrington, 2010a, p. 104). The lesser auditory salience of the stop release of [p], due to the “lowest amplitude and spectrally most diffuse burst of any of the voiceless stops” has been suggested to account for the tendency of [p] to become voiced (Ohala, 1983, p. 195, based on Stevens, 1980). Although the distinction between [kh] and [th] is less clear, [th] shows a rise of the spectral energy towards the following vowel, while [kh] has its spectral moment in the mid-frequencies (Harrington, 2010a, pp. 104–106).7 As far as perception is concerned, there are experiments indicating that the voicing contrast in English is more salient for alveolar than for bilabial stops (Silbert, 2014). If it is assumed that, due to aerodynamic factors, post-aspiration in /st/-sequences is perceptually more prominent than in /sp/- and /sk/-sequences, then it is possible that listeners first start to imitate the long VOT in /st/-words, and only later generalize it to the velar and the bilabial context.

A slightly longer VOT in hC-sequences than in intervocalic stops is likely to exist as synchronic variation also in speakers that have not taken part in the sound change, that is, speakers who produce mostly short VOT, but a long closure duration and pre-aspiration. This assumption is supported by the slightly longer VOT in hC- than in C-words found for older EAS speakers (see Figure 2), and by Torreira’s (2007a) inter-dialectal comparison of /sp, st, sk/ where a similar tendency was observed for Puerto Rican /st, sk/ and Buenos Aires Spanish /sk/. The explanation of how this longer VOT arises has to remain an issue for future studies. One possible explanation is that the long stop closure (in hC- as opposed to C-sequences) results in a higher intra-oral pressure, the latter leading to more prominent stop release (Ruch & Harrington, 2014). Another possible explanation is that the coordination between the glottal and the oral gesture is more variable in the offset of the consonant sequence, as was observed for German fricative + stop clusters (Hoole, 2006, p. 145). Such a looser coupling at the offset of the /s/ + stop sequence would permit an earlier release of the oral stop, with the consequence of a greater voice onset time.

This brings up the question of sound change actuation: why should a listener imitate a slightly different production target instead of compensating and filtering the deviant token out? In the case of slightly post-aspirated stops — as they are likely to exist at the beginning of the sound change — a longer VOT does not necessarily have to be perceived as a deviant token. A perception experiment with listeners of Argentinian Spanish (Ruch & Harrington, 2014) provided evidence that a slightly longer VOT favours the perception of an ambiguous stimulus [ˈpahtha] as pasta, and not as pata in a forced-choice perception experiment. The authors synthesized two continua between pata [ˈpata] and pasta [ˈpahta] by manipulating the duration of pre-aspiration. One of the two continua was generated with a slightly longer VOT (29 ms instead of 12 ms). In this continuum, the listeners were more inclined to answer pasta than in the continuum with short VOT. The results suggest that post-aspiration may enhance the cues of the underlying phonological /s/.

At least two pre-conditions would have to be met for post-aspiration to be imitated and to be spread within the speech community or, in other words, for the sound change to be actuated: First, the longer VOT needs acoustically to be sufficiently distinct in order to be perceived as a different production target (see Baker et al., 2011). Second, post-aspiration should not be perceived as a deviant /sC/-token because then, post-aspiration would be filtered out and would not be imitated by the listener when he becomes speaker (Garrett & Johnson, 2013). If a listener-speaker then imitates the slightly shifted production target and, eventually, exaggerates the target, these subtle shifts are accumulated and, as suggested by Garrett and Johnson (2013), can lead to a gradual sound change. Related to this scenario is the question of whether the sound change originated in the younger or in the older speaker group. The trend towards a longer VOT in hC- but not in C-sequences in older speakers (see Figure 3) could reflect a very early stage of the sound change or it could be the result of older speakers accommodating to younger speakers. In the latter case, however, and according to the model presented above, the VOT-difference between hC and C should be especially marked in /st/. The data of the current study do not allow a conclusive answer to this question. Longitudinal studies of individual Andalusian speakers (following Harringtonet al., 2000, for British English) or studies on accommodation between younger and older speakers could shed light on this issue.

In both varieties and age groups, C- and hC-sequences very clearly differed in terms of VTT, duration of the oral stop closure, and the total duration of the voiceless interval: hC-sequences displayed a longer and mostly positive VTT, a longer stop closure, and a total voiceless interval almost double the length of the intervocalic stops /p, t, k/. As the comparison between the two varieties and age groups suggests, the total duration of the voiceless interval is furthermore very stable in apparent-time in both varieties (see Figure 5). In addition to the above-mentioned parameters, young WAS participants displayed significantly longer post-aspiration in hC- than in C-sequences, and young EAS speakers in /st/ than in /t/. The preceding vowel was slightly longer when followed by an intervocalic stop than when followed by an hC-sequence (see Figure 6). This finding runs counter to assumptions formulated for Puerto Rican (Figueroa, 2000; Resnick & Hammond, 1975) and Eastern Andalusian Spanish (Carlson, 2012), where vowel lengthening has been assumed as compensating for /s/-lenition.

These findings further demonstrate that multiple cues are used to distinguish between intervocalic voiceless stops and hC-sequences in production. Within the multiple acoustic parameters that are used by Andalusian speakers to distinguish C- and hC-sequences in production, VOT is becoming more prominent, and VTT and closure duration are becoming less prominent. From a phonological point of view, the finding of post-aspirated stops in a variety of Spanish is striking because Spanish does not have (post-)aspirated stops, either phonologically or phonetically.8 Voiceless intervocalic stops are realized without aspiration, and, as the results of the present study and previous studies (e.g., O’Neill, 2010; Torreira & Ernestus, 2011) have shown, they often exhibit a partly voiced stop closure (see Figures 2 and 3).

The aim of the forced-choice perception experiment was to test whether post-aspiration is used as an acoustic cue for /st/ when distinguishing a minimal pair such as /pata/-/pasta/. With the exception of 5 out of 74 listeners, they were all able to distinguish the minimal pair that differed only in VOT. This finding challenges Torreira’s (2012) assumption (based on acoustic data) that Western Andalusian post-aspirated stops are the result of coarticulatory overlap and are not intended by the speakers. Although almost all listeners were able to distinguish the minimal pair, there were differences between the two varieties and age groups in the perception of post-aspiration. Younger listeners needed a shorter VOT to perceive pasta, indicating that they were more sensitive to this cue. At the same time, Western Andalusian subjects displayed a steeper psychometric curve pointing to a more categorical perception of the VOT-difference.

It has to be kept in mind that the participants might have based their judgements on other cues than post-aspiration. There is evidence that phoneme distinction is made based on several, often co-varying, acoustic cues (Best et al., 1981; Dorman et al., 1977; Raphael, 2005). In the forced-choice perception experiment in this study, an increased VOT in the stimulus is associated with an increased total duration of the phonological /st/ sequence and, consequently, a greater C:V1 ratio (see Table 4). This could explain why not only younger and WAS listeners, but also older participants were able to distinguish between the two words of the minimal pair (although in a less consistent way). Further research is needed to understand cue-weighting in the present sound change in progress, and to investigate to what degree the same speaker-listener uses these cues also in speech production.

Despite these caveats, the results of the perception study mirror to a certain degree the results of the production study and point to a relationship between production and perception, which was confirmed by intra-subject comparison: speakers who produced a longer post-aspiration in the production task were also more sensitive to this acoustic parameter in the perception experiment. The categorical distinction of a minimal pair based on VOT indicates that speaker-listeners of Andalusian Spanish use post-aspiration as a cue to /st/, and that post-aspirated stops are likely to be phonologized to at least some degree. Parrell (2012: 45) assumes this major or minor degree of phonologization to be the reason why in his study some WAS speakers showed consistently long VOT values across different speech rates, instead of switches from pre- to post-aspiration as speech rate increased.

In conclusion, the gradual sound change from pre- to post-aspiration might have arisen from the variably released /s/ + voiceless stop closures that are sometimes slightly post-aspirated. If social factors are favourable, the aspirated stop release is imitated by other speakers in the dental context, where it is perceptually more prominent. The imitation by other speakers and the concomitant propagation of the new feature, i.e., sound change actuation, is possible because a slightly post-aspirated stop is not perceived as a deviant production variant of /sC/, but as an additional cue to the underlying phonological /s/. The greater VOT is no longer automatic variation or a result of coarticulation, but increasingly used as a cue to /sC/ in production and perception. If the total duration of the glottal opening remains constant, the process of phonologization likely leads to a further lengthening of post-aspiration and the concomitant shortening of pre-aspiration and/or closure duration. To shed light on the use and weighting of different acoustic cues in a sound change in progress, perception tests that variably manipulate closure duration, VTT, VOT, or the C:V-ratio could be conducted. Physiological investigations into the coordination between the oral closure and the glottal opening gesture are needed to understand how the slightly longer VOT in /s/ + voiceless stops arises. Of particular interest for the question of sound change actuation would be a comparison between Andalusian Spanish and another variety with /s/-lenition since coarticulation and gestural coordination is known to be language- and dialect-specific (Browman & Goldstein, 1992; Garrett & Johnson, 2013; Solé, 2007). Such an investigation could shed light on the question of whether post-aspirated stops in Andalusian Spanish are due to a dialect-specific coordination between oral and glottal gestures (as suggested by Torreira, 2012), or if such a process is likely to take place in other varieties of Spanish as well.

Competing Interests

The authors declare that they have no competing interests.