Production and perception across three Hong Kong Cantonese consonant mergers: Community- and individual-level perspectives

Individual variation is key to understanding phenomena in phonetic variation and change, including the production-perception link. To test the generalizability of this relationship, this study compares communityand individual-level variation across three long-standing consonant mergers in Hong Kong Cantonese speakers: [n]→[l], [ŋ̩]→[m̩], and [ŋ]↔Ø. Concurrently, we document these understudied mergers in a community that has undergone rapid social change in recent decades. Younger (college-aged) and older (middle-aged) Hong Kongers completed a reading production task followed by a forced-choice lexical identification perception task. Group-level results suggest mismatching production and perception: While the community overall distinguished merger pairs in production, younger listeners are more perceptually categorical than older listeners. However, aggregate results obscure the fact that individuals vary substantially in the extent of merging in both perception and production, including many who exhibit complete merger, and that individual-level production-perception correlations were found for [n]→[l] and [ŋ̩]→[m̩], though not [ŋ]↔Ø. Results are discussed in the context of previous research. We find that (i) these mergers have diverged from predicted trajectories of completion, and (ii) overall, prior findings on the production-perception link are generalizable to these consonant mergers.


Introduction
While variability within speech communities has long been acknowledged, phonetic variation and change has historically been studied at the level of the community or macro-social demographic groups. Recently, a more concentrated focus has been placed on the characteristics of individual speaker-listeners and their role in innovating and driving change (for overviews, see Coetzee, 2018;Yu & Zellou, 2019), including both social (e.g., Sharma & Sankaran, 2011;Lev-Ari, 2017;Garrett & Johnson, 2013) and cognitive (e.g., Yu, 2010;Yu, Abrego-Collier, & Sonderegger, 2013;Dimov, Katseff, & Johnson, 2012;Kong & Edwards, 2016) dimensions. As Yu and Zellou (2019) state, "the study of individual differences can shed light on the nature of the cognitive representations and mechanisms involved in phonological processing," which in turn can "provide insight into long-standing issues in linguistic variation and change" (p. 131). In other words, bringing together community-level (macro) patterns with individual-level (micro) considerations helps build a fuller, more accurate picture of how speech is produced, perceived, and processed-feeding into our collective understanding of how sound systems vary and change over time (see also Yao & Chang, 2016;Fridland & Kendall, 2012;Voeten, 2020).
The example of focus here is how perception and production systems relate in the context of sound change. As this relationship holds implications for mechanisms behind the initiation and propagation of sound change, a more complete picture is crucial for understanding both the spread of sound change from individuals to entire communities and the progression of community-level change over time. Though many have probed this question, consistent results have been hard to come by both in regards to whether production-perception systems are linked in individuals (e.g., Schultz, Francis, & Llanos, 2012;Schertz, Cho, Lotto, & Warner, 2015;Beddor, Coetzee, Styler, McGowan, & Boland, 2018) and which domain precedes the other during a change-inprogress (e.g., Harrington, Kleber, & Reubold, 2008;Kuang & Cui, 2018;Pinget, Kager, & Van de Velde, 2020). Insofar as the nature of the production-perception relationship has yet to be fully uncovered, attempts to answer this question strongly benefit from an approach comparing patterns at the group and individual levels.
In line with this perspective, the present study examines-at community and individual levels-production, perception, and production-perception alignment in an understudied case of phonetic variation and change. We conduct an apparent-time investigation of three consonant mergers in Hong Kong Cantonese with distinct profiles and trajectories, namely [n]→[l], [ŋ ̩ ]→ [m̩ ], and [ŋ]↔Ø. While documenting the recent state of these mergers, we test hypotheses of the production-perception relationship, seeking to extend previous findings to a novel set of sound changes of a type-consonant mergers-that has thus far lacked study (with the notable exception of Pinget et al., 2020). In our focus on these particular consonant mergers in Hong Kong Cantonese, we find ourselves in a complex situation involving long-standing changes-in-progress, style-shifting, and sociolinguistic ideologies. Although previous research has largely presented the variants as a subset of mergers-in-progress within a larger set of merging changes (e.g., Bauer, 1986;Zee, 1999a;To, Mcleod, & Cheung, 2015), scholarship also indicates the presence of formality-based stylistic variation (e.g., Bauer, 1982a;Bourgerie, 1990) and stigmatization (e.g., Chen, 2018). Nevertheless, regardless of whether these mergers have stabilized, completed, or continue to change, they remain situated in the context of diachronic change. Accordingly, we primarily approach these mergers as historical sound change and compare them to the sound change literature while also discussing the socially-structured variation and ideology in individual experiences.

Production and Perception: Is there a link?
Theories of sound change have long assumed that an individual's perception and production systems must be connected in some way for an individual to perceive a change within their speech community and implement it in their own production repertoire. This is a necessary condition if we posit that changes propagate from individual to individual, though the extent to which this is true (or, under which circumstances) had not been directly explored until recently (see Beddor, 2015). In establishing groundwork for theories about phonetic and sound change, Beddor et al. (2018) examine the time course of speaker-listeners' coarticulatory vowel nasalization in production (from aerodynamic measures) and perceptual patterns (with an eyetracking paradigm) in American English. They demonstrate within-individual correlations across production and perception for this stable variation not only in the use of a cue but in the timecourse of use: Those who produce early nasal flow in a vowel use temporally early nasalization to identify words.
One common suggestion put forth to account for inconsistent findings is that the experimental tasks and measures used to assess a link between production and perception were not necessarily comparable, such that production and perception tasks may have in fact been measuring different constructs (see Zellou, 2017;Kim & Clayards, 2019;Grosvald & Corina, 2012). A related concern is that previous measures may not have been sensitive enough to capture similarities across domains (e.g., static versus dynamic measures, Beddor et al., 2018). These concerns point to a need for careful selection of tasks and variables for comparison. They also link to methodological concerns about validity and reliability in terms of how phonetic variables are assessed (e.g., acoustic measurements versus auditory coding; trained versus naive listeners; speakers versus non-speakers; Evans, Munson, & Edwards, 2018).
From a theoretical perspective, some posit that the context of variation, such as the stability of variation patterns in the community, has consequences for production-perception alignment. Beddor et al. (2018), for example, suggest that an individual's link between perception and production is likely at its weakest during change when variants are in flux and socially-stratified, in comparison to stable, non-socially-indexed variation (see also Harrington, 2012). However, current empirical evidence does not obviously support this position. In terms of stable variation, while Beddor and colleagues do find a link between production and perception at the individual level for vowel nasalization (as does Zellou, 2017), a number of scholars investigating different types of coarticulatory variation failed to do so (e.g., Schultz et al., 2012 on VOT-f0 cue weighting in English; Schertz et al., 2015 on Korean L2 English VOT versus f0; Grosvald & Corina, 2012 on vowel-to-vowel coarticulation), though these direct comparisons may be inconclusive due to the aforementioned methodological differences.
Importantly, many recent studies of sound change do find evidence of individual-level links between domains, despite community-level mismatch Kuang & Cui, 2018;Pinget et al., 2020;Voeten, 2020). On the surface, these findings run counter to the above prediction as individual-level production-perception relationships can indeed be found during changes-in-progress. Yet, it remains possible that the size of correlation is smaller or that a larger number of individuals demonstrate a lack of relationship, both compatible with an interpretation that the relationship is weaker during change than during stability. Overall, this highlights the need for more systematic, controlled comparison of individual production and perception patterns for variation in different contexts, crucially using comparable measures.

Production-perception alignment: What matters?
Beyond the simple existence of a production-perception connection, a co-occurring question pertains to the nature or direction of alignment. In studies of community change, alignment direction has been demonstrated to vary such that, at times, perception appears to lead in sound change while at other times, production appears more advanced. Some have called for attention to the trajectory and stage of change when examining production and perception, arguing that alignment patterns may depend on whether the change is in early or late stages. A slew of recent findings converge to support the notion that direction of misalignment may be yoked to the stage of the change. Kuang and Cui (2018) documented two vowel changes in Southern Yi where phonation cues are shifting to formant cues. For the incipient /u/ change, all speaker groups were generally aligned: Phonation was the most important cue in both domains, but F1 was increasingly relied upon in perception. For the ongoing /e/ change, however, older speakers were misaligned, using phonation as the primary cue in production but F1 as the novel primary cue in perception.
In contrast, younger speakers were categorized as 'realigned,' using the novel F1 cue in both domains. In sum, misalignment occurred during intermediate stages of change and perception led production for both early-and mid-stage change. Pinget et al. (2020) show comparable results for regional bilabial stop and labiodental fricative devoicing mergers in Dutch, where perception tends to precede production for change in initial stages. Interestingly, their results indicate a reversal during later stages of change such that perception remained comparatively less merged for individuals whose change is near completion in production. In an investigation of a VOT-f0 cue weighting shift in Afrikaans, Coetzee et al. (2018) similarly found that younger speakers were generally 'realigned' at the individual level, but if they were misaligned, production was more advanced. This late-stage perception lag is consistent with research on vowel mergers showing that, based on expectations of the talker, listeners can utilize cues in perception that they do not distinguish in production (e.g., Hay, Warren, & Drager, 2006;Koops, Gentry, & Pantos, 2008).
Factors relating to the source of sound change may separately influence the manifestation of production-perception misalignment. Voeten (2020), for example, finds that sociolinguistic migrants from Belgium who moved to the Netherlands were variable in their approximation of Netherlandic vowels; in general, although production and perception were linked in this case, production seemed to lead perception. This suggests that differences may arise in productionperception mismatch between cases of internally-driven versus contact-driven changes. In other words, studies of second dialect acquisition (e.g., Voeten, 2020;Evans & Iverson, 2007) find that production appears to lead perception, at least in early stages. In other cases of presumably internal changes occurring across generations, perception leads at first (e.g., Kuang & Cui, 2018;Pinget et al., 2020) until the change reaches late stages.
Finally, the type of change may matter in the coordination of production and perception.
In the current study, we highlight mergers as an interesting case of change, unique because they represent a total loss of phonological contrast rather than simply a shifting of cues that maintain contrast. This comparatively drastic phonological change could lead to differences in behaviour surrounding production and perception. To the best of our knowledge, only one study of production and perception has been published on segmental mergers (Pinget et al., 2020), for which reported results agree with those of other studies. Prior studies on tone mergers in Hong Kong Cantonese also report a link between reduced distinction in production and slower responses in perception (Mok, Zuo, & Wong, 2013;Ou & Law, 2017). Here, we present an investigation of a set of consonant mergers in Hong Kong Cantonese, each with its own socio-historical profiles, to test the robustness of the perception-production link across different types of sound changes.

Consonant mergers in Hong Kong Cantonese
Hong Kong Cantonese (HKC) provides an interesting opportunity to study mergers, as a large set of consonantal (e.g., Wong, 1941;Zee, 1999a;To et al., 2015) and tonal mergers (e.g., Bauer, Cheung, & Cheung, 2003;Mok et al., 2013;Zhang, 2019) have emerged over the course of the last century. Moreover, the complex and ever-changing socio-political landscape surrounding Hong Kong has introduced a multitude of external influences on HKC, with consequences for the trajectory of these mergers. The last few decades, in particular, have seen rapid changes due to the transition of Hong Kong's political status from a British colony to a Special Administrative Region under Chinese rule in 1997 ('the handover'). While already a highly multilingual environment, one notable change is the increasing presence and influence of Mandarin in the city, alongside Cantonese and English, especially for younger generations (e.g., in the education system; Lee & Leung, 2012;Wang & Kirkpatrick, 2015;Wong, A., 2019). Whether contact with Mandarin has ultimately affected the HKC sound changes is unclear, though both convergence, such as reversing changes that are dissimilar to Mandarin (Bauer & Benedict, 1997), and divergence, for example maintaining HKC-specific changes that are linked to local identity (Whelpton, 1999), are conceivable. This latter possibility is particularly plausible given that many locals continue to identify as 'Hong Kongers' (as opposed to 'Chinese' or 'Hong Kong-Chinese') and specifically view Cantonese as central to this identity (Lai, 2001;Lai, 2011).
Another layer of complexity is that linguistic ideologies (e.g., the public media campaigns and school curriculum changes spearheaded by Professor Richard Ho Man-Wui, a prominent public scholar and professor of Chinese literature) have led to stigmatization of the innovative variants.
The use of mergers is termed 懶音 laan5 jam1, which translates roughly to 'lazy pronunciation' (also variably denoted as 'lazy accent,' 'lazy articulation,' and 'lazy sounds'). This notion is recognized by Hong Kongers in the present day (Chen, 2018), and is perhaps particularly salient to younger generations due to the introduction of Cantonese pronunciation in a standardized exam in 2007. This prompted official promotion of 'proper Cantonese pronunciation' on TV and radio, as well as in school activities (Lee & Leung, 2012). As a result of the prevailing prescriptivist approach linking conservative variants with standardness, clarity, and formality, a certain level of explicit awareness and attitudes about the various mergers is available to individuals; this may in turn legitimize hypercorrection to the 'proper' variant.
In Hong Kong, these external factors could have plausibly come together to stall or reverse the progression of the mergers, leading to stable variation or preservation of the conservative form in formal registers (e.g., Labov, 1994;Hickey, 2012). Though HKC consonant mergers have been the focus of a sizable number of sociolinguistic and phonetic production studies in the past (e.g., Bauer, 1982a, Zee, 1999aTo et al., 2015), little to no research appears to have revisited these consonant mergers in recent years. The most recent comprehensive study on the status of HKC consonant mergers is To et al. (2015), which reports on data collected in the early 2000s when the earliest post-handover generation were still children. While large in its sample of speakers, this study was limited in scope and too early to assess post-handover outcomes.
Against this background, the present investigation focuses on three consonant mergers Despite surface similarities, they present a mix of phonological and socio-historical profiles, including differences in phonetic biases, functional load, cross-language correspondences, and metalinguistic awareness (for a detailed review of the historical trajectories, see To et al., 2015).
This variation provides an opportunity to extend previous research on the production-perception link to a set of contrasting mergers in a single speech community, testing the extent to which predictions of production and perception hold across different types of sound change in the same individuals while using the same tasks and measures.

Merger
Chinese Character Jyutping Romanization English Translation Historical Innovative

[n]→[l] merger
The [n]→[l] onset merger represents a prototypical case of contrast loss. These syllable-initial phonemes, described by Zee (1999b) as an apico-laminal denti-alveolar nasal and apical (denti-) lateral approximant, historically differed in both nasality and tongue blade involvement. There is also an asymmetry in frequency: [l]-onsets occur more often than [n] by both measures of type frequency (approximately double) and token frequency (approximately six times; Leung, Law, & Fung, 2004).
Since the 1940s or earlier (Wong, 1941 as cited by To et al., 2015; see also Bourgerie, 1990), [n] has been documented to be steadily merging towards [l] in progressively younger generations (Hashimoto, 1972;Yeung, 1980;Bourgerie, 1990;Zee, 1999a). Most recently, To et al. (2015) reported that nearly all of their participants, regardless of age group and gender, produced the innovative form [l] in place of the historical /n/ (94.2% of children versus 94.6% of adults). All available evidence thus points to (near-)completion of the merger by the start of the 21 st century.
Because the innovative [l] has become so prevalent, Chen (2018) notes that this merger appears to be comparatively less stigmatized than other pairs such that "most users do not consider them wrong" (p. 7). Nevertheless, hypercorrection from historical [l] to [n] has been described-albeit with some debate-in the literature. In support, Pan (1981) reports [n] realizations of historical /l/ in a word list reading task. Zee (1999a)  is associated with formality (Pan, 1981) and careful speech (Zee, 1999a). This is supported by production data, as Bourgerie (1990) found that impromptu, conversational speech included the most [l] realizations (80.5%), trailed by interview speech (58.2%) and public speech (38.6%).
Thus, even while casual speech may mostly comprise innovative variants, HKC speakers also have access and exposure to speech registers that include the conservative [n] variant. The presence of stylistic variation complicates the interpretation of merger completion in this speech community across sporadic research reports with varying methodologies.

[ŋ ̩ ]→[m ̩ ] merger
The [ŋ ̩ ]→[m̩ ] merger involves two historical phonemes, the syllabic velar and bilabial nasal consonants, that were largely non-contrastive. [ŋ ̩ ] historically occurred in relatively few lexical items, limited to the three low tones. Of these, Bauer (1982a) contends that only four occur with any frequency in spoken HKC: two common words (五 ng5 'five,' 午 ng5 'noon') and two surnames (吳 ng4 and 伍 ng5). On the other hand, [m̩ ] historically occurred only as a single-albeit highly frequent-morpheme: the negation marker 唔 m4 'no, not.' Both categories therefore contained few types but still maintained a phonemic contrast. The single morpheme for /m̩ / combined with the rather limited inventory of /ŋ ̩ / words suggests that contrastiveness has been largely uncompromised over the course of the merger.
Though this merger was first documented in the 1980s (Bauer, 1982a;1982b), evidence suggests that the [ŋ ̩ ]→[m̩ ] change was well underway by the late 1970s and likely initiated around the 1940s from labial assimilation for the highly frequent word 'five' (Bauer, 1986).
Since then, To et al. (2015) reported that nearly all children (born 1993-1994) and adults (born 1960-1987)  and 1981, while a Chinese dictionary listed 'five' with both velar and bilabial pronunciations (Lau, 1977, cited by Bauer, 1982a. As one of the mergers identified by proper pronunciation campaigns, it can be assumed that awareness has been maintained, though the degree of salience

[ŋ]↔Ø merger
Unlike the former two mergers, the [ŋ]↔Ø merger-involving the syllable-initial velar nasal and its null-initial (Ø) counterpart (also referred to as zero-initial, phonetically either a vowel or glottal stop onset; Bourgerie, 1990;Chen, 2018)-is described as historically allophonic, such that [ŋ] onset occurred with low tones while null-initial occurred with high tones. While this merger is not technically a merger of phonemic categories, it is a merger of lexical classes and is discussed as a merger within the Cantonese literature (e.g., To et al., 2015). To complicate matters, this merger has also undergone a reversal in direction over time, leading to a loss of the historical pattern of complementary distribution.
Since the start of the 20th century, both historical [ŋ] and Ø have shown evidence of merging towards the other (note some exceptions of [ŋ] onset occurring with mid to high tones; Ball, 1907 as cited in To et al., 2015). Specifically, reports suggest that Ø→[ŋ] took place in the first half of the century (Wong, 1941 as cited in To et al., 2015), which was subsequently supplanted by a [ŋ]→Ø change in the second half (Yeung, 1980;Bourgerie, 1990). Along with the change in merger direction, younger speakers (roughly born since the 1960s-70s) used null-initial more often than not for both historical /ŋ/ and Ø (Young, 1980;Bourgerie, 1990;Zee, 1999a).
Likewise, the children in To et al. (2015) produced more null-initial variants for both historical /ŋ/ (65.0%) and historical Ø (94.2%), in contrast to the adult proportions of Ø production for historical /ŋ/ (37.5%) and historical Ø (68.7%). However, both groups differentiated historical /ŋ/ and Ø historical categories, using null-initial more for historically Ø items. Thus, in spite of the bidirectional merger and variation in both historical classes, the two categories remain distinct to a degree from a community standpoint, although the original tone-based complementary distribution does appear to be lost.
A further complexity arises when we compare the results of To et al.'s (2015) survey to Bourgerie (1990). In To et al. (2015), children, as compared to adults, produced significantly more Ø for /ŋ/ (65.0% versus 37.5%) and significantly less [ŋ] for Ø (5.8% versus 31.3%). These group values are conspicuously similar to Bourgerie's (1990)  One potential explanation is that stigmatization of the null-initial as 'lazy' led to age-based patterning across the community, such that adult populations tend to produce the more 'proper' and 'prestigious' [ŋ] variant for both historical classes. Indeed, according to Chen (2018), this merger is more socially stigmatized compared to mergers like [n]→[l], perhaps due to its relative rarity. As with the other mergers, style-shifting is reported to occur for [ŋ]→Ø (Bourgerie, 1990) such that impromptu speech featured the highest rate of null-initials for historical [ŋ] words (35.6%), followed by interview (21.8%) and public speech (8.7%). Further investigation, particularly of speakers born in the 1990s in more recent years, is necessary before coming to a conclusion about the status of the [ŋ]↔Ø variation.

The present study
The present study investigates the production and perception of the The first goal is to test how production-perception patterns generalize across sound changes with varying attributes. In doing so, we seek to clarify the factors that modulate the existence and nature of the production-perception link, for which evidence has been inconsistent.
1 In a study on the frequencies of all Cantonese word initial sounds, Ng and Kwok (2004) do report that, comparing to an earlier survey from the 1970s (Fok, 1979) Mandarin and reinforcement of 'proper pronunciation' ideologies that brand the mergers as 'lazy' (Chen, 2018). What can we glean about the status of the mergers across generations, taking into account these shifts in the sociolinguistic landscape? Because this aspect of the study is descriptive in nature, we seek to generally explore the patterning of variant use across demographic groups. Nonetheless, because of the historical context of these mergers as sound change, we place special interest in the extent to which age co-varies with usage

Participants
As part of a larger project, data was collected from Cantonese speakers in both Hong Kong and Vancouver, Canada but only the Hong Kong data are discussed in this paper. This study Cantonese as a first language and all had lived in Hong Kong since birth, except one young man who had moved to Hong Kong before age three. Due to recording errors, production data from two participants were excluded; as such, only 49 participants were included in the production and production-perception analyses. Along with Cantonese, all participants reported some level of ability to speak and/or understand English and Mandarin, with variation in reported proficiency across domains. Summary demographics for the participants are provided in Tables 2 and 3.

Perception stimuli
Perception stimuli consisted of three 13-step continua generated between Cantonese real-word minimal pairs, one per merger as listed in Table 4. To create these continua, the target minimal pairs were produced by a 33-year-old Cantonese speaker and trained linguist who grew up in Hong Kong. Stimuli were digitally recorded at a sample frequency of 44.1 kHz in a soundattenuated booth using an AKG C520 headset mic with a USBPre 2 Pre-Amp. The speaker read from transcriptions of each target word, accompanied by the Chinese orthography, to elicit maximally contrastive pronunciations. She was asked to produce each pair naturally but as similarly as she could (aside from the target contrast) to facilitate more natural-sounding synthesis. Each These six natural productions were then used as endpoints to create three word-pair continua using tandem-straight in Matlab (Kawahara et al., 2008). The entire word forms were used as the endpoints, which, given tandem-straight's global synthesis methods involving acoustic decomposition and generation, allows for natural-sounding resynthesis that retains redundant co-variation. Twenty-five acoustically-equidistant steps were synthesized between the duration of the target words, with each pair time-aligned at acoustic landmarks (e.g., phone boundaries) to facilitate morphing. From these, every odd-numbered step was selected to result in thirteen equidistant steps. The resulting continuum steps gradually morph from one endpoint token to the other, thus including in the synthesis an array of phonetic differences across the tokens. The natural endpoints were carefully selected to be as similar as possible in terms of non-contrastive elements. As such, the main perceptible change across continua is between the target sounds in each pair. The continua are available at https://osf.io/mk2v4/?view_ only=51cfc7a974f345678e226944023dcb39. Figure 1 illustrates the transition between the initial sound in the aak1-ngaak1 continuum (i.e., the beginning of the vowel [aː] and the entire duration of [ŋ]) with waveforms and spectrograms of the midpoint and endpoint recordings.
Judgments from the first author and two other Cantonese-speaking colleagues were used to select the most natural-sounding continuum per merger. In all, the 13 variants from each continuum totalled 39 tokens.

Target
Merger IPA Jyutping Romanization Chinese orthography English translation

Post-task awareness interview
An exit interview was administered to examine metalinguistic awareness about the mergers in the participant's own speech and experiences. The interview was conducted in English or Mandarin, supplemented by Cantonese when necessary to ensure comprehension. 3 Participants were presented with a list of minimal or near-minimal word pairs designed to historically contrast the target sounds. For each pair, they were asked to say the words as they would in casual speech, explain any differences between their productions, and name the prescribed pronunciation if they were aware of any. At the end, participants were asked if they were familiar with the term 懶音 laan5 jam1 'lazy pronunciation,' what they thought it meant, and whether they believe (or have been told by others) that they use it in their own speech.

Experiment
Participants were seated alone at a computer in a sound-attenuated booth where they were presented with the production task followed by the perception task. Instructions were provided visually in English on the screen. Upon completion, the experimenter returned to the room to answer any questions while participants filled out the language questionnaire, then conducted the exit interview.
To reduce the chance that participants would change their speech behaviour due to realizing the purpose of the experiment (i.e., our interest in the target sounds), the production task, which included fillers, was ordered prior to the perception task, which involved only the target sounds.
In the self-paced production task, Chinese characters and the English translation were visually presented using E-Prime 2.0 software (Schneider, Eschman, & Zuccolotto, 2007) in three randomized blocks, where each word appeared once per block (192 trials in total). Participants 3 Interviews were not conducted fully in Cantonese mainly due to limitations of available personnel, but an added advantage is that we minimize the potential for an interviewers' pronunciation of Cantonese (e.g., if they used merged variants) to have influenced participants' response to minimal pair comparisons and 'lazy pronunciation'-related questions. In addition, we note that because participants only interacted with the experimenter outside the main task and that the post-task interview was conducted only for additional context, there is no reason to expect this choice to impact the study results in any meaningful way.
were asked to read the Chinese characters out loud. Productions were digitally recorded using an AKG C520 headset microphone connected to a UR22MKII interface. Participants pressed the zero key on the keyboard to begin a two-second recording for each trial. They were allowed to re-record as many times as they wished before moving on to the next word by pressing the spacebar. Participants spent approximately 20-30 minutes on this task.
Perception stimuli were auditorily presented in a two-alternative forced choice lexical identification task. To reduce influences of explicit knowledge about 'proper' pronunciation, participants were instructed to respond as quickly as possible and not overthink their response.
Using E-Prime 2.0, the synthesized Cantonese words were played over AKG K77 Perception headphones at a comfortable listening level, accompanied by a visual display of the appropriate minimal pair words in Cantonese orthography and the English translation. For each trial, participants heard the audio stimulus once and saw two word choices labelled with '1' or '5' (on the left or right side of the screen). They were asked to press the button on the button box (i.e., '1' or '5') that corresponded to the word they heard. If no response was registered within three seconds, the next trial began automatically. Each token was repeated three times throughout the experiment, randomly presented over three blocks (117 trials in total).

Production coding
To assess production, two phonetically-trained Cantonese speakers who grew up in Hong Kong coded the onset of each item, or in the case of the syllabic nasals, the sole segment comprising the word. Items were blocked by talker, randomized, and presented to the coders blind, without knowledge of the intended lexical items. Coders categorized the onset from a closed set, presented orthographically as six options: 'l,' 'm,' 'n,' 'ng,' 'a vowel,' or 'other.' The 204 items identified by both transcribers as 'other' were coded by the first author. These items either did not include a usable production (e.g., recording errors where the word was cut off or missing) or included a mispronounced production (e.g., saam1 'clothing' instead of lau1 'coat').
The coders agreed on 4933 trials. Inter-rater reliability for the two coders was calculated in R (R Core Team, 2020) using the Kappa statistic for two raters unweighted using the kappa2() method from irr() (Gamer, Lemon, Fellows, & Singh, 2019) from the perspective of whether there was agreement on the transcription of each item's coded onset. These data from the two coders are reported in Table 5. Kappa scores ranged from values that are in the range described as near perfect agreement for historical /l/ and /n/-initial words to moderate agreement scores for the historical syllabic nasals. To resolve disagreement, all items (n = 725) on which the two coders disagreed were presented in a similar format to the first author: blinded as to their lexical identity and organized by talker. Sixteen of these disagreement items were removed due to recording errors or mispronunciations. The ultimate coding for each item was then determined as whichever category two out of the three coders agreed.     The model output is summarized in Table 6. There was a significant intercept, and a significant effect of Historical Pattern. These results indicate that [l] productions are overall more likely, and that [l] productions are even more likely on words that are historically pronounced with /l/. These results are visualized in Figure 2, separated by the non-significant factors of Talker Age and Gender in order to transparently present the individual variation that is present.

[n]→[l] merger
The figures present a barplot to indicate the group patterns, along with individual data points for each individual; these individual data points are somewhat transparent so that individual overlap is observable. Lines connect an individual's data points for the two word classes. While the group averages indicate that both historically /l/ and /n/ words are usually pronounced with [l], Figure 2 illustrates how there is both group-level separation between historical categories (confirming the model result) and considerable variation on the individual level. To better visualize the individual-level data, we also present each individual's mean proportion of the innovative variant for the two historical categories as a scatter plot in         The model output is summarized in Table 8 initial productions are more likely on historically vowel initial words, but these are still, overall, unlikely. Some clearly prefer vowel-initial onsets, however; individuals from three of the four demographic groups use vowel-initial onsets 100% of the time for both historically vowel-initial and [ŋ]-initial word classes.

[ŋ]↔Ø merger
These results can be seen more clearly in the individual scatterplot (Figure 7).    substantially across individuals, in none of the three mergers is it predicted by demographic factors like age or gender (i.e., no significant interactions were found), which suggests that these mergers should not be characterized as changes-in-progress.

[n]→[l] merger
Null responses were removed, accounting for just over 1.5% of the data. The remaining data (n = 3265) were fit to a logistic mixed effects regression model predicting the likelihood of a historical /l/ word response (/l/ = 1, /n/ = 0). Continuum step was centered and scaled, and Talker Age and Gender were sum coded (Age: Older = 1, Younger = -1; Talker Gender: Male = 1, Female = -1). Subject was a random effect with Step as a by-subject random slope. 5

B Standard Error z-value p-value
Intercept -0.44226 0.11441 -3.866 0.000111 Step ( Step : Age Group : Gender -0.1741 0.22868 -0.761 0.446471 The model output is reported in Table 9, and the results are visualized in Figure 8.
There was a significant intercept, indicating that /l/ word responses were more likely, and main effects of Step and Age. The interaction between Step and Age was also significant.
While, overall, participant response functions varied predictably by continuum step with more [n]-like realizations receiving more /n/ word responses, younger listeners present comparatively more extreme responses at the continuum endpoints, resulting in a steeper response function. To explore the individual differences in the discreteness of these lexical items, individual values were quantified using the by-subject contrast coefficient slope (CCS), following the methods provided by Casillas (2021). This value, which can be taken as a measure of category 'crispness' (Morrison, 2007;Casillas & Simonet, 2016), is an estimate of the slope of each individual's sigmoid at its steepest point. More extreme values indicate more crisp or discrete perceptual categories, while values closer to zero are taken to indicate a less discrete contrast, which we interpret as evidence of a merger in this case. As a concept, this crispness value acknowledges that phonological categories can exist on a gradient in terms of their categorical contrasts (see, for example, Hall, 2013). These category crispness values were estimated by fitting new logistic regression models with continuum Step (centered and scaled) as the only fixed effect. The left panel in Figure 9 shows
The model output is reported in Table 10, and results are visualized in Figure 11. There was a significant intercept, indicating that /m̩ / responses were more likely, and there were main effects of Step, a two-way interaction between Age and Gender, and a three-way interaction

Individual differences in recognition performance for [ŋ ̩ ]→[m̩ ] were also analyzed in terms
of category crispness. The middle panel in Figure 9 presents the distribution of these values. The

[ŋ-]↔Ø merger
Lastly, the responses to the [ŋ]↔Ø continua were analyzed in a similar manner. Null responses were removed, accounting for just over 1% of the data. The remaining data (n = 3277) were fit to a logistic mixed effects regression model predicting the likelihood of a historical vowel-initial word response (Ø = 1, /ŋ/ =0). All other model specifications were the same as above.
The model output is reported in Table 11, and the results are visualized in Figure   13. There was a significant intercept, indicating that [ŋ] responses were more likely, and there was a main effect of Step. As the two-way interaction between Step and Age (p = 0.05912) approached the threshold for statistical significance, we include it in Figure 13 for consideration. Younger listeners trended towards more categoricity in their response patterns.  Individual differences for [ŋ]↔Ø were also quantified in terms of category crispness.

B Standard Error z-value p-value
While the mean (Mean = 0.029) and median (Median = 0.009) both are positive, the range of values (-0.22, 0.31) spans 0 to a greater degree than those for the previous two continua.
Individuals' data for these continua are shown in the right panel of Figure 9.

Production-Perception
To understand the relationship between perception and production at the individual level for the three mergers under investigation, we quantified the degree of mergedness for perception and production. Mergers in perception were quantified using the absolute value of the by-subject crispness values described above. Higher values indicate more discrete perceptual categories, while lower values are taken to indicate a merger of perceptual categories.
Mergers in production were quantified as the absolute value of the difference between proportions of the more novel pronunciation (i.e., [l], [m̩ ], null) for historical /l/, /m̩ /, and null-onsets words and proportions of the more novel pronunciation for the historical /n/, /ŋ ̩ /, and [ŋ]-onset words. This quantification means that individuals who produce, for example, a full merger 100% of the time (e.g., 100% [l] for historical /l/ and /n/ words) will have an equivalent merger score as those who exclusively show hypercorrection (e.g., 0% [l] for historical /n/ and /l/ words; thus 100% [n] for both lexical classes) and those who exhibit a variable mix of pronunciations for both lexical sets (e.g., 50% [l] for historical /n/ and /l/ words). Crucially, however, participants of these types are demonstrating a lack of a reliable difference in pronunciation variants for the lexical sets, which we take as an indication of a merger of these categories at the lexical level.
Importantly, for the production-perception analysis, we removed the two perceptual crispness outliers (see Figure 9), leaving 47 individuals for consideration. We did so because, first, their values were so exceptionally different (values less than -5 compared to an overall CCS median of -0.021 and IQR of 0.081) that it was neither possible to conduct any reasonable comparison between them and the remaining data, nor within the remaining data due to the expanded range. Second, based on visual inspection of individual response curves, the extreme jump in numerical value appears not to be linearly reflective of actual change in categoricity given that the next highest values within a reasonable range (approximately 0.6) looked similar and highly categorical; 7 as a result, although treating the next highest values as maximum crispness is a simplification, using them to represent strongly categorical perceptual responses is a fair representation of the data and exclusion of the outliers is not expected to alter conclusions. To assess the direction of misalignment within an individual, we calculate the difference between scaled production and perception values per merger (DiffPP, following Pinget et al., 2020). Zero indicates completely aligned production and perception, regardless of whether individuals are fully merged, fully contrastive, or at an intermediate value. Scores above zero represent individuals whose production is more contrastive (less merged) than perception while scores below zero represent individuals whose perception is more contrastive (less merged) than production. Visualized in Figure 16, this provides another way to look at the data, in essence representing the direction and distance of each individual from the hypothetical perfect correlation line in Figure 15. 7 Because the distance was too great, attempts to apply transformations for skewed data were not successful in creating a reasonable distribution; thus, we ultimately chose to remove the outliers. 8 Due to the removal of the two outliers prior to scaling and visualizations, the maximum scaled value in Figures 12 and 13 represents the next highest CCS value, which was 0.632. 9 To be conservative, the correlation provided in text includes the two extreme values. Removing those two data points from the correlation does not affect the direction or interpretation of the results [t(45) = 4.54, r = 0.56, p < 0.001]. Figure 15: Degree of merger in perception (y-axis) versus production (x-axis) for the three mergers. Merger in perception is the absolute value of category crispness and the merger in production is quantified as the absolute value of the difference in the proportions of the novel pronunciation for the two lexical sets for each merger. The range of the x-axis is from 1 (fully contrastive) to 0 (fully merged) to reflect the time course of change. The solid lines represent fitted lines to the data while the dashed lines represent a hypothetical fitted line if production and perception were perfectly correlated. Figure 16: DiffPP scores, calculated as the difference of production and perception measures (y-axis) versus production (x-axis, reversed) for the three mergers. DiffPP scores above zero represent individuals whose production is more contrastive (less merged) than perception while scores below zero represent individuals whose perception is more contrastive (less merged) than production. Figure 15 illustrates the reliable correlation between merger in /n/ and /l/ production and perceptual category crispness in a forced choice task of word selection.

The left panel in
Individuals who make smaller differences in their rates of [l] production for historical /l/ and /n/ categories are more likely to be merged in perception; this is seen as the clustering in the bottom right corner. At the same time, the subset of participants who show a difference in [l] pronunciation rates for historical /n/ and /l/ words are more likely to exhibit a crisper perceptual contrast. Nevertheless, the relationship between production and perception is far from perfect. As seen in the left panel of Figure 16, many individuals show productionperception misalignment by falling above or below the zero line. A number of observations are notable. First, multiple individuals who produce little to no contrast (on the right side of the plot) fall below zero, demonstrating contrast in perception despite none in production.
Conversely, misaligned individuals who produce some contrast (roughly greater than 0.2) tend to fall above zero, demonstrating more contrast in production than perception. In other words, perception appears to lead production in merger, but when merger is (nearly) complete in production, perception appears to lag (i.e., retain contrast). Finally, age appears to play a role such that younger individuals appear more likely to show lower DiffPP scores relative to older individuals with similar production patterns. In particular, three younger women defy the general trend, showing intermediate production contrast but with comparatively greater perceptual categoricity than production.
The middle panel in Figure 15 illustrates the reliable correlation between merger in /m̩ / and /ŋ ̩ / production and perceptual category crispness in a forced choice task of word selection.
Specifically, individuals who exhibit a smaller difference in their rates of production of the more innovative [m̩ ] for historically /m̩ / and /ŋ ̩ / words are more likely to be merged in perception, while those producing larger contrasts were more likely to show crisper perceptual distinction.

The distribution of perceptual crispness values is contracted compared to those values for the
[n]→[l] merger, indicating that, overall, this contrast is less perceptually robust. 10 Again, while there is a significant correlation, the relationship is not isomorphic: There is substantial variability in the alignment of production of these variants and listeners' ability to systematically distinguish the lexical items by historical category in perception. Taking the maximum perceptual crispness value for [n]→[l] as the upper limit, 11 the overall pattern in the middle panel of Figure 16 is that individuals who are not fully merged show more contrastiveness in production than perception.
Nearly all participants fall roughly at or above zero; a few individuals do show a small degree of 10 It's worth noting that regardless of its participation in a merger, the auditory-acoustic contrast between syllabic nasals compared to /n/ and /l/ is less robust. 11 While this may not be a foolproof approach due to the less perceptually robust nature of nasal place of articulation contrasts, the use of the upper boundary of [n]→[l] crispness values serves as an estimate, given that 'maximum' perceptual crispness values are unknown for this particular contrast. misalignment towards perception, but otherwise, if not aligned, perception leads production. In addition, younger participants tend to show lower DiffPP scores, indicating less alignment skew towards production contrast.
Lastly, the third panel of Figure 15 illustrates the lack of correlation between the merger of historically [ŋ] and vowel-initial items in production and perceptual category crispness in a forced choice task of word selection (of aak1 'shake' and ngaak1 'deceive,' specifically). The range of perceptual crispness values are similar to those for [ŋ ̩ ]→[m̩ ], but the production range is more limited. This reaffirms the production results where individuals either mainly produce [ŋ] or null-initial. In Figure 16, the vast majority of participants merge [ŋ] and null-initial in both production and perception but those who do not follow a similar pattern to [n]→[l]: Those who produce little to no contrast fall below zero (perception lags behind production) while misaligned individuals who produce some contrast (roughly greater than 0.2) fall above zero (perception leads production). Further, there appears to be an age-based pattern whereby younger individuals make up the majority of production-merged individuals who fall below zero, showing a contrast in perception despite none in production.

Summary of production-perception
Production-perception analyses, which correlate an individual's degree of merger in production to their degree of merger in perception, reveal a mixed bag of results. We find moderate production- contrast in perception despite little to none in production. Finally, age-based trends suggest that younger individuals are less misaligned than older individuals in the direction of production contrast but more misaligned in the direction of perception contrast.

Discussion
The two aims of the study were to (1) test generalizations of the production-perception link and (2) describe apparent-time patterning of the merger variants in production and perception. We first discuss the descriptive data relative to previous documentation of these mergers (Sections 4.1 and 4.2). As this study was not designed to provide a definitive or comprehensive overview of the completion status of the mergers, we discuss various possible interpretations consistent with the data but do not diagnose the situation further. Then, we discuss the production-perception results situated in the sound change literature (Sections 4.3 and 4.4). Tables 12-14 summarize the full set of findings for each merger.

Past versus present in production
In To et al. (2015), the [n]→[l] merger appeared near-complete in both adults and children (mean rates of [l] for /n/ around 94%), but in the current data, /n/ is realized as [l] between 54% (younger women) to 71% (older women) of the time. Within each group, speakers also varied considerably as to whether they realized /n/ as [l], ranging from 0% to 100%-in fact, 13 of the 49 speakers were fully merged. Further, no age or gender pattern was evident for speakers who were mainly contrastive. This suggests that the linguistic situation is more complicated than a simple story of change-in-progress where the contrast is maintained by the older generation.
In addition to variation in /n/, there was variation in the amount of [l] produced for /l/, which has been generally ignored in previous literature due to the assumption that /l/ was not changing. That some speakers produce a significant amount of [n] for /l/ (up to 100% in one speaker's case) confirms that [l]→[n] 'hypercorrection' does exist, contrary to Bourgerie's (1990) assertion. For at least these individuals, it seems that the social meaning of 'properness' indexed to [n] (Pan, 1981) can be applied to both historical /n/ and /l/. Altogether, this suggests that the two variants [l] and [n] are dually-mapped to the same lexical items regardless of historical lexical class (Samuel & Larraza, 2015); this is indicative of a merger at the lexical level (Soo & Babel, 2020). Listeners have implicit knowledge-and possibly explicit awareness, given the pronunciation campaigns and social meaning attributed to the variants-that /n/ and /l/ are merged, but they lack the knowledge of which items were historically /l/, allowing for both [n] and [l] pronunciation variants for both lexical classes.

What has changed?
Several factors could plausibly lead to the discrepancies between current results and prior predictions about the trajectory of these consonant mergers. We consider here four possible sources: methodological differences, style-shifting, age-grading, and community-wide change.
In terms of methodology, we used several words per merger pair as opposed to the single word used in To et al. (2015). Rather than representing a true difference in norms over time, the larger set of lexical items could have simply provided a more accurate depiction of variant usage across the lexicon. However, by-word examination of the current results rules out this explanation: Production of the same words used in To et al. (2015)  Another methodological difference between our study and the previous offers the possibility that speakers may have been style-shifting to a formal register in the current data given our labbased reading task, but would have produced more innovative variants in a more casual setting without the influence of orthography (e.g., the picture naming task in To et al., 2015). This interpretation could be congruent with the across-the-board increases in [n] and [ŋ]. Given the prescriptivist norms in the local context, task-based style-shifting is plausible and cannot be ruled out as an explanation for our diverging results (a point we return to below).
A third possible scenario is that of age-grading 13 due to linguistic marketplace pressure, wherein individuals change their production patterns over different life stages, linked to the social value of 'standard' variants (for reviews of age-grading and lifespan changes, see Wagner,12 A direct comparison could not be made for the historical null-initial word, as the word uk1 'house' used by To et al. (2015) was not used in the current study. 13 See Wagner (2012) for discussion on the multiple definitions or criteria of the term 'age-grading' (community stability, repeating patterns across generations, marketplace pressure) and why there isn't a clear, consensus differentiation between 'age-grading' and 'lifespan change.' We reference a particular approach to age-grading described by Wagner (2012), but we recognize it is not unanimously accepted. 2012; Cheshire, 2006). If age-preferential language use were the explanation in the current data, the generation studied in To et al. (2015) as children may have begun to use more of the 'formal' prestige variants, namely [n] and [ŋ], now that they are young adults and likely in more professional settings. Like style-shifting, this could explain the increase in conservative variants in the younger generation across all three mergers. On the other hand, interpretation of the results as age-grading is less clearly motivated for older adults based on these data. For one, the adult group in To et al. (2015) cannot be directly matched to our older adult sample, and two, it is not fully clear what an age-grading explanation would predict for this generation, which spans middle-aged to retirement-aged individuals. Future data collection specifically targeting age-grading will be necessary to adjudicate such an interpretation.
Lastly, production norms may have changed due to reversal of the phonological merger in younger generations. Given that older men appear to be merging [ŋ ̩ ]→[m̩ ] at a similar rate as previously reported, the current results could be consistent with an incipient reversal. For [l]→[n] and [ŋ]↔Ø, however, younger speakers did not make a larger contrast relative to the older generation. Perhaps these age-undifferentiated results reflect a community-wide change of language ideologies due to increase in awareness of the 'proper' social meaning of [n] and [ŋ] since the advent of the 'proper pronunciation' campaigns in the late 2000s. Notably these events occurred after To et al. (2015) conducted their study. This social shift may have led to more prestige-related style-shifting in the present day as compared to the past. 14 If changes in production were indeed driven by an ideological shift, the age-related results for [ŋ ̩ ]→[m̩ ] could be seen as representing an effect of register that is particularly pronounced in younger speakers; that is, the social association of [ŋ ̩ ] with 'properness' may not exist as strongly for older speakers (or, alternatively, older speakers may be less invested in abiding to newer social norms). Moreover, women were more likely to use high rates of [ŋ ̩ ] for both categories, including historical /m̩ /. Given that women have been suggested to (a) be more aware of social stigma and (b) use more prestige variants (e.g., Labov, 1990), this is compatible with the interpretation that prestige motivates the maintenance of contrast or overall increased use of the conservative variant. The sharp shift in [ŋ]↔Ø norms from majority null-initial in To et al. (2015) to majority [ŋ] in the current study could also stem from drastically increased salience of the social meaning of [ŋ] as 'proper,' aligning with Chen's (2018) assertion that nullinitial is particularly stigmatized as 'lazy.' 14 Merger reversal across contexts and context-based style-shifting cannot be distinguished in the current data, as we do not compare contexts. As well, they may not be mutually exclusive: More prestige-related style-shifting could co-occur with and/or precede incipient reversals in production outside of formal contexts, both motivated by increased awareness and attitudes towards the social meanings of the merger variants. As we propose these outcomes to have the same source, we discuss them together.
It is clear that, to fully understand the patterns found in these data, speech style and attitudes must be accounted for. Thus, limitations of the current study include a lack of consideration for multiple registers in the production task and attitudes about 'lazy pronunciation.' Accounting for variable styles would have allowed for more conclusive interpretation of descriptive results, especially in comparison to previous literature. At the same time, while we were limited in our methodological choices due to constraints of the stimuli, the formal style elicited in our task does not invalidate our findings, including those of the production-perception analyses. Speakers have a repertoire at their disposal, and we tapped into a particular style of production, which may well have been influenced by our methodological choices; regardless, we need to account for the linguistic knowledge presented to us. Moreover, our choice of an isolated, context-free, single word production task accompanied by orthography is well matched with the perception task, which is characterized by all those features as well. This well-matched formality allows for interpretable (mis)alignment patterns across production and perception.
With respect to 'lazy pronunciation,' although the exit interviews confirm that all participants knew what 'lazy pronunciation' was, we did not assess awareness or attitudes for each particular merger. Attitudes can vary substantially, potentially influencing patterns of both production and perception. For example, although many participants characterized the mergers as 'incorrect' and 'lazy,' one young man expressed more neutral or positive sentiments, describing the pronunciations as 'more convenient' (see also Lin, Yao, & Luo, 2021 on both positive and negative attitudes about the younger generation's accent). Knowledge of the degree of stigmatization for each merger, along with individual attitudes towards the stigmatization, could help contextualize variation across individuals as well as any potential community-level reversals that are motivated by group orientation towards or away from the existing social meanings of the merger variants (see, e.g., Labov, Rosenfelder, & Fruehwald, 2013;Tamminga, 2019). Continued research is needed to document whether merger reversals eventually emerge in production across the Hong Kong community; this work would benefit from comparison of multiple speech styles, including more naturalistic speech, as well as inclusion of awareness and attitude assessments to form a clearer picture of production norms across both age and context.

Perception versus production in the community
Unlike production, a general theme in perception is that younger participants, particularly women, responded to historical lexical classes more categorically than older participants, though the degree of categoricity was objectively low for mergers involving [ŋ]. 15 Considering that 15 We acknowledge the possibility that an age-based effect of hearing sensitivity (or cognitive processing abilities) may have influenced perceptual results to some extent and note that we did not specifically control for this confounding factor. At the same time, because younger men often patterned with older individuals (most noticeably for [ŋ ̩ ]→[m̩ ]), we take the position that the age trend cannot simply be reduced to a difference in auditory-perceptual abilities. age effects are not robustly supported by evidence in production, how can the perceptual age effects be interpreted? Younger listeners have had more exposure to the historical contrasts due to growing up with 'proper pronunciation' prescriptivism and language contact with Mandarin.
One possibility is that, due to these experiences, they may have adapted to flexibly recognize the contrast in perception, regardless of whether they make a contrast between the two in their own productions (see Samuel & Larraza, 2015 on adaptation to variable pronunciations or 'errors'; Sumner & Samuel, 2009 on covert experience with pronunciation variants on word recognition timing). Cognitively, this would be a dual-mapping of two pronunciation variants to a single phonological or lexical category for a style or register difference. This accommodation to phonetic variation is akin to interpretations of late-stage sound changes where perception 'lags' behind production due to the more advanced (e.g., younger) listeners' need to maintain these representations to cope with speech produced by less advanced (e.g., older) speakers (see Pinget et al., 2020;Coetzee et al., 2018). In other words, they are flexible in perception while showing stability in production.
Another possibility is that, rather than simply demonstrating perceptual flexibility, younger speaker-listeners are in fact beginning to reverse the mergers in their recognition of lexical items, prior to showing clear signs of reversal in production. That is, a phonological change across generations may truly be occurring, where perception leads production in the opposition direction to the historical change. As suggested earlier, this explanation could align with That is, differences may have arisen due to younger listeners perceiving the model speaker as a peer while older listeners perceived the same speaker as younger (Hay et al., 2006;Koops et al., 2008). Specifically, if there is an association between the younger generation and increased 'lazy pronunciation' held by older speakers (see discussion in Lin et al., 2021), we might expect that younger listeners (without such an association) perceive the stimuli more veridically while older listeners perceive the stimuli with less distinction. This possibility cannot be teased apart from other possible interpretations and must be left to future research.

Individual evidence for the production-perception link
Moving on to the production-perception link, our individual-level production-perception results reveal a link between production and perception for two consonantal mergers, corroborating recent positive findings from the sound change literature. In particular, it allows us to more confidently generalize the findings from Pinget et al. (2020) that a production-perception link can be found for consonantal mergers, just like more frequently-studied types of vowel-related sound change. Given the rather different properties of mergers to shifting-type changes (e.g., loss of contrast), the consistency of detecting an individual-level production-perception relationship is interesting and strengthens the claim of an underlying mechanistic link between systems that drives sound change.
On the contrary, no production-perception correlation was found for the [ŋ]↔Ø merger.
Since the other two merger pairs demonstrated a correlation, the type of change (i.e., merger rather than cue shifting, for example) cannot be the constraining factor, contributing evidence against the hypothesis that production-perception relationships vary by type of change. Still, it is possible that the specific nature of this change is relevant. That is, the other two mergers are 'standard' contrast-loss mergers, unlike [ŋ]↔Ø which was originally allophonic and exhibited bidirectional merging. The nature of this change (including the changes in direction) may have contributed to the apparently unlinked representations. Given this, it may be beneficial for future research to consider more fine-grained variability among types of change and how it may influence the existence of a production-perception link. We additionally note that these findings could have resulted from the choice of perceptual stimuli and task, which involved lexical identification between items in a purported minimal pair that did not follow historical tone allophony patterns (i.e., involved an exception). With the current data, it is not possible to tease apart whether this null effect is due to the nature of this particular merger or aspects of the experiment. Future work would benefit from examining the [ŋ]↔Ø merger further in perception.

Direction of production-perception misalignment
Despite correlation differences, similar patterns of misalignment were uncovered across all three merger pairs. In general, individuals making a contrast in production showed comparatively less perceptual contrast, indicating that perceptual merger is more advanced than production merger. On the other hand, many individuals with nearly or fully merged production for [n]→ [l] and [ŋ]↔Ø-if not 'realigned' through full merger in production and perception-in fact show evidence of categoricity in perception; this suggests that at these late stages of production merger, perception is less advanced, or lagging. Given the data at hand, these results fall in line with previous reports of production-perception misalignment with regards to stage of change (Pinget et al., 2020;Coetzee et al., 2018). However, the switch from perception-led to production-led change was reported in Pinget et al. (2020) to be roughly at the halfway point of merger while here, only those who were nearly or fully merged exhibited production leading perception.
Whether this difference relates to the fact that the present mergers appear to have reached a state of socially conditioned variation rather than continued merger or whether such a pattern is dependent on the particular situation requires further investigation.
In addition, younger participants on the whole tend towards less misalignment in the direction of production contrast and/or more in the direction of perceptual contrast. This confirms that community-level perceptual and production results, where younger individuals showed more contrast in perception, can be sourced to the individual level. A final observation is noteworthy. Unlike the other two mergers, few individuals showed misalignment in the direction of perceptual contrast for the [ŋ ̩ ]→[m̩ ] merger, even for those merged in production.
Why does this difference arise? While speculative, it seems conceivable that [ŋ ̩ ]→[m̩ ] operates on a reduced perceptual scale due to the lack of acoustic-perceptual salience for nasals (Harnsberger, 2000;Narayan, 2008); if so, a different approach to assessing misalignment may be needed in future in work that attempts to compare across multiple cases of variation and change.

This study investigates production and perception of the [n]→[l], [ŋ ̩ ]→[m̩ ], and [ŋ]↔Ø mergers
in Hong Kong Cantonese, on one hand documenting the present-day status of these long-standing sound changes and on the other hand extending hypotheses about the production-perception relationship in the context of sound change. Production results demonstrate substantial amounts of individual variability but a lack of clear age-or gender-related variation, indicating that the picture for each merger is less straightforward than continued or completed merger. Perceptual results find that younger speaker-listeners are less merged than the older generation, potentially due to adaptation to high phonetic variability. While this project did not collect the data necessary to fully diagnose the current stage of the mergers, the results do suggest that [n]→[l] and [ŋ]↔Ø no longer appear to be changes-in-progress, though incipient reversal is a possibility for [ŋ ̩ ]→[m̩ ]; instead, usage of the formal variants appear inflated compared to previous records, suggesting that 'proper pronunciation' ideologies are at play. Another possibility is that the mergers have stabilized into context-conditioned style-shifting, a scenario that Labov (2001) proposes may historically be more common than changes going to completion (p. 75). Future research can disentangle these possibilities.
A birds-eye view of our results on the production-perception link reveals that, on the whole, it parallels the literature: Some level of linkage evidently exists between production and perception systems, though it is not always strong and not always found. We show here that, in the same population using the same methodology, a production-perception correlation can be found moderately for two mergers yet not at all for another. Our finding of productionperception coupling for two additional consonant mergers specifically bolsters the conclusion that convergent results in the literature are generalizable beyond certain types of sound change.
However, while we aimed to use comparable measures for production and perception, we also acknowledge that our experimental choices may still have impacted results. Ultimately, we do not solve the problem of methodological inconsistency, but we urge future research to take these issues into careful consideration.
Interestingly, when a correlation was detected, they were of very similar magnitude, even though these mergers are characterized by rather different profiles (e.g., functional load), trajectories, and perceptual salience. This direct comparison of magnitudes indicates that there is a modicum of consistency about the production-perception relationship across cases; notably, such a comparison is difficult to make across studies due to the variability in context, methodology, participants, and more (see Pinget et al., 2020, who nevertheless report that previous studies often find moderate correlations).
That production and perception systems constrain but do not determine each other is sensible as we expect flexibility in perception beyond variability exhibited in production (e.g., we understand and accept a much larger range of speech than we tend to produce; see Samuel & Larraza, 2015). It cannot simply be that all exposure in perception affects production, as effects of exposure in stored representations are most likely mediated by encoding (see attention-weighting approaches to exemplar theoretic models of speech perception, Sumner et al., 2014;Drager & Kirtley, 2016;Johnson, 1997). In the same way, production is mediated by other factors, such as speaker agency, and does not reflect purely linguistic aspects of experience. If this is the case, however, what is the expected extent of transfer between perception and production, and what cognitive factors influence the degree of coupling?
One route forward is to consider moving beyond a search for evidence on the existence of a production-perception link to a more nuanced examination of when and to what degree we are able to detect a relationship. In this vein, we encourage researchers to apply a controlled comparative approach with a larger range of case studies in order to paint a broader typology of sound change and the extent of coupling between production-perception systems. By focusing attention on theorizing why there would not be a link for any particular case, what mechanisms underlie such connections, and what regulates the strength of relationship across controlled groups, variables, or tasks, we can move closer to understanding the persistently mixed results of production-perception studies and the underlying cognitive processes.