<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">1868-6354</journal-id>
<journal-title-group>
<journal-title>Laboratory Phonology: Journal of the Association for Laboratory Phonology</journal-title>
</journal-title-group>
<issn pub-type="epub">1868-6354</issn>
<publisher>
<publisher-name>Open Library of Humanities</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.16995/labphon.6455</article-id>
<article-categories>
<subj-group>
<subject>Journal article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Identifying generalizable knowledge from the distribution of tonotactic accidental gaps in Mandarin</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Jin</surname>
<given-names>Shao-Jie</given-names>
</name>
<email>shaojiejin.c@nycu.edu.tw</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Sheng-Fu</given-names>
</name>
<email>sftwang@gate.sinica.edu.tw</email>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Lu</surname>
<given-names>Yu-An</given-names>
</name>
<email>yuanlu@nycu.edu.tw</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Department of Foreign Languages and Literatures, National Yang Ming Chiao Tung University, Hsinchu, Taiwan</aff>
<aff id="aff-2"><label>2</label>Institute of Linguistics, Academia Sinica, Taipei, Taiwan</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2023-07-26">
<day>26</day>
<month>07</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>42</lpage>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2023 The Author(s)</copyright-statement>
<copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See <uri xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.journal-labphon.org/articles/10.16995/labphon.6455/"/>
<abstract>
<p>This study investigates tonotactic accidental gaps (unattested syllable-tone combinations) in Mandarin Chinese. In a corpus study, we found that, independent of syllable type, T2 (rising) and T3 (falling-rising) gaps were over-represented, whereas T1 (high level) and T4 (falling) gaps were under-represented. We also observed fewer T1 gaps with voiceless onsets and more T2 and T3 gaps with voiceless onsets, a pattern that is consistent with cross-linguistic observations. While these trends were generally reflected in a wordlikeness rating experiment by Mandarin listeners, their judgements of these gaps, similar to those of real words, were also guided by neighborhood density. Furthermore, T2 gaps with real-word T3 counterparts were rated as more wordlike, a result attributed to the T3 sandhi in Mandarin Chinese. Finally, we used harmonic scores generated from the UCLA Phonotactic Learner to explicitly test the role of lexical knowledge and markedness constraints in modeling speakers&#8217; tonotactic knowledge reflected in the wordlikeness ratings. We found that grammars induced from lexical data were the most successful at predicting wordlikeness ratings of gaps and lexical syllables combined. However, when focused on the ratings of tonotactic gaps, grammars with markedness constraints informed by cross-linguistic observations were more successful even without the constraints being weighted on lexical data. The results show how lexical knowledge and universal markedness, which is not entirely learnable from the lexicon, may account for some tonotactic generalizations.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>1. Introduction</title>
<p>One of the central goals of phonology is to describe what structures are and are not possible in a given language (<xref ref-type="bibr" rid="B18">Fischer-J&#248;rgensen, 1952</xref>; <xref ref-type="bibr" rid="B27">Halle, 1962</xref>). However, relatively less research has considered what seems to be possible yet does not exist. These unattested &#8220;accidental gaps&#8221; have traditionally been dismissed, considered possible and thus left unexplained, as opposed to systematic gaps, which violate systematic phonotactic constraints and thus are deemed impossible. Previous studies have shown that speakers&#8217; acceptance of these possible yet unattested forms is generally gradient and based on grammatical principles, such as <italic>markedness</italic> (e.g., <xref ref-type="bibr" rid="B21">Frisch, Pierrehumbert, &amp; Broe, 2004</xref>; <xref ref-type="bibr" rid="B71">Zuraw, 2000</xref>, <xref ref-type="bibr" rid="B72">2002</xref>), or <italic>lexical statistics</italic>, such as neighborhood density and the probability or frequency of attested forms (e.g., <xref ref-type="bibr" rid="B1">Albright &amp; Hayes, 2003</xref>; <xref ref-type="bibr" rid="B10">Coleman &amp; Pierrehumbert, 1997</xref>; <xref ref-type="bibr" rid="B21">Frisch et al., 2004</xref>; <xref ref-type="bibr" rid="B25">Gong &amp; Zhang, 2021</xref>; <xref ref-type="bibr" rid="B54">Myers &amp; Tsay, 2004</xref>). These studies, however, mainly focus on unattested segmental combinations. This study, on the other hand, explores unattested forms involving syllable-tone combinations, or &#8220;tonotactic accidental gaps&#8221; (<xref ref-type="bibr" rid="B25">Gong &amp; Zhang, 2021</xref>; <xref ref-type="bibr" rid="B43">Lai, 2003</xref>; <xref ref-type="bibr" rid="B61">Wang, 1998</xref>), in Mandarin Chinese.<xref ref-type="fn" rid="n1">1</xref> Since the processing of tone is distinct from that of segmental information (e.g., <xref ref-type="bibr" rid="B11">Cutler &amp; Chen, 1997</xref>; <xref ref-type="bibr" rid="B44">Lee, 2007</xref>; <xref ref-type="bibr" rid="B64">Wiener &amp; Turnbull, 2016</xref>), we investigate if the aforementioned generalizations (i.e., cross-linguistic grammatical principles and lexical statistics) drawn from unattested segmental combinations can also be applied to tonotactic accidental gaps. Using a corpus study and a wordlikeness rating experiment, we investigate whether the patterns observed in the corpus for gaps are reflected in Mandarin speakers&#8217; judgments of wordlikeness. Furthermore, given the finding that the phonetic naturalness of onset-tone interactions and lexical statistics both predicted speakers&#8217; judgments, we used computational modeling analysis to investigate their relationship further. Specifically, we asked whether and to what extent relevant knowledge of tone and onset-tone markedness is learnable from the lexicon using constraint-based learning simulation.</p>
<p>Standard Mandarin is generally described as having five vowels (/i, y, u, &#601;, a/) and 25 consonants with the maximum syllable structure (C)(G)V(G)/(C) (<xref ref-type="bibr" rid="B48">Lin, 2007</xref>). It has four phonemic tones: High-level Tone 1 (55), rising Tone 2 (35), falling-rising Tone 3 (214), and falling Tone 4 (51) (<xref ref-type="bibr" rid="B16">Duanmu, 2007</xref>; <xref ref-type="bibr" rid="B48">Lin, 2007</xref>). Tone numbers here indicate relative pitch height&#8212;the higher the number, the higher the relative pitch. Though less mainstream, some phonologists do not consider Mandarin to be a four-toneme language, treating the neutral tone as lexically specified (<xref ref-type="bibr" rid="B8">Chen &amp; Xu, 2006</xref>; <xref ref-type="bibr" rid="B35">K. Huang, 2012</xref>). Not all tones, however, can be combined with every possible syllable. For example, the syllable [ts&#688;u] can be combined with T1 ([ts&#688;u]<sup>55</sup> &#8220;coarse&#8221;), T2 ([ts&#688;u]<sup>35</sup> &#8220;die&#8221; in Classical or Literary Chinese), and T4 ([ts&#688;u]<sup>51</sup> &#8220;vinegar&#8221;), but not with T3. The syllable-tone combination *[ts&#688;u]<sup>214</sup> does not violate any obvious phonotactic constraints in Mandarin, yet it fails to exist in any dictionary&#8212;this is an example of a tonotactic accidental gap (<xref ref-type="bibr" rid="B17">Duanmu, 2011</xref>; <xref ref-type="bibr" rid="B43">Lai, 2003</xref>). Because their occurrence seems random, tonotactic accidental gaps have been all but ignored in the literature. Note that, unlike segmental gaps in the aforementioned studies in which the number of possible unattested forms are hard to define, the number of tonotactic gaps can be easily calculated since the allowable syllables in Mandarin Chinese is straightforward. Thus, the tonotactic gaps reported in this study can also be understood as the inverse of actual Mandarin Chinese syllables (e.g., more T2 gaps means fewer actual T2 syllables). This study focuses on tonotactic accidental gaps to reveal relevant grammatical properties of the linguistic system. Motivated by studies demonstrating the importance of a priori grammar states and analytic biases independent of lexical statistics (e.g., <xref ref-type="bibr" rid="B6">Berent, Wilson, Marcus, Bemis, 2012</xref>; <xref ref-type="bibr" rid="B5">Becker, Nevins, &amp; Levine, 2012</xref>) in modeling native speakers&#8217; phonotactic knowledge, we aimed to investigate whether speakers&#8217; differential preferences for unattested forms require knowledge that is either supplied by some a priori state (markedness informed by cross-linguistic observations) or from the attested lexicon (lexical statistics).</p>
<p>Among a handful of studies that have examined this issue, Wang (<xref ref-type="bibr" rid="B61">1998</xref>) asked native Taiwan Mandarin speakers to rate the wordlikeness of target syllables on a scale from 0 to 10, 0 indicating that the target syllable was very close to a real Mandarin word and 10 indicating that the target syllable was completely unlike a real word. The target syllables included tonotactic accidental gaps, phonotactic accidental gaps (phonotactically legal syllables that fail to exist), systematic gaps (phonotactically illegal syllables), and existing words. The results showed a clear distinction between existing and non-existing words, suggesting that tonotactic accidental gaps generally pattern together with phonotactically illegal syllables. However, Wang also noted that, among the non-existing words, accidental gaps (both tonotactic and phonotactic) were more readily accepted by native speakers compared to systematic gaps.</p>
<p>On the other hand, Myers and Tsay (<xref ref-type="bibr" rid="B54">2004</xref>) showed that native Taiwan Mandarin speakers&#8217; judgments of tonotactic accidental gaps in a wordlikeness rating experiment differed from those of phonotactically legal syllables and were judged similarly to systematic gaps. They concluded that phonotactics affects the judgement of both real words and non-words while frequency and neighborhood density only affect words. In an investigation of gap distribution and native speakers&#8217; judgements, however, Lai (<xref ref-type="bibr" rid="B43">2003</xref>) showed that there was an effect of tone frequency on non-words. In this study, tonotactic gaps with T2 were shown to be more common than those with the other tones. Moreover, T2 combined with closed syllables and [p, t, k, t&#597;, t&#642;, ts] onsets and T1 with [m, n, l, &#656;] onsets accounted for a large proportion of gaps. Lai further conducted rating and preference experiments investigating native Taiwan Mandarin speakers&#8217; judgments of tonotactic accidental gaps and found that T4 gaps were more readily accepted as real Mandarin words compared with T2 gaps, and T2 gaps with [p, t, k, t&#597;, t&#642;, ts] onsets were generally disfavored. These results were attributed to tone frequency: There are more real words with T4 than T2, and there are more T2 gaps with those particular onsets.</p>
<p>In a more recent study, Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>) collected native Mandarin speakers&#8217; well-formedness judgments of five types of T1 syllables&#8212;real words, tonotactic gaps, allophonic gaps (gaps that only violate allophonic rules; e.g., vowel backness depending on the place of the nasal codas), phonotactic accidental gaps, and systematic gaps. They found that the five different types of stimuli were rated gradiently: Real words were considered more well-formed than tonotactic gaps, followed by allophonic gaps, phonotactic accidental gaps and finally systematic gaps. Furthermore, the judgments were positively correlated with neighborhood density, and this effect was found to be stronger for gaps than for real words.</p>
<p>The results from such behavioral experiments with gradient measurements of phonotactic probability or well-formedness can be computationally modeled. The UCLA Phonotactic Learner (<xref ref-type="bibr" rid="B31">Hayes &amp; Wilson, 2008</xref>), which induces grammars consisting of weighted constraints based on the principle of Maximum entropy (<xref ref-type="bibr" rid="B14">Della Pietra, Della Pietra, &amp; Lafferty, 1997</xref>; <xref ref-type="bibr" rid="B23">Goldwater &amp; Johnson, 2003</xref>; <xref ref-type="bibr" rid="B31">Hayes &amp; Wilson, 2008</xref>; <xref ref-type="bibr" rid="B73">Zuraw &amp; Hayes, 2017</xref>), has been extensively used for this purpose (e.g., <xref ref-type="bibr" rid="B6">Berent et al., 2012</xref>; <xref ref-type="bibr" rid="B12">Daland, Hayes, White, Garellek, Davis, &amp; Norrmann, 2011</xref>; <xref ref-type="bibr" rid="B22">Gallagher, Gouskova, &amp; Camacho Rios, 2019</xref>; <xref ref-type="bibr" rid="B23">Goldwater &amp; Johnson, 2003</xref>; Hayes &amp; White, 2003; <xref ref-type="bibr" rid="B65">Wilson &amp; Gallagher, 2018</xref>). Gong (<xref ref-type="bibr" rid="B24">2017</xref>), for example, used this method to model visual lexical decisions on segmental combinations in Mandarin Chinese. Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>) also used the learner to model the wordlikeness ratings of Mandarin word forms from different lexicality categories. Alternatives to the UCLA Phonotactic Learner in modeling lexical judgements in Mandarin include the probability of segmental strings (<xref ref-type="bibr" rid="B55">Myers &amp; Tsay, 2005</xref>) and Bayesian probabilities (<xref ref-type="bibr" rid="B15">Do &amp; Lai, 2020</xref>). In this study, we complement these works by modeling tonotactic generalizations with the UCLA Phonotactic Learner to compare grammars with different tonotactic constraints, namely <italic>inductive constraints</italic> with different levels of fit to the lexicon and <italic>typologically-motivated markedness constraints</italic> with or without access to the lexicon (i.e., if the weights are informed by learning simulations using the lexicon). These comparisons allow us to examine to what extent the effects observed in the behavioral data are learnable from the lexical data.</p>
<p>In the following sections, we give a comprehensive description of Mandarin tonotactic accidental gaps. First, we conduct a corpus study to examine the distribution of all lexical segmental syllables, that, when combined with the four lexical tones, yield non-lexical syllables. The results show that, independent of syllable type, T2 (rising) gaps are over-represented. Since T2 and T3 are intrinsically more marked than T1 and T4 in terms of contour complexity and aerodynamics (see Section 2), and the phonetic realization of T2 and T3 contours requires a longer duration (<xref ref-type="bibr" rid="B70">Zhang, 2001</xref>), we further investigate to what extent would the T2 overrepresentation and T2/T3 markedness reflect on speakers&#8217; wordlikeness rating. The results reveal that T2 is not disfavored while T3 is generally disfavored independent of syllable structures. Furthermore, native speakers&#8217; wordlikeness ratings of gaps are gradient and heavily guided by neighborhood density. While speakers&#8217; wordlikeness judgment does reflect the markedness of T3 with a falling-rising contour, the over-representation of T2 gaps from the lexicon is not similarly evident. Motivated by the mismatches between the statistical properties of the lexicon and speakers&#8217; judgments, we use the UCLA Phonotactic Learner as a computational tool to incorporate and compare different degrees of lexical access in modeling wordlikeness ratings. We find that while tonotactic constraints induced from the lexical data can successfully model the results overall, typologically-motivated markedness constraints are better at predicting which gaps receive higher ratings, and their success could largely be achieved independent of the lexicon. The modeling results suggest that speakers&#8217; tonotactic knowledge may be disassociated from statistical patterns in the lexicon.</p>
</sec>
<sec>
<title>2. Corpus study of Mandarin tonotactic accidental gaps</title>
<p>To examine if the possible yet unattested Mandarin tonotactic accidental gaps follow any particular pattern, we first investigate the distribution of these gaps by compiling a corpus of gaps, which we named the &#8216;Mandarin Accidental Gap Corpus&#8217;. The corpus included the 398 allowable Mandarin syllables (taken from <xref ref-type="bibr" rid="B48">Lin, 2007, p. 283</xref>). Two definitions of accidental gaps were employed: (1) A narrow view, in which syllable-tone combinations do not exist as a lexical syllable in the <italic>Revised Mandarin Chinese Dictionary</italic>, compiled by the Ministry of Education, Taiwan (<ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://dict.revised.moe.edu.tw/">https://dict.revised.moe.edu.tw/</ext-link>), and (2) a broad view, where syllable-tone combinations <italic>do</italic> form lexical syllables but have zero-frequency in the <italic>Taiwan Mandarin Conversational Corpus</italic> (TMC corpus) (<xref ref-type="bibr" rid="B60">Tseng, 2019</xref>).<xref ref-type="fn" rid="n2">2</xref> For example, the syllable [ts&#688;u] in T2 ([ts&#688;u]<sup>35</sup> &#8220;die&#8221; in Classical or Literary Chinese), historically exists as a lexical syllable and is known by Mandarin speakers through poetry but is not listed. As such, this lexical syllable might be considered a gap by native Mandarin speakers because it is rarely, if ever, used in spoken Mandarin. This lexical syllable was thus counted as a gap in the broad view but not in the narrow view. Note that we use &#8220;lexical syllable&#8221; here and throughout this work instead of &#8220;word&#8221; because while Mandarin morphemes are mostly monosyllabic, around 72% of the lexicon is made up of disyllabic words (<xref ref-type="bibr" rid="B47">Li, 2013</xref>).</p>
<p>Our investigation of the corpus data revealed that accidental gaps were not evenly distributed across the four tones, as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. In the narrow view, a one-way chi-square test showed that T2 gaps were over-represented while T4 gaps were under-represented (<italic>&#967;</italic><sup>2</sup>(3) = 68.8, <italic>p</italic> &lt; .001). Another one-way chi-square test revealed that T2 gaps were over-represented in the broad view (<italic>&#967;</italic><sup>2</sup>(3) = 25.01, <italic>p</italic> &lt; .001). In the aforementioned study, Lai (<xref ref-type="bibr" rid="B43">2003</xref>) made a similar observation that T2 gaps outnumbered gaps with the other tones.</p>
<fig id="F1">
<label>Figure 1</label>
<caption>
<p>Numbers of Mandarin accidental gaps as a function of tone in the narrow and broad views. The horizontal lines indicate the predicted numbers.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g1.png"/>
</fig>
<p>There are several possible explanations for the asymmetrical distribution of accidental gaps. First, it could be attributed to <italic>Zhu&#243; Sh&#462;ng Bi&#224;n Q&#249;</italic> (voiced <italic>sh&#462;ng</italic> tone entering <italic>q&#249;</italic> tone), a historical tone merging process in which a number of voiced <italic>sh&#462;ng</italic> tones (i.e., T3) merged into <italic>q&#249;</italic> tones (i.e., T4) in Middle Chinese (<xref ref-type="bibr" rid="B51">Mei, 1970</xref>, <xref ref-type="bibr" rid="B52">1977</xref>; <xref ref-type="bibr" rid="B62">Wang, 1972</xref>). This may account for the lower number of T4 gaps. Second, the large percentage of T2 gaps could be attributed to the markedness of rising (i.e., T2) in comparison with level and falling tones (i.e., T1 and T4). Specifically, among simple contour tones, falls are much more common than rises, presumably due to the physiological difficulty associated with the production of rising contours against natural airflow dynamics (<xref ref-type="bibr" rid="B70">Zhang, 2001</xref>). Despite the fact that T3 (falling-rising) involves the most complex contour, we did not observe any obvious overrepresentation of T3 gaps except for the greater number of T3 gaps observed in the broad view compared with those in the narrow view. This may be attributed to the marked status of the complex tonal contour or to its greatly confusable nature with T2 due to phonetic similarity and a morphophonemic alternation involving T3 sandhi (<xref ref-type="bibr" rid="B28">Hao, 2012</xref>; <xref ref-type="bibr" rid="B36">T. Huang, 2001</xref>; <xref ref-type="bibr" rid="B37">Huang &amp; Johnson, 2010</xref>; <xref ref-type="bibr" rid="B38">Hume &amp; Johnson, 2003</xref>; <xref ref-type="bibr" rid="B52">Mei, 1977</xref>). This speculation is not without grounding as T3 sandhi has indeed emerged within the past few centuries (<xref ref-type="bibr" rid="B52">Mei, 1977</xref>).</p>
<p>It should be noted that, cross-linguistically, contour tones are generally preferred in longer rimes, presumably because they provide a duration long enough to realize the complex tone targets (<xref ref-type="bibr" rid="B69">Zhang, 2000</xref>, <xref ref-type="bibr" rid="B70">2001</xref>). Studies have shown that Mandarin CGVN syllables are indeed longer than other syllable types (i.e., CV, CVN, CGV) (<xref ref-type="bibr" rid="B66">Wu &amp; Kenstowicz, 2015</xref>) and that T2 and T3 are longer than T1 and T4 (<xref ref-type="bibr" rid="B49">Lu &amp; Lee-Kim, 2021</xref>; <xref ref-type="bibr" rid="B66">Wu &amp; Kenstowicz, 2015</xref>). We thus divided the syllables into different types (CV/CGV, CVG/CGVG, CVN/CGVN) to examine if there were any tone-syllable type dependencies. Here we follow a conventional definition of syllable structure in Mandarin, assuming that on-glides are grouped with the onset and thus do not contribute to syllable weight (<xref ref-type="bibr" rid="B16">Duanmu, 2007</xref>). However, as the results in <xref ref-type="fig" rid="F2">Figure 2</xref> show, we did not observe any tendencies in this direction (narrow view: <italic>&#967;</italic><sup>2</sup>(6) = 8.05, <italic>p</italic> = .24; broad view: <italic>&#967;</italic><sup>2</sup>(6) = 4.11, <italic>p</italic> = .66).</p>
<fig id="F2">
<label>Figure 2</label>
<caption>
<p>Proportions of Mandarin tonotactic accidental gaps as a function of syllable types in the narrow and broad views. The horizontal lines indicate the predicted proportions.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g2.png"/>
</fig>
<p>Cross-linguistic and diachronic studies have observed that high-<italic>f0</italic> tones are more compatible with voiceless onsets while low-<italic>f0</italic> tones are more compatible with voiced onsets (e.g., <xref ref-type="bibr" rid="B34">Hsieh &amp; Kenstowicz, 2008</xref>; <xref ref-type="bibr" rid="B39">Kenstowicz &amp; Suchato, 2006</xref>; <xref ref-type="bibr" rid="B56">Ohala, 1978</xref>; <xref ref-type="bibr" rid="B57">Sagart, 1999</xref>; <xref ref-type="bibr" rid="B68">Yip, 2002</xref>). This can be explained by the aerodynamics involved in articulation in that voiceless consonants exert a pitch-raising effect on the following tone (<xref ref-type="bibr" rid="B32">Hombert, Ohala, &amp; Ewan, 1979</xref>; <xref ref-type="bibr" rid="B56">Ohala, 1978</xref>). It is generally agreed upon that some Chinese tones originated via a similar mechanism. Sagart (<xref ref-type="bibr" rid="B57">1999</xref>) reports a clear correspondence between onset voicing in Middle Chinese and tones in Modern Chinese&#8212;voiced onsets, both obstruents and sonorants, induced a tone lowering of the following vowel resulting in high and low allotones that eventually phonologized into different tonal contrasts. We thus examined if T1 and T4, tones with an initially high pitch, were more likely to appear with voiceless onsets (i.e., having fewer gaps), and if T2 and T3, tones with initially low pitch, were more likely to appear with voiced onsets. In other words, there should be fewer T1 and T4 gaps coupled with voiceless onsets and T2 and T3 gaps with voiced onsets. The contrast between voiced and voiceless consonants is essentially obstruent vs. sonorant since Mandarin obstruents lack a voicing contrast.<xref ref-type="fn" rid="n3">3</xref> The results (<xref ref-type="fig" rid="F3">Figure 3</xref>) showed that the distribution of gaps mostly conformed to this trend: More gaps with voiced onsets were observed in T1 than in T2 and T3, in which gaps with voiceless onsets dominated (narrow view: <italic>&#967;</italic><sup>2</sup>(3) = 80.37, <italic>p</italic> &lt; .001; broad view: <italic>&#967;</italic><sup>2</sup>(3) = 64.52, <italic>p</italic> &lt; .001). However, T4 gaps did not pattern as predicted. Although the pitch of T4 is initially high, there were still more T4 gaps with voiceless onsets. This may be attributed to the historical tone merging process mentioned earlier whereby T3, presumably with more voiced onsets, merged into T4, disrupting the connection between onset voicing and tone.</p>
<fig id="F3">
<label>Figure 3</label>
<caption>
<p>Proportions of Mandarin tonotactic accidental gaps as a function of onset voicing in the narrow and broad views. The horizontal lines indicate the predicted proportions.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g3.png"/>
</fig>
<p>Recall that Lai (<xref ref-type="bibr" rid="B43">2003</xref>) reported that T2 closed syllables with [p, t, k, t&#597;, t&#642;, ts] onsets and T1 with [m, n, l, &#656;] onsets accounted for a large proportion of gaps. When we focused our analysis on individual onsets, we found that onset voicing, again, is a better indicator of the general pattern, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<p>The analyses of our corpus data were aimed at determining if the accidental gaps in Mandarin follow any particular pattern. Our findings suggest that the occurrence of gaps is not completely random. We found that T2 gaps are over-represented while the most marked T3 gaps are not. We also found more gaps with voiced onsets in T1 than in all other tones, in which gaps with voiceless onsets dominated, a pattern that is partially observed cross-linguistically and diachronically.</p>
<p>In the next section, we investigate Mandarin speakers&#8217; judgments of these accidental gaps. Specifically, we conducted a wordlikeness judgement experiment to explore whether their judgements of accidental gaps would follow the same tendencies that we observed in our corpus study and/or by grammatical principles that were absent from the corpus.</p>
</sec>
<sec>
<title>3. Wordlikeness judgment experiment</title>
<p>We conducted a wordlikeness judgement experiment to investigate Mandarin listeners&#8217; perception of tonotactic accidental gaps and to determine if their perceptual tendencies reflect what has been observed both cross-linguistically and in our corpus study. The factors being examined along with our predictions are summarized in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap id="T1">
<label>Table 1</label>
<caption>
<p>Factors that may affect the wordlikeness judgments of Mandarin tonotactic accidental gaps.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Observation</bold></td>
<td align="left" valign="top"><bold>Possible explanations</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">a. More T2 gaps than T1/T4 gaps observed in the corpus<break/><break/>&#10132; Are T2 gaps judged as less wordlike?<break/>&#10132; Is the most marked T3 judged as less wordlike?</td>
<td align="left" valign="top">T2/T3 are more marked than T1/T4.</td>
</tr>
<tr>
<td align="left" valign="top">b. T2/T3: More gaps with voiceless onset<break/>T1: Fewer gaps with voiceless onset<break/><break/>Note: T4 does not conform to this pattern due to a historical tone merging process.</td>
<td align="left" valign="top">High-<italic>f0</italic> is more compatible with voiceless segments while low-<italic>f0</italic> is more compatible with voiced segments.</td>
</tr>
<tr>
<td align="left" valign="top" colspan="2">&#10132; Are T2/T3 gaps with voiceless onset and T1 gaps with voiced onset judged as less wordlike?</td>
</tr>
<tr>
<td align="left" valign="top">c. Lexical statistics: Neighborhood density, frequency, and phonotactic probability</td>
<td align="left" valign="top">Previous studies have found effects of frequency, neighborhood density and phonotactic probability effects on unattested forms (<xref ref-type="bibr" rid="B1">Albright, 2003</xref>; <xref ref-type="bibr" rid="B10">Coleman &amp; Pierrehumbert, 1997</xref>; <xref ref-type="bibr" rid="B20">Frisch, Large, &amp; Pisoni, 2000</xref>; <xref ref-type="bibr" rid="B25">Gong &amp; Zhang, 2021</xref>; <xref ref-type="bibr" rid="B43">Lai, 2003</xref>; <xref ref-type="bibr" rid="B55">Myers &amp; Tsay, 2005</xref>).</td>
</tr>
<tr>
<td align="left" valign="top" colspan="2">&#10132; Do lexical statistics have a gradient effect on the wordlikeness judgment of gaps?</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec sec-type="methods">
<title>3.1. Methodology</title>
<sec>
<title>3.1.1. Participants</title>
<p>Thirty-seven Taiwan Mandarin native speakers (10 male, 27 female; aged 20&#8211;37, <italic>M</italic> = 21.68) were recruited from National Yang Ming Chiao Tung University. These participants were all Mandarin-dominant speakers with some exposure to other dialects of Chinese spoken in Taiwan (Taiwanese Southern Min and Hakka). None of the participants reported hearing or speaking deficiencies. The study was conducted in accordance with the ethical guidelines approved by the Research Ethics Committee for Human Subject Protection, National Yang Ming Chiao Tung University. All participants were compensated monetarily for their time.</p>
</sec>
<sec>
<title>3.1.2. Materials</title>
<p>To examine whether the patterns observed in the corpus and cross-linguistically (<xref ref-type="table" rid="T1">Table 1</xref>) are reflected in native Mandarin speakers&#8217; judgments of accidental gaps, 96 Mandarin accidental gaps (as defined by both the narrow and broad views) were selected. The gaps were counterbalanced across the four Mandarin tones and different syllable types (open: CV, CGV; closed: CVN, CGVN). Another 48 Mandarin lexical syllables, referred to as &#8220;words&#8221; in the figures for the sake of brevity, fulfilling the same criteria were also selected. These stimuli were selected such that they represented the distribution in the corpus in terms of onset voicing and tone combinations (e.g., more voiced-onset T1 gaps, more voiceless-onset T2 gaps).<xref ref-type="fn" rid="n4">4</xref> The 144 stimuli (see Appendix I) were produced by a male native speaker of Taiwan Mandarin. Though previous studies have shown that including real words can de-sensitize participants&#8217; ratings of non-words and is more likely to activate lexical neighbors than when all stimuli are non-words (<xref ref-type="bibr" rid="B2">Albright, 2009</xref>), we included lexical syllables to enable a comparison between the gaps and lexical syllables.</p>
<p>The realization of the phonetic tonal contours of these naturally produced stimuli were checked to ensure that the gap and lexical tokens were comparable. Time normalized <italic>f0</italic> contours of these tokens (excluding obstruent onsets, if any) were obtained using ProsodyPro (<xref ref-type="bibr" rid="B67">Xu, 2013</xref>). As seen in <xref ref-type="fig" rid="F4">Figure 4</xref>, the tonal contours were comparable between the gap and lexical tokens. Note that the final rise of T3 (falling-rising) contours fell short of the final rise target, a well-known characteristic of T3 production in Taiwan Mandarin (<xref ref-type="bibr" rid="B19">Fon &amp; Chiang, 1999</xref>; <xref ref-type="bibr" rid="B41">Kubler, 1985</xref>). Despite the final-rise undershoot of T3, Lu and Lee-Kim (<xref ref-type="bibr" rid="B49">2021</xref>) showed that this tone is still perceived as having a complex fall-rise contour. All else being equal, Taiwan Mandarin speakers perceived T3 tokens without final rise as the longest among the four lexical tones. Furthermore, when asked to imitate T3 with a final-rise undershoot, these speakers implemented a final rise, similar to that in T3 with a full concave contour.</p>
<fig id="F4">
<label>Figure 4</label>
<caption>
<p>Time normalized f0 as a function of tone paneled by lexicality.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g4.png"/>
</fig>
<p>The durations of gap syllables were longer (<italic>M</italic> = 470.33 ms, <italic>SD</italic> = 111.15 ms) than those of lexical syllables (<italic>M</italic> = 441.47 ms, <italic>SD</italic> = 106.89 ms). Since this difference in the naturally produced stimuli could have confounded the wordlikeness ratings, the stimuli were further resynthesized into two durations, 300 ms and 500 ms, reflecting the range of Mandarin syllable duration (<xref ref-type="bibr" rid="B49">Lu &amp; Lee-Kim, 2021</xref>; <xref ref-type="bibr" rid="B66">Wu &amp; Kenstowicz, 2015</xref>), using the Pitch Synchronous Overlap and Add (PSOLA) algorithm in Praat (<xref ref-type="bibr" rid="B7">Boersma &amp; Weenink, 2017</xref>). The manipulation ensured that any differences in the gap and word ratings would be unlikely to be due to any acoustic artifacts of the stimuli.</p>
</sec>
<sec>
<title>3.1.3. Procedure</title>
<p>The 288 stimuli ([96 accidental gaps + 48 lexical syllables] &#215; 2 durations) were randomized for each participant and presented auditorily in three blocks using E-Prime software (<xref ref-type="bibr" rid="B58">Schneider, Eschman, &amp; Zuccolotto, 2012</xref>). The participants were instructed with written instructions on a computer screen (<italic>Q&#464;ngw&#232;n n&#237;n t&#299;ngd&#224;o de z&#236; y&#466;u du&#333; xi&#224;ng zh&#333;ngw&#233;n?</italic> &#8220;How Mandarin-like is the word you just heard?&#8221;) to rate each word on a 7-point scale, with 7 being the most wordlike and 1 the least wordlike.<xref ref-type="fn" rid="n5">5</xref> Nine practice trials were presented before the experiment to familiarize participants with the task. Participants were tested individually in a sound-attenuated booth using AKG K240 headphones and their responses were recorded using E-Prime. The total duration of the experiment was around 15 minutes.</p>
</sec>
</sec>
<sec>
<title>3.2. Results</title>
<p>Linear mixed-effects regression models were fitted in R using the <italic>lme4</italic> package (<xref ref-type="bibr" rid="B4">Bates, Maechler, Bolker, &amp; Walker, 2015</xref>) and <italic>p</italic>-values were obtained using the <italic>lmerTest</italic> package (<xref ref-type="bibr" rid="B42">Kuznetsova et al., 2016</xref>). The visualizations were plotted using the <italic>ggplot2</italic> package (<xref ref-type="bibr" rid="B63">Wickham, 2009</xref>). Models were fitted with the participants&#8217; wordlikeness ratings on the 7-point scale converted into <italic>z</italic>-scores for each speaker as the dependent variable. For our analyses, the experimental variables of interest included <italic>Tone</italic> (4 levels), <italic>SyllableType</italic> (open vs. closed), <italic>OnsetVoicing</italic> (voiced vs. voiceless), and <italic>Lexicality</italic> (tonotactic gap vs. lexical syllables). A set of variables on lexical statistics was also included. For a balanced comparison of the tonotactic gaps and lexical syllables, we used <italic>SyllableFrequency</italic> and <italic>SyllableGapFrequency</italic> as the indices to calculate the effect of frequency, if any.</p>
<p>The calculation of <italic>SyllableFrequency</italic> was straightforward; we calculated the overall token frequency of each syllable regardless of tone and morphemes using the TMC corpus (<xref ref-type="bibr" rid="B60">Tseng, 2019</xref>). We grouped homophonic morphemes together and only considered syllable token frequency since an auditory experiment had been employed. <italic>SyllableGapFrequency</italic> is the inverse of tonal neighborhood density, calculated by the number of lexical syllables differing from the test item only in tone. That is, with only one gap in a certain syllable, &#8220;1&#8221; would be considered more frequent while &#8220;3&#8221;, with three gaps, would be considered less frequent. Note that there is no &#8220;4&#8221; because in this case all tone-syllable combinations would be impossible.</p>
<p><italic>NeighborhoodDensity</italic> and <italic>PhonotacticProbability</italic> were included to provide additional quantification of possible lexical influence. <italic>NeighborhoodDensity</italic> was calculated by the summed frequency of the words generated by adding, deleting, or substituting a single phoneme. In this calculation, we treated diphthong vowels as sequences of two phonemes (e.g., [a], [i], and [ei] as neighbors of [ai]). Note that we used a tone-blind <italic>NeighborhoodDensity</italic> since the previously mentioned <italic>SyllableGapFrequency</italic> variable already reflected the number of syllables differing in tones. Finally, <italic>PhonotacticProbability</italic> was defined by onset-rime transitional probability (<xref ref-type="bibr" rid="B60">Tseng, 2019</xref>).</p>
<p>In addition to the lexical statistics variables (<italic>SyllableFrequency, NeighborhoodDensity, SyllableGapFrequency</italic> and <italic>PhonotacticProbability</italic>), the model also included the <italic>Tone</italic>*<italic>OnsetVoicing</italic> interaction to examine if there was any correlation between <italic>Tone</italic> and the two factors (<xref ref-type="table" rid="T1">Table 1(a, b)</xref>) as well as the <italic>Tone</italic>*<italic>Lexicality</italic> interaction to determine if gaps, like lexical syllables, were rated based on lexical statistics (<xref ref-type="table" rid="T1">Table 1(c)</xref>). The model also included the random intercepts for <italic>Participant</italic> and <italic>Item</italic> as well as by-participant random slopes for <italic>Tone, OnsetVoicing</italic>, and <italic>NeighborhoodDensity</italic>. Models including other by-participant random slopes failed to converge.</p>
<p>Descriptions of each variable and how they were coded are listed in <xref ref-type="table" rid="T2">Table 2</xref>. T2, which yielded intermediate wordlikeness ratings, was set as the reference level to facilitate the interpretation of the results. The binary variables <italic>SyllableType, OnsetType</italic>, and <italic>Lexicality</italic> were contrast coded so the sum of the weight of each level would be 0 so we could interpret the results as main effects (<xref ref-type="bibr" rid="B13">Davis, 2010</xref>).</p>
<table-wrap id="T2">
<label>Table 2</label>
<caption>
<p>Variables considered for analysis in wordlikeness rating experiment.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Variable</bold></td>
<td align="left" valign="top"><bold>Description</bold></td>
<td align="left" valign="top"><bold>Coding</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">WordlikenessRating</td>
<td align="left" valign="top">1&#8211;7 rating scale transformed into z-score</td>
<td align="left" valign="top">Numerical</td>
</tr>
<tr>
<td align="left" valign="top">Tone</td>
<td align="left" valign="top">T1, T2, T3, T4</td>
<td align="left" valign="top">4 levels: T2 as reference</td>
</tr>
<tr>
<td align="left" valign="top">SyllableType</td>
<td align="left" valign="top">Open vs. closed</td>
<td align="left" valign="top">2 levels: &#8211;1 vs. 1</td>
</tr>
<tr>
<td align="left" valign="top">OnsetVoicing</td>
<td align="left" valign="top">Voiceless vs. voiced</td>
<td align="left" valign="top">2 levels: &#8211;1 vs. 1</td>
</tr>
<tr>
<td align="left" valign="top">Lexicality</td>
<td align="left" valign="top">Gap vs. lexical syllable</td>
<td align="left" valign="top">2 levels: &#8211;1 vs. 1</td>
</tr>
<tr>
<td align="left" valign="top">SyllableFrequency</td>
<td align="left" valign="top">Z-scored log transformed token frequency of each syllable regardless of tone</td>
<td align="left" valign="top">Numerical</td>
</tr>
<tr>
<td align="left" valign="top">SyllableGapFrequency</td>
<td align="left" valign="top">The inverse of tonal neighborhood density</td>
<td align="left" valign="top">Numerical</td>
</tr>
<tr>
<td align="left" valign="top">NeighborhoodDensity</td>
<td align="left" valign="top">Z-scored summed frequency of the words generated by adding, deleting, or substituting of a single phoneme</td>
<td align="left" valign="top">Numerical</td>
</tr>
<tr>
<td align="left" valign="top">PhonotacticProbability</td>
<td align="left" valign="top">Z-scored onset-rime transitional probability</td>
<td align="left" valign="top">Numerical</td>
</tr>
<tr>
<td align="left" valign="top">Participant</td>
<td align="left" valign="top">Participant ID</td>
<td align="left" valign="top">Factorial</td>
</tr>
<tr>
<td align="left" valign="top">Item</td>
<td align="left" valign="top">Test item</td>
<td align="left" valign="top">Factorial</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The statistical model is summarized in <xref ref-type="table" rid="T3">Table 3</xref>. As would be expected, Mandarin speakers generally rated lexical syllables as more wordlike than gaps (<italic>Lexicality</italic>: <italic>p</italic> &lt; .0001). In the following, we discuss each of the factors and interactions that were relevant to patterns reported in the corpus study and previous cross-linguistic observations.</p>
<table-wrap id="T3">
<label>Table 3</label>
<caption>
<p>Summary of the statistical model for the wordlikeness judgment experiment.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top" rowspan="3"></td>
<td align="left" valign="top" colspan="4"><bold>R<sup>2</sup> = .55</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>B</bold></td>
<td align="left" valign="top"><bold>SE</bold></td>
<td align="left" valign="top"><bold>t</bold></td>
<td align="left" valign="top"><bold>p</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">(Intercept)</td>
<td align="left" valign="top">0.29</td>
<td align="left" valign="top">0.11</td>
<td align="left" valign="top">2.75</td>
<td align="left" valign="top">.007</td>
</tr>
<tr>
<td align="left" valign="top">T1</td>
<td align="left" valign="top">0.08</td>
<td align="left" valign="top">0.08</td>
<td align="left" valign="top">1.02</td>
<td align="left" valign="top">.311</td>
</tr>
<tr>
<td align="left" valign="top"><bold>T3</bold></td>
<td align="left" valign="top"><bold>&#8211;0.22</bold></td>
<td align="left" valign="top"><bold>0.08</bold></td>
<td align="left" valign="top"><bold>&#8211;2.74</bold></td>
<td align="left" valign="top"><bold>.007</bold></td>
</tr>
<tr>
<td align="left" valign="top">T4</td>
<td align="left" valign="top">0.06</td>
<td align="left" valign="top">0.08</td>
<td align="left" valign="top">0.77</td>
<td align="left" valign="top">.445</td>
</tr>
<tr>
<td align="left" valign="top">OnsetVoicing</td>
<td align="left" valign="top">&#8211;0.03</td>
<td align="left" valign="top">0.06</td>
<td align="left" valign="top">&#8211;0.57</td>
<td align="left" valign="top">.567</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Lexicality</bold></td>
<td align="left" valign="top"><bold>0.44</bold></td>
<td align="left" valign="top"><bold>0.09</bold></td>
<td align="left" valign="top"><bold>4.69</bold></td>
<td align="left" valign="top"><bold>&lt;.0001</bold></td>
</tr>
<tr>
<td align="left" valign="top">SyllableFrequency</td>
<td align="left" valign="top">0.06</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">1.34</td>
<td align="left" valign="top">.184</td>
</tr>
<tr>
<td align="left" valign="top">SyllableGapFrequency</td>
<td align="left" valign="top">&#8211;0.03</td>
<td align="left" valign="top">0.04</td>
<td align="left" valign="top">&#8211;0.81</td>
<td align="left" valign="top">.418</td>
</tr>
<tr>
<td align="left" valign="top">SyllableType</td>
<td align="left" valign="top">0.02</td>
<td align="left" valign="top">0.02</td>
<td align="left" valign="top">0.90</td>
<td align="left" valign="top">.371</td>
</tr>
<tr>
<td align="left" valign="top"><bold>NeighborhoodDensity</bold></td>
<td align="left" valign="top"><bold>0.11</bold></td>
<td align="left" valign="top"><bold>0.05</bold></td>
<td align="left" valign="top"><bold>2.24</bold></td>
<td align="left" valign="top"><bold>.027</bold></td>
</tr>
<tr>
<td align="left" valign="top">PhonProbability</td>
<td align="left" valign="top">0.02</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">0.25</td>
<td align="left" valign="top">.802</td>
</tr>
<tr>
<td align="left" valign="top"><bold>T1:OnsetVoicing</bold></td>
<td align="left" valign="top"><bold>0.18</bold></td>
<td align="left" valign="top"><bold>0.08</bold></td>
<td align="left" valign="top"><bold>2.27</bold></td>
<td align="left" valign="top"><bold>.025</bold></td>
</tr>
<tr>
<td align="left" valign="top">T3:OnsetVoicing</td>
<td align="left" valign="top">0.03</td>
<td align="left" valign="top">0.08</td>
<td align="left" valign="top">0.45</td>
<td align="left" valign="top">.657</td>
</tr>
<tr>
<td align="left" valign="top">T4:OnsetVoicing</td>
<td align="left" valign="top">0.04</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">0.59</td>
<td align="left" valign="top">.558</td>
</tr>
<tr>
<td align="left" valign="top">T1:Lexicality</td>
<td align="left" valign="top">&#8211;0.10</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;1.37</td>
<td align="left" valign="top">.174</td>
</tr>
<tr>
<td align="left" valign="top">T3:Lexicality</td>
<td align="left" valign="top">&#8211;0.04</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;0.53</td>
<td align="left" valign="top">.594</td>
</tr>
<tr>
<td align="left" valign="top">T4:Lexicality</td>
<td align="left" valign="top">&#8211;0.07</td>
<td align="left" valign="top">0.07</td>
<td align="left" valign="top">&#8211;1.03</td>
<td align="left" valign="top">.306</td>
</tr>
<tr>
<td align="left" valign="top">Lexicality:SyllableFrequency</td>
<td align="left" valign="top">0.00</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">&#8211;0.04</td>
<td align="left" valign="top">.967</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Lexicality:SyllableGapFrequency</bold></td>
<td align="left" valign="top"><bold>0.10</bold></td>
<td align="left" valign="top"><bold>0.04</bold></td>
<td align="left" valign="top"><bold>2.67</bold></td>
<td align="left" valign="top"><bold>.008</bold></td>
</tr>
<tr>
<td align="left" valign="top">Lexicality:NeighborhoodDensity</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">0.05</td>
<td align="left" valign="top">1.10</td>
<td align="left" valign="top">.273</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Model: Wordlikeness rating &#126; Tone * OnsetVoicing + Tone * Lexicality + Lexicality * SyllableFrequency + Lexicality * SyllableGapFrequency + Lexicality * NeighborhoodDensity + PhonotacticProbability + SyllableType + (1 + Tone + OnsetVoicing + NeighborhoodDensity &#124; Participant) + (1 &#124; Item).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<sec>
<title>3.2.1. Corpus observation: More T2 and T3 gaps than T1 and T4 gaps</title>
<p>One of the main observations from our corpus study was that there were more T2 gaps than gaps of other tones (see <xref ref-type="fig" rid="F1">Figure 1</xref>). One goal of this experiment was to determine if this pattern would also be observed in Mandarin native speakers&#8217; wordlikeness ratings. That is, would Mandarin speakers rate T2 gaps as less wordlike than the T1, T2, and T4 gaps? The results are graphed in <xref ref-type="fig" rid="F5">Figure 5</xref>, which shows that T3 syllables, instead of T2 syllables, among gaps and real words were rated as the least wordlike, as indicated by the significant T3 effect (<italic>p</italic> = .007; <xref ref-type="table" rid="T3">Table 3</xref>). Post-hoc tests using the <italic>emmeans</italic> package (<xref ref-type="bibr" rid="B46">Lenth, Singmann, Love, Buerkner, &amp; Herve, 2019</xref>) showed that, though T2 was rated as less wordlike than T1 and T4, the ratings for T2, T1, and T4 did not significantly differ. These patterns held true in both words and gaps, as indicated by the lack of a <italic>Tone</italic>*<italic>Lexicality</italic> interaction (all <italic>p</italic> &gt; .05).</p>
<fig id="F5">
<label>Figure 5</label>
<caption>
<p>Standardized wordlikeness ratings as a function of tone and lexicality.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g5.png"/>
</fig>
<p>These findings diverge from the patterns observed in the corpus in the following ways. First, T2 gaps, rather than T3 gaps, were found to be over-represented. If the wordlikeness ratings strictly followed the pattern observed in our corpus study, we should have seen T2 gaps judged as the least wordlike. Instead, T3 gaps were rated as the least wordlike. Second, Mandarin speakers also rated T3 <italic>lexical syllables</italic> as less wordlike than lexical syllables with other tones, a pattern that also diverged from the Mandarin speakers&#8217; linguistic experience, as there were fewer T2 lexical syllables in the corpus (<xref ref-type="fig" rid="F6">Figure 6</xref>). Furthermore, T2 is the least frequent tone for lexical syllables, as indicated by a calculation of token and type frequency (again, based on syllables, not morphemes) of Mandarin tones using the TMC corpus (<xref ref-type="bibr" rid="B60">Tseng, 2019</xref>) (<xref ref-type="table" rid="T4">Table 4</xref>). As such, the aversion to T3 lexical syllables cannot be attributed to the linguistic experience of the Mandarin native speakers. One possible explanation is that T3, being a complex contour tone, is more marked than the other tones in Mandarin, which may lead to it seeming less wordlike.</p>
<fig id="F6">
<label>Figure 6</label>
<caption>
<p>Number of existing syllables as a function of tone from the corpus study.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g6.png"/>
</fig>
<table-wrap id="T4">
<label>Table 4</label>
<caption>
<p>Tone frequency in real words from TMC corpus (<xref ref-type="bibr" rid="B60">Tseng, 2019</xref>).</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Token frequency</bold></td>
<td align="left" valign="top"><bold>Type frequency</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">T1</td>
<td align="left" valign="top">105168</td>
<td align="left" valign="top">272</td>
</tr>
<tr>
<td align="left" valign="top">T2</td>
<td align="left" valign="top">96586</td>
<td align="left" valign="top">220</td>
</tr>
<tr>
<td align="left" valign="top">T3</td>
<td align="left" valign="top">129505</td>
<td align="left" valign="top">250</td>
</tr>
<tr>
<td align="left" valign="top">T4</td>
<td align="left" valign="top">228182</td>
<td align="left" valign="top">301</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In our corpus study, we also found more T3 gaps as the definition of gaps was shifted from the broad to the narrow view relative to the other tones. We speculated that the large number of T3 gaps might have arisen from avoiding the confusability between T2 and T3, as a T3 becomes a T2 before another T3 in a tone sandhi process. To explore this idea in terms of wordlikeness judgments, we compiled a subset of T2 and T3 gaps from the data comparing how Mandarin speakers rated T2 and T3 gaps whose T3 or T2 counterparts were <italic>not</italic> gaps as opposed to the T3 and T2 gaps with counterparts that were also gaps. For example, T3 [tsu&#331;]<sup>214</sup> &#8216;always&#8217; is a lexical syllable, but T2 *[tsu&#331;]<sup>35</sup> is a gap; however, Mandarin speakers would still have experience with T2 *[tsu&#331;]<sup>35</sup> as a sandhi form of T3 [tsu&#331;]<sup>214</sup>. In contrast, both T2 *[&#635;&#633;&#809;]<sup>35</sup> and T3 *[&#635;&#633;&#809;]<sup>214</sup> are gaps, so Mandarin speakers would not have been exposed to either form. <xref ref-type="fig" rid="F7">Figure 7</xref> shows the results of this analysis, which indicates that there was indeed a general tendency for T2 gaps with T3 lexical syllable counterparts to be given higher wordlikeness ratings. This suggests that T2 gaps may have been interpreted as a sandhi-ed T3, thereby improving their wordlikeness ratings. This trend, however, was not observed with T3 gaps with T2 lexical syllable counterparts, since surface T3 cannot be derived from T2 by any sandhi process in Mandarin. Chien et al. (<xref ref-type="bibr" rid="B9">2017</xref>) observed a similar asymmetrical pattern: Presenting a T3 prime facilitated a lexical decision for a T2 (underlyingT3)-T3 disyllabic word, while presenting a surface T2 prime did not facilitate a T2 (underlyingT3)-T3 disyllabic word. Our analysis of these subset data suggests that there is a close relationship between T2 and T3 in a direction that can be predicted by the T3 sandhi in Mandarin Chinese. The higher wordlikeness ratings of T2 gaps may partially explain why T3 words and gaps were rated as less wordlike overall (cf. Section 3.2.1).</p>
<fig id="F7">
<label>Figure 7</label>
<caption>
<p>Standardized wordlikeness ratings on T2/T3 gaps in which the T3/T2 counterparts are either gaps or real words.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g7.png"/>
</fig>
</sec>
<sec>
<title>3.2.2. Corpus observation: More T2 and T3 gaps with voiceless onsets but fewer T1 gaps with voiceless onsets</title>
<p>We found in our corpus study, and others have observed diachronically and cross-linguistically, that T2 and T3 syllables, with initially low <italic>f0</italic>, are less compatible with voiceless onsets giving rise to more T2 and T3 gaps with voiceless onsets. In contrast, T1, with an initially high <italic>f0</italic>, is more compatible with a voiceless onset and thus there are fewer T1 gaps with voiceless onsets. In our wordlikeness rating experiment, we found a significant <italic>Tone</italic>*<italic>OnsetVoicing</italic> interaction driven by the higher wordlikeness ratings of T1 gaps with voiceless onsets (<italic>T1</italic>*<italic>OnsetVoicing, p</italic> = .025; <xref ref-type="table" rid="T3">Table 3</xref>), as shown in <xref ref-type="fig" rid="F8">Figure 8</xref>. Post-hoc tests using the <italic>emmeans</italic> package (<xref ref-type="bibr" rid="B46">Lenth et al., 2019</xref>) confirmed no other <italic>Tone*OnsetVoicing</italic> interactions. However, the same was not observed for T2 and T3 gaps with voiced onsets, despite the compatibility between these tones with voiced onsets. This finding could be attributed to the general disfavoring of T2 and T3.</p>
<fig id="F8">
<label>Figure 8</label>
<caption>
<p>Standardized wordlikeness ratings as a function of onset voicing.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g8.png"/>
</fig>
</sec>
<sec>
<title>3.2.3. The effect of lexical statistics</title>
<p>Previous studies have demonstrated effects of lexical statistics for real words and unattested forms. Here, we aimed to determine if such effects would be present in the wordlikeness judgements of both gaps and lexical syllables. We found a significant <italic>NeighborhoodDensity</italic> effect (<italic>p</italic> = .027; <xref ref-type="fig" rid="F9">Figure 9</xref>), indicating that the more neighbors the syllable had, the more wordlike it was judged. These effects on lexical syllables and gaps were comparable, as suggested by the lack of interactions with <italic>Lexicality</italic>. These findings are in line with those in Lai (<xref ref-type="bibr" rid="B43">2003</xref>) and Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>). A <italic>SyllableGapFrequency</italic>*<italic>Lexicality</italic> interaction (<italic>p</italic> = .008; <xref ref-type="fig" rid="F10">Figure 10</xref>) was found, since only gap syllables were judged as more wordlike the more tonal neighbors they had. No <italic>SyllableFrequency</italic> or <italic>PhonotacticProbability</italic> effect was found.</p>
<fig id="F9">
<label>Figure 9</label>
<caption>
<p>Standardized wordlikeness ratings as a function of standardized neighborhood density.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g9.png"/>
</fig>
<fig id="F10">
<label>Figure 10</label>
<caption>
<p>Standardized wordlikeness ratings as a function of Syllable Gap Frequency.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g10.png"/>
</fig>
</sec>
</sec>
<sec>
<title>3.3. Summary</title>
<p>Based on the results of the wordlikeness judgement task, the patterns found in Mandarin speakers&#8217; perception of accidental gaps did not entirely match those found in our corpus study or other cross-linguistic studies. The Mandarin speakers rated T3 as less wordlike than T2, T1, and T4, both for gaps and lexical syllables, a result that was not predicted based on the patterns observed in our corpus study alone. We attributed this to the marked complex contour of T3. Furthermore, T2 and T3 are confusable due to the aforementioned tone sandhi process. Mandarin listeners&#8217; experience with T3 sandhi may have caused T2 gaps to be considered more acceptable to some degree, particularly when their T3 counterparts were real words. Mandarin listeners&#8217; judgements were not affected by syllables type. In fact, gaps with different syllable types were rated comparably. We did, however, find that T1 gaps with voiceless onset were judged as more acceptable, a pattern that was also observed in our corpus study and cross-linguistically.</p>
<p>We also found that, similar to lexical syllables, gap syllables were affected by neighborhood density&#8212;the more neighbors the syllable had, the higher the ratings.</p>
</sec>
</sec>
<sec>
<title>4. Modeling wordlikeness with phonotactic grammars</title>
<p>In this section, we model the results of the wordlikeness experiment with constraint-based grammars using the UCLA Phonotactic Learner (<xref ref-type="bibr" rid="B31">Hayes &amp; Wilson, 2008</xref>) to explore the extent to which Mandarin lexicon is useful in establishing a phonotactic grammar that models the speakers&#8217; tonotactic knowledge. To answer this question, we generated tonotactic constraints in three settings according to the degree of lexical access. In the first setting, we built the tonotactic grammar with the learner <italic>entirely</italic> from the inductive process, reflecting the view that the lexicon itself is sufficient as the source of phonotactic knowledge. In the second setting, we used the lexicon and the inductive process to select and weight a small set of typologically motivated constraints, reflecting the view that phonotactic knowledge is built based on a smaller hypothesis space in which inductive learning also plays a role. In the third setting, the tonotactic grammar also contains typologically motivated constraints but <italic>makes no reference</italic> to the lexicon and the inductive process, reflecting the view that innate constraints are separate from lexical knowledge. These settings enable us to more accurately evaluate the role of lexical statistics in shaping native speakers&#8217; tonotactic knowledge as revealed through the wordlikeness ratings.</p>
<p>The UCLA Phonotactic Learner is an inductive learning tool that takes a wordlist as input and yields a constraint-based phonotactic grammar based on the principle of Maximum Entropy (<xref ref-type="bibr" rid="B14">Della Pietra et al., 1997</xref>; <xref ref-type="bibr" rid="B23">Goldwater &amp; Johnson, 2003</xref>; <xref ref-type="bibr" rid="B31">Hayes &amp; Wilson, 2008</xref>; <xref ref-type="bibr" rid="B73">Zuraw &amp; Hayes, 2017</xref>). The induced grammar contains surface constraints that are weighted as in the framework of Harmonic Grammar (<xref ref-type="bibr" rid="B45">Legendre, Miyata, &amp; Smolensky, 1990</xref>; <xref ref-type="bibr" rid="B59">Smolensky &amp; Legendre, 2006</xref>). The constraints refer to sequences of under-attested natural classes in the wordlist. The natural classes are defined by a feature matrix provided by the user prior to a learning simulation. The learner assumes a probability space shared by all possible word forms based on the segments that are provided. Unattested and under-attested forms in the provided lexicon are penalized, which can be translated into lower probabilities of those forms; meanwhile, the learner increases the probabilities that it assigns to the attested forms, especially over-attested ones, in the lexicon. Maximum Entropy here thus refers to the fact that the weights in the grammar are induced in a way that maximizes the probabilities of the possible word forms that the training lexicon is drawn from, not just the lexicon itself. The induced constraints target sequences of natural classes. For example, a constraint that penalizes consonant clusters bears the form of *[+consonantal][+consonantal].</p>
<p>A few parameters control how the learner induces a grammar. The adjusted observed-over-expected (O/E) threshold restricts the learner to only induce constraints that refer to co-occurrences of natural groups whose adjusted O/E ratio is below a specified number. The O/E ratio describes the actual number of occurrences of certain combinations (O) divided by the expected number (E) based on random and unrestricted combinations. For example, given a strictly CV language with only five vowels /a, i, y, o, u/ and three onset consonants /p, b, t/, we would expect six occurrences of [+labial][+round] syllables (i.e., /po/, /bo/, /pu/, /bu/, /py/, /by/). If we only see /po/ in the actual lexicon, the O/E of [+labial][+round] would be 1/6 &#8776; 0.167. A smaller number indicates that a particular sequence is under-attested. The UCLA Learner uses the statistical &#8220;upper confidence limit&#8221; (<xref ref-type="bibr" rid="B53">Mikheev, 1997</xref>; Albright &amp; Hayes, 2002, <xref ref-type="bibr" rid="B3">2003</xref>) to adjust O/E. This adjustment method has the effect of treating generalizations with a larger E as stronger. For example, under this method, the difference between an O/E of 0/10 and 0/1000 would be adjusted to 0.22 and 0.002, respectively.</p>
<p>The user can also provide a number of constraints that the learner should aim to induce. Beyond affecting the size of the induced grammar, varying the targeted number of constraints also alters the nature of the learned constraints. The learner prioritizes inducing constraints with a lower adjusted O/E value. This is referred to as the &#8220;accuracy&#8221; heuristic. Given the same level of accuracy, the learner prioritizes constraints that describe smaller n-grams (e.g., bigrams preferred over trigrams) and constraints with natural classes that cover more segments. This is referred to as the &#8220;generality&#8221; heuristic. With these two heuristics, constraints that are induced earlier or are induced when a simulation aims for fewer constraints would be more general and accurate (i.e., exceptionless or with a larger E).</p>
<p>Finally, the user can specify the maximum n-grams a constraint should try to capture. A larger <italic>n</italic> number increases the number of possible constraints for the learner to consider, especially given a larger number of natural classes. For example, with 400 natural classes, there would be 160,000 possible bigram constraints, 64 million possible trigram constraints, 26 billion possible 4-gram constraints, and 10 trillion possible 5-gram constraints; thus, bigrams and trigrams are preferred for segmental constraints (<xref ref-type="bibr" rid="B31">Hayes &amp; Wilson, 2008</xref>).</p>
<p>The grammar induced by the learner can then be used to assign harmonic scores to word forms, and the scores can be compared with results of behavioral experiments, such as wordlikeness and lexical decision tasks (<xref ref-type="bibr" rid="B6">Berent et al., 2012</xref>; <xref ref-type="bibr" rid="B12">Daland et al., 2011</xref>; <xref ref-type="bibr" rid="B22">Gallagher et al., 2019</xref>; <xref ref-type="bibr" rid="B23">Goldwater &amp; Johnson, 2003</xref>; <xref ref-type="bibr" rid="B30">Hayes &amp; White, 2013</xref>; <xref ref-type="bibr" rid="B65">Wilson &amp; Gallagher, 2018</xref>). The inductive process can also start with a set of pre-written constraints. In such cases, the learner can be used to add more constraints to the grammar or simply to determine the weights of the pre-written constraints based on a provided list of word forms.</p>
<p>In the current study, we extend this methodology to the modeling of wordlikeness ratings of syllable-tone gaps. This provides an opportunity to employ a computational learner to model possible syllable-tone phonotactics, which has not yet been fully explored in the literature. A few recent studies have employed similar approaches in using phonotactic well-formedness to measure experimental results in tone languages. Gong (<xref ref-type="bibr" rid="B24">2017</xref>) also used the UCLA Phonotactic Learner, but with two crucial differences. First, his data came from a visual lexical decision experiment, where stimuli were represented with <italic>Bopomofo</italic>, a phonetic alphabet used in Taiwan (<xref ref-type="bibr" rid="B55">Myers &amp; Tsay, 2005</xref>). Second, and more importantly, he focused solely on segmental patterns without modeling syllable-tone combinations and found that unattested segmental combinations with higher harmonic scores elicited a higher proportion of wordlike responses and shorter response times. In a study similar to ours, Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>) compared wordlikeness ratings and harmonic scores from the UCLA Phonotactic Learner. Their approach also compared handwritten and induced grammars; however, the focus of their modeling was on attested and unattested forms in Mandarin. Moreover, their stimuli only included syllables with a high tone, without a specific focus on tonotactic gaps as in this study.</p>
<p>In a similar line of research, Do and Lai (<xref ref-type="bibr" rid="B15">2020</xref>) modeled nonce syllables in Cantonese, including both unattested segmental combinations and syllable-tone gaps. To estimate the probability of a syllable-tone combination, they used the probability of tone given the entire or part of the segmental strings. For example, the probability of /pit/<sup>55</sup> was estimated by the likelihood of /pit/ having the high level tone (i.e., P(X<sup>55</sup> &#124; pit)), the likelihood of a syllable with the vowel /i/ having the high level tone (i.e., P(X<sup>55</sup> &#124; _i_), the likelihood of a syllable with the onset /p/ having the high level tone (i.e., P(X<sup>55</sup> &#124; p_)), the likelihood of a syllable with the coda /t/ having the high level tone (i.e., P(X<sup>55</sup> &#124; _t)), and the likelihood of a syllable with the rime /it/ having the high level tone (i.e., P(X<sup>55</sup> &#124; _it)). These probabilities were estimated by multinomial logistic regression analyses. Their Bayesian statistical analysis showed that phonotactic probabilities calculated in this way affected how wordlike a nonword syllable was judged to be, but in cases where the stimuli were judged to be absolutely unwordlike, there was no effect of phonotactic probability.</p>
<p>This study complements previous studies by testing a range of phonotactic grammars with varying degrees of access to different types of lexical data on how well they model wordlikeness ratings. More importantly, by testing tonotactic constraints and weights that reflect statistics gleaned from the lexicon as well as those that do not, we aim to tease apart statistical patterns of gaps and universal markedness and how they may account for native speakers&#8217; tonotactic knowledge.</p>
<p>The rest of this section is organized as follows: Section 4.1 describes the different settings for building the constraint-based phonotactic grammars. Section 4.2 discusses the induced phonotactic constraints, particularly whether they capture similar generalizations by typologically-motivated markedness constraints. Section 4.3 examines the correlation between the phonotactic grammars&#8217; well-formedness scores (MaxEnt scores) and the wordlikeness ratings from our behavioral experiment, with a focus on whether the inductive learning process from the lexicon is necessary for building a grammar that predicts the behavioral results. Finally, Section 4.4 summarizes our analysis on tonotactic grammars.</p>
<sec>
<title>4.1. Building the phonotactic grammars</title>
<p>The set of phonetic symbols used in our analysis is shown in <xref ref-type="table" rid="T5">Table 5</xref>, with a breakdown of prosodic positions in which these sounds may occur. The symbol set was similar to that in Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>), treating [e] and [o] as separate sounds from [&#601;] despite a possible phonemic analysis that treats them as the same underlying phoneme. The learner was thus expected to induce phonotactic constraints that describe the complementary distribution of [e] and [o]. In the other two instances of complementary distribution, our symbol set took an allophonic analysis. First, the apical vowels behind sibilants were transcribed as /i/ (similar to Gong and Zhang). Second, unlike Gong and Zhang who separated /a/ and /&#593;/, we transcribed the low vowel as /a/ regardless of where it occurred. We also added the vowel /&#602;/, which was not included in Gong and Zhang&#8217;s study. All segments were annotated with the distinctive features required by the learner. Following Gong and Zhang, we used binary place features.</p>
<table-wrap id="T5">
<label>Table 5</label>
<caption>
<p>Segmental inventories in different prosodic positions in the learning simulation.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top">C</td>
<td align="left" valign="top">p p&#688; m f t t&#688; n l t&#597; t&#597;&#688; &#597; ts ts&#688; s t&#642; t&#642;&#688; &#642; &#656; k k&#688; x</td>
</tr>
<tr>
<td align="left" valign="top">G</td>
<td align="left" valign="top">j w &#613;</td>
</tr>
<tr>
<td align="left" valign="top">V</td>
<td align="left" valign="top">a &#601; e o i u y &#602;</td>
</tr>
<tr>
<td align="left" valign="top">X</td>
<td align="left" valign="top">i u n &#331;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We applied Hayes and Wilson&#8217;s (<xref ref-type="bibr" rid="B31">2008</xref>) treatment of lexical stress to the four lexical tones. Specifically, vowels with different tones were treated as separate vowel types (<xref ref-type="bibr" rid="B40">Kirby &amp; Yu, 2007</xref>). For example, /a/ with different tones was transcribed as /a/<sup>55</sup>, /a/<sup>35</sup>, /a/<sup>214</sup>, and /a/<sup>51</sup> in the lexicon. Following Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>), we used privative tonal features to refer to these four tones. This treatment represents a null hypothesis concerning how the tones may be grouped into natural classes; that is, we did not force tones to be grouped into [+high tones] or [+rising tones] and bias the learners towards adopting such generalizations. Under this treatment, a constraint penalizing voiced onsets in T1 syllables would bear the form of *[+voice,+consonantal][T1]. Note that this approach can also handle the interaction between tones and vowels, penalizing certain tone-vowel combinations. For example, *[+low, T3] penalizes low vowels in T3. This way, under-attested combinations of classes of consonants, vowels, and tones could all be formulated and induced as constraints by the learner.</p>
<p>In addition to the provided segmental and tonal features, the learner adds a [syllable boundary] feature (abbreviated as [sb] henceforth). The feature describes the contrast between the syllable boundary ([+sb]) and non-boundary tokens ([&#8211;sb]), which essentially refers to all segments. This is how the learner expresses constraints that refer to syllable boundaries. For instance, *[+consonant][+sb] is a constraint that penalizes codas. This also allows the learner to describe a constraint that refers to &#8220;all segments&#8221; in a particular position. For example, *[+consonant][&#8211;sb][+syllabic] penalizes any segment between a consonant and a vowel.</p>
<p>The learner&#8217;s inductive process required a list of word forms as the training data. We used a word list with 16,684 distinct lexical items from the TMC corpus. During training, these lexical items were broken down into 36,623 monosyllabic units (i.e., individual tokens of syllable-tone combinations). Similar types of training data have been used for inducing phonotactic constraints (e.g., <xref ref-type="bibr" rid="B25">Gong &amp; Zhang, 2021</xref>; <xref ref-type="bibr" rid="B26">Gouskova &amp; Gallagher, 2020</xref>; <xref ref-type="bibr" rid="B31">Hayes &amp; Wilson, 2008</xref>). Since a syllable-tone combination may appear as homophonous, the word list did include some information about type frequency of distinct syllable-tone combinations in the lexicon. For example, the training data were given the information that the syllable-tone combination /t&#688;a<sup>55</sup>/ occurs in 24 distinct words in the TMC corpus. Note that this type of frequency count only refers to how a particular tone-syllable combination occurs in unique lexical items and makes no reference to lexical token frequency in the TMC corpus, which was used as the Frequency variable in our experimental analysis.</p>
<p>Since our goal was to see the different extents to which the lexicon is needed for inducing tonotactic constraints that best model speakers&#8217; wordlikeness ratings, we ran the learning simulation in three different settings: The strong induction setting, the weak induction setting, and the no induction setting, each of which are described in greater detail in the following.</p>
<p>The <bold>Strong Induction</bold> setting: We ran learning simulations that aimed to induce 300 constraints. The maximum constraint length was set to trigrams. Out of the first 75, 150, and all 300 constraints<xref ref-type="fn" rid="n6">6</xref>, we selected constraints referring to segment-tone interactions, resulting in increasing sets of 25, 78, and 210 inductive tonotactic constraints (hereafter, the Small, Medium, and Large Strong Induction grammars). As mentioned earlier, constraints that are learned later in the simulation tend to be less accurate and less general. Grammars with more constraints learned later in the simulation and grammars aiming to learn more constraints are more likely to overfit to the lexicon by capturing statistical patterns that describe accidental gaps instead of describing more general phonotactic knowledge. On the other hand, grammars with only a few constraints induced in an early stage might not have captured relevant phonotactic knowledge, as only strong generalizations that target larger natural classes would be included. By varying the number of tonotactic constraints in this setting, we tested the extent to which the levels of statistical fit needed to be leveraged to induce a grammar that best models speakers&#8217; wordlikeness ratings. Finally, since we are only interested in tonotactic constraints, we only took the induced constraints that refer to segment-tone interactions from the simulations.</p>
<p>The <bold>Weak Induction</bold> setting: In this setting, we did not ask the learner to induce novel tonotactic constraints. Instead, we made it reweight a smaller set of ten typologically-motivated constraints, shown in (1) below. The constraints *[T3] and *[T2] are motivated by the typological markedness of contour and rising tones (e.g., <xref ref-type="bibr" rid="B68">Yip, 2002</xref>; <xref ref-type="bibr" rid="B70">Zhang, 2001</xref>). The other eight constraints refer to the incompatibility of high tones with voiced onsets, and low tones with voiceless onsets (e.g., <xref ref-type="bibr" rid="B34">Hsieh &amp; Kenstowicz, 2008</xref>; <xref ref-type="bibr" rid="B39">Kenstowicz &amp; Suchato, 2006</xref>; <xref ref-type="bibr" rid="B56">Ohala, 1978</xref>; <xref ref-type="bibr" rid="B57">Sagart, 1999</xref>; <xref ref-type="bibr" rid="B68">Yip, 2002</xref>). It is worth noting that since tones are carried by vowels in our learning simulation, the onset-tone interaction in syllables with an onglide has to be captured in a separate series of constraints (e.g., *[+voice][&#8211;consonantal][T1] and *[&#8211;voice][&#8211;consonantal][T2]). This Weak Induction setting represents a scenario where the learner has a smaller hypothesis space concerning what the relevant and important tonotactic constraints are in this language.</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(1)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Typologically-motivated tonotactic constraints:</p></list-item>
<list-item><p>Markedness of contour and rising tones: *[T3], *[T2]</p></list-item>
<list-item><p>Markedness of voiced onsets with high tones: *[+voice][T1], *[+voice][T4],</p></list-item>
<list-item><p>*[+voice][&#8211;consonantal][T1], *[+voice][&#8211;consonantal][T4]</p></list-item>
<list-item><p>Markedness of voiceless onsets with low tones: *[&#8211;voice][T2], *[&#8211;voice][T3],</p></list-item>
<list-item><p>*[&#8722;voice][&#8211;consonantal][T2], *[&#8722;voice][&#8211;consonantal][T3]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p>The <bold>No Induction</bold> setting: Phonotactic grammars in this setting also contain typologically-motivated tonotactic constraints. The difference from the Weak Induction setting is that the weights of these constraints were decided independent of the lexicon. This represents a hypothesis where tonotactic constraints do not need to be informed by lexical knowledge, which we also call the baseline setting.</p>
<p>Since we are interested in tonotactic constraints, in all three settings, we used Gong and Zhang&#8217;s (<xref ref-type="bibr" rid="B25">2021</xref>) 38 handwritten segmental constraints (Appendix II) along with the induced and typologically-motivated tonotactic constraints. These handwritten constraints refer to systematic, allophonic, and accidental segmental gaps in the Mandarin lexicon, and have shown to be much more effective in modeling native speakers&#8217; behavioral results than inductive segmental constraints. These segmental constraints served as the baseline grammar for accounting for variances in wordlikeness ratings that are not related to segment-tone interactions.</p>
<p>As part of this process, in the Strong Induction setting, the induced tonotactic constraints were added to the handwritten segmental constraints before undergoing another round of reweighting. In the Weak Induction setting, the typologically-motivated tonotactic constraints were reweighted along with the handwritten segmental constraints. In the No Induction setting, the handwritten segmental constraints were themselves weighted by the lexicon and used alongside the typologically-motivated tonotactic constraints. In other words, in all three induction conditions, the handwritten segmental constraints were weighted by the lexicon. This was a methodological choice made to simplify the non-tonal part of the grammar induction process. However, we acknowledge the possibility that, similar to tonotactic constraints, there may be other methods for assigning weights to segmental constraints that better account for native speakers&#8217; knowledge of segmental phonotactics. The procedure for generating the phonotactic grammars is summarized in <xref ref-type="table" rid="T6">Table 6</xref>.</p>
<table-wrap id="T6">
<label>Table 6</label>
<caption>
<p>Generating phonotactic grammars with different tonotactic constraints and weights.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Setting</bold></td>
<td align="left" valign="top"><bold>Procedure</bold></td>
<td align="left" valign="top"><bold>Number of resulting grammars</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Strong Induction</td>
<td align="left" valign="top">Pick the first 75, 150, and all 300 constraints from a simulation<break/>&#10132; pick tonotactic constraints (25, 78, 210)<break/>&#10132; reweight these tonotactic constraints along with 38 handwritten segmental constraints</td>
<td align="left" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">Weak Induction</td>
<td align="left" valign="top">Weight 10 typologically motivated constraints along with 38 handwritten segmental constraints</td>
<td align="left" valign="top">1</td>
</tr>
<tr>
<td align="left" valign="top">No Induction (baseline)</td>
<td align="left" valign="top">Weight 38 handwritten segmental constraints<break/>&#10132; use the weighted constraints along with 10 typologically motivated constraints with baseline weights (a weight of 3 for all 10 constraints)</td>
<td align="left" valign="top">1</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>4.2. The induced tonotactic constraints</title>
<p>In this section, we discuss the content of the inductive tonotactic constraints, with a focus on whether they refer to the markedness of T3 and T2, or to the interaction between onset voicing and tone.</p>
<p>From the Small Strong Induction grammar with 25 induced tonotactic constraints, six constraints referred to the interaction between onset voicing and tone in the same direction as the typologically-motivated constraints, as shown in <xref ref-type="table" rid="T7">Table 7</xref>. Two of these penalized sequences of voiced onsets before a vowel with T1, even though they targeted smaller natural classes (e.g., non-/a/ vowel and non-labial nasals) instead of all voiced onsets and all vowels with T1. Other constraints consistent with the hypothesized direction targeted much smaller natural classes: Non-labial nasals before /o, &#601;, &#602;/ with T4; /s, &#597;/ before non-high vowels with T2; and the aspirated /t&#688;, ts&#688;, t&#597;&#688;/ before mid vowels with T3. No constraints could be interpreted as a general restriction against T2 or T3 syllables, even though there were slightly more constraints referring to sequences with T3 and T2 (seven and eight constraints, respectively) than with T1 and T4 (four and six constraints, respectively). See Appendix III for the full list of inductive tonotactic constraints in the Small Strong Induction grammar.</p>
<table-wrap id="T7">
<label>Table 7</label>
<caption>
<p>Constraints from the Small Strong Induction grammar that targeted the interaction between onset voicing and tones in the expected directions given typological observations.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Constraint</bold></td>
<td align="left" valign="top"><bold>Weight</bold></td>
<td align="left" valign="top"><bold>Penalized sequences</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">*[+cons,+voice][&#8211;low,T1]</td>
<td align="left" valign="top">2.164</td>
<td align="left" valign="top">Voiced onset and non-/a/ vowels in T1</td>
</tr>
<tr>
<td align="left" valign="top">*[+nasal,&#8211;labial][T1]</td>
<td align="left" valign="top">2,753</td>
<td align="left" valign="top">Onset /n/ in T1</td>
</tr>
<tr>
<td align="left" valign="top">*[+nasal,&#8211;labial][&#8211;high,&#8211;low,&#8211;front,T4]</td>
<td align="left" valign="top">3.772</td>
<td align="left" valign="top">/n/ before /o, &#601;, &#602;/ in T4</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8211;voice,+cont,+ant][&#8211;high,T2]</td>
<td align="left" valign="top">1.512</td>
<td align="left" valign="top">/s, &#597;/ before non-high vowels in T2</td>
</tr>
<tr>
<td align="left" valign="top">*[+aspirated,+anterior][&#8211;high,&#8211;low,T3]</td>
<td align="left" valign="top">1.673</td>
<td align="left" valign="top">/t&#688;, ts&#688;, t&#597;&#688;/ before /e, o, &#601;, &#602;/ in T3</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8211;voice,+coronal][&#8211;high,+front,T3]</td>
<td align="left" valign="top">2.296</td>
<td align="left" valign="top">Voiceless coronals before /e/ in T3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In the Medium Strong Induction grammar, there were a few constraints that penalized different subsets of the interaction between voiced onsets and T1. These are shown in <xref ref-type="table" rid="T8">Table 8</xref>. Other than the constraint *[+voice][&#8211;sb][T1], which penalized voiced onsets in a T1 syllable with an onglide, the other four constraints again targeted smaller natural classes than the typologically-motivated constraints.<xref ref-type="fn" rid="n7">7</xref> It is also worth noting that there were no such constraints for T4.</p>
<table-wrap id="T8">
<label>Table 8</label>
<caption>
<p>Induced constraints in the Medium Strong Induction grammar that refer to incompatibility between voiced onsets and T1/T4.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Constraint</bold></td>
<td align="left" valign="top"><bold>Weight</bold></td>
<td align="left" valign="top"><bold>Penalized sequences</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">*[+voice][&#8211;front][&#8211;back,T1]</td>
<td align="left" valign="top">1.919</td>
<td align="left" valign="top">Voiced onsets before non-back vowels in a T1 syllable with the onglide /w/</td>
</tr>
<tr>
<td align="left" valign="top">*[+cons,+voice][+low,T1][&#8211;labial]</td>
<td align="left" valign="top">1.98</td>
<td align="left" valign="top">Voiced onsets before /a/ in a T1 syllable with nasal coda</td>
</tr>
<tr>
<td align="left" valign="top">*[+voice,+delayed][T1]</td>
<td align="left" valign="top">2.269</td>
<td align="left" valign="top">The onset /&#656;/ in a T1 syllable</td>
</tr>
<tr>
<td align="left" valign="top">*[+voice][&#8211;sb][T1]</td>
<td align="left" valign="top">4.124</td>
<td align="left" valign="top">Voiced onsets in a T1 syllable with an onglide ([&#8211;sb] refers to the natural class of all &#8220;non-boundary&#8221; segments.)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Similar to the Small Strong Induction grammar, constraints that penalized voiceless onsets and T2/T3 mostly targeted smaller natural classes down to two consonants and one vowel. Some of these are shown in <xref ref-type="table" rid="T9">Table 9</xref>. Again, there were no constraints that targeted T2 and T3 in general, though slightly more constraints targeted T2 and T3 (26 and 21 constraints) than T1 and T4 (18 and 13 constraints).</p>
<table-wrap id="T9">
<label>Table 9</label>
<caption>
<p>Induced constraints in the Medium Strong Induction grammar that referred to incompatibility between voiceless onsets and T2/T3.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Constraint</bold></td>
<td align="left" valign="top"><bold>Weight</bold></td>
<td align="left" valign="top"><bold>Penalized sequences</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">*[&#8211;aspirated][T2][+consonantal]</td>
<td align="left" valign="top">3.982</td>
<td align="left" valign="top">Unaspirated onsets in T2 syllables with codas</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8211;voice,+cont,&#8211;labial,&#8211;dorsal] [+high,+round,T2]</td>
<td align="left" valign="top">2.024</td>
<td align="left" valign="top">/s, &#642;/ before /u, y/ in T2</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8211;aspirated,&#8211;anterior][&#8722;high,+back,T3]</td>
<td align="left" valign="top">1.954</td>
<td align="left" valign="top">Unaspirated onsets before /o/ in T3 syllables</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>A similar trend was observed when we examined the additional constraints in the Large Strong Induction grammar: There were constraints that penalized voiced onsets and T1/T4 in the same direction (e.g., *[+voice,&#8211;coronal][&#8211;sb][&#8211;back,T1], *[+nasal][&#8211;front][&#8211;back,T4]) and voiceless onsets and T2/T3 (e.g., *[+aspirated,labial][&#8211;front][+round,T2], *[&#8211;voice,&#8211;dorsal][&#8211;back][+round,T3]). As the additional constraints tended to target sequences of smaller natural classes, there were again no constraints specifically targeting T2 and T3 in general, but still more constraints targeting sequences bearing subsets of T3 and T2 (64 and 59 constraints) syllables than T1 and T4 (44 and 43 constraints).</p>
<p>Beyond these constraints that could be interpreted as being relevant for onset-tone interactions in the expected directions, many more constraints targeted other aspects of syllable-tone interactions such as dorsal onsets in certain vowel-tone configurations (e.g., *[+consonantal,&#8211;labial,&#8211;coronal][+front,T2]), certain groups of onsetless vowels in certain tones (*[+sb][&#8211;high,+back,T2]), or the /&#602;/ vowel with T1. They could also target specific tonotactic gaps; for example, *[+sb][+low,T3] describes the absence of onsetless /a/ in T3.</p>
<p>Overall, among all the inductive onset-tone interaction constraints, those involving voiced onsets and T1 tended to be more general, targeting broader natural classes). While constraints penalizing voiced-T4 and voiceless-T2/T3 combinations were also induced, they mostly targeted smaller natural classes. The completely bottom-up inductive method also failed to identify constraints that targeted a larger proportion of T2 and T3 syllables, although there were more constraints that targeted subsets of the interaction between T2/T3 syllables and the natural classes referring to onsets and codas.</p>
</sec>
<sec>
<title>4.3. Correlation between harmonic scores and wordlikeness ratings</title>
<p>Grammars generated in different settings were used to assign harmonic scores H(x) to each stimulus. The correlation between wordlikenss ratings and harmonic scores was then used to evaluate how well different grammars predict the experimental results. For each stimuli <italic>x</italic>, a grammar assigned a harmonic score based on the summed weights of the constraints violated by <italic>x</italic>, as shown in (3a). In this study, we follow Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>) in using MaxEnt scores calculated by raising <italic>e</italic> to the negative power of the harmonic score H(x), as shown in (3b) The MaxEnt scores ranged from 0 (least well-formed) to 1 (most well-formed).</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(2)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Harmonic score and MaxEnt score</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>&#160;</p></list-item>
</list>
<list list-type="wordfirst">
<list-item><p>a.</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p><italic>H</italic>(<italic>x</italic>) = &#8721;<sub><italic>i</italic></sub> <italic>w</italic><sub><italic>i</italic></sub> <italic>C</italic><sub><italic>i</italic></sub>(<italic>x</italic>)</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>&#160;</p></list-item>
</list>
<list list-type="wordfirst">
<list-item><p>b.</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p><italic>MaxEnt</italic>(<italic>x</italic>) = <italic>e<sup>&#8211;H</sup></italic><sup>(</sup><italic><sup>x</sup></italic><sup>)</sup></p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p><xref ref-type="table" rid="T10">Table 10</xref> shows the correlation between the wordlikeness ratings and the MaxEnt scores assigned by the different grammars. Following Gong and Zhang (<xref ref-type="bibr" rid="B25">2021</xref>), we report the correlation between MaxEnt scores and wordlikeness ratings of all stimuli, including both the gaps and the lexical syllables (&#8220;all ratings&#8221;). Additionally, we report correlation results specifically for gaps, as we are particularly interested in how different phonotactic grammars account for participants&#8217; ratings of items outside of the lexicon. However, we have chosen not to report or discuss the correlation between ratings and MaxEnt scores for lexical syllables, as the high ratings for these syllables may create a strong ceiling effect that could reduce the informativeness of the phonotactic effect.</p>
<table-wrap id="T10">
<label>Table 10</label>
<caption>
<p>Comparison of correlation between wordlikeness ratings and MaxEnt scores from different grammars.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Grammar</bold></td>
<td align="left" valign="top"><bold>Number of tonotactic constraints</bold></td>
<td align="left" valign="top"><bold>Correlation with all ratings</bold></td>
<td align="left" valign="top"><bold>Correlation with gap ratings</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">No Induction baseline</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">0.229</td>
<td align="left" valign="top">0.372</td>
</tr>
<tr>
<td align="left" valign="top">Weak Induction</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">0.273</td>
<td align="left" valign="top">0.302</td>
</tr>
<tr>
<td align="left" valign="top">Strong Induction (small)</td>
<td align="left" valign="top">25</td>
<td align="left" valign="top">0.232</td>
<td align="left" valign="top">0.059</td>
</tr>
<tr>
<td align="left" valign="top">Strong Induction (medium)</td>
<td align="left" valign="top">78</td>
<td align="left" valign="top">0.468</td>
<td align="left" valign="top">0.117</td>
</tr>
<tr>
<td align="left" valign="top">Strong Induction (large)</td>
<td align="left" valign="top">210</td>
<td align="left" valign="top">0.558</td>
<td align="left" valign="top">0.116</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The correlations show that having more inductive tonotactic constraints allowed the grammars&#8217; harmonic scores to better predict the overall wordlikeness ratings. However, when we focus on the ratings for gaps, the grammars with typologically motivated tonotactic constraints (i.e., those with the Weak and No Induction settings) vastly outperformed the grammar with inductive tonotactic constraints (i.e., that with the Strong Induction settings). Having more inductive tonotactic constraints in the Strong Induction setting helped predict overall wordlikeness ratings but did not improve predictions on the ratings for gaps only, suggesting that the inductive tonotactic constraints are increasingly able to differentiate between lexical syllables and gaps. This contrast suggests that while learning more nuanced segment-tone interactions in the lexicon helps further modeling the difference between lexical syllables and gaps, it was not very helpful in modeling how native speakers judged certain gaps to be more wordlike. Another finding worth noting is that the baseline grammar with the No Induction setting outperformed the grammar with the Weak Induction setting in which the constraints were weighted by the lexicon.</p>
<p><xref ref-type="fig" rid="F11">Figure 11</xref> shows the correlations between the wordlikeness ratings and the MaxEnt scores from the grammars with inductive constraints (the Strong Induction condition). With only a small number of inductive tonotactic constraints, very few stimuli were marked as less well-formed. When more tonotactic constraints were induced, gradient differences in well-formedness started to emerge, and more gaps began receiving low MaxEnt scores, which explains why the correlation between the MaxEnt scores and wordlikeness ratings increased as the number of inductive constraints grew. However, this trend did not strengthen the correlation between the MaxEnt scores and the ratings for gaps.</p>
<fig id="F11">
<label>Figure 11</label>
<caption>
<p>Correlation between the MaxEnt scores and wordlikeness ratings from the Strong Induction grammar. Light dots refer to lexical syllables, and dark dots refer to gaps. Horizontal jittering with a range of 0.02 was applied to the dots to reduce overlaps. The solid lines indicate slopes for MaxEnt scores predicting ratings for all items, and the dotted lines indicate slopes for predicting ratings for gaps only. The shading represents the 95% confidence interval for the slopes.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g11.png"/>
</fig>
<p><xref ref-type="fig" rid="F12">Figure 12</xref> shows the correlation between the MaxEnt scores and wordlikeness ratings for the No Induction and Weak Induction grammars, both of which consisted of only typologically-motivated tonotactic constraints. For the No Induction grammar with baseline weights, the MaxEnt scores of a stimulus decreased if it was with T3 and T2, had a voiced onset with T1/T4, or had a voiceless onset with T2/T3, which explains the almost binary distribution horizontally. There were also a large number of lexical syllables with very low MaxEnt scores, which explains the relatively poor correlation between the MaxEnt scores and wordlikeness ratings in this condition. The fact that the No Induction baseline grammar failed to distinguish lexical syllables and gaps also suggests that it may simulate a cognitive module for tonotactics independent of the lexicon.</p>
<fig id="F12">
<label>Figure 12</label>
<caption>
<p>Correlation between the No Induction baseline (left) and Weak Induction grammars&#8217; MaxEnt scores and wordlikeness ratings. Light dots refer to lexical syllables, and dark dots refer to gaps. Horizontal jittering with a range of 0.02 was applied to the dots to reduce overlaps. The solid lines indicate slopes for MaxEnt scores predicting ratings for all items, and the dotted lines indicate slopes for predicting ratings for gaps only. The shading represents the 95% confidence interval for the slopes.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g12.png"/>
</fig>
<p>The Weak Induction grammar, on the other hand, only penalized items with T3 and T2 as well as items with a voiced onset and T1 (<xref ref-type="table" rid="T11">Table 11</xref>). In other words, based on the lexicon, the learner only found statistical support for the dispreference for T3, T2, and syllables with voiced onsets and T1. By incorporating these weights, the number of lexical syllables with low MaxEnt scores was greatly reduced, explaining the advantage of this grammar over the No Induction baseline grammar in terms of predicting ratings for all stimuli.</p>
<table-wrap id="T11">
<label>Table 11</label>
<caption>
<p>Comparison of constraint weights in the No Induction baseline and the Weak Induction grammars.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Constraint</bold></td>
<td align="left" valign="top"><bold>Weights in Weak Induction grammar</bold></td>
<td align="left" valign="top"><bold>Weights in No Induction baseline grammar</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">*[T3]</td>
<td align="left" valign="top">0.905</td>
<td align="left" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">*[T2]</td>
<td align="left" valign="top">0.664</td>
<td align="left" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">*[+voice][T1]</td>
<td align="left" valign="top">2.852/2.259</td>
<td align="left" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">*[+voice][T4]</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8211;voice][T2]</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8211;voice][T3]</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">Correlation with overall ratings</td>
<td align="left" valign="top">0.273</td>
<td align="left" valign="top">0.229</td>
</tr>
<tr>
<td align="left" valign="top">Correlation with gap ratings</td>
<td align="left" valign="top">0.302</td>
<td align="left" valign="top">0.372</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>However, when we focus on gaps only, it is the No Induction grammar that best predicts the ratings. In other words, blindly penalizing certain types of stimuli (i.e., those with T3, T2, voiced onset-T1/T4, voiceless onset-T2/T3) can better explain which gaps were rated as more wordlike by the native speakers.</p>
<p>To explore the optimal weights for the typologically-motivated constraints, we altered the weights of the constraints in the No Induction grammar from the fixed baseline with a grid search: The No Induction grammar with all combinations of the five weights (0, 1.5, 3, 4.5, 6) for the typologically-motivated constraints were tested. We simplified the process by giving the constraints with and without reference to glides (e.g., *[+voice][T1] and *[+voice][&#8211;consonantal][T1]) the same weight in each combination, yielding a total of 15,625 (5<sup>6</sup>) combinations of weights. We view this as an effort to locate an optimal configuration for the cognitive module independent of lexical knowledge when it comes to modeling the behavioral results.</p>
<p><xref ref-type="table" rid="T12">Table 12</xref> shows the weights for the typologically-motivated constraints in two of the 15,625 grammars. The No Induction baseline and Weak Induction grammars (<xref ref-type="table" rid="T11">Table 11</xref>) are also listed for reference. The grammar that best predicted gap ratings weighted *[T3] heavily while placing smaller weights on *[+voice][T1] and *[+voice][T4] and no weight on other constraints. In other words, having a strong markedness constraint against the complex contour tone and modest constraints on voiced onsets and high tones helped predict the wordlikeness ratings for the gaps. <xref ref-type="table" rid="T12">Table 12</xref> also lists the weights for the grammar that best predict the wordlikeness ratings for all test items. This grammar weighted the [+voice][T1] constraint heavily while assigning a smaller weight to *[T3].</p>
<table-wrap id="T12">
<label>Table 12</label>
<caption>
<p>Comparisons of constraint weights for No Induction grammars that are best at predicting gaps. The No Induction baseline and Weak Induction grammars are listed for reference.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Constraint</bold></td>
<td align="left" valign="top"><bold>No Induction: Optimized for gap prediction</bold></td>
<td align="left" valign="top"><bold>No Induction: Optimized for overall prediction</bold></td>
<td align="left" valign="top"><bold>No Induction: Baseline</bold></td>
<td align="left" valign="top"><bold>Weak Induction</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">*[T3]</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">1.5</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.905</td>
</tr>
<tr>
<td align="left" valign="top">*[T2]</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0.664</td>
</tr>
<tr>
<td align="left" valign="top">*[+voice][T1]</td>
<td align="left" valign="top">1.5</td>
<td align="left" valign="top">6</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">2.852/2.259</td>
</tr>
<tr>
<td align="left" valign="top">*[+voice][T4]</td>
<td align="left" valign="top">1.5</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8722;voice][T2]</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0</td>
</tr>
<tr>
<td align="left" valign="top">*[&#8722;voice][T3]</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0</td>
</tr>
<tr>
<td align="left" valign="top">Correlation with all ratings</td>
<td align="left" valign="top">0.241</td>
<td align="left" valign="top">0.265</td>
<td align="left" valign="top">0.229</td>
<td align="left" valign="top">0.273</td>
</tr>
<tr>
<td align="left" valign="top">Correlation with gap ratings</td>
<td align="left" valign="top">0.428</td>
<td align="left" valign="top">0.369</td>
<td align="left" valign="top">0.372</td>
<td align="left" valign="top">0.302</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The correlation between the MaxEnt scores of these &#8220;optimized&#8221; grammars and the wordlikeness ratings is shown in <xref ref-type="fig" rid="F13">Figure 13</xref>. Compared with the No Induction baseline grammar, the lower end of the MaxEnt scores shows greater gradience. The difference between the overall-optimized and the gap-optimized versions is that the former penalizes lexical syllables less than the latter.</p>
<fig id="F13">
<label>Figure 13</label>
<caption>
<p>Correlation between the No Induction grammars&#8217; MaxEnt scores and wordlikeness ratings when the weights were the most successful at predicting all items (left) and gaps only (right) in the grid search. Light dots refer to lexical syllables, and dark dots refer to gaps. Horizontal jittering with a range of 0.02 was applied to the dots to reduce overlaps. The solid lines indicate slopes for MaxEnt scores predicting ratings for all items, and the dotted lines indicate slopes for predicting ratings for gaps only. The shading represents the 95% confidence interval for the slopes.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g13.png"/>
</fig>
<p>To further evaluate to what extent can we find the optimal weights for typologically-motivated tonotactic constraints, we ran 10,000 iterations of random weight assignments to *[+voice]([&#8211;cons])[T1], *[+voice]([&#8211;cons])[T4], *[&#8211;voice]([&#8211;cons])[T2], *[&#8211;voice]([&#8211;cons])[T3], *[T2], and *[T3]. That is, we assigned the same weights to the constraints with and without reference to glides. The random weights followed a uniform distribution between 0 and 6. <xref ref-type="fig" rid="F14">Figure 14</xref> shows the distribution of grammars with typologically-motivated tonotactic constraints with random weights in terms of their correlation with the wordlikeness ratings for all stimuli (left) and gaps only (right).</p>
<fig id="F14">
<label>Figure 14</label>
<caption>
<p>Distribution of the No Induction grammars with random weights for their correlation with the overall wordlikeness ratings (left) and ratings for gaps only (right). Horizontal lines indicate the mean correlation number (red), the correlation number for the Weak Induction (blue) and the No Induction baseline (yellow) settings.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="labphon-14-6455-g14.png"/>
</fig>
<p>Comparing the Weak Induction grammar to the No Induction grammar with random weights shows that weighting typologically-motivated tonotactic constraints based on the lexicon made the harmonic scores correlate almost as well as they possibly could given these constraints. On the other hand, when the focus was solely on the wordlikeness of gaps, weights based on the lexicon resulted in a very poor correlation between harmonic scores and the ratings relative to what the random weights could do. These differences were confirmed by statistical tests. The MaxEnt scores from the No Induction grammars with randomized weights had a weaker correlation with wordlikeness ratings than those from the Weak Induction grammar, according to a one-sample <italic>t</italic> test (<italic>t</italic>(9999) = &#8211;186, <italic>p</italic> &lt; .0001). On the other hand, the MaxEnt scores from the No Induction grammars with randomized weights had a stronger correlation with the wordlikeness ratings than those from the Weak Induction grammar, according to a one-sample <italic>t</italic> test (<italic>t</italic>(9999) = 148.17, <italic>p</italic> &lt; .0001). In other words, while modeling the wordlikeness of all stimuli can benefit from the knowledge of statistical patterns in the lexicon, modeling the wordlikeness of tonotactic gaps does not need to be informed by the lexicon at all.</p>
</sec>
<sec>
<title>4.4. Summary</title>
<p>In this section, we explored whether statistical information from the Mandarin lexicon is sufficient or necessary to build tonotactic constraints that account for the wordlikeness ratings in our behavioral experiment. This was done in two steps. We first examined the constraints induced from the lexicon. We then examined the correlation between the wordlikeness ratings and the MaxEnt scores of the different grammars. We observed some general constraints against voiced onsets in T1 syllables, which was consistent with one of our typologically-motivated constraints. Even though we saw induced constraints that penalized voiceless onsets with T2/T3 and voiced onsets with T4, they mostly targeted much smaller natural classes and could be better viewed as constraints against specific accidental gaps. This potential &#8220;overfit&#8221; to accidental gaps in the constraint induction became more likely as the learner induced more constraints.</p>
<p>Our analysis of the correlation between the MaxEnt scores and wordlikeness ratings showed that speakers&#8217; different ratings for lexical syllables and tonotactic gaps could be successfully modeled by tonotactic constraints directly induced from the lexicon (the Strong Induction grammar) by the UCLA Phonotactic Learner. For modeling the ratings for tonotactic gaps, grammars with a smaller number of typologically-motivated constraints (i.e., *[T3], *[T2], *[+voice][T1/T4], *[&#8211;voice][T2/T3]) having arbitrary and random weights almost always outperformed the grammars with these constraints weighted from the lexicon and grammars with inductive tonotactic constraints. In other words, nearly all potential combinations of this limited set of typologically-motivated tonotactic constraints yielded better results than the lexicon-informed grammars when modeling the wordlikeness ratings of gaps. Among the possible configurations, we found that a large weight for *[T3] and a relatively smaller weight for *[+voice][T1/T4] could make the grammar assign harmonic scores that best correlated with the wordlikeness ratings of gaps. The finding on the importance of [+voice][T4] is particularly interesting since it was not supported by the lexicon at all (i.e., the inductive process assigned no weights to this constraint).</p>
<p>In short, results with the UCLA Phonotactic Learner suggested that learning constraints from the lexicon is neither sufficient nor necessary when it comes to predicting the wordlikeness ratings of tonotactic gaps, which is consistent with a view that phonological grammar for tonotactics is potentially a module independent of lexical memory. Replications using other phonotactic modeling tools (e.g., neural network-based models as reported in <xref ref-type="bibr" rid="B50">Mayer &amp; Nelson, 2020</xref>) are needed to provide further support to this view, particularly in a similar setup where a priori and lexically-informed knowledge are compared.</p>
</sec>
</sec>
<sec>
<title>5. General discussion</title>
<p>This study investigated Mandarin tonotactic accidental gaps by looking for patterns in corpus data and comparing the findings to wordlikeness ratings and to the harmonic scores generated by the UCLA Phonotactic Learner. Our corpus study revealed certain trends in the occurrence of accidental gaps in Mandarin. Specifically, we found that T2 gaps were over-represented, followed by T3 gaps, and both tended to occur with closed syllables. T4 gaps were the least common, a result that could be attributed to a historical tonal merging process. We also found fewer T1 gaps with voiced onsets than T2 and T3 gaps, which were more likely to occur with voiceless onsets, a pattern that has also been observed cross-linguistically. In listeners&#8217; wordlikeness ratings, however, T2, the tone with the most gaps, was not rated as the least wordlike. Instead, the listeners rated T3 gaps as the least wordlike, a result that could be attributed to the markedness of the T3 contour. Furthermore, we found T1 gaps with voiced onsets were also rated as less wordlike, a pattern that was also observed in our corpus study. Although there was a significant difference in wordlikeness ratings between gaps and lexical syllables, they were both gradiently accepted as wordlike based on neighborhood density. Our findings across the corpus analysis, the wordlikeness rating experiment and modeling analyses are summarized in <xref ref-type="table" rid="T13">Table 13</xref>.</p>
<table-wrap id="T13">
<label>Table 13</label>
<caption>
<p>Summary of the findings across the corpus analysis, the wordlikeness rating experiment, and modeling analyses.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Corpus analysis</bold></td>
<td align="left" valign="top"><bold>Wordlikeness rating</bold></td>
<td align="left" valign="top"><bold>Modeling with phonotactic grammars</bold></td>
<td align="left" valign="top"><bold>What it means</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">More T2 gaps</td>
<td align="left" valign="top">T3 least wordlike</td>
<td align="left" valign="top">*T2 is learnable from the lexical data but is not necessary for predicting wordlikeness ratings.<break/>The *T3 constraint is effective in modeling ratings especially for gaps. *T3 is learnable from the lexical data, but the induced weight was not as effective in predicting the ratings for gaps.</td>
<td align="left" valign="top">More T2 gaps is accidental, while more T3 gaps may reflect an effect from typological markedness.</td>
</tr>
<tr>
<td align="left" valign="top">More gaps with voiced onsets for T1 vs. T2/T3/T4</td>
<td align="left" valign="top">Voiced onset less wordlike for T1</td>
<td align="left" valign="top">*[+voice][T1] is learnable from the lexical data, but the induced weight is more useful for modeling all items than for modeling the gaps.<break/>*[+voice][T4] was not learnable from the lexical data, but it helped increase the correlation between the MaxEnt scores and ratings, especially for gaps.</td>
<td align="left" valign="top">There are potentially real phonotactic constraints that are not entirely learnable from the lexicon.</td>
</tr>
<tr>
<td align="left" valign="top">More gaps with closed syllables</td>
<td align="left" valign="top">No effect</td>
<td align="left" valign="top">Not explored.</td>
<td align="left" valign="top">Not real phonotactic constraints</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Taken together, not all patterns observed in the corpus were reflected in the wordlikeness ratings. For instance, contrary to the findings in the corpus, T2 gaps were not treated as the least wordlike by native speakers. Instead, T3 was rated the least wordlike. This pattern was not induced by the phonotactic learner, nor was the fact that T4, the tonal category with the fewest gaps, was rated as more wordlike. We attributed the speakers&#8217; general aversion to T3 to the universal markedness of its complex tonal contour.</p>
<p>Some patterns, however, were robustly observed throughout this study. More T1 gaps were found with voiced onsets in the corpus, while more T2, T3, and T4 gaps were found with voiceless onsets. Among these patterns, we found T1 syllables with voiced onsets were rated as less wordlike, which suggests that T1 syllables with voiced onsets may not be as accidental as previously assumed; they also reflect psychologically real phonotactic restrictions regarding T1 gaps and voiced onsets. While the lack of a T2-<italic>OnsetVoicing</italic> effect in the wordlikeness experiment may suggest that the pattern was purely accidental, it could be the case that the overall low scores for T2 caused the interaction to fail to show.</p>
<p>With the help of the UCLA Phonotactic Learner, we examined to what extent the lexical data could help induce constraints or weight handwritten constraints that best account for the wordlikeness rating results. The comparison showed the statistics from the lexicon were beneficial in inducing a grammar with tonotactic constraints that predicts the ratings for all stimuli. However, in predicting the ratings for tonotactic gaps only, typologically-motivated constraints without lexical access outperformed grammars with tonotactic constraints induced from the lexical data. Although important markedness constraints, such as *T3 and *[+voice][T1], were assigned weights when trained on the lexical data, grammars with these weights had a weaker correlation with wordlikeness ratings of gaps compared to grammars with arbitrary weights for these constraints. Crucially, iterations with random weight assignments revealed that lexically-informed weights were significantly worse than random weights in modeling gaps&#8217; ratings. It is important to note that, despite our exploration of various constraint induction and weighting setups, the results we have obtained may still be limited by the UCLA Phonotactic Learner, the training data, and the simulation setups we have used. Therefore, it is uncertain whether our findings are indicative of limitations in grammar induction from the lexicon alone. If this finding can be replicated using other simulation tools and settings that compare a priori and lexically-informed phonotactic knowledge, it may suggest that universal markedness is more relevant than patterns in the lexicon for modeling the wordlikeness of tonotactic gaps.</p>
<p>In their explanation of similar dissociations between phonotactic knowledge and statistical generalizations in the lexicon in modeling nonword perception and production, Becker et al. (2011) proposed that UG serves as a filter on possible generalizations that humans can make (see also Davidson, 2006; Moreton, 2002), which may, in turn, facilitate the (over)-learning of phonetically motivated patterns The lack of such filters explains why inductive statistical models fail to model behavioral results since these models are prone to learning accidental statistical patterns (the &#8220;surfeit-of-the-stimulus&#8221; effect). Our findings are consistent with the predictions made by the proposal that only a subset of generalizations are possible, as shown by the success of typologically-motivated tonotactic constraints over inductive ones in modeling nonword tonotactics. We further showed that there is little evidence for an association between statistical patterns in the lexicon and the exact configurations of the possible constraints (i.e., constraint weights) other than the fact that information from the lexicon is helpful in modeling the separation between gaps and lexical syllables. Since *T3 and *[+voice][T1/T4] both have phonetic motivations, innateness and phonetic naturalness of these constraints are both potential sources of such tonotactic knowledge. The findings regarding the roles of *T3 and *[+voice][T4] are particularly interesting, as they are not as strongly supported by statistical patterns in the lexicon as *[+voice][T1]. This suggests that *T3 and *[+voice][T4] may have been potentially overlearned from the lexicon. On the other hand, despite *T2 being supported by the lexicon, it was not relevant in modeling nonword tonotactics. This indicates that the speakers may have underlearned this statistical pattern.</p>
<p>This study contributes to the general understanding of unattested forms, especially involving tone-segment combinations, and extends the modeling of phonotactic well-formedness that has been previously restricted to segmental combinations to tone-syllable combinations.</p>
</sec>
</body>
<back>
<sec>
<title>Additional files</title>
<p>The additional files for this article can be found as follows:</p>
<list list-type="bullet">
<list-item><p><bold>Appendix I.</bold> Stimuli for Wordlikeness Rating Experiment. DOI: <uri>https://doi.org/10.16995/labphon.6455.s1</uri></p></list-item>
<list-item><p><bold>Appendix II.</bold> Handwritten Segmental Constraints adopted from Gong &amp; Zhang (<xref ref-type="bibr" rid="B25">2021</xref>). DOI: <uri>https://doi.org/10.16995/labphon.6455.s2</uri></p></list-item>
<list-item><p><bold>Appendix III.</bold> Tonotactic constraints in the Small Strong Inductive grammar. DOI: <uri>https://doi.org/10.16995/labphon.6455.s3</uri></p></list-item>
<list-item><p><bold>Corpus-data.</bold> The &#8216;Mandarin Accidental Gap Corpus&#8217; (Section 2) includes a calculation of 398 allowable Mandarin syllables with all possible tonal combinations, whether they exist or not. DOI: <uri>https://doi.org/10.16995/labphon.6455.s4</uri></p></list-item>
<list-item><p><bold>Wordlikeness rating data.</bold> Wordlikeness ratings (Section 3) were obtained from thirty-seven Taiwan Mandarin native speakers for 288 stimuli. DOI: <uri>https://doi.org/10.16995/labphon.6455.s5</uri></p></list-item>
</list>
</sec>
<fn-group>
<fn id="n1"><p>By using &#8220;tone-syllable&#8221; combination, we did not make any claims on the status of &#8220;tone&#8221; being separable from the rest of the syllable.</p></fn>
<fn id="n2"><p>We referred to a Taiwan-based conversational corpus in this study because Taiwan Mandarin participants were recruited for our wordlikeness rating experiment. We did not assume a substantial difference in the generalizations drawn here and those drawn from Putunghua.</p></fn>
<fn id="n3"><p>Note that the Mandarin rhotic is variably treated as a voiced obstruent [&#656;] or an approximant [&#633;] (<xref ref-type="bibr" rid="B16">Duanmu, 2007</xref>; <xref ref-type="bibr" rid="B48">Lin, 2007</xref>).</p></fn>
<fn id="n4"><p>The distribution of onset voicing according to tone is listed here: T1 voiced = 13, voiceless = 11; T2 voiced = 6, voiceless = 18; T3 voiced = 6, voiceless = 18; T4 voiced = 6, voiceless = 18.</p></fn>
<fn id="n5"><p>One might question the possibility that the reading of &#8220;zh&#333;ngw&#233;n&#8221;, literally meaning &#8220;Chinese&#8221;, could refer to a Chinese dialect other than Mandarin. This reading is unlikely due to the fact that these participants were Mandarin-dominant and that in Taiwan, Taiwanese Southern Min is referred to as &#8220;t&#225;i y&#468;&#8221; and Hakka as &#8220;k&#232; y&#468;&#8221;. The term &#8220;zh&#333;ngw&#233;n&#8221; almost exclusively refers to Mandarin in Taiwan.</p></fn>
<fn id="n6"><p>Targeting between 100 and 200 inductive constraints is common in recent works (e.g., <xref ref-type="bibr" rid="B26">Gouskova &amp; Gallagher, 2020</xref>; <xref ref-type="bibr" rid="B22">Gallagher et al., 2019</xref>). We took the midpoint of this range, as well as the half and the doubled numbers to explore the potential effects of underfit and overfit to the lexicon.</p></fn>
<fn id="n7"><p>The learner used the binary feature [sb] to label syllable boundaries ([+sb]) and all non-boundary symbols (i.e., all segments ([&#8722;sb])).</p></fn>
</fn-group>
<ack>
<title>Acknowledgements</title>
<p>We would like to thank Sang-Im Lee-Kim, Tsung-Ying Chen, the editors and reviewers of <italic>Laboratory Phonology</italic>, and the participants in ICPhS 2019 for their insights and suggestions. We especially thank Dr. Bruce Hayes for directing us to include a computational modeling component in this study. Any remaining errors are ours.</p>
</ack>
<sec>
<title>Competing interests</title>
<p>The authors have no competing interests to declare.</p>
</sec>
<ref-list>
<ref id="B1"><label>1</label><mixed-citation publication-type="book"><string-name><surname>Albright</surname>, <given-names>A.</given-names></string-name> (<year>2003</year>). <chapter-title>A quantitative study of Spanish paradigm gaps</chapter-title>. <source>Paper presented at the 22nd West Coast Conference on Formal Linguistics</source>, <publisher-name>University of California</publisher-name>, <publisher-loc>San Diego</publisher-loc>.</mixed-citation></ref>
<ref id="B2"><label>2</label><mixed-citation publication-type="journal"><string-name><surname>Albright</surname>, <given-names>A.</given-names></string-name> (<year>2009</year>). <article-title>Feature-based generalisation as a source of gradient acceptability</article-title>. <source>Phonology</source>, <volume>26</volume>(<issue>1</issue>), <fpage>9</fpage>&#8211;<lpage>41</lpage>. DOI: <pub-id pub-id-type="doi">10.1017/S0952675709001705</pub-id></mixed-citation></ref>
<ref id="B3"><label>3</label><mixed-citation publication-type="journal"><string-name><surname>Albright</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Hayes</surname>, <given-names>B.</given-names></string-name> (<year>2003</year>). <article-title>Rules vs. analogy in English past tenses: A computational/experimental study</article-title>. <source>Cognition</source>, <volume>90</volume>(<issue>2</issue>), <fpage>119</fpage>&#8211;<lpage>161</lpage>. DOI: <pub-id pub-id-type="doi">10.1016/S0010-0277(03)00146-X</pub-id></mixed-citation></ref>
<ref id="B4"><label>4</label><mixed-citation publication-type="journal"><string-name><surname>Bates</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Maechler</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Bolker</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name><surname>Walker</surname>, <given-names>S.</given-names></string-name> (<year>2015</year>). <article-title>Fitting linear mixed-effects models using lme4</article-title>. <source>Journal of Statistical Software</source>, <volume>67</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>48</lpage>. DOI: <pub-id pub-id-type="doi">10.18637/jss.v067.i01</pub-id></mixed-citation></ref>
<ref id="B5"><label>5</label><mixed-citation publication-type="journal"><string-name><surname>Becker</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Nevins</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Levine</surname>, <given-names>J.</given-names></string-name> (<year>2012</year>). <article-title>Asymmetries in generalizing alternations to and from initial syllables</article-title>. <source>Language</source>, <volume>88</volume>(<issue>2</issue>), <fpage>231</fpage>&#8211;<lpage>268</lpage>. DOI: <pub-id pub-id-type="doi">10.1353/lan.2012.0049</pub-id></mixed-citation></ref>
<ref id="B6"><label>6</label><mixed-citation publication-type="journal"><string-name><surname>Berent</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Wilson</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Marcus</surname>, <given-names>G. F.</given-names></string-name>, &amp; <string-name><surname>Bemis</surname>, <given-names>D. K.</given-names></string-name> (<year>2012</year>). <article-title>On the role of variables in phonology: Remarks on Hayes and Wilson 2008</article-title>. <source>Linguistic inquiry</source>, <volume>43</volume>(<issue>1</issue>), <fpage>97</fpage>&#8211;<lpage>119</lpage>. DOI: <pub-id pub-id-type="doi">10.1162/LING_a_00075</pub-id></mixed-citation></ref>
<ref id="B7"><label>7</label><mixed-citation publication-type="webpage"><string-name><surname>Boersma</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name><surname>Weenink</surname>, <given-names>D.</given-names></string-name> (<year>2017</year>). <article-title>Praat: Doing phonetics by computer (Version 6.0.26)</article-title>. Retrieved from <uri>www.praat.org</uri>.</mixed-citation></ref>
<ref id="B8"><label>8</label><mixed-citation publication-type="journal"><string-name><surname>Chen</surname>, <given-names>Y.</given-names></string-name>, &amp; <string-name><surname>Xu</surname>, <given-names>Y.</given-names></string-name> (<year>2006</year>). <article-title>Production of weak elements in speech&#8211;evidence from f&#8320; patterns of neutral tone in Standard Chinese</article-title>. <source>Phonetica</source>, <volume>63</volume>(<issue>1</issue>), <fpage>47</fpage>&#8211;<lpage>75</lpage>. DOI: <pub-id pub-id-type="doi">10.1159/000091406</pub-id></mixed-citation></ref>
<ref id="B9"><label>9</label><mixed-citation publication-type="journal"><string-name><surname>Chien</surname>, <given-names>Y.-F.</given-names></string-name>, <string-name><surname>Sereno</surname>, <given-names>J. A.</given-names></string-name>, &amp; <string-name><surname>Zhang</surname>, <given-names>J.</given-names></string-name> (<year>2017</year>). <article-title>What&#8217;s in a word: Observing the contribution of underlying and surface representations</article-title>. <source>Language and Speech</source>, <volume>60</volume>(<issue>4</issue>), <fpage>643</fpage>&#8211;<lpage>657</lpage>. DOI: <pub-id pub-id-type="doi">10.1177/0023830917690419</pub-id></mixed-citation></ref>
<ref id="B10"><label>10</label><mixed-citation publication-type="journal"><string-name><surname>Coleman</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Pierrehumbert</surname>, <given-names>J.</given-names></string-name> (<year>1997</year>). <article-title>Stochastic phonological grammars and acceptability</article-title>. arXiv preprint cmp-lg/9707017.</mixed-citation></ref>
<ref id="B11"><label>11</label><mixed-citation publication-type="journal"><string-name><surname>Cutler</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Chen</surname>, <given-names>H.-C.</given-names></string-name> (<year>1997</year>). <article-title>Lexical tone in Cantonese spoken-word processing</article-title>. <source>Perception and Psychophysics</source>, <volume>59</volume>(<issue>2</issue>), <fpage>165</fpage>&#8211;<lpage>179</lpage>. DOI: <pub-id pub-id-type="doi">10.3758/BF03211886</pub-id></mixed-citation></ref>
<ref id="B12"><label>12</label><mixed-citation publication-type="journal"><string-name><surname>Daland</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Hayes</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>White</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Garellek</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Davis</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Norrmann</surname>, <given-names>I.</given-names></string-name> (<year>2011</year>). <article-title>Explaining sonority projection effects</article-title>. <source>Phonology</source>, <volume>28</volume>, <fpage>197</fpage>&#8211;<lpage>234</lpage>. DOI: <pub-id pub-id-type="doi">10.1017/S0952675711000145</pub-id></mixed-citation></ref>
<ref id="B13"><label>13</label><mixed-citation publication-type="journal"><string-name><surname>Davis</surname>, <given-names>M. J.</given-names></string-name> (<year>2010</year>). <article-title>Contrast coding in multiple regression analysis: Strengths, weaknesses, and utility of popular coding structures</article-title>. <source>Journal of Data Science</source>, <volume>8</volume>(<issue>1</issue>), <fpage>61</fpage>&#8211;<lpage>73</lpage>. DOI: <pub-id pub-id-type="doi">10.6339/JDS.2010.08(1).563</pub-id></mixed-citation></ref>
<ref id="B14"><label>14</label><mixed-citation publication-type="journal"><string-name><surname>Della Pietra</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Della Pietra</surname>, <given-names>V.</given-names></string-name>, &amp; <string-name><surname>Lafferty</surname>, <given-names>J.</given-names></string-name> (<year>1997</year>). <article-title>Inducing features of random fields</article-title>. <source>IEEE transactions on pattern analysis machine intelligence</source>, <volume>19</volume>(<issue>4</issue>), <fpage>380</fpage>&#8211;<lpage>393</lpage>. DOI: <pub-id pub-id-type="doi">10.1109/34.588021</pub-id></mixed-citation></ref>
<ref id="B15"><label>15</label><mixed-citation publication-type="journal"><string-name><surname>Do</surname>, <given-names>Y.</given-names></string-name>, &amp; <string-name><surname>Lai</surname>, <given-names>R. K. Y.</given-names></string-name> (<year>2020</year>). <article-title>Incorporating tone in the modelling of wordlikeness judgements</article-title>. <source>Phonology</source>, <volume>37</volume>(<issue>4</issue>), <fpage>577</fpage>&#8211;<lpage>615</lpage>. DOI: <pub-id pub-id-type="doi">10.1017/S0952675720000287</pub-id></mixed-citation></ref>
<ref id="B16"><label>16</label><mixed-citation publication-type="book"><string-name><surname>Duanmu</surname>, <given-names>S.</given-names></string-name> (<year>2007</year>). <source>The Phonology of Standard Chinese</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</mixed-citation></ref>
<ref id="B17"><label>17</label><mixed-citation publication-type="journal"><string-name><surname>Duanmu</surname>, <given-names>S.</given-names></string-name> (<year>2011</year>). <article-title>Chinese syllable structure</article-title>. <source>The Blackwell Companion to Phonology</source>, <fpage>1</fpage>&#8211;<lpage>24</lpage>. DOI: <pub-id pub-id-type="doi">10.1002/9781444335262.wbctp0115</pub-id></mixed-citation></ref>
<ref id="B18"><label>18</label><mixed-citation publication-type="journal"><string-name><surname>Fischer-J&#248;rgensen</surname>, <given-names>E.</given-names></string-name> (<year>1952</year>). <article-title>On the definition of phoneme categories on a distributional basis</article-title>. <source>Acta linguistica</source>, <volume>7</volume>(<issue>1&#8211;2</issue>), <fpage>8</fpage>&#8211;<lpage>39</lpage>. DOI: <pub-id pub-id-type="doi">10.1080/03740463.1952.10415400</pub-id></mixed-citation></ref>
<ref id="B19"><label>19</label><mixed-citation publication-type="journal"><string-name><surname>Fon</surname>, <given-names>Janice</given-names></string-name>, &amp; <string-name><surname>Chiang</surname>, <given-names>Wen-Yu</given-names></string-name>. (<year>1999</year>). <article-title>What does Chao have to say about tones? A case study of Taiwan Mandarin/&#36213;&#27663;&#22768;&#35843;&#31995;&#32479;&#19982;&#22768;&#23398;&#20043;&#32852;&#32467;&#21450;&#37327;&#21270;&#8211;&#20197;&#21488;&#28286;&#22320;&#21306;&#22269;&#35821;&#20026;&#20363;</article-title>. <source>Journal of Chinese Linguistics</source>, <volume>27</volume>(<issue>1</issue>), <fpage>13</fpage>&#8211;<lpage>37</lpage>.</mixed-citation></ref>
<ref id="B20"><label>20</label><mixed-citation publication-type="journal"><string-name><surname>Frisch</surname>, <given-names>S. A.</given-names></string-name>, <string-name><surname>Large</surname>, <given-names>N. R.</given-names></string-name>, &amp; <string-name><surname>Pisoni</surname>, <given-names>D. B.</given-names></string-name> (<year>2000</year>). <article-title>Perception of wordlikeness: Effects of segment probability and length on the processing of nonwords</article-title>. <source>Journal of Memory and Language</source>, <volume>42</volume>(<issue>4</issue>), <fpage>481</fpage>&#8211;<lpage>496</lpage>. DOI: <pub-id pub-id-type="doi">10.1006/jmla.1999.2692</pub-id></mixed-citation></ref>
<ref id="B21"><label>21</label><mixed-citation publication-type="journal"><string-name><surname>Frisch</surname>, <given-names>S. A.</given-names></string-name>, <string-name><surname>Pierrehumbert</surname>, <given-names>J. B.</given-names></string-name>, &amp; <string-name><surname>Broe</surname>, <given-names>M. B.</given-names></string-name> (<year>2004</year>). <article-title>Similarity avoidance and the OCP</article-title>. <source>Natural Language &amp; Linguistic Theory</source>, <volume>22</volume>(<issue>1</issue>), <fpage>179</fpage>&#8211;<lpage>228</lpage>. DOI: <pub-id pub-id-type="doi">10.1023/B:NALA.0000005557.78535.3c</pub-id></mixed-citation></ref>
<ref id="B22"><label>22</label><mixed-citation publication-type="journal"><string-name><surname>Gallagher</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Gouskova</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Camacho Rios</surname>, <given-names>G.</given-names></string-name> (<year>2019</year>). <article-title>Phonotactic restrictions and morphology in Aymara</article-title>. <source>Glossa: A Journal of General Linguistics</source>, <volume>4</volume>(<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>48</lpage>. DOI: <pub-id pub-id-type="doi">10.5334/gjgl.826</pub-id></mixed-citation></ref>
<ref id="B23"><label>23</label><mixed-citation publication-type="journal"><string-name><surname>Goldwater</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Johnson</surname>, <given-names>M.</given-names></string-name> (<year>2003</year>). <article-title>Learning OT constraint rankings using a maximum entropy model</article-title>. <source>Paper presented at the Proceedings of the Stockholm workshop on variation within Optimality Theory</source>.</mixed-citation></ref>
<ref id="B24"><label>24</label><mixed-citation publication-type="journal"><string-name><surname>Gong</surname>, <given-names>S.</given-names></string-name> (<year>2017</year>). <article-title>Grammaticality and lexical statistics in Chinese unnatural phonotactics</article-title>. <source>UCL Working Papers in Linguistics</source>, <volume>29</volume>, <fpage>1</fpage>&#8211;<lpage>23</lpage>.</mixed-citation></ref>
<ref id="B25"><label>25</label><mixed-citation publication-type="journal"><string-name><surname>Gong</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Zhang</surname>, <given-names>J.</given-names></string-name> (<year>2021</year>). <article-title>Modelling Mandarin speakers&#8217; phonotactic knowledge</article-title>. <source>Phonology</source>, <volume>38</volume>(<issue>2</issue>), <fpage>241</fpage>&#8211;<lpage>275</lpage>. DOI: <pub-id pub-id-type="doi">10.1017/S0952675721000166</pub-id></mixed-citation></ref>
<ref id="B26"><label>26</label><mixed-citation publication-type="journal"><string-name><surname>Gouskova</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Gallagher</surname>, <given-names>G.</given-names></string-name> (<year>2020</year>). <article-title>Inducing nonlocal constraints from baseline phonotactics</article-title>. <source>Natural Language Linguistic Theory</source>, <volume>38</volume>(<issue>1</issue>), <fpage>77</fpage>&#8211;<lpage>116</lpage>. DOI: <pub-id pub-id-type="doi">10.1007/s11049-019-09446-x</pub-id></mixed-citation></ref>
<ref id="B27"><label>27</label><mixed-citation publication-type="journal"><string-name><surname>Halle</surname>, <given-names>M.</given-names></string-name> (<year>1962</year>). <article-title>Phonology in generative grammar</article-title>. <source>Word</source>, <volume>18</volume>(<issue>1&#8211;3</issue>), <fpage>54</fpage>&#8211;<lpage>72</lpage>. DOI: <pub-id pub-id-type="doi">10.1080/00437956.1962.11659765</pub-id></mixed-citation></ref>
<ref id="B28"><label>28</label><mixed-citation publication-type="journal"><string-name><surname>Hao</surname>, <given-names>Y.-C.</given-names></string-name> (<year>2012</year>). <article-title>Second language acquisition of Mandarin Chinese tones by tonal and non-tonal language speakers</article-title>. <source>Journal of Phonetics</source>, <volume>40</volume>(<issue>2</issue>), <fpage>269</fpage>&#8211;<lpage>279</lpage>. DOI: <pub-id pub-id-type="doi">10.1016/j.wocn.2011.11.001</pub-id></mixed-citation></ref>
<ref id="B30"><label>30</label><mixed-citation publication-type="journal"><string-name><surname>Hayes</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name><surname>White</surname>, <given-names>J.</given-names></string-name> (<year>2013</year>). <article-title>Phonological naturalness and phonotactic learning</article-title>. <source>Linguistic Inquiry</source>, <volume>44</volume>(<issue>1</issue>), <fpage>45</fpage>&#8211;<lpage>75</lpage>. DOI: <pub-id pub-id-type="doi">10.1162/LING_a_00119</pub-id></mixed-citation></ref>
<ref id="B31"><label>31</label><mixed-citation publication-type="journal"><string-name><surname>Hayes</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name><surname>Wilson</surname>, <given-names>C.</given-names></string-name> (<year>2008</year>). <article-title>A maximum entropy model of phonotactics and phonotactic learning</article-title>. <source>Linguistic Inquiry</source>, <volume>39</volume>(<issue>3</issue>), <fpage>379</fpage>&#8211;<lpage>440</lpage>. DOI: <pub-id pub-id-type="doi">10.1162/ling.2008.39.3.379</pub-id></mixed-citation></ref>
<ref id="B32"><label>32</label><mixed-citation publication-type="journal"><string-name><surname>Hombert</surname>, <given-names>J.-M.</given-names></string-name>, <string-name><surname>Ohala</surname>, <given-names>J. J.</given-names></string-name>, &amp; <string-name><surname>Ewan</surname>, <given-names>W. G.</given-names></string-name> (<year>1979</year>). <article-title>Phonetic explanations for the development of tones</article-title>. <source>Language</source>, <volume>55</volume>(<issue>1</issue>), <fpage>37</fpage>&#8211;<lpage>58</lpage>. DOI: <pub-id pub-id-type="doi">10.2307/412518</pub-id></mixed-citation></ref>
<ref id="B34"><label>34</label><mixed-citation publication-type="journal"><string-name><surname>Hsieh</surname>, <given-names>F.-F.</given-names></string-name>, &amp; <string-name><surname>Kenstowicz</surname>, <given-names>M. J.</given-names></string-name> (<year>2008</year>). <article-title>Phonetic knowledge in tonal adaptation: Mandarn and English loanwords in Lhasa Tibetan</article-title>. <source>Journal of East Asian Linguistics</source>, <volume>17</volume>, <fpage>279</fpage>&#8211;<lpage>297</lpage>. DOI: <pub-id pub-id-type="doi">10.1007/s10831-008-9027-7</pub-id></mixed-citation></ref>
<ref id="B35"><label>35</label><mixed-citation publication-type="book"><string-name><surname>Huang</surname>, <given-names>K.</given-names></string-name> (<year>2012</year>). <source>A study of neutral-tone syllables in Taiwan Mandarin</source>. <publisher-loc>Honolulu</publisher-loc>: <publisher-name>University of Hawaii at Manoa</publisher-name>.</mixed-citation></ref>
<ref id="B36"><label>36</label><mixed-citation publication-type="book"><string-name><surname>Huang</surname>, <given-names>T.</given-names></string-name> (<year>2001</year>). <chapter-title>The interplay of perception and phonology in tone 3 sandhi in Chinese Putonghua</chapter-title>. In <string-name><surname>Hume</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Johnson</surname>, <given-names>K.</given-names></string-name>, (Eds.), <source>Studies on the Interplay of Speech Perception and Phonology</source>, <volume>55</volume>, <fpage>23</fpage>&#8211;<lpage>42</lpage>. <publisher-name>Ohio State University</publisher-name>.</mixed-citation></ref>
<ref id="B37"><label>37</label><mixed-citation publication-type="journal"><string-name><surname>Huang</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name><surname>Johnson</surname>, <given-names>K.</given-names></string-name> (<year>2010</year>). <article-title>Language specificity in speech perception: Perception of Mandarin tones by native and nonnative listeners</article-title>. <source>Phonetica</source>, <volume>67</volume>(<issue>4</issue>), <fpage>243</fpage>&#8211;<lpage>267</lpage>. DOI: <pub-id pub-id-type="doi">10.1159/000327392</pub-id></mixed-citation></ref>
<ref id="B38"><label>38</label><mixed-citation publication-type="journal"><string-name><surname>Hume</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name><surname>Johnson</surname>, <given-names>K.</given-names></string-name> (<year>2003</year>). <article-title>The impact of partial phonological contrast on speech perception</article-title>. <source>Paper presented at the Proceedings of the fifteenth international congress of phonetic sciences</source>.</mixed-citation></ref>
<ref id="B39"><label>39</label><mixed-citation publication-type="journal"><string-name><surname>Kenstowicz</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name><surname>Suchato</surname>, <given-names>A.</given-names></string-name> (<year>2006</year>). <article-title>Issues in loanword adaptations: A case study from Thai</article-title>. <source>Lingua</source>, <volume>116</volume>(<issue>7</issue>), <fpage>921</fpage>&#8211;<lpage>949</lpage>. DOI: <pub-id pub-id-type="doi">10.1016/j.lingua.2005.05.006</pub-id></mixed-citation></ref>
<ref id="B40"><label>40</label><mixed-citation publication-type="journal"><string-name><surname>Kirby</surname>, <given-names>J. P.</given-names></string-name>, &amp; <string-name><surname>Yu</surname>, <given-names>A. C. L.</given-names></string-name> (<year>2007</year>). <article-title>Lexical and phonotactic effects on wordlikeness judgments in Cantonese</article-title>. <source>Paper presented at the Proceedings of the International Congress of the Phonetic Sciences XVI</source>.</mixed-citation></ref>
<ref id="B41"><label>41</label><mixed-citation publication-type="journal"><string-name><surname>Kubler</surname>, <given-names>C. C.</given-names></string-name> (<year>1985</year>). <article-title>The influence of Southern Min on the Mandarin of Taiwan</article-title>. <source>Anthropological Linguistics</source>, <volume>27</volume>(<issue>2</issue>), <fpage>156</fpage>&#8211;<lpage>176</lpage>.</mixed-citation></ref>
<ref id="B42"><label>42</label><mixed-citation publication-type="journal"><string-name><surname>Kuznetsova</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Brockhoff</surname>, <given-names>P. B.</given-names></string-name>, &amp; <string-name><surname>Christensen</surname>, <given-names>R. H. B.</given-names></string-name> (<year>2016</year>). <article-title>lmerTest: Test in linear mixed effects model: R package version 2.0-33</article-title>.</mixed-citation></ref>
<ref id="B43"><label>43</label><mixed-citation publication-type="book"><string-name><surname>Lai</surname>, <given-names>Y. C.</given-names></string-name> (<year>2003</year>). <source>A Perceptual Investigation on Mandarin Tonotactic Gaps</source>. (M.A.), <publisher-name>National Tsing Hua University</publisher-name>, <publisher-loc>Hsinchu, Taiwan</publisher-loc>.</mixed-citation></ref>
<ref id="B44"><label>44</label><mixed-citation publication-type="journal"><string-name><surname>Lee</surname>, <given-names>C.-Y.</given-names></string-name> (<year>2007</year>). <article-title>Does Horse Activate Mother? Processing lexical tone in form priming</article-title>. <source>Language and Speech</source>, <volume>50</volume>(<issue>1</issue>), <fpage>101</fpage>&#8211;<lpage>123</lpage>. DOI: <pub-id pub-id-type="doi">10.1177/00238309070500010501</pub-id></mixed-citation></ref>
<ref id="B45"><label>45</label><mixed-citation publication-type="book"><string-name><surname>Legendre</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Miyata</surname>, <given-names>Y.</given-names></string-name>, &amp; <string-name><surname>Smolensky</surname>, <given-names>P.</given-names></string-name> (<year>1990</year>). <chapter-title>Harmonic Grammar&#8212;A formal multi-level connectionist theory of linguistic well-formedness: Theoretical foundations</chapter-title>. <source>Paper presented at the Proceedings of the twelfth annual conference of the Cognitive Science Society</source>, <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Erlbaum</publisher-name>.</mixed-citation></ref>
<ref id="B46"><label>46</label><mixed-citation publication-type="book"><string-name><surname>Lenth</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Singmann</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Love</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Buerkner</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name><surname>Herve</surname>, <given-names>M.</given-names></string-name> (<year>2019</year>). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.3.2.</mixed-citation></ref>
<ref id="B47"><label>47</label><mixed-citation publication-type="book"><string-name><surname>Li</surname>, <given-names>Jian</given-names></string-name>. (<year>2013</year>). <source>The rise of disyllables in Old Chinese: The role of lianmian words</source>. <publisher-name>City University of New York</publisher-name>.</mixed-citation></ref>
<ref id="B48"><label>48</label><mixed-citation publication-type="book"><string-name><surname>Lin</surname>, <given-names>Y.-H.</given-names></string-name> (<year>2007</year>). <source>The Sounds of Chinese</source>. <publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</mixed-citation></ref>
<ref id="B49"><label>49</label><mixed-citation publication-type="journal"><string-name><surname>Lu</surname>, <given-names>Y.-A.</given-names></string-name>, &amp; <string-name><surname>Lee-Kim</surname>, <given-names>S.-I.</given-names></string-name> (<year>2021</year>). <article-title>The effect of linguistic experience on perceived vowel duration: Evidence from Taiwan Mandarin speakers</article-title>. <source>Journal of Phonetics</source>, <volume>86</volume>, <elocation-id>101049</elocation-id>. DOI: <pub-id pub-id-type="doi">10.1016/j.wocn.2021.101049</pub-id></mixed-citation></ref>
<ref id="B50"><label>50</label><mixed-citation publication-type="journal"><string-name><surname>Mayer</surname>, <given-names>C.</given-names></string-name>, &amp; <string-name><surname>Nelson</surname>, <given-names>M.</given-names></string-name> (<year>2020</year>). <article-title>Phonotactic learning with neural language models</article-title>. <source>Proceedings of the Society for Computation in Linguistics</source>, <volume>3</volume>(<issue>1</issue>), <fpage>149</fpage>&#8211;<lpage>159</lpage>.</mixed-citation></ref>
<ref id="B51"><label>51</label><mixed-citation publication-type="journal"><string-name><surname>Mei</surname>, <given-names>T.-L.</given-names></string-name> (<year>1970</year>). <article-title>Tones and prosody in Middle Chinese and the origin of the rising tone</article-title>. <source>Harvard Journal of Asiatic Studies</source>, <volume>30</volume>, <fpage>86</fpage>&#8211;<lpage>110</lpage>. DOI: <pub-id pub-id-type="doi">10.2307/2718766</pub-id></mixed-citation></ref>
<ref id="B52"><label>52</label><mixed-citation publication-type="journal"><string-name><surname>Mei</surname>, <given-names>T.-L.</given-names></string-name> (<year>1977</year>). <article-title>Tones and tone sandhi in 16th century Mandarin</article-title>. <source>Journal of Chinese Linguistics</source>, <volume>5</volume>(<issue>2</issue>), <fpage>237</fpage>&#8211;<lpage>260</lpage>.</mixed-citation></ref>
<ref id="B53"><label>53</label><mixed-citation publication-type="journal"><string-name><surname>Mikheev</surname>, <given-names>A.</given-names></string-name> (<year>1997</year>). <article-title>Automatic rule induction for unknown word guessing</article-title>. <source>Computational Linguistics</source>, <volume>23</volume>, <fpage>405</fpage>&#8211;<lpage>423</lpage>.</mixed-citation></ref>
<ref id="B54"><label>54</label><mixed-citation publication-type="journal"><string-name><surname>Myers</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Tsay</surname>, <given-names>J.</given-names></string-name> (<year>2004</year>). <article-title>Exploring performance-based predictors of phonological judgments in Mandarin</article-title>. <source>Poster presented at Laboratory Phonology</source>, <volume>9</volume>.</mixed-citation></ref>
<ref id="B55"><label>55</label><mixed-citation publication-type="journal"><string-name><surname>Myers</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Tsay</surname>, <given-names>J.</given-names></string-name> (<year>2005</year>). <article-title>The processing of phonological acceptability judgments</article-title>. <source>Paper presented at the Proceedings of symposium on 90&#8211;92 NSC projects</source>.</mixed-citation></ref>
<ref id="B56"><label>56</label><mixed-citation publication-type="book"><string-name><surname>Ohala</surname>, <given-names>J. J.</given-names></string-name> (<year>1978</year>). <chapter-title>Production of tone</chapter-title>. In <source>Tone: A Linguistic Survey</source> (pp. <fpage>5</fpage>&#8211;<lpage>39</lpage>): <publisher-name>Elsevier</publisher-name>. DOI: <pub-id pub-id-type="doi">10.1016/B978-0-12-267350-4.50006-6</pub-id></mixed-citation></ref>
<ref id="B57"><label>57</label><mixed-citation publication-type="book"><string-name><surname>Sagart</surname>, <given-names>L.</given-names></string-name> (<year>1999</year>). <chapter-title>The origin of Chinese tones</chapter-title>. <source>Paper presented at the Proceedings of the Symposium/Cross-Linguistic Studies of Tonal Phenomena/Tonogenesis</source>, <publisher-name>Typology and Related Topics</publisher-name>.</mixed-citation></ref>
<ref id="B58"><label>58</label><mixed-citation publication-type="book"><string-name><surname>Schneider</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Eschman</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Zuccolotto</surname>, <given-names>A.</given-names></string-name> (<year>2012</year>). <source>E-Prime User&#8217;s Guide</source>. <publisher-loc>Pittsburgh</publisher-loc>: <publisher-name>Psychology Software Tools Inc</publisher-name>.</mixed-citation></ref>
<ref id="B59"><label>59</label><mixed-citation publication-type="book"><string-name><surname>Smolensky</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name><surname>Legendre</surname>, <given-names>G.</given-names></string-name> (<year>2006</year>). <source>The harmonic mind: From neural computation to optimality-theoretic grammar (Cognitive architecture)</source>, Vol. <volume>1</volume>. <publisher-name>MIT press</publisher-name>.</mixed-citation></ref>
<ref id="B60"><label>60</label><mixed-citation publication-type="book"><string-name><surname>Tseng</surname>, <given-names>S.-C.</given-names></string-name> (<year>2019</year>). <chapter-title>ILAS Chinese Spoken Language Resources</chapter-title>. <source>Paper presented at the Proceedings of LPSS 2019&#8211;the third International Symposium on Linguistic Patterns in Spontaneous Speech</source>, <publisher-loc>Taipei</publisher-loc>.</mixed-citation></ref>
<ref id="B61"><label>61</label><mixed-citation publication-type="journal"><string-name><surname>Wang</surname>, <given-names>H. S.</given-names></string-name> (<year>1998</year>). <article-title>An experimental study on the phonotactic constraints of Mandarin Chinese</article-title>. <source>Studia Linguistica Serica</source>, <fpage>259</fpage>&#8211;<lpage>268</lpage>.</mixed-citation></ref>
<ref id="B62"><label>62</label><mixed-citation publication-type="book"><string-name><surname>Wang</surname>, <given-names>L.</given-names></string-name> (<year>1972</year>). <source>Chinese Phonology [H&#224;ny&#468; Y&#299;ny&#249;n Xu&#233;]</source>. <publisher-loc>Hong Kong</publisher-loc>: <publisher-name>Zhong Hua Shuju</publisher-name>.</mixed-citation></ref>
<ref id="B63"><label>63</label><mixed-citation publication-type="book"><string-name><surname>Wickham</surname>, <given-names>H.</given-names></string-name> (<year>2009</year>). <chapter-title>ggplot2: Elegant Graphics for Data Analysis</chapter-title>. <publisher-loc>New York</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>. DOI: <pub-id pub-id-type="doi">10.1007/978-0-387-98141-3</pub-id></mixed-citation></ref>
<ref id="B64"><label>64</label><mixed-citation publication-type="journal"><string-name><surname>Wiener</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Turnbull</surname>, <given-names>R.</given-names></string-name> (<year>2016</year>). <article-title>Constraints of tones, vowels and consonants on lexical selection in Mandarin Chinese</article-title>. <source>Language and Speech</source>, <volume>59</volume>(<issue>1</issue>), <fpage>59</fpage>&#8211;<lpage>82</lpage>. DOI: <pub-id pub-id-type="doi">10.1177/0023830915578000</pub-id></mixed-citation></ref>
<ref id="B65"><label>65</label><mixed-citation publication-type="journal"><string-name><surname>Wilson</surname>, <given-names>C.</given-names></string-name>, &amp; <string-name><surname>Gallagher</surname>, <given-names>G.</given-names></string-name> (<year>2018</year>). <article-title>Accidental gaps and surface-based phonotactic learning: A case study of South Bolivian Quechua</article-title>. <source>Linguistic inquiry</source>, <volume>49</volume>(<issue>3</issue>), <fpage>610</fpage>&#8211;<lpage>623</lpage>. DOI: <pub-id pub-id-type="doi">10.1162/ling_a_00285</pub-id></mixed-citation></ref>
<ref id="B66"><label>66</label><mixed-citation publication-type="journal"><string-name><surname>Wu</surname>, <given-names>F.</given-names></string-name>, &amp; <string-name><surname>Kenstowicz</surname>, <given-names>M.</given-names></string-name> (<year>2015</year>). <article-title>Duration reflexes of syllable structure in Mandarin</article-title>. <source>Lingua</source>, <volume>164</volume>, <fpage>87</fpage>&#8211;<lpage>99</lpage>. DOI: <pub-id pub-id-type="doi">10.1016/j.lingua.2015.06.010</pub-id></mixed-citation></ref>
<ref id="B67"><label>67</label><mixed-citation publication-type="book"><string-name><surname>Xu</surname>, <given-names>Y.</given-names></string-name> (<year>2013</year>). <chapter-title>ProsodyPro&#8212;A tool for large-scale systematic prosody analysis</chapter-title>. In: <source>Tools and Resources for the Analysis of Speech Prosody</source>. (pp. <fpage>7</fpage>&#8211;<lpage>10</lpage>). <publisher-name>Laboratoire Parole et Langage, France</publisher-name>: <publisher-loc>Aix-en-Provence, France</publisher-loc>.</mixed-citation></ref>
<ref id="B68"><label>68</label><mixed-citation publication-type="book"><string-name><surname>Yip</surname>, <given-names>M.</given-names></string-name> (<year>2002</year>). <source>Tone</source>. <publisher-name>Cambridge University Press</publisher-name>. DOI: <pub-id pub-id-type="doi">10.1017/CBO9781139164559</pub-id></mixed-citation></ref>
<ref id="B69"><label>69</label><mixed-citation publication-type="book"><string-name><surname>Zhang</surname>, <given-names>J.</given-names></string-name> (<year>2000</year>). <chapter-title>Phonetic duration effects on contour tone distribution</chapter-title>. <source>Paper presented at the PROCEEDINGS-NELS</source>.</mixed-citation></ref>
<ref id="B70"><label>70</label><mixed-citation publication-type="thesis"><string-name><surname>Zhang</surname>, <given-names>J.</given-names></string-name> (<year>2001</year>). <source>The effects of duration and sonority on contour tone distribution&#8211;typological survey and formal analysis</source>. (Ph.D), <publisher-name>UCLA</publisher-name>.</mixed-citation></ref>
<ref id="B71"><label>71</label><mixed-citation publication-type="thesis"><string-name><surname>Zuraw</surname>, <given-names>K.</given-names></string-name> (<year>2000</year>). <source>Patterned exceptions in phonology</source>. (Ph.D), <publisher-name>UCLA</publisher-name>.</mixed-citation></ref>
<ref id="B72"><label>72</label><mixed-citation publication-type="journal"><string-name><surname>Zuraw</surname>, <given-names>K.</given-names></string-name> (<year>2002</year>). <article-title>Aggressive reduplication</article-title>. <source>Phonology</source>, <volume>19</volume>(<issue>3</issue>), <fpage>395</fpage>&#8211;<lpage>439</lpage>. DOI: <pub-id pub-id-type="doi">10.1017/S095267570300441X</pub-id></mixed-citation></ref>
<ref id="B73"><label>73</label><mixed-citation publication-type="journal"><string-name><surname>Zuraw</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name><surname>Hayes</surname>, <given-names>B.</given-names></string-name> (<year>2017</year>). <article-title>Intersecting constraint families: an argument for Harmonic Grammar</article-title>. <source>Language</source>, <volume>93</volume>(<issue>3</issue>), <fpage>497</fpage>&#8211;<lpage>548</lpage>. DOI: <pub-id pub-id-type="doi">10.1353/lan.2017.0035</pub-id></mixed-citation></ref>
</ref-list>
</back>
</article>