1. Introduction

It is widely acknowledged that adult second language (L2) learners experience difficulties when acquiring non-native sound systems. The L1 phonological system interferes with the accurate perception of novel sounds (Best, 1995; Escudero, 2005; Flege, 1995; Kuhl, 2000), and as a result, listeners might struggle with perceiving some L2 sounds accurately. Such ‘perceptual accent’ affects L2 word recognition and L2 listening, especially in aversive conditions (e.g., Lecumberri, Cooke, & Cutler, 2010), impeding communication and affecting L2 pronunciation (Tourville & Guenther, 2011).

One of the most influential and detailed models that explains L2 perception is Best’s Perceptual Assimilation Model (PAM/PAM-L2: Best, 1993, 1994, 1995; Best & Tyler, 2007). According to this model, naïve listeners assimilate L2 sounds to the L1 sounds that they perceive as most similar. If an L2 contrast is perceptually assimilated to two different native categories (two-category assimilation), discrimination is predicted to be excellent, and if contrasting L2 sounds are assimilated to the same L1 category (single category assimilation), the discrimination will likely be poor. The intermediate situation, in which one member of an L2 contrast is assimilated as a good version and the other as a poor version of a native category, is called category-goodness assimilation. In this case, the perceptual difficulty depends on the degree of difference in category goodness between the two L2 sounds and discrimination might be moderate to good. It could be also that one L2 sound is categorized while the other is not—the assimilation pattern that is called uncategorized-categorized and discriminated well. It must be noticed that over time, exposure to L2 input might lead to native phonetic categories splitting and the assimilation patterns being reorganized towards a more native-like perception (Escudero, 2005).

Previous research has shown that individuals differ in the way they process and acquire novel sound systems (Golestani & Zatorre, 2009; Kartushina & Frauenfelder, 2013; Kartushina et al., 2016; Perrachione et al., 2011). PAM does not address individual differences in non-native perception assuming that all listeners with a shared L1 would map non-native categories into native ones in a similar manner. However, individuals demonstrate different perceptual assimilation patterns. Mayr and Escudero (2010) investigated whether native English learners of L2 German vary systematically from each other in the way they map L2 sounds to native categories. Mayr and Escudero demonstrated that even at the initial stage of L2 acquisition there was substantial variability among learners: Some individuals perceived the tested vowel contrast in terms of a single native category, whereas other individuals perceived the same contrast as two or more native categories. The authors suggested that, among other factors, differences in L1 dialect might influence non-native perception. Flege (2016) and Flege and Bohn (2021) also observed that individuals might develop different L1 phonetic categories as a function of different L1 experiences, which leads to differences in L1-L2 mappings.

The explanation of how listeners with the same L1 might differ in the way they perceive sounds of their L1 has been offered by exemplar-based theories. According to these theories, phonetic categories are defined as clusters of experienced instances of speech sounds or exemplars. Because individuals process and store speech sounds with the acoustic, lexical, and social contexts, in which the sounds occur, phonetic categories are not abstract invariant entities, but rather clusters of specific tokens (Coleman, 2003; Ettlinger & Johnson, 2009; Pierrehumbert, 2001). Since phonetic categories are a result of the interplay between an individual’s cognitive, affective, and social differences and experiences, speakers of the same L1 have dissimilar L1 phonetic categories that lead to dissimilar perceptual profiles. Thus, naïve and L2 speech perception would depend on 1) the perceptual similarity of a new exemplar to the existing cluster of exemplars; 2) the presence of neighboring, competing clusters of exemplars; and 3) the density of the clusters, with more exemplars in a cluster attracting new tokens more strongly (Anderson, Morgan, & White, 2003). In this regard, exemplar-based theories support feature-based theories demonstrating that non-native languages are perceived through the lens of L1. However, exemplar-based theories do not treat L1 as a homogeneous entity and may therefore explain how individual differences in the same L1 influence L2 perception. It must be noted that the studies that test the assumptions of exemplar-based models in application to L2 speech acquisition are relatively new, and there is currently no L2 speech perception model that would empirically demonstrate the exemplar-based approach to L2 perception. Therefore, in this paper, we consider exemplar-based theories only as an extension of the existing L2 speech models, with a primary focus on PAM.

If L1 categories vary across individuals as a result of different linguistic experiences and processing mechanisms, this variability would be reflected in the size, shape, and placement of each phonetic category in the psychoacoustic space. The combination of these parameters makes up an L1 perceptual profile that would be unique for every individual. Attempts to measure the size or the distribution of tokens within an L1 category—category compactness or precision—have been made before for speech production. Perkell et al. (2004) examined the perception and production of L1 English vowels and showed that a relatively little within-vowel variability in production is associated with greater discrimination ability. Franken et al. (2017) demonstrated similar results with L1 Dutch. Kartushina and Frauenfelder (2013, 2014) measured several L1 Spanish vowels and related individual compactness indices to the perceptual accuracy of similar L2 French vowels. Their results showed that speakers with broadly distributed L1 categories were less successful at perceiving L2 vowels than speakers with more compact L1 categories. The effect of L1 category compactness seems to be evident even at the earliest stages of L2 speech training, as demonstrated in another study by Kartushina et al. (2016). Most recently, Flege and Bohn (2021) have proposed the revised Speech Learning Model (SLM-r), where a similar idea of category compactness is described as ‘category precision.’ The L2 category precision hypothesis states that more precisely defined L1 categories facilitate the perception of the phonetic differences between L1 and L2 sounds at the onset of L2 acquisition. Category precision is operationalized as “the variability of acoustic dimensions measured in multiple productions of a phonetic category” (p. 36).

Building on this research, the present study aims to further explore the concept of category compactness but rather than measuring category variability in production, we measure the compactness or precision of an L1 category in perception. We then relate this measure to L2 perception, more specifically to listeners’ ability to distinguish a novel non-native vocalic contrast. Avoiding production measures altogether might enable us to obtain a more direct representation of individual differences in L1 perception without the potential bias of individual differences in articulatory skill.

1.1 The Russian contrast /i – ɨ/

To understand how L1 phonetic category compactness influences the ability to perceive the acoustic distance between two non-native sounds, we focused on measuring the L1 Spanish category /i/ and the non-native Russian contrast /i – ɨ/ (Figure 1).

Figure 1
Figure 1

The vowel inventory of Spanish (Ladefoged & Johnson, 2010) and Russian (Yanushevskaya & Bunčić, 2015).

L2 learners of Russian whose native language does not have the /i – ɨ/ contrast typically struggle with it in both perception and production (Andryushina, 2014; Shutova & Orekhova, 2018). This pattern can be observed with Spanish learners as well. Based on the classroom observation, Klimova, Yurchenko, Cherkashina, and Kulik (2017) report that even at the higher levels of language proficiency, both Russian /i/ and /ɨ/ are often assimilated to Spanish /i/. This observation goes along with the theoretical explanation offered by the L2 perception models that predict difficulty with non-native sounds that are similar to L1. In terms of acoustics, Russian /i/ overlaps with Spanish /i/ (Figure 2). Russian /ɨ/ is close but more dissimilar acoustically from Spanish /i/. Russian /ɨ/ can be also described as what Flege (1995) calls a complex sound for acquisition: 1) /ɨ/ is not commonly found in the world’s languages (only 16% of the world languages have it; compare to 92% for /i/, Moran & McCloy, 2019), and 2) it is a difficult segment for monolingual children to acquire (Zharkova, 2004; Maryutina, 2021). Russian /ɨ/ is distinct from Russian /i/ in terms of the F2 value that can be as low as 1600 Hz. Such low F2 is often explained by the secondary articulation when the back of the tongue is raised—velarization (Padgett, 2001).

Figure 2
Figure 2

Variability of Russian /i/ and /ɨ/ – solid line, based on the F1-F2 values from Holden and Nearey (1986), and variability of Spanish /i/ – dashed line, based on the F1-F2 values from Chládková and Escudero (2012). The studies use a different methodology, and the above figure is presented just for a rough comparison between the Russian and Spanish vowels of interest.

Several studies have shown that acoustic similarity is a reliable predictor of cross-language speech perception (Alispahic, Mulak, & Escudero, 2017; Escudero, 2005; Gilichinskaya & Strange, 2010). Russian /i/ is acoustically similar to Spanish /i/, and, thus, according to PAM, is likely a good exemplar of Spanish /i/ predicting poor to median discrimination. Russian /ɨ/ is also similar to Spanish /i/, yet, it occupies the psychoacoustic space that is not taken by any Spanish vowel. In this sense, it might be labeled as an uncategorized sound (uncategorized-categorized category assimilation). Another possibility is category-goodness assimilation when Russian /ɨ/ is perceived as a deviant of Spanish /i/. In this study, we predict that the assimilation pattern will depend on the compactness of Spanish /i/. In listeners with the more compact Spanish category /i/, uncategorized-categorized assimilation would take place with good discrimination of a target contrast. In listeners with the less compact (large) Spanish /i/, category-goodness assimilation would take place, i.e., poor to moderate discrimination between Russian /i/ and /ɨ/ (Figure 3).

Figure 3
Figure 3

The compactness of the Spanish L1 vowel category /i/ (in blue) affects the perception of the novel Russian contrast /i – ɨ/ (in pink). Individual A is more likely to discriminate between Russian /i/ and /ɨ/ than individual B.

2. Methodology

2.1 Participants

All Spanish-speaking participants were recruited through CloudResearch, formerly known as TurkPrime, a participant-sourcing platform for online research. Of the 109 participants who took part, 91 completed all the tasks. A further 23 participants were excluded from the data analysis either because they did not fit the study’s criteria—i.e., were not born in Spain or had studied Russian before—or because of abnormal data: unrealistic timing (less than 30 minutes) and deviant scores (e.g., unrealistic reaction times). We also excluded participants who were highly proficient in other foreign languages as it could influence their perception of the Russian contrast /i – ɨ/. Studies on the effects of L2 experience in L3 perception report that the general experience of learning a foreign language gives a global advantage in phonological perception (Chang, 2013).

The remaining 68 participants were European Spanish true (N = 11) and functional (N = 57) monolinguals without any prior knowledge of Russian. According to Best and Tyler (2007), functional monolinguals are those who were raised in monolingual homes without learning another language before attending school. They received only a basic knowledge of English at school (i.e., basic classroom instruction and grammar), have not resided in an English-speaking country for more than a month, and used only Spanish in their daily lives. All participants were screened for knowledge of languages other than Spanish by being presented with relevant questions. Proficiency in L2s was evaluated through self-assessment adapted from the CEFR self-assessment grid of reference levels (Council of Europe, 2001). Eleven participants reported being unable to hold a conversation in any foreign language. Fifty-seven participants reported basic knowledge of English that did not surpass the A1-A2 levels of proficiency.1 Participants also reported all foreign languages spoken by their family: e.g., many participants reported French and Italian as languages they heard from other family members without being able to speak or understand these languages (Table 1).

Table 1

Characteristics of the Spanish participants (N = 68).

Measure M SD Min Max
Age 41 11 19 62
Number of L2s studied 1 1 0 5
Number of L2s spoken by the family 1 1 0 3

The study also recruited 16 monolingual Russian speakers to provide baseline data for the rated dissimilarity task that required distinguishing between Russian /i/ and /ɨ/ (Table 2). Russian participants were recruited and screened online, with the help of several social networks, principally Facebook. All Russian speakers were born and lived in Russia at the moment when the study took place. The data from the Russian participants were further analyzed and compared to the Spanish participants’ performance.

Table 2

Characteristics of the Russian participants (N = 16).

Measure M SD Min Max
Age 24 6 18 47
Number of L2s studied 1 1 0 1
Number of L2s spoken by the family 1 1 0 1

2.2 Instruments

As a part of the experiment, all Spanish participants were asked to fill out a demographics and language history questionnaire. The measures of phonological short-term memory and acoustic memory were employed as control variables. Previous research has shown that these types of memory contribute to enhanced non-native perception, and therefore, it was important to account for this source of individual variability (Aliaga-Garcia, Mora, & Cerviño-Povedano, 2011; MacKay, Meador, & Flege, 2001).

2.2.1 Serial nonword recognition task

Phonological short-term memory was assessed with a serial nonword recognition task developed by Cerviño-Povedano and Mora (2011). The only difference between their task and the present one is that we used Spanish nonwords instead of Catalan nonwords. Participants heard 24 pairs of nonword strings increasing in length and decided whether the order of the nonwords in the sequences was the same or different. The nonwords were developed using Syllabarium corpus (Duñabeitia et al., 2010). Only the CVC syllables of high frequency were selected, equal to or greater than 1000 on the frequency index. The selected 160 syllables were recorded at the Phonetic Laboratory at the University of Barcelona by a native female speaker of European Spanish as a part of a carrier sentence Yo digo _ / Yo digo _ una vez. (“I said _ / I said _ again”). The recorded CVC nonwords were extracted from the sentences and processed in Praat (Boersma & Weenink, 2013) to normalize for amplitude (70dB) and remove alternating current. The nonwords were normalized for length, which was determined by taking the average length of all nonword stimuli (650 ms). The best tokens were selected based on auditory judgments and acoustic measurements. Every trial contained a variety of vowels and consonantal contexts.

    1. (1)
    1. Example of the same sequence:
    1. a.
    1. bul tad som fes sil
    1. b.
    1. bul tad som fes sil
    1. (2)
    1. Example of a different sequence (the 3rd and the 4th nonwords are switched):
    1. a.
    1. bul tad som fes sil
    1. b.
    1. bul tad fes som sil

In this task, a weighted score was obtained by assigning five, six, and seven points to the correct responses of five-, six- and seven-item sequences with a maximum score of 144 points.

2.2.2 Target sound recognition task

Acoustic memory was assessed with a target sound recognition task (Li, Cowan, & Saults, 2013; Safronova & Mora, 2012). Participants listened to two-, three-, and four-item sound sequences (ISI = 300 ms), followed by a target sound presented 3000 ms later that had either been presented previously in the sequence (same trial) or not (different trial) (Figure 4).

Figure 4
Figure 4

The example of a 3-item same trial on the left and a 3-item different trial on the right.

The stimuli consisted of 101 Spanish CV syllables from the Syllabarium corpus recorded in the same manner as the nonwords in the serial nonword recognition task. The CV syllables were normalized for length and manipulated through frequency rotation – speech rotation (Scott et al., 2009). This technique preserves the acoustic complexity of the stimuli while making it impossible to encode phonologically. The stimuli were presented in three blocks with each block containing eight trials in randomized order. The weighted score was computed by assigning scores of two, three, and four points to the correct responses of two-, three-, and four-item sequences, respectively, with a maximum score of 72 points.

2.2.3 Goodness rating task

To measure the compactness of the native Spanish vowel /i/ in the perceptual space, we administered a goodness rating task. Using Klatt’s synthesizer (Klatt, 1980), we created 28 vowels that were distributed across a mel-scaled F1*F2 psychoacoustic space (Figure 5). The prototypical Spanish /i/ vowel was selected based on the values reported by Chládková and Escudero (2012) for a European Spanish male speaker and equated to F1 = 286 Hz (386 mels) and F2 = 2367 Hz (1665 mels). The 28 variants formed four vectors around the prototypical /i/. Variants were obtained by modifying F1, F2, or both at the same time. The difference in F1 values between the variants was 30 mels and the difference in F2 values 50 mels (see the Appendix for the exact values of the variants). F1-F2 pairings, for which the value of F1 was equal or higher than F2 (i.e., where F1 would become the F2 and vice versa) were excluded, as were tokens outside the range of the possible human vowel space (i.e., those with very low F1 and F2 values). The frequencies of the third through sixth formants were set to the following values for all vowel tokens: F3 = 3010 Hz, F4 = 3300 Hz, F5 = 3850 Hz, and F6 = 4990 Hz. The bandwidths used were: B1 = 60 Hz, B2 = 90 Hz, B3 = 150 Hz, B4 = 200 Hz, B5 = 200 Hz, and B6 = 1000 Hz. The stimuli were 500 ms in duration. The fundamental frequency began at 112 Hz, rose to 132 Hz over the first 100 ms, and dropped to 92 Hz over the next 400 ms to produce a natural-like rise-fall contour.

Figure 5
Figure 5

The 28 synthesized vowels are distributed across a mel-scaled F1*F2 psychoacoustic space with a prototype /i/ in the center and the corresponding values.

Participants were presented with one variant at a time and had to decide how well the variant matched their representation of Spanish /i/. Specifically, the instructions stated: “In this task, you will hear various vowels produced by a male speaker. Your task is to decide whether each vowel you have heard sounds like /i/ as in the word sin” (without: “sin” /sin/). Participants were asked to rate each variant by marking the degree of mismatch between the variant and /i/ on an intuitive rating scale (Figure 6: Jilka, 2009): the left edge of the scale (“Similar”) was associated with a good exemplar of /i/ and the right edge of the scale (“Diferente”) with a poor exemplar. For the ease of statistical analysis, the scale was divided into ten segments that provided the measures from 1 to 10 for each variant: A score of 1 would mean “sounds very similar to Spanish /i/” and a score of 10 would mean “sounds very different from Spanish /i/.” To ensure task reliability, each participant had to rate each variant four times for a total number of 112 trials.

Figure 6
Figure 6

Intuitive rating scale (adapted from Jilka, 2009). Here the score for a variant is 10 (the black dot is placed on the right edge of the scale, next to “Diferente”) meaning “sounds very different from Spanish /i/.”

To calculate the compactness index of /i/ for each participant, we followed several steps. First, we counted the number of variants consistently selected as good exemplars of /i/, with the rating goodness greater than five, a middle point of the scale (Figure 11 shows an average rating for each variant). The maximum number of variants to select was 28, which included all possible variants of /i/. We only counted the variants that were selected as good exemplars consistently, i.e., out of the four times the variant was presented, it was rated higher than five at least three times. Otherwise, we assumed the rating was not reliable and did not count this variant. Thus, at this step, a compactness index of six would indicate a rather compact phonetic category; i.e., fewer variants are selected as good exemplars of this category (Figure 7). On the other hand, a compactness index of 15 would signify a large/less compact phonetic category; i.e., more variants are selected as good exemplars of /i/.

Figure 7
Figure 7

The first step in calculating compactness: Participant A has a less compact category (15 variants) than Participant B (6 variants).

The next step in calculating the compactness index was to assign different values to the variants based on their distance from the prototypical /i/. This step was necessary to account for the situations when fewer variants were selected further from the center. For example, because a participant selected only six variants, the phonetic category can be defined as compact. However, if these variants are far from the prototypical /i/, such a category cannot be counted as compact. To consider this, each variant was assigned a value from 1 to 5 based on how many steps away it was located from the prototype (Figure 5). The final weighted score that represented the compactness index was the sum of the values of the selected variants added, with the maximum score equal to 69 (calculated as follows: 8*1+8*2+6*3+3*4+3*5 = 69). In our previous example (Figure 6), participant A would have a compactness index equal to 21 (calculated as follows: 8*1+5*2+1*3 = 21) and participant B a compactness index equal to 5 (calculated as follows: 5*1 = 5). Notice that the prototype in the middle has a value of zero and is not included in the formula.

Another concern in calculating the compactness index was the shape of a phonetic category: A category elongated along one dimension (e.g., along F1) could affect the perception of a non-native contrast that has particular acoustic characteristics (e.g., differ in F2 but not so much in F1). However, as we were processing the data, none of the participants demonstrated a category that would be elongated alongside one dimension, as often observed in production. Participants tended to have perceptual categories shaped as a sphere, more or less balanced across F1 and F2 dimensions. We did not identify a prototypical exemplar for each individual as often the physical center of a category fell in-between the tokens (e.g., Participant B in Figure 7). For such a measure to be captured a better resolution is needed, i.e., more tokens should be synthesized and evaluated.

2.2.4 Rated dissimilarity task

A rated dissimilarity task was designed to assess the degree of perceived dissimilarity between non-native Russian /i – ɨ/. Two female and two male native speakers of the Central Russian dialect from Moscow recorded the target vowel contrasts in a /bVt/ context as part of a carrier sentence Я сказал(а) _ / Я сказал(а) _ опять (“I said_” / “I said_ again”). The stimuli were digitally recorded (Praat and Edirol UA-25 USB Audio Capture device) in a soundproof booth at a sampling rate of 44.1 kHz with a 16-bit resolution on a mono channel in the Phonetics Laboratory at the University of Barcelona. The selected tokens were extracted from the sentences and processed using Praat to normalize for pitch and amplitude. The word-final /t/ release burst was removed, and the offset of the spliced portion occurred when the amplitude of the vowel waveform began to decrement with the exact cut at a zero-crossing. The original duration values of the tokens were preserved to make vowels sound as natural as possible and ranged from 310–370 ms (the duration showed no statistically significant effect on perception). The best tokens per speaker were selected based on auditory judgments and acoustic measurements. Each sound category was represented by at least five tokens to encourage participants to respond in a general rather than in a token-specific manner.

The stimuli were organized into trials where each token A and B within a pair were spoken by the same or different individual(s) and presented with an inter-stimulus interval of 700 ms, with the token order within a pair counterbalanced. The four test blocks consisted of eight change (/i – ɨ/ or /ɨ – i/) and eight no-change (/i – i/ or /ɨ – ɨ/) trials in a randomized order per block; eight distractor trials with other vowel contrasts (e.g., /a – o/) were randomly included in each block, with the total number of tokens equal to 96.

Participants were asked to assess the difference between two vowels by marking the degree of mismatch on an intuitive rating scale (Figure 6: Jilka, 2009): “Similar” indicated that two vowels sound the same (small or no distance) and “Diferente” indicated that two vowels sound different (large distance). Again, for computational purposes, the intuitive scale was divided into ten segments, with 1 indicating “similar” and 10 indicating “different.” A score of 1 would mean perceiving less distance between the contrasting sounds: /i/ and /ɨ/ sound the same, and a score of 10 would represent perceiving more distance: /i/ and /ɨ/ sound different. Only /i – ɨ/ or /ɨ – i/ pairs were taken into account when calculating the perceived similarity scores. We did not include /i – i/ pairs in the analysis as participants responded to this contrast in an overwhelmingly similar manner marking the distance as 1.

2.3 Procedure

The experimental design consisted of a single testing session conducted in Spanish on the same day and lasted for one hour. Each participant received a URL to the experiment’s website on the PsyToolkit platform (Stoet, 2010, 2017). When entering the website, participants were taken to an information sheet and informed-consent screens. They also had to declare the type of technology they were using to perform the tasks to exclude less optimal settings: e.g., participants were not allowed to use mobile phones and tablets to complete the experiment. Before beginning the audio portion of the experiment, the sound checks were done to ensure the sound quality and the comfortable volume level. At the end of the experiment, each participant received an automatically-generated individual code that they used to receive compensation.

3. Results

All statistical analyses were performed using R 3.5.0 (R Core Team, 2018) with the help of the following packages: psych (Revelle, 2019), betareg (Cribari-Neto & Zeileis, 2010), Matrix (Bates & Maechler, 2021), ggplot2 (Wickham, 2016), and car (Fox & Weisberg, 2019).

3.1 Overview of the data

The descriptive statistics across all tasks are summarized in Table 3.

Table 3

Summary of performance across all tasks (N = 68).

Task Max score Min Max Mean Median SD
Serial Nonword Recognition Task 144 22.00 137.00 86.65 84.50 24.07
Target Sound Recognition Task 72 16.00 72.00 54.24 54.00 10.54
Rated Dissimilarity Task 10 4.69 9.94 8.39 8.80 1.26
Goodness Rating Task 69 8.00 56.00 31.35 30.5 10.04

The score distributions for the tasks were normal, except for the rated dissimilarity task (Shapiro-Wilk test: W = 0.90, p < .001). The negative skew of –1.01 reflected the fact that most of the participants nearly excelled in discriminating between Russian /i/ and /ɨ/ (Figure 8).

Figure 8
Figure 8

The distribution of the rated dissimilarity task scores is negatively skewed on the left (skew = –1.01; kurtosis = 0.22).

Even though the Spanish participants obtained unexpectedly high dissimilarity rating scores for the Russian /i – ɨ/ contrast when compared to the Russian participants, they were considerably outperformed (Figure 9): The difference between the two groups was significant, as confirmed by the independent one-tailed Wilcoxon test: W = 923, p < .001. None of the Spanish participants had a perfect score of 10 which represented the maximum distance between /i/ and /ɨ/.

Figure 9
Figure 9

The Russian participants obtained significantly higher perceived dissimilarity scores for /i – ɨ/ than the Spanish participants.

The acoustic distance between /i/ and /ɨ/ in each dissimilar pair contributed significantly to the Spanish participants’ degree of perceived dissimilarity: The greater the acoustic distance was the more dissimilar the vowels were judged. Acoustic distance in F2 contributed more to perceived dissimilarity (B = 0.00006, SE = 0.004, p = 0.008), then acoustic distance in F1 (B = 0.003, SE = 0.0003, p = 0.001), which is consistent with the previous observation that /i/ and /ɨ/ primarily differ in terms of F2 (Padgett, 2001).2

3.2 Category compactness and L2 perception

We next assessed participants’ degree of category compactness (Figure 10). Figure 11 shows how distant or close each variant of the prototypical Spanish /i/ sounded to participants. A lower number signifies that the variant was perceived as being acoustically closer to /i/, and a higher number is acoustically further from /i/. Participants demonstrated remarkable perceptual sensitivity when judging each variant’s distance from the prototype: When checked with simple linear regression, a significant effect of distance was observed ( = 0.58, p < .001). Participants were more sensitive to the differences in F1 (the average perceptual distance between the variants was 0.67 points, SD = 0.56) than to the differences in F2 (the average perceptual distance between the variants was 0.51 points, SD = 0.46). As expected, when both formants were manipulated, the distance between the variants was perceived as the greatest (the average perceptual distance between the variants alongside the diagonal vectors was 0.74 points, SD = 0.58).

Figure 10
Figure 10

The distribution of the compactness scores from 68 participants.

Figure 11
Figure 11

The size of the circles signifies the perceived psychoacoustic distance of a given variant from a prototypical /i/, with smaller circles indicating a smaller distance (perceived as a good exemplar of /i/). The maximum distance is 10.

Figure 12 shows a weak negative relationship between Compactness and Perceived Dissimilarity: As the size of a category increases, the ability to perceive dissimilarity between two contrasting non-native sounds decreases.

Figure 12
Figure 12

A negative linear relationship between Compactness and Perceived Dissimilarity: Participants with more compact perceptual categories seem to perceive the distance between two unfamiliar sounds better.

We fitted a beta regression model with Perceived Dissimilarity as a dependent variable and Compactness, Phonological Short-Term Memory, Acoustic Memory, and L2 Experience (the number of L2 studied in the past) as predictors (Table 4). This type of generalized linear regression does not assume the normal distribution and works well when a dependent variable is a rating scale, as it is in our case. To use a beta regression, the dependent variable must vary between 0 and 1 with no observation equal to zero and/or one. Thus, first, we had to create a proportional variable of Perceived Dissimilarity using the following formula from Cribari-Neto and Zeileis (2010): (y * (n – 1) + 0.5)/n, where y is a perceived dissimilarity score and n is the length of the vector/variable. This model showed that Phonological Short-Term Memory and L2 Experience did not affect Perceived Dissimilarity at a significant level (p = .49 and p = .12 respectively), whereas Acoustic Memory did (p = .008). Compactness affected Perceived Dissimilarity at a border significance level (p = .058), decreasing it by about 0.003 points. Removing Phonological Short-Term Memory and L2 Experience from the formula and adding an interaction term between Compactness and Acoustic memory improved the fit of the model from pseudo- = 0.19 to pseudo- = 0.22, with Compactness reaching a significance level this time (p = .014). The interaction between Compactness and Acoustic memory also reached significance (p = .002).

Table 4

Results for the beta regression model #2 with proportional Perceived Dissimilarity as a dependent variable.

Predictor B SE p
Intercept –4.491 0.02301 0.0002
Compactness –0.0001588 0.00006482 0.0143*
Acoustic Memory –0.0006156 0.0004079 0.1313
Compactness: Acoustic Memory 0.000002691 0.000001175 0.0220*

The interaction between Compactness and Acoustic memory showed that the effect of Compactness tended to vary depending on the acoustic memory capacity. Participants with the lower acoustic memory capacity were more likely to rely on the size of their native category to differentiate between /i/ and /ɨ/, whereas participants with the greater acoustic memory capacity were not affected by Compactness (Figure 13).

Figure 13
Figure 13

Participants with poorer acoustic memory tend to rely on compactness to differentiate between /i/ and /ɨ/, whereas participants with greater acoustic memory tend to rely on acoustic memory only.

4. Discussion

Aligned with a large body of previous research, our results showed individual differences in the perception of a non-native vowel contrast. Even though the majority of the Spanish participants in our study demonstrated high accuracy in distinguishing between Russian /i/ and /ɨ/, their performance varied at an individual level. Both Russian vowels occupy more or less the same perceptual space where the Spanish vowel inventory has a single category /i/, which constitutes a perceptual challenge for some listeners but not for others. Some participants perceived little or no acoustic distance between Russian /i/ and /ɨ/, and some participants perceived /i/ or /ɨ as two distinct sounds.

Our findings suggest that the degree of perceptual sensitivity in an unfamiliar language might be connected to the size of L1 phonetic categories. It is likely that at the onset of L2 speech acquisition, compact native categories support non-native perception, and on the contrary, large native categories may hinder it. The reason could be simply quantitative: One large native category might occupy the psychoacoustic space of two or more non-native categories making L2 perception difficult. Best (1995) refers to this phenomenon as either single category assimilation (poor perception) or category goodness assimilation (poor to moderate perception). It seems that in our study listeners with a large Spanish category /i/ followed the category-goodness assimilation pattern: Russian /i/ was perceived as a better exemplar of Spanish /i/ and Russian /ɨ/ as a poor or deviant exemplar of the same native category. These participants perceived a small acoustic distance between the two non-native sounds and struggled to differentiate between them. On the other hand, it seems that listeners with a compact Spanish category /i/ followed the uncategorized-categorized assimilation pattern: They perceived Russian /i/ as a good exemplar of Spanish /i/ and perceived Russian /ɨ/ as an uncategorized sound (definitely not Spanish /i/) that allowed for better discrimination of this non-native contrast. Thus, we argue that individuals might have different perceptual assimilation patterns in a novel language based on the size of their native phonetic categories. That being said, our results should be interpreted with caution as the relationship between non-native perception and L1 compactness was rather weak.

It has been demonstrated previously (Kuhl et al., 2006; Kuhl et al., 2008) that L1 experience sharpens L1 perception but interferes with L2 speech learning in adulthood. In terms of phonetic category size, it seems that L1 perception benefits from categories that are built through exposure to more exemplars since it allows a listener to cope with a greater degree of variability (McMurray, Aslin, & Toscano, 2009; Sumner, 2011). Yet, such large robust L1 categories could make it more difficult to initially perceive sounds that fall outside these categories and potentially form new categories. It seems that having more accurate non-native perception is of a fundamentally different nature from having more accurate non-native production: If being flexible in production results in producing a variety of non-native sounds (Christiner & Reiterer, 2015; Delvaux, Huet, Piccaluga, & Harmegnies, 2014), non-native perception benefits from precision as opposed to flexibility. Because of the categorical nature of perception, non-native sounds that fall within large categories get averaged out over other members of this category (the perceptual magnet effect, Kuhl, 1993; Kuhl et al., 2008). Thus, it is more beneficial for non-native perception to rely on a large number of compact categories (what a polyglot would probably have) as opposed to fewer larger categories.

One interesting question is whether L1 category compactness is an endogenous factor and related to individual differences or whether it is affected by linguistic factors (e.g., the number of languages studied/spoken). Previous research on L1 phonetic drift has shown that adult L2 learners show systematic phonetic changes in their L1; these changes are especially pronounced in novice learners (Chang, 2013, 2019; Kartushina et al., 2016). It means that both L1 and L2 are dynamic systems undergoing continuous changes and even a small amount of language exposure is enough to trigger L1 phonetic drift. In the present study, foreign language experience (studying foreign languages in the past) did not contribute significantly to non-native perception, yet, there was a negative trend: The more languages an individual had learned in the past, the poorer was their ability to perceive the acoustic distance between non-native vowels. In other words, true monolinguals seemed to be more successful (although not at a significant level) at distinguishing between Russian /i/ and /ɨ/ than functional monolinguals. A large body of evidence suggests that indeed previous L2 experience does not guarantee enhanced non-native perception (Kennedy, 2012; Kennedy & Trofimovich, 2010; Venkatagiri & Levis, 2007). The amount of language exposure necessary to develop non-native phonological awareness might be important in this regard (Shoemaker, 2014). A certain level of L2 proficiency might be necessary for previous language learning to have a beneficial effect on further (L3) perceptual learning. It could be that previous L2 learning experience of functional monolinguals in our study enlarged their native perceptual categories. Since the functional monolinguals did not continue learning their L2s and, thus, never reached a proficiency level higher than lower-intermediate, the process of category split (the division of a single native category that handles both instances of a non-native contrast into two new categories; Mayr & Escudero, 2010) never took place leaving them with slightly larger native categories. In other words, previous L2 experience of functional monolinguals resulted in a ‘looser’ or less compact native category /i/, which included more ‘deviant’ variants of /i/ (more allophones of /i/), than the same category of the true monolinguals who had no previous L2 experience and, thus, preserved more compact categories.

According to exemplar theory (Pierrehumbert, 2001), variability is essential for defining category boundaries and size, such as which tokens are not /i/. A learner must hear a variety of exemplars to define the psychoacoustic space occupied by a particular phonetic category. Limited input in a foreign language might be responsible for including too many noncontrastive (or perceived as noncontrastive) variants into a native category, which might impede the processing of other novel sounds. Lev-Ari’s (2017) computational simulations also offer some insights into the mechanisms underlying the effects of limited L2 proficiency on initial perceptual learning. She demonstrates that at the onset of L2 acquisition learners fail to attend to relevant acoustic dimensions, which hinders the acquisition of novel phonetic categories. At the early stage of learning, there is a tendency toward forming large categories, each comprised of several non-native categories, and only later, with an increased amount of input, large categories may split into two or several smaller categories. Thus, the amount and quality of input matter when novel phonetic categories are formed. In this sense, although somewhat counterintuitively, when compared to functional monolinguals, true monolinguals might enjoy a perceptual advantage for distinguishing a novel contrast since their native categories have not been ‘contaminated’ with irrelevant variants. Following this logic, we would expect a significant difference between true monolinguals and functional monolinguals in terms of category compactness: True monolinguals should have significantly more compact categories than functional monolinguals. Even though such a trend was detected in the present study, it did not reach significance. Since the effect of previous L2 experience on category compactness was not the focus of this study, we did not target specific populations of participants in this regard, such as a balanced group of functional and true monolinguals. Therefore, we only had 11 true monolinguals, which made the comparison to functional monolinguals (57 participants) statistically challenging.

In contrast to our study, Lengeris and Hazan (2010) did not find a relationship between L1-based individual differences and non-native perception. They investigated how individuals from the same L1 background (Greek) vary in their ability to learn to perceive an L2 contrast (English). They found no connection between individual L1 profiles and L2 English perception. It could be that L1-based differences only play a role at the initial stage of perceptual learning and later, as the learning progresses, other factors take over (e.g., phonological short-term memory, motivation). In Lengeris and Hazan’s study, all participants had 10 to 12 years of English instruction with the language proficiency level described as ‘moderately high.’ Thus, at a later stage of the language acquisition category compactness might not play an important role in non-native perception. Another explanation for Lengeris and Hazan’s results is that L1-based individual differences in perception, specifically, category compactness, affect individuals differently. Our results showed that individuals with greater acoustic memory were not affected by compactness when distinguishing between two novel sounds. On the other hand, individuals with poorer acoustic memory relied heavily on category compactness when completing the same task. These individual differences in task performance could be due to different processing strategies: top-down (using compact L1 phonetic categories) versus bottom-up (using acoustic memory to discern the subtle phonetic-acoustic difference between sounds). In either case, taken together, our findings suggest that individuals can successfully distinguish between novel non-native contrasts even if they were not exposed to these contrasts at an early age.

The present study has some limitations that should be taken into account when interpreting the findings. The L1 category compactness index has been based on one native category: The ideal scenario would be measuring at least three native phonetic categories. Measuring more phonetic categories is not only important for understanding how the categories vary within a single perceptual space but also for relating the categories and their size to the size of the perceptual space itself. It must be also acknowledged that individual perceptual measures fluctuate depending on many factors, such as time of the day or the hormone level, and therefore it would be beneficial to repeat the same tests several times. Finally, in the present study, we have not explored the possibilities that a longitudinal design might offer. Following the participants as they acquire Russian over time would allow us to observe how category compactness changes with growing proficiency and what factors influence this process.

5. Conclusion

In this study, we investigated the association between the size (compactness) of an L1 vowel category and naïve perception. The results show that, overall, listeners with a more compact L1 category perceive a difficult non-native contrast better, that is, perceive a greater acoustic distance between the two non-native sounds. These findings confirm previous research on the role of L1 category compactness in production and its contribution to L2 perception. In our study, individuals with greater acoustic memory were not affected by the size of their native categories as much as individuals with poorer acoustic memory. In other words, individuals with greater acoustic memory did not benefit from compact phonetic categories as much.

This study offers insights into individual variability in L1 perception, which to date is still not well understood. Our findings are consistent with exemplar-based theories that suggest that speakers of the same language might differ in how their native phonetic categories are represented. It is for future studies to explore the contribution of this L1-based variability to subsequent L2 speech learning.

Additional File

The additional file for this article can be found as follows:


The F1/F2 values for the prototype Spanish /i/ (first raw) and the 28 variants. DOI: https://doi.org/10.16995/labphon.6431.s1


  1. Other languages that the participants indicated as being familiar with included (here in the alphabetic order): Basque, Catalan, French, Galician, Italian, and Valencian. [^]
  2. Some participants reported not studying or speaking any foreign languages (true monolinguals, N = 11) and some participants studied foreign languages before (functional monolinguals, N = 57). An independent two-tailed Wilcoxon test revealed significant differences between the groups (W = 193, p .04). Figure 9 shows that the experience of studying foreign languages in the past influenced Perceived Dissimilarity negatively. [^]

Competing interests

The authors have no competing interests to declare.


Aliaga-Garcia, C., Mora, J. C., & Cerviño-Povedano, E. (2011). L2 speech learning in adulthood and phonological short-term memory. Poznań Studies in Contemporary Linguistics, 47(1), 1–14. DOI:  http://doi.org/10.2478/psicl-2011-0002

Alispahic, S., Mulak, K. E., & Escudero, P. (2017). Acoustic properties predict perception of unfamiliar Dutch vowels by adult Australian English and Peruvian Spanish listeners. Frontiers in psychology, 8, 52. DOI:  http://doi.org/10.3389/fpsyg.2017.00052

Anderson, J. L., Morgan, J. L., & White, K. S. (2003). A statistical basis for speech sound discrimination. Language and Speech, 46(2–3), 155–182. DOI:  http://doi.org/10.1177/00238309030460020601

Andryushina, E. A. (2014). Filologicheskie nauki. Voprosy teorii I praktiki. Chast’ 2. – Philological Sciences. Issues of Theory and Practice. Part 2.

Bates, D., & Maechler, M. (2021). Matrix: Sparse and dense matrix classes and methods. R package version 1.3-2. https://CRAN.R-project.org/package=Matrix

Best, C. T. (1993). Emergence of language-specific constraints in perception of non-native speech: A window on early phonological development. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage & L. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 289–304). Springer, Dordrecht. DOI:  http://doi.org/10.1007/978-94-015-8234-6_24

Best, C. T. (1994). The emergence of native-language phonological influences in infants: A perceptual assimilation model. The Development of Speech Perception: The Transition from Speech Sounds to Spoken Words, 167(224), 233–277.

Best, C. T. (1995). A direct realist view of cross-language speech perception. Speech Perception and Linguistic Experience, 171–206.

Best, C. T., & Tyler, M. D. (2007). Non-native and second-language speech perception: Commonalities and complementarities. In M. J. Munro & O.-S. Bohn (Eds.), Second language speech learning: The role of language experience in speech perception and production (pp. 13–34). Amsterdam: John Benjamins. DOI:  http://doi.org/10.1075/lllt.17.07bes

Boersma, P., & Weenink, D. (2013). Praat: Doing Phonetics by Computer [Computer Program]. Version 5.3. 51. Online: http://www.praat.org

Bongaerts, T., van Summeren, C., Planken, B., & Schils, E. (1997). Age and ultimate attainment in the pronunciation of a foreign language. Studies in Second Language Acquisition, 447–465. DOI:  http://doi.org/10.1017/S0272263197004026

Cerviño-Povedano, E., & Mora, J. C. (2011). Investigating Catalan learners of English over-reliance on duration: Vowel cue weighting and phonological short-term memory. In K. Dziubalska-Kołaczyk, M. Wrembel & M. Kul, (Eds.), Achievements and perspectives in the acquisition of second language speech: New Sounds 2010 (pp. 53–64). Bern: Peter Lang.

Chang, C. B. (2013). A novelty effect in phonetic drift of the native language. Journal of Phonetics, 41(6), 520–533. DOI:  http://doi.org/10.1016/j.wocn.2013.09.006

Chang, C. B. (2019). Phonetic drift. In The Oxford handbook of language attrition. Oxford University Press. DOI:  http://doi.org/10.1093/oxfordhb/9780198793595.013.16

Chládková, K., & Escudero, P. (2012). Comparing vowel perception and production in Spanish and Portuguese: European versus Latin American dialects. The Journal of the Acoustical Society of America, 131(2), EL119–EL125. DOI:  http://doi.org/10.1121/1.3674991

Christiner, M., & Reiterer, S. M. (2015). A Mozart is not a Pavarotti: Singers outperform instrumentalists on foreign accent imitation. Frontiers in Human Neuroscience, 9, 482. DOI:  http://doi.org/10.3389/fnhum.2015.00482

Coleman, J. (2003). Discovering the acoustic correlates of phonological contrasts. Journal of Phonetics, 31(3–4), 351–372. DOI:  http://doi.org/10.1016/j.wocn.2003.10.001

Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge, U.K: Press Syndicate of the University of Cambridge.

Cribari-Neto, F., & Zeileis, A. (2010). Beta regression in R. Journal of Statistical Software, 34(2). DOI:  http://doi.org/10.18637/jss.v034.i02

Delvaux, V., Huet, K., Piccaluga, M., & Harmegnies, B. (2014). Phonetic compliance: A proof-of-concept study. Frontiers in Psychology, 5, 1375. DOI:  http://doi.org/10.3389/fpsyg.2014.01375

Duñabeitia, J. A., Cholin, J., Corral, J., Perea, M., & Carreiras, M. (2010). SYLLABARIUM: An online application for deriving complete statistics for Basque and Spanish orthographic syllables. Behavior Research Methods, 42(1), 118–125. DOI:  http://doi.org/10.3758/BRM.42.1.118

Escudero, P. (2005). Linguistic perception and second language acquisition: Explaining the attainment of optimal phonological categorization. LOT Dissertation Series 113, Utrecht University.

Ettlinger, M., & Johnson, K. (2009). Vowel discrimination by English, French and Turkish speakers: Evidence for an exemplar-based approach to speech perception. Phonetica, 66(4), 222–242. DOI:  http://doi.org/10.1159/000298584

Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233–277). Baltimore, MA: York Press.

Flege, J. E. (2016). The role of phonetic category formation in second language speech acquisition. Paper presented at the 8th International Conference on Second Language Speech, Aarhus University, Denmark.

Flege, J. E., & Bohn, O.-S. (2021). The revised Speech Learning Model (SLM-r). In R. Wayland (Ed.), Second language speech learning: Theoretical and empirical progress (pp. 3–83). Cambridge University Press. DOI:  http://doi.org/10.1017/9781108886901.002

Fox, J., & Weisberg, S. (2019). An R Companion to Applied Regression, Third edition. Sage, Thousand Oaks CA.

Franken, M. K., Acheson, D. J., McQueen, J. M., Eisner, F., & Hagoort, P. (2017). Individual variability as a window on production-perception interactions in speech motor control. The Journal of the Acoustical Society of America, 142(4), 2007–2018. DOI:  http://doi.org/10.1121/1.5006899

Gilichinskaya, Y. D., & Strange, W. (2010). Perceptual assimilation of American English vowels by inexperienced Russian listeners. The Journal of the Acoustical Society of America, 128(2), EL80–EL85. DOI:  http://doi.org/10.1121/1.3462988

Golestani, N., & Zatorre, R. J. (2009). Individual differences in the acquisition of second language phonology. Brain and Language, 109(2–3), 55–67. DOI:  http://doi.org/10.1016/j.bandl.2008.01.005

Holden, K. T., & Nearey, T. M. (1986). A preliminary report on three Russian dialects: Vowel perception and production. Russian Language Journal/Русский язык, 40(136/137), 3–21.

Jilka, M. (2009). Assessment of phonetic ability. In G. Dogil & S. Reiterer. (Eds.), Language talent and brain activity (pp. 17–66). Berlin: Mouton De Gruyter. DOI:  http://doi.org/10.1515/9783110215496.17

Kartushina, N., & Frauenfelder, U. H. (2013). On the role of L1 speech production in L2 perception: Evidence from Spanish learners of French. In Proceedings of the 14th Interspeech Conference, 2118–2122. DOI:  http://doi.org/10.21437/Interspeech.2013-502

Kartushina, N., & Frauenfelder, U. H. (2014). On the effects of L2 perception and of individual differences in L1 production on L2 pronunciation. Frontiers in Psychology, 5, 1246. DOI:  http://doi.org/10.3389/fpsyg.2014.01246

Kartushina, N., Hervais-Adelman, A., Frauenfelder, U. H., & Golestani, N. (2016). Mutual influences between native and non-native vowels in production: Evidence from short-term visual articulatory feedback training. Journal of Phonetics, 57, 21–39. DOI:  http://doi.org/10.1016/j.wocn.2016.05.001

Kennedy, S. (2012). Exploring the relationship between language awareness and second language use. TESOL Quarterly, 46(2), 398–408. DOI:  http://doi.org/10.1002/tesq.24

Kennedy, S., & Trofimovich, P. (2010). Language awareness and second language pronunciation: A classroom study. Language Awareness, 19(3), 171–185. DOI:  http://doi.org/10.1080/09658416.2010.486439

Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer. The Journal of the Acoustical Society of America, 67(3), 971–995. DOI:  http://doi.org/10.1121/1.383940

Klimova, Y. A., Yurchenko, N. B., Cherkashina, O. M., Kulik, S. S. (2017). Sopostavitel’nyj analiz fonologicheskih system russkogo i ispanskogo yazykov (v celyah obucheniya ispanogovoryaschih studentov russkogomu proiznosheniyu). Kazan Pedagogical Journal, 6(125).

Kuhl, P. K. (1993). Innate predispositions and the effects of experience in speech perception: The Native Language Magnet Theory. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage & J. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 259–274). Springer, Dordrecht. DOI:  http://doi.org/10.1007/978-94-015-8234-6_22

Kuhl, P. K. (2000). A new view of language acquisition. In Proceedings of the National Academy of Sciences, 97(22), 11850–11857. DOI:  http://doi.org/10.1073/pnas.97.22.11850

Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T. (2008). Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e). Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 979–1000. DOI:  http://doi.org/10.1098/rstb.2007.2154

Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. Developmental Science, 9, 13–21. DOI:  http://doi.org/10.1111/j.1467-7687.2006.00468.x

Ladefoged, P., & Johnson, K. (2010). A course in phonetics. Cengage Learning.

Lecumberri, M. L. G., Cooke, M., & Cutler, A. (2010). Non-native speech perception in adverse conditions: A review. Speech communication, 52(11–12), 864–886. DOI:  http://doi.org/10.1016/j.specom.2010.08.014

Lengeris, A., & Hazan, V. (2010). The effect of native vowel processing ability and frequency discrimination acuity on the phonetic training of English vowels for native speakers of Greek. The Journal of the Acoustical Society of America, 128(6), 3757–3768. DOI:  http://doi.org/10.1121/1.3506351

Lev-Ari, S. (2017). Talking to fewer people leads to having more malleable linguistic representations. PloS one, 12(8), e0183593. DOI:  http://doi.org/10.1371/journal.pone.0183593

Li, D., Cowan, N., & Saults, J. S. (2013). Estimating working memory capacity for lists of nonverbal sounds. Attention, Perception, & Psychophysics, 75(1), 145–160. DOI:  http://doi.org/10.3758/s13414-012-0383-z

MacKay, I. R. A., Meador, D., & Flege, J. E. (2001). The identification of English consonants by native speakers of Italian. Phonetica, 58, 103–125. DOI:  http://doi.org/10.1159/000028490

Maryutina, E. (2021). The Production of Russian Vowels/i/and/ɨ/by Russian-English Bilingual Children. (Unpublished doctoral dissertation). City University of New York (CUNY), New York.

Mayr, R., & Escudero, P. (2010). Explaining individual variation in L2 perception: Rounded vowels in English learners of German. Bilingualism: Language and Cognition, 13, 279–297. DOI:  http://doi.org/10.1017/S1366728909990022

McMurray, B., Aslin, R. N., & Toscano, J. C. (2009). Statistical learning of phonetic categories: Insights from a computational approach. Developmental Science, 12(3), 369–378. DOI:  http://doi.org/10.1111/j.1467-7687.2009.00822.x

Moran, S., & McCloy, D. (2019). PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://phoible.org, Accessed on 2022-02-22.)

Padgett, J. (2001). Contrast dispersion and Russian palatalization. The role of speech perception in phonology, 187–218.

Perkell, J. S., Guenther, F. H., Lane, H., Matthies, M. L., Stockmann, E., Tiede, M., & Zandipour, M. (2004). The distinctness of speakers’ productions of vowel contrasts is related to their discrimination of the contrasts. The Journal of the Acoustical Society of America, 116(4), 2338–2344. DOI:  http://doi.org/10.1121/1.1787524

Perrachione, T. K., Lee, J., Ha, L. Y., & Wong, P. C. (2011). Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. The Journal of the Acoustical Society of America, 130(1), 461–472. DOI:  http://doi.org/10.1121/1.3593366

Pierrehumbert, J. B. (2001). Exemplar dynamics: Word frequency, lenition and contrast. Typological Studies in Language, 45, 137–158. DOI:  http://doi.org/10.1075/tsl.45.08pie

R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org

Revelle, W. (2019). psych: Procedures for personality and psychological research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version=1.9.12.

Safronova, E., & Mora, J. C. (2012). Acoustic and phonological memory in L2 vowel perception. In S. Martín Alegre, M. Moyer, E. Pladevall & S. Tubau (Eds.). At a time of crisis: English and American studies in Spain. Departament de Filologia Anglesa i de Germanística, Universitat Autònoma de Barcelona/AEDEAN.

Scott, S. K., Rosen, S., Beaman, C. P., Davis, J. P., & Wise, R. J. S. (2009). The neural processing of masked speech: Evidence for different mechanisms in the left and right temporal lobes. Journal of Acoustic Society of America, 125(3), 1737–1743. DOI:  http://doi.org/10.1121/1.3050255

Shoemaker, E. (2014). The exploitation of subphonemic acoustic detail in L2 speech segmentation. Studies in Second Language Acquisition, 36(4), 709–731. DOI:  http://doi.org/10.1017/S027226311400014X

Shutova, M. N., & Orekhova, I. A. (2018). Phonetics in teaching Russian as a foreign language. Russian Language Studies, 16(3), 261–278. DOI:  http://doi.org/10.22363/2618-8163-2018-16-3-261-278

Stoet, G. (2010). PsyToolkit – A software package for programming psychological experiments using Linux. Behavior Research Methods, 42(4), 1096–1104. DOI:  http://doi.org/10.3758/BRM.42.4.1096

Stoet, G. (2017). PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24–31. DOI:  http://doi.org/10.1177/0098628316677643

Sumner, M. (2011). The role of variation in the perception of accented speech. Cognition, 119(1), 131–136. DOI:  http://doi.org/10.1016/j.cognition.2010.10.018

Tourville, J. A., & Guenther, F. H. (2011). The DIVA model: A neural theory of speech acquisition and production. Language and cognitive processes, 26(7), 952–981. DOI:  http://doi.org/10.1080/01690960903498424

Venkatagiri, H. S., & Levis, J. M. (2007). Phonological awareness and speech comprehensibility: An exploratory study. Language Awareness, 16(4), 263–277. DOI:  http://doi.org/10.2167/la417.0

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.

Yanushevskaya, I., & Bunčić, D. (2015). Russian. Journal of the International Phonetic Association, 45(2), 221–228. DOI:  http://doi.org/10.1007/978-3-319-24277-4

Zharkova, N. (2004). Strategies in the acquisition of segments and syllables in Russian-speaking children. Developmental paths in phonological acquisition. Special issue of Leiden Papers in Linguistics. DOI:  http://doi.org/10.1017/S0025100314000395