The Interlanguage Speech Intelligibility Benefit (ISIB) hypothesis for Talkers suggests that there is a potential benefit when listening to one’s second language when it is produced in the accent of one’s first language (matched-accent processing). This study explores ISIB, considering listener proficiency. According to second language learning theories, the listener’s second language proficiency determines the extent to which they rely on their first language phonetics, hence the magnitude of ISIB may be affected by listener proficiency. The accuracy and reaction times of Bulgarian-English bilinguals living in the UK were recorded in a lexical decision task. The English stimuli were produced by native English speakers and Bulgarian-English bilinguals. Listeners responded more slowly and less accurately to the matched-accent stimuli than the native English stimuli. In addition, they adapted their reaction times faster to new speakers with a native English accent compared to a Bulgarian accent. However, the listeners with the lowest English proficiency had no advantage in reaction times and accuracy for either accent. The results offer mixed support for ISIB for Talkers, and are consistent with second language learning theories according to which listeners rely less on their native language phonology when their proficiency in the second language has increased.
Socially prestigious monolingual English that does not incorporate the phonetic influence of other languages is still considered the ‘gold standard’ against which learners’ varieties are explicitly or implicitly compared, despite long-standing criticisms of such views (
Most research has focused on bilinguals’ long-term listening adaptation to L2 speech. The term ‘long-term listening adaptation’ refers to a listener’s relative speed and accuracy of processing an accent at a single time point, and as a result of all the listener’s past experiences (or lack thereof) with that accent. This has been investigated in the context of the Interlanguage Speech Intelligibility Benefit hypothesis (
There is consensus across L2 phonological acquisition models, for example the Perceptual Assimilation Model-L2, the Speech Leaning Model, or the Automatic Selective Perception model, that adult L2 listeners draw parallels between the phonology of their L2 and their native L1 language, but that the nature of these parallels may change depending on the L2 listeners’ usage of their respective languages (
The following two sections summarize prior (somewhat contradictory) findings about L2 listeners who share an L1 with the speaker. This type of listening situation will be referred to as ‘matched-accent’ processing, following from Bent and Bradlow’s (
The Interlanguage Speech Intelligibility Benefit hypothesis (ISIB) predicts that non-native listeners would be equally or more accurate than native listeners when hearing their second language spoken by other non-native speakers (
Studies that have investigated the ISIB hypothesis often operationalize intelligibility as the ability to correctly understand speech input. This ability is measured as a function of lexical accuracy. However, we want to consider a more refined view of intelligibility by also measuring speed of processing. A putative matched-accent benefit has been observed through either an accuracy or a speed benefit, but only for small samples of L2 listeners and usually but not exclusively in those with low L2 linguistic proficiency (e.g.,
One of the few studies that investigated if listeners’ L2 proficiency would affect their speed of matched, native, and non-matched-accent processing is by Ludwig and Mora (
In addition to the results of Ludwig and Mora (
Contrary to the prediction of ISIB for Talkers, Imai, Walley, and Flege (
With the exception of Ludwig and Mora (
It appears that listeners paying attention to a non-native language might process other matched-accent non-native speakers faster and more accurately than they can perceive native speakers, but primarily when the listeners have low proficiency in their L2. In this study, the ISIB for Talkers hypothesis will be tested by measuring the reaction times and accuracy of Bulgarian L1 – English L2 listeners responding to Bulgarian-accented or native English speech, by focusing on the effect of listener L2 English proficiency. On the basis of the research discussed so far, it may be expected that Bulgarian-accented English may actually be a challenge for some Bulgarian-English bilinguals, especially those who have high English proficiency. The specifics of Bulgarian phonology may also contribute to that. The Bulgarian phonological inventory is smaller than the Standard British English one (
As noted, at least some Bulgarian listeners could be expected to benefit from hearing Bulgarian-accented English, probably the less L2 proficient ones. Moreover, if there are communities learning L2 English, then non-native accent features could be reinforced. Bulgarian L1 – English L2 listeners are likely to have some formal training of English in Bulgaria, as English is currently the most commonly chosen foreign language option in Bulgarian schools (
Overall, the evidence presented in this section suggests that proficiency may play an important role for non-native listeners of a language who are exposed to matched-accented L2 speech.
Listeners’ accuracy and reaction times when processing different accents are not static. The research of Eger and Reinisch (
Unsurprisingly, most of the research on short-term speaker adaptation has focused on the experiences of people listening to their native language (cf.
Listeners with extensive German-accent experience in Dutch had priming effects across all accent strengths, while inexperienced listeners had priming effects only when the accent was weak or medium. Exposing inexperienced listeners with strongly accented speech led to priming effects in the first half of the task, compared to a group of listeners with no training who only showed priming effects in the second half. This study suggests that both long-term experience with a strong accent and recent exposure to it can result in a priming effect in the first half of the task. However, even inexperienced listeners with no recent exposure to strongly German-accented Dutch can adapt within the duration of the experiment, achieving a priming effect in the second half of the task. This suggests that native listeners hearing an unfamiliar foreign accent can adapt to it through the exposure they receive within one experiment (cf.,
These results are challenged by Floccia et al. (
The only study known to us that investigates adaptation within matched-accent L2 processing is from Reinisch, Weber, and Mitterer (
One factor that can facilitate listeners’ adaptation to an accent is exposure to a large variety of speakers from that accent. This is particularly relevant for the topic of matched-accent processing, as some L2 listeners might have different amounts of exposure to other L2 users who have the same non-native accent. When living in an L2 majority environment, for example, non-native listeners would likely have exposure to a greater diversity of native speakers of the L2 than matched-accent L2 speakers, or even L1 speakers. This suggests that expats in an anglophone country would be expected to have long-term adaptations to native English speakers than to matched-accent L2 English speakers and by extension they would adapt faster to new native English speakers than matched-accented speakers.
The specific experience of emigrants with phonetic adaptation was investigated by Bruggeman and Cutler (
The need for exposure to a diversity of voices to achieve phonetic adaptation to novel speakers has been supported by several other studies. Some studies such as Bradlow and Bent (
Another factor that can facilitate speaker adaptation is feedback about the accuracy of their perception. The participants in Kriengwatana, Terry, Chládková, and Escudero’s (
In a real communicative situation, feedback on correct interpretation can be derived from contextual cues. It can be speculated than L2 listeners with higher proficiency would have access to more linguistic feedback (e.g., by being familiar with collocations and having larger vocabularies) than listeners with low L2 proficiency, who might need to rely more on metalinguistic feedback of correctness (e.g., facial expressions or body language). In addition, Eger and Reinisch (
To summarize, short-term adaptation to the phonetic nuance in an L2 accent is possible for L2 listeners. Adaptation can benefit from experience with multiple novel speakers and from top-down feedback on speech processing correctness. However, existing research is overwhelmingly based on native listeners of a language, and more information is needed on the time-course of L2 listeners’ adaptation to matched-accent L2 speech. According to Floccia et al. (
There are three key elements that require further investigation within the topic of matched-accent processing. Firstly, the literature review above highlighted L2 proficiency as a potentially important factor for matched-accent benefit or lack thereof. This study will focus on the listeners’ proficiency in more detail by operationalizing it as a continuous variable. Secondly, ISIB for Talkers has not been extensively investigated from the perspective of the speed of processing a matched accent. This study will investigate matched-accent processing by comparing both the response accuracy and reaction times of bilinguals in a lexical decision task. Our task is similar to the procedure used by Lagrou et al. (
Lastly, there is no research investigating real-time reaction time adaptation to novel speakers with a matched accent compared to speakers with native accent in L2, so this will be investigated by comparing the token-to-token changes in reaction times in response to speakers with the two accents. A decreasing trend of reaction times relative to the reaction times in the initial trial within a block will be interpreted as evidence of adaptation to the speaker. These research aims will be pursued by studying the matched-accent processing of Bulgarian L1 – English L2 bilinguals who reside in the UK.
It must be recognized that the strength of the Bulgarian accent of the stimuli is likely to play a role in the results, as studies have shown that the strength of foreign accent affects how quickly native listeners adapt their reaction times (
First, according to ISIB for Talkers (
In addition, based on the predictions listed above, listeners with the lowest L2 proficiency are expected to speed up their reaction times earlier in response to a new speaker with a matched accent than with a native English accent. Listeners with the highest L2 proficiency are expected to speed up their reaction times earlier in response to a new speaker with a native English accent than with a matched accent. In practice this would mean that their reaction times will speed up at different rates in the initial trials of a new speaker block within the lexical decision task when responding to either Bulgarian-accented or native English speakers (cf.,
This study examines the effects of second language proficiency on bilingual listeners’ speed and accuracy when perceptually processing a ‘matched accent’ of a second language (Bulgarian-accented English in this case) versus natively-spoken second language (native British English). The experimental tool was a timed auditory lexical decision task. Participants were categorized using an English proficiency test and further information about them was gathered via a questionnaire. The experiment was carried out online and participants were recruited via social media.
The experiment was carried out in accordance with the ethics guidelines of Queen Margaret University and the ethics application was approved by Professor Janet M. Beck, head of the Division of Speech and Hearing Sciences, on behalf of the Ethics Committee on 29 July 2018. The data and R codes used for hypothesis testing are available via the Open Science Framework (OSF) repository (
The call for participants advertised for people who considered themselves residents of the UK and who had been born and raised as Bulgarian speakers with at least a primary level of schooling in Bulgaria. We did not specifically exclude speakers who had spoken an additional language to Bulgarian in their home environment during childhood. People were invited to take part if they were comfortable reading the information about the experiment and the consent form in English, hence the recruitment process acted informally to screen for a minimal requirement of functional ability in English.
The data of 94 participants were used for analysis, out of a total of 129 participants who started the experiment. Sixteen participants never entered a response for the auditory task and nine did not complete the proficiency test. A further six participants were excluded from the analysis due to an error in the recording of their data. Two participants listed their age of arrival in the UK as less than 10 years, so they were also excluded. One participant was excluded because they had entered only one correct response. One participant had no correct responses within 2.5 seconds in the native English condition, and so they were also excluded. The participants’ mean age was 30.3 (
The following paragraphs explain how the participants’ English accent exposure score and Bulgarian accent exposure score were derived, based on a method used in Porretta et al. (
The answers to these questions were each divided by 100, then multiplied together, and then multiplied by 100. On average the listeners reported spending 62% of their time talking with native English speakers, and 58.4% of that time talking to speakers from England, meaning that on average they spent 39% (
Two Pearson correlations were calculated: one between the English accent exposure score and Age and one between the Bulgarian accent exposure score and Age. There was a positive correlation between Age and the English accent exposure score (
A limitation of this study is that the participants were not asked whether they had spoken additional languages to Bulgarian at home when growing up. However, the call for participants emphasized the criteria of early home use and schooling in Bulgarian language, which should have selected participants for whom Bulgarian was overall the dominant language in childhood, suggesting early exposure to Bulgarian phonetics. Another potential limitation is that the participants were not asked to report their current usage of Bulgarian. As demonstrated by Flege and MacKay (
The final stimuli for this experiment were 64 real monosyllabic English words and 64 monosyllabic non-words (which were phonotactically plausible for English and comparable to the real word list, see below). A full list of the real words can be found in the Appendix. Subsections 3.3 to 3.6 report how an initial list of 100 words was selected, recorded by Bulgarian-accented and native English speakers, rated by native English listeners for strength of foreign accent, then finally narrowed down to a final experimental set of materials, which included 64 real words. It was planned from the start to only include a subset of the initial 100-word list. The initial 100 monosyllabic words were chosen from the webCELEX database (
The 100 words were recorded by Bulgarian-accented and native English speakers. In order to focus on the items most representative of this accent difference, a number of steps were followed. The most important one was to gather Foreign Accentedness ratings from native English listeners in an online task (see below). First, however, it should be noted that, due to the high phonological neighbourhood size, it was possible that some words, when pronounced with a Bulgarian accent, might sound like an unintended lexeme. Such risk was identified for words containing stressed /a/, like “had” or “land,” which could be substituted with [e] and for “third” and “through,” where substituting the initial /θ/ with [t] might lead to other high-frequency real words. Upon auditory inspection of the actual stimuli recorded by the Bulgarian-accented speakers, we judged that there was low risk of this type of misinterpretation, as the speakers produced the difficult phonemes unambiguously. If the Bulgarian-English bilingual listeners’ mental representations of these phones were not sufficiently distinct from their Bulgarian equivalents, then even native-like productions could have led to lexical access to more than one real word. As these words would act as homophones in English, this could potentially have an inhibitory or facilitatory effect on their reaction times in a lexical decision task in L2 regardless of their proficiency (
We tested the results of this decision statistically by removing the potentially problematic words from the reaction time and adaptation models and can confirm that there were no changes in the significant predictors. Removing the words from the accuracy model led to convergence problems, which were only resolved by removing the random slope per word (see Section 3.9 for an explanation of random slopes). When the model converged there were also no differences in the significances of the predictors. Therefore, the results reported below are based on responses to all 64 words selected on the basis of accentedness, as described below.
The non-words were drawn from an initial list of 100 monosyllabic tokens matched for phoneme number, as far as possible, to the real words. They were chosen from the ARC non-word database (
Four female speakers had been asked to produce the 100 stimulus words and 100 non-words for the experiment. They were two monolingual native speakers of Standard British English and two native Bulgarian speakers. Both Bulgarian-English bilinguals were raised as monolingual Bulgarian speakers. During their teenage years, they had learned the same Standard British English target variety of English as the monolingual speakers and used it regularly in their professional lives (scoring 87.5 and 90 in LexTALE). All four had completed university degrees and were working in universities at the time of recording.
All recordings took place in sound-attenuated recording studios (at Queen Margaret University, Edinburgh, or in Varna, Bulgaria). The same equipment was used for all recordings, made at a sampling rate of 44.1 kHz. A TASCAM DR-100 recorder was placed on a desk, 20 cm away from the speaker’s mouth. The speakers read the words twice from a list with randomized word orders, followed by randomized non-words, while seated. They were instructed to pronounce the words in a natural, everyday manner, without over-enunciation.
To help select the best individual stimuli, and also to ensure a consistent group difference between the Bulgarian-accented stimuli and the native English stimuli, the initial 100 real words were rated by native English listeners from the UK. The goal was for the final experiment to include stimuli consistently judged to have a detectable foreign accent when pronounced by the Bulgarian-accented speakers and with the least strong perceived foreign accent when pronounced by the native English speakers.
Forty-three native speakers of English (27 female, mean age = 38.02,
Each rater had one of the wordlists randomly assigned to them. The words were presented in a random order over their own computer’s audio system, via a web browser. The orthographic form of the target word was also presented on the screen. Participants were asked to listen to each word no more than twice. They were asked to rate the strength of the (undefined) foreign accent they perceived in each word on a scale of 0–8 (none to very strong), using the web page interface. After the rating was completed, the raters filled in a debriefing questionnaire, asking for general demographic information, what they thought the identity of the accents was, their background in Bulgarian, their own variety of English and their frequency of interaction with non-native speakers of English and Bulgarians in particular.
The questionnaire showed that all raters had grown up speaking English in the UK and none of them had studied Bulgarian or had a Bulgarian background. On average the raters spent 16.6% of their time interacting with non-native speakers of English (
A linear mixed effects model was constructed to test whether the foreign accent ratings differed between the speakers with different accents. The outcome variable was Foreign Accentedness score. The model had one predictor, Speaker, in which one of the native English speakers (En1) was picked as a baseline level and the scores of the rest were compared to hers. The model included random intercepts for Rater and by-Speaker random slopes for Rater. This accounts for the fact that each rater may have had a different pattern of rating and that this pattern may have differed for each speaker. The model also had random intercepts for Word and by-Speaker random slopes for Word. This accounts for the fact that each word may have contributed to a slightly different accentedness rating and that this may have differed depending on which speaker produced it. The results of this model are summarized in
Summary of the model on Rating scores per Speaker.
Intercept | 0.56 | 4.64 | <0.001 |
Speaker (EN 2) | –0.28 | –2.45 | 0.02 |
Speaker (BG 1) | 4.37 | 18.08 | <0.001 |
Speaker (BG 2) | 4.21 | 16.79 | <0.001 |
There was a small difference between the two native English speakers, such that the second speaker was rated as having a little lower foreign accentedness, although the magnitude of the difference was small (
The modelled estimate and standard error of the foreign accent rating of the two native English speakers (En) and the two Bulgarian L1 – English L2 speakers (Bg).
These native English listeners were not asked to transcribe the stimuli and we did not attempt to measure their intelligibility. As pointed out in the review of ISIB research above, native English listeners might find a non-native accent less intelligible than listeners who have a matched accent. Since the aim here was to focus on non-native listeners, we decided that the intelligibility judgements of native English listeners were beyond the scope of this study and could be left to future research.
With the aim of increasing the overall difference between the two accent groups, we included only a subset of items from the initial selection of 100 stimuli from webCELEX. Words for each accent category (native English versus Bulgarian accent) were picked based on the Foreign accentedness scores. For each of the 100 words, two average scores were calculated: Mean Foreign accentedness scores of the two native English speakers and mean Foreign accentedness scores of the two Bulgarian-English bilinguals. The difference between the two Foreign accentedness scores was then calculated for each of the 100 words. As the average difference between the En1 speaker and the two Bulgarian speakers rounds to four (see
The main experiment was a lexical decision task with a within-subject design. As noted above, the stimuli were recorded by four speakers, two with Bulgarian-accented English and two who were native English speakers. All participants heard words and non-words produced by all four speakers. To restrict the length of the experiment and to avoid exposing the listeners to the test words more than once (which could affect their reaction times), each individual participant heard only 16 words and 16 non-words per speaker, adding up to the total of 64 words and 64 non-words and resulting in four versions of the experiment in which each quarter of the stimuli was produced by a different speaker. Each listener heard each of the words and non-words only once.
The stimuli were presented in four blocks. They were blocked by speaker within accent, to allow the listeners to adjust to each voice and thus to avoid affecting the reaction times due to random changes in the speaker’s identity. To prevent order effects, the accent blocks and the speaker blocks within them were counterbalanced across participants. Within each block the stimuli were presented in a different random order to each listener. Before the first block the listeners heard ten training trials with three non-words and seven real words. A summary of the structure of the whole experiment is available in
Structure of the main experiment.
The speaker for the training stimuli was the first author, a native Bulgarian speaker with 5 years of experience living in the UK at the time of the recording. Half of the training stimuli were produced with a Bulgarian accent and half with received pronunciation. The training stimuli were a subset from a previous pilot experiment, in which the Bulgarian-accented stimuli were rated as significantly more foreign accented than the received pronunciation ones by ten native English listeners.
This subsection describes the procedure of the online experiment involving Bulgarian L1 – English L2 listeners. Participants (all living in the UK) were reached online via social media, such as Twitter and Facebook, as well as Queen Margaret University’s internal email recruitment system. This method for data collection was chosen to reach as many participants as possible and increase the variability in the participants’ proficiency scores. Prior pilot studies had proven that recruiting Bulgarian participants for in-person laboratory-based experiments in Edinburgh and Musselburgh was problematic. The whole experiment was carried out using the online platform PsyToolkit (
After providing informed consent the participants were given written instructions for the auditory lexical decision task. The instructions included a photo of a standard keyboard, which highlighted the keys that the participants needed to press if they wanted to select a ‘word’ or a ‘non-word’ answer. They proceeded at their own pace. After a countdown, the training trials for the lexical decision task started automatically.
The following procedure applied to the whole auditory lexical decision task. When making their lexical decisions, the participants had to respond by pressing either the ‘4’ or the ‘6’ key on the keyboard with their index finger. When waiting to hear a word and make a decision, the participants were instructed to rest their finger over the ‘5’ key. These keys were picked because it was anticipated that there might be differences in the layout across the participants’ keyboards. Keys ‘4’ and ‘6’ are consistently close to each other across the most common Bulgarian layouts as well as the English (UK) and English (United States) layouts. The correspondence to words and non-words for the two keys were randomized across participants. After hearing each auditory stimulus the participants had 2500 ms to enter their response, after which the following test item was automatically loaded. The reaction times were measured from the end of the sound file that contained the auditory stimulus. The sound files were trimmed to have no extraneous silence after the last phoneme’s acoustic energy dropped away. As soon as a participant entered a response, or just before the new item was loaded if they entered no response, the word “LOGGED” appeared on the screen, to signify that their response (or lack of response) was recorded, and a new item was about to be played.
After the training task, in which the participants responded to ten trials, they proceeded with the main experiment, which started after a countdown. The participants heard the 128 trials of words and non-words without a break, albeit in four blocks (as described above). With a maximum delay for each answer set at 2.5 seconds, the whole task was expected to take up to five minutes. The auditory lexical decision task was followed by the proficiency test LexTALE (
The first question addressed in this experiment was whether the Bulgarian accent of the stimuli would facilitate the speed of recognition of real English words for Bulgarian L1 – English L2 bilinguals, particularly for participants with low English proficiency. Only correct real-word responses between 200 ms and 2000 ms were included, based on the similar study design, listeners, and analysis of Weber et al. (
A linear mixed effects analysis was performed, to find the effect of the listeners’ English proficiency and the stimuli’s accent on the listeners’ overall reaction times. The linear mixed effects regression analysis had three predictors: Proficiency (the LexTALE score centred around the mean 80), Accent (native English as a baseline, and Bulgarian accent) and their interaction. The outcome variable was Reaction times in ms centred around the mean 1225.3 ms. Centring was performed because it allowed for an easier interpretation of the coefficients of the model, since 0 to 200 ms were not meaningful outcomes in the dataset. The model had random intercepts of Participant and by-Accent random slopes for Participants. This means that the model accounts for the fact that each participant is likely to have a slightly different pattern of reaction times, and that this pattern could differ between different accents within a participant. The model also includes random intercepts of Word and random slopes of Word by Speaker. This means that the model accounts for the fact that each word may have elicited a different pattern of reaction times and that this pattern may have differed depending on which speaker pronounced it.
To investigate the second research question, about the effects of Proficiency and Accent on the Accuracy of word recognition, a binomial logistic mixed effect model was tested. The outcome variable included correct and incorrect answers to real word stimuli that received responses within 200 ms and 2000 ms. The model included the interaction between Proficiency (the LexTALE scores centred around the mean 80) and Accent (native English as baseline, compared to Bulgarian accent) as well as each of these predictors separately. In addition, the model had random intercepts of Participant and Speaker. This was to account for the fact that each participant could have had a slightly different pattern of accuracy. The model included by-Accent slope for Participant and a by-Speaker slope for Word. Each participant and word could have elicited a different pattern of accuracy, which could have also varied depending on the accent or speaker, respectively. The outcome variable was coded with zero (incorrect) and one (correct), hence estimates in the positive direction would suggest an increased number of correct answers.
Lastly, this study addressed the question of whether the listeners would adapt their reaction times to a matched-accented speaker faster or slower than a native English speaker and whether their adaptation would be affected by the listeners’ proficiency in English. Adaptation here is used to mean the change in reaction times over a period of exposure to stimuli. Adaptation is tested by observing the change in the listeners’ reaction times with each subsequent stimulus they hear, depending on the accent of the stimuli.
As noted above, the stimuli were presented to the listeners blocked by accent and then by speaker. The change in reaction times over the subsequent stimuli was represented using curves formed of points with two coordinates: stimulus number within the speaker block on the x-axis and the reaction time on the y-axis. The curves were compared using a generalized additive mixed model (GAMM). A GAMM analysis allows for the investigation of both linear and non-linear relationships between the predictors through their inclusion as parametric (linear) and smooth (non-linear) terms in the model. The linear terms test similar hypotheses to those presented earlier, while the smooth terms test if the outcome variable is affected non-linearly by one or more continuous variables. A significant smooth term (also called a smooth) suggests that the outcome variable changes in a non-linear fashion along a continuous predictor. Often the main continuous predictor is Time or a proxy for Time, as it is in this case with the use of Within-block trial number. Hence conceptually, a smooth term resembles an interaction between the predictor of interest and a continuous variable (here, Within-block trial number). In addition, like the mixed effects models described so far, this type of analysis also allows the use of random structures (here, random smooths) to account for the fact that multiple reaction time data-points came from the same participants and that multiple participants were presented with the same words. A random smooth therefore accounts for the effect of non-linear but systematic variation from the model. This model focuses on the non-linear relationship between the continuous predictors Proficiency and Within-block trial number and their interaction with the two Accents (native English and Bulgarian Accent).
Only correct responses between 200 ms and 2000 ms to real words were included in the analysis. The reaction times to the words were centred around their mean (1225.3 ms). The model included a parametric term for Accent (native English versus Bulgarian Accent), a smooth term for the token number Within-block (1 to 32 in a speaker block), a smooth term for Proficiency, an interaction smooth for Within-block number by Accent with
Increasing the number of base functions (
In addition, the model included random smooths for Within-block number per Trajectory where Trajectory was the adaptation trajectory for one participant for one speaker block, allowing individual variation at each trial number within a speaker block. There were also random smooths for Within-block number per Word, accounting for the fact that each word may have led to different pattern of reaction times, depending on its order within the block. There was also a random smooth for Within-block by Participant, allowing individual non-linear variation per participant at each trial number within a block.
This section addresses the questions of whether the listeners’ proficiency in English, and whether the speakers’ accent (Bulgarian accent or native English), together or separately have an effect on the listeners’ reaction times. The analysis includes 5291 observations, 64 words, and 94 participants. The detailed results are presented in
Summary statistics for the linear analysis of the effects of Proficiency and Accent on the reaction times.
Intercept | 1.44 | 0.10 | 0.92 |
Proficiency | 0.15 | 0.19 | 0.85 |
Accent (Bulgarian) | 45.38 | 5.22 | <0.001 |
Proficiency: Accent (Bulgarian) | 1.13 | 3.18 | 0.002 |
The significant interaction between Proficiency and Accent is more relevant for the research question.
To better understand the relative effect of Proficiency on the listeners’ response times for each accent, two follow-up analyses are performed. The data are separated by Accent, and the effect of Proficiency is tested on each accent subgroup separately. The first follow-up model focuses on the reaction times to Bulgarian-accented stimuli only, centred around their mean 1255.7 ms. The only predictor is the listeners’ Proficiency, each score centred around the mean. The model includes random intercepts by Participant and Word and by-Speaker random slopes for Word. This means that the model accounts for the fact that different participants might have slightly different patterns of reaction times, that different words might elicit slightly different patterns of reaction times and that words produced by different speakers can also elicit different patterns of reaction times. The model is based on 2588 observations, 64 words, and 94 participants, and the results are summarized in
Summary statistics for the linear analysis of the Proficiency on the Reaction times to Bulgarian-accented stimuli.
Intercept | 41.40 | 2.53 | 0.01 |
Proficiency | 0.94 | 1.05 | 0.30 |
The effect of Proficiency on the reaction times to Bulgarian-accented words is not significant. The second follow-up model focuses on the reaction times to native English stimuli only, centred around the mean 1196.2 ms. The predictor is the listeners’ Proficiency, centred around the mean. The model includes random intercepts by Participant and Word, and by-Speaker random slopes for Word. The model is based on 2703 observations and 94 participants. The results are summarized in
Summary statistics for the linear analysis of the effect of Proficiency on the Reaction times to native English stimuli.
Intercept | –27.74 | –1.78 | 0.08 |
Proficiency | –0.59 | –0.74 | 0.46 |
The lack of Proficiency effect in each subset is surprising, considering the significant interaction between Accent and Proficiency in the pooled data.
To summarize, no matched-accent benefit in reaction times is found, even for listeners towards the lower end of the proficiency continuum, providing no support for ISIB. Some of the expected proficiency effects are observed, because there is a significant interaction between proficiency and accent. As the listeners’ proficiency increases, their reaction times to Bulgarian-accented English increase relative to native English. This can be interpreted as evidence for matched-accent disadvantage. However, the two follow-up analyses reveal that when the dataset is split by accent, there is no effect of proficiency on the listeners’ reaction times for either accent. This discrepancy could be the result of the high variability in reaction times combined with a small effect of proficiency, only manifesting when two accents are compared to each other.
This section addresses the question of whether Bulgarian-accented English facilitates the accuracy of word recognition compared to native English stimuli for Bulgarian L1 – English L2 bilinguals with different English proficiencies. There is an overall lower accuracy rate with Bulgarian-accented words (91%) than native English words (94%), although the accuracy of the participants is generally high. There are 5703 observations, 94 participants, and 64 words considered in the binomial logistic mixed effects model. The results are summarized in
Summary of the overall binominal logistic mixed effects model on the listeners’ Accuracy.
Intercept | –3.46 | –18.78 | <0.001 |
Proficiency | –0.004 | –1.73 | 0.58 |
Accent (Bulgarian) | 0.60 | 3.14 | 0.002 |
Proficiency : Accent (Bulgarian) | 0.02 | 2.54 | 0.01 |
There is no significant effect for Proficiency for a baseline of native English stimuli. However, there is a significant effect of Accent, such that Bulgarian-accented words are recognized incorrectly more often than native English accented words for a baseline of listeners with average proficiency. Importantly, there is also a significant interaction between Proficiency and Accent, such that with increased proficiency there is decreased accuracy for Bulgarian-accented words.
Modelled interaction between Proficiency and Accent on the outcome variable Accuracy. The x-axis shows the Proficiency score, lowest to highest. The shaded area reflects an estimation of the 95% CI for the main effects.
Two follow-up analyses are performed to fully interpret the results. The dataset is separated by Accent and the effect of Proficiency is investigated in each separate dataset, also including random intercepts of Participant and random slopes and intercepts of Word by Speaker. The model, focusing on Bulgarian-accented stimuli, is based on 2828 observations, 64 words, and 94 participants. The effect of Proficiency on the Accuracy of recognizing Bulgarian-accented words is not significant. The results are summarized in
Summary of the generalized linear mixed model on the listeners’ Accuracy for the subset of Bulgarian-accented stimuli.
Intercept | 3.15 | 17.62 | <0.001 |
Proficiency | –0.007 | –1.00 | 0.32 |
The second model, focusing on the native English stimuli is based on 2875 observations, 64 words, and 94 participants. There is no significant effect of Proficiency on the Accuracy of recognizing native English words. The results are summarized in
Summary of the generalized linear mixed model on the listeners’ Accuracy for the subset of native English stimuli.
Intercept | 3.83 | 16.47 | <0.001 |
Proficiency | 0.01 | 1.65 | 0.10 |
Overall, there is no evidence of matched-accent benefit for accuracy, even for the listeners towards the lower end of the proficiency continuum, again showing no support for ISIB. There is an expected matched-accent accuracy disadvantage, relative to native accent accuracy, when the listeners’ proficiency increases. Similar to the results for the reaction times, the proficiency effect is not observed when the data are split by accent. Again, this could be the result of the large variability in accuracy across listeners with different proficiency.
This section investigates the effect of English Proficiency and stimulus Accent on the participants’ short-term reaction time adaptation to new speakers with either of the two different accents. It was predicted that listeners with high English proficiency would adapt to a new speaker with a native English accent faster than to a new speaker with Bulgarian accent when they first heard the accents within the experiment. It was also predicted that low English proficiency listeners would adapt faster to a new speaker with a Bulgarian accent than to a new speaker with a native English accent. The results of the smooth terms, estimating the non-linear relationship between the predictors and the outcome variable, are summarized in
Summary statistics for the smooth and random terms of the full GAM model. Edf = estimated degrees of freedom.
Within-block | 1.00 | 0.42 | 0.51 |
Proficiency | 2.86 | 2.02 | 0.10 |
Within-block by Accent (Bulgarian) | 3.55 | 3.32 | 0.01 |
Within-block by Proficiency | 2.95 | 1.31 | 0.22 |
Within-block by Accent (Bulgarian) and Proficiency | 1.00 | 0.29 | 0.59 |
Random smooth terms | Edf | ||
Within-block per trajectory | 241.78 | 0.32 | <0.001 |
Within-block per participant | 107.40 | 2.63 | <0.001 |
Within-block per word | 62.37 | 4.41 | <0.001 |
There is no significant effect of the Within-block smooth. This means that the within-block trial number led to no systematic non-linear change in reaction times for listeners with average proficiency adapting to native English speakers. There is no significant effect of the Proficiency smooth. This means that there are no systematic non-linear differences in reaction times between listeners with different English proficiencies when responding to native English stimuli.
There is, however, a significant non-linear interaction between Within-block trial and Accent. This means that listeners change their reaction times differently for the two accents as the block progresses. A significant GAMM smooth effect suggests that there is a non-linear effect of the predictors on the outcome variable, but it does not specify the direction of change. This information is obtained by observing a plot.
Left: y-axis shows GAM model predictions of reaction time adaptations (centred around the mean 1225.3 ms) to either of the new speakers with Native English or Bulgarian accent. Right: y-axis shows the difference between the two accent curves, with the area of significant difference highlighted in blue. Both x-axes show the trial numbers within a new speaker and a new accent block.
There is no significant interaction between Proficiency and Within-block trial, which means that people with different proficiencies have no systematic changes in their reaction times when responding to native English voices as the block progresses. Lastly, the most relevant interaction for the research questions is not significant. There is also no significant triple interaction between Proficiency, Accent, and Within-block number. This means that as the block progresses, listeners with different levels of English proficiency have no systematic differences in how they change their reaction times in response to the two accents.
To summarize, Bulgarian-accented words are processed slower than native English words throughout the whole block, although the difference between the two accents gradually decreases as the block progresses. Contrary to the initial expectations, there are no significant non-linear interactions between proficiency, accent, and block trial number, suggesting that the listeners’ proficiency in English does not systematically affect how they adapt to each of the accents. Similar to what was reported in the accuracy and reaction time analyses, there is no support for ISIB, and Bulgarian-accented words are slower to process than the native English words.
The results of this study show mixed support for the Interlanguage Speech Intelligibility hypothesis for Talkers (
First, it was expected that the Bulgarian L1 – English L2 listeners towards the lowest end of the English proficiency continuum scores would process matched-accent speech faster and more accurately than native English speech. This was
Second, it was expected that greater English L2 proficiency would lead to a perceptual advantage for the native-accented L2 stimuli over the matched (L1 Bulgarian-influenced) accent. This prediction was supported, as there was an overall effect of accent, suggesting that the majority of listeners processed native English stimuli faster and more accurately than Bulgarian-accented stimuli. There was also a small interaction with proficiency: The higher the English proficiency of the listeners, the slower and less accurate their responses to Bulgarian-accented stimuli were, compared to their responses to native English stimuli. However, this effect was present only when comparing the two accents directly and it was not present within each accent separately.
This latter point was a surprising finding, and may have been caused by a reduction of statistical power when the dataset was split by accent (
Third, it was expected that there would be some listeners intermediate on the L2 English proficiency continuum who would have no systematic accent-based difference in their accuracy and reaction times.
The rest of the research questions focused on token-to-token reaction time changes that might be expected when listeners encounter a new speaker, irrespective of whether the speaker has a Bulgarian or a native English accent. It was expected that there would be an interaction between listener L2 proficiency and speaker accent, such that the greater the listeners’ proficiency, the greater their native English accent advantage would be. Specifically, we expected slower adaptation by proficient English L2 listeners to new speakers with a matched Bulgarian accent in English than to new speakers with a native English accent. Conversely, it was expected that the lower proficiency English L2 listeners would have a greater matched-accent benefit in reaction times and speed of adaptation compared to native accents. These predictions were not supported as there was no significant interaction between accent and proficiency. There
Thanks to the use of proficiency as a continuous variable, the results of this study provide specific information about the mechanism by which matched-accent processing works, at least in a situation where the segmental phonetic material is its main driver. Matched-accent processing is less efficient and accurate than native accent processing in L2 for listeners with intermediate to high L2 proficiency (scoring over 45 on LexTALE), although the effects are small (on average 60 ms slower and 3% less accurate for Bulgarian-accented English). Two concrete predictions can therefore be made about where ISIB can be found, particularly in a population of emigrants in an L2-dominant country. First, ISIB would be more likely for adult learners with lower L2 proficiency (e.g., less than 45 on LexTALE). Listeners with weak L2 may be more likely to rely on similarities to their L1 phonology in processing weak L2 speech from speakers with whom they share (aspects of) their L1. Second, ISIB might be present for listeners with somewhat higher proficiency in L2 (e.g., between 45 and 80 on LexTALE), when the test materials are embedded in sentences or longer stretches of speech and are likely to contain helpful supra-segmental cues.
The current results do not support ISIB for Talkers, as it is outlined by Hayes-Harb et al. (
Native listeners of a language typically have a disadvantage when processing foreign accents over native accents (e.g.,
We have, however, provided a novel approach to ISIB for Talkers by looking at the effect of the listeners’ L2 proficiency on their reaction times in addition to their accuracy when processing a matched accent. The only other study with a similar design has been reported by Ludwig and Mora (
As noted above, the observed trade-off between matched-accent and native accent reaction times and accuracy is consistent with the Perceptual Assimilation Model-L2 (
As the listeners received no feedback of correctness and were only exposed to monosyllables, this suggests that they updated their representations on a sub-syllabic level during the experiment. In addition, since we used monosyllabic single words as stimuli to reduce the potential complicating effect of prosody on the listeners’ responses, it may be concluded that it was specifically segmental phonetic properties that were attended to. It seems the native English stimuli were closer to the internal representations of the Bulgarian listeners.
Although not supportive of ISIB, these results are consistent with Best and Tyler (
Our results show that the biggest difference in reaction times between the two accents was observed in the first five trials of the block. This suggests the amount of information necessary to adapt to most of the idiosyncrasies of a new speaker’s voice is correspondingly small. The fact that the reaction time difference between the two accents was not completely neutralized until the end of a speaker block, however, is reminiscent of the results of Floccia et al. (
It is difficult to apply the same interpretation for the present results because most participants reported having more exposure to native English speakers than to matched-accented speakers. Increased exposure to native accent could reasonably be expected to lead to native accent advantage (e.g.,
Reduced exposure to native speakers might partly explain why a matched-accent benefit tends to be reported for low L2 proficiency listeners. Due to their lower experience with native accents, the L2 learners might not be able to benefit from the native accents’ greater phonetic predictability and phonetic nuance to the same extent as high proficiency listeners (
Outside of single-word lexical decision tasks in experimental set-ups, L2 listeners are more likely to encounter matched accents in longer stretches, whether as spontaneous or read speech (e.g., conversations, media, audiobooks). Understanding ISIB for Talkers in these contexts would also require a study of matched-accent prosody and cross-linguistics influences (cf.,
This study suggested that there was no explicit advantage for Bulgarian–English bilinguals, (specifically L1 Bulgarians living in the UK) to listen to Bulgarian-accented English (i.e., a matched-accented L2) compared to listening to native English speech. Indeed, there was a native English accent advantage. Native English speech was perceived better in terms of higher overall accuracy, lower reaction times, and more rapid short-term reaction time adaptation. The listeners’ overall speed and accuracy with the two accents was affected by their English proficiency only when the two accents were compared against each other but not when the dataset was split by accent.
ISIB for Talkers, on the other hand, predicts that L2 listeners would be more effective at processing matched-accented L2 than native L2 productions (
Overall, this in-depth study of the speed and accuracy of bilinguals processing a non-native language in a lexical-decision task shows how greater proficiency in their L2 is associated with a matched-accent disadvantage, but only when such processing by the listeners is directly compared to their processing of native accent stimuli in that L2.
CI – confidence intervals
edf – estimated degrees of freedom
GAMM – generalized additive mixed model
ISIB – interlanguage speech intelligibility benefit
k – knots, convergence points
L1 – first language
L2 – second language
ms – milliseconds
SD – standard deviation
The additional file for this article can be found as follows:
A full list of the real words. DOI:
We would like to thank Steve Cowen for his overall moral support and for his expert technical help during the voice recordings. We are grateful to Professor Gijsbert Stoet for his advice and help setting up the experiment on PsyToolkit. This experiment would not have been possible without the kind help of the four anonymous speakers who lent their voices for the stimuli and the participants. Thank you to the doctoral examiners Dr. Rachel Smith and Dr. Sonja Schaeffler, Associate Editor Dr. Eva Reinisch, General Editor Dr. Lisa Davidson, and the anonymous reviewers for their patient and thoughtful support to improve this work. Thank you to Dr. Kip Wilson for managing the process and for copyediting!
This study was funded by the full-time doctoral bursary of Queen Margaret University.
The authors have no competing interests to declare.