A review of data collection practices using electromagnetic articulography

Teja Rebernik; Jidde Jacobi; Roel Jonkers; Aude Noiray; Martijn Wieling; Teja Rebernik; Jidde Jacobi; Roel Jonkers; Aude Noiray; Martijn Wieling

doi:10.5334/labphon.237

1. Introduction

Electromagnetic articulography (EMA) is a popular technique for the study of speech production that supports the tracking of articulatory kinematics using sensors attached primarily to the tongue, lips, and jaw. This paper provides a comprehensive overview of studies that have used EMA as a method for the investigation of speech-related topics, with the ultimate goal of characterizing various data collection procedures and comparing them to our own practices. In Section 2, we introduce electromagnetic articulography and address some methodological considerations, such as device safety and accuracy, usage, and general sensor placement guidelines. Section 3 continues with a discussion of data collection practices drawn from a systematic literature review of 905 publications from conferences and journals published since 1987. In this contribution, we focus on 412 journal publications. Sections 4 and 5 of this paper are practical, as we describe our own data collection procedure in detail, and we evaluate the adhesion duration of three different types of sensors through a sensor adhesion experiment. We hope this paper will be of help to those starting out with EMA data collection.

2. An Introduction to Electromagnetic Articulography: Methodological Considerations

We first focus on introducing electromagnetic articulography (EMA) as a method. This section addresses some methodological considerations, including the method’s advantages and limitations, device accuracy and safety, various uses, compatibility with other experimental methods, and participants who are suitable for EMA studies.

2.1. Advantages and limitations of EMA

Electromagnetic articulography (EMA)¹ is a point tracking method, whereby sensors placed on target articulators (including tongue, lips, and jaw) are used to track movement in real time in 3D. As with any method, there are both advantages and disadvantages to EMA (Kochetov, 2020; Earnest & Max, 2003; Maeda et al., 2006; Mennen, Scobbie, de Leeuw, Schaeffler, & Schaeffler, 2010; Stone, 2010; Whalen et al., 2005). We first discuss some advantages of EMA. The data collected within the oral cavity has high spatial accuracy and temporal resolution (see Section 2.4 below), yielding relatively precise information on articulatory gestures. Unlike with some other methods (such as ultrasound tongue imaging), it is possible to measure multiple articulators simultaneously and therefore allows the investigation of inter-articulatory interactions. It is one of the few methods that allows researchers to study movements of articulators directly, as opposed to more indirect acoustic methods. EMA is biologically safe (contrary to some methods used in the past, such as x-ray cineradiography or microbeam) and minimally invasive. Furthermore, the sensors are mostly well-tolerated by adult participants and only moderately interfere with speech production (speakers adapt within 10 minutes; Dromey, Hunter, & Nissen, 2018). Compared to other methods used to track speech articulators, articulographs restrict the participants’ movement less, they are not line-of-sight (such as, e.g., VICON or OptoTrak), and they are not restricted to in-plane visualization (such as, e.g., real-time magnetic resonance imaging or ultrasound tongue imaging).

However, several limitations should be considered when employing EMA for speech-related investigations. For example, the positioning of sensors is limited to the anterior oral tract. It is more problematic to place sensors on the more posterior part of the tongue (e.g., tongue dorsum) than its anterior part, and it is not possible to track velum movements without discomfort to the participants (see exceptions below). Furthermore, depending on the size and location of the articulator of interest, it is not possible to place many sensors on an articulator at the same time due to mutual electrical interference and increased perturbation of articulation. Additionally, sensors still cannot be placed too close to each other without disturbing their measurement accuracy (the Carstens AG500 manual, for example, states that the minimum distance between sensors should be 8 mm), which again limits the number of points that can be tracked on the articulators. Furthermore, because EMA is a fixed point-tracking technique, it does not capture the global movements of articulators, for instance the full midsagittal tongue shape (as obtained using rtMRI).

Additionally, the equipment is expensive and requires a relatively high level of technical knowledge, prior training, and practice to use successfully. Finally, as sensors are firmly affixed to orofacial structures, they constitute a form of articulatory perturbation. While articulation does return to nearly normal after a while (see below), the acoustics are changed when sensors are attached (Meenakshi, Yarra, Yamini, & Ghosh, 2014). Nevertheless, some earlier problems (such as restricted head movement, the need for extensive calibration, and data being restricted to the midsagittal plane only) were present for previous articulographs, but have largely been eliminated with the newer devices (see more details below).

2.2. EMA devices

EMA systems have been used for speech-related research since the 1980s (see Figure 1 for an overview of EMA market releases). In the past, the MIT system articulograph (Perkell et al., 1992), the Movetrack system (Branderud, 1985), and the Aurora system (NDI; Kröger et al., 2000) were used as some of the first available commercial articulographs.² For the past two decades and up until recently, there were two main manufacturers with a continuing production of EMA devices, namely Carstens Medizinelektronik (Bovenden, Germany) and Northern Digital Inc. (Waterloo, Canada). Carstens Medizinelektronik has manufactured several articulography devices over time spanning from the late 1980s until now, including models AG100, AG200, AG500, and the most recent AG501. Northern Digital Inc. (NDI) has manufactured the Wave articulograph, which came to the market in 2009 and was discontinued with the arrival of their latest articulograph, the NDI Vox in early 2020. The NDI Vox has since then likewise been discontinued, as NDI decided to reduce their product portfolio (Northern Digital Inc., 2020). Consequently, at present only Carstens offers a commercial articulograph that has not been discontinued.

Figure 1

Timeline of articulographs. Note that the AG200 is not included as it was a combination of the AG500 with the helmet from the AG100. The Aurora system is not included because it was a point-tracking tool but not one meant exclusively for the study of speech production.

As articulographs are costly, it is not uncommon for a lab to use an older system despite a new version being available on the market. Regardless, considerable advancements have been made since the first commercial articulograph. Technological advances have made it possible to collect more comprehensive data, going from 2D EMMA (midsagittal) systems to 3D (or rather 5D) systems collecting three Cartesian coordinates and two angular coordinates (Hoole & Zierdt, 2010). Thus, although early articulographs only measured in one plane (i.e., the midsagittal plane), modern devices track data in three isotropic spatial and two angular dimensions, and sensor orientation is tracked in addition to position. Furthermore, early articulographs required extensive calibration before testing and restricted the participants’ head movement, while modern systems permit free head movement.

2.3. Uses of EMA

Starting in the 1980s, EMA was designed as a way to track points both inside and outside the vocal tract (Schönle et al., 1987). Early studies evaluated the suitability of EMA for tracking speech movements (e.g., Höhne et al., 1987; Hoole & Gfoerer, 1990; Maurer, Gröne, Landis, Hoch, & Schönle, 1993) as well as for clinical use (e.g., Schönle, Müller, & Wenig, 1989; Engelke, Schönle, Kring, & Richter, 1989; Engelke, W., Engelke, D., & Schwetska, 1990). Nowadays, EMA is predominantly employed for the study of speech motor control—in individuals with and without speech disorders—but its uses remain broad. For example, it can be used for the study of orofacial processes in which articulators are actively involved, such as mastication (e.g., Peyron, Mioche, Renon, & Abouelkaram, 1996; Fuentes et al., 2018; Hoke et al., 2019) or swallowing (e.g., Horn, Kühnast, Axmann-Krcmar, & Göz, 2004; Steele & van Lieshout, 2009; Alvarez, Dias, Lezcano, Arias, & Fuentes, 2019; see also Steele, 2015, for a short overview of EMA and other instrumental techniques for the study of swallowing).

The uses of EMA in the study of speech production are likewise varied. Beyond collecting parallel acoustic data, there has been a continued interest in supplementing articulographic data with other speech data, either by collecting data with two devices simultaneously in the same session (if technically possible) or by collecting data from the same participants in separate sessions and coupling the data afterwards. Some of the methods that have been used to collect data in the same session as EMA include: ultrasound tongue imaging (UTI) (e.g., Aron et al., 2016; Benuš & Gafos, 2007), electropalatography (EPG; West, 1999; Simonsen, Moen, & Cowen, 2008; Harper, Lee, Goldstein, & Byrd, 2018), electromyography (EMG; e.g., Rong, Loucks, Kim, & Hasegawa-Johnson, 2012), and motion capture (e.g., Kroos, Bundgaard-Nielsen, & Best, 2012; Krivokapić, Tiede, & Tyrone, 2017). EMA and UTI, especially, are frequently used together, as EMA sensors can be used to provide a fixed reference for ultrasound recordings (e.g., Tiede, Chen, & Whalen, 2019). Methods whose data can be coupled with EMA data after recording additionally include real-time magnetic resonance imaging (rtMRI; e.g., Kim, Lammert, Ghosh, & Narayanan, 2014). Successful attempts have also been made to collect data from two speakers simultaneously using a dual EMA setup (e.g., Geng et al., 2013; Tiede et al., 2010).

Some researchers have made their EMA databases publicly available, sometimes concurrently with other kinematic data collection methods (e.g., rtMRI and UTI data collected from the same participants). Notable articulatory corpora include the USC-TIMIT multimodal speech production database (Narayanan et al., 2014), the MOCHA-TIMIT multi-channel articulatory database (Wrench, 2000), the TORGO database of acoustic and articulatory speech from dysarthric speakers (Rudzicz et al., 2012), the EMA-MAE corpus of Mandarin-Accented English (Ji, Berry, & Johnson, 2014), the mngu0 articulatory corpus (Richmond, Hoole, & King, 2011), the Haskins rate contrast database (Tiede et al., 2017), the MSPKA articulatory corpus of Italian (Canevari, Badino, & Fadiga, 2015), the DKU-JNU-EMA database on Mandarin and Chinese dialects (Cai et al., 2018), the Mandarin-Tibetan speech corpus (Lobsang et al., 2016), and the database of Norwegian speech sounds (Moen, Gram Simonsen, & Lindstad, 2004).

EMA has been used to provide accurate information on movements inside the vocal tract for animating talking heads (e.g., Badin, Tarabalka, Elisei, & Bailly, 2010; Gilbert, Olsen, Leung, & Stevens, 2015), synthesizing speech (e.g., Bocquelet, Hueber, Girin, Savariaux, & Yvert, 2016) or acoustic-to-articulatory inversion (e.g., Girin, Hueber, & Alameda-Pineda, 2017; Sivaraman, Espy-Wilson, & Wieling, 2017), and improving automatic speech recognition (ASR) software (e.g., Demange & Ouni, 2011; Wang, Samal, Green, & Rudzicz, 2012; Mitra et al., 2017). It can additionally be used to provide real time video feedback of articulatory movements and thus has advantages in second language acquisition to help with target pronunciation (Suemitsu, Dang, Ito, & Tiede, 2015) as well as in speech therapy as a biofeedback device (Murdoch, 2011; van Lieshout, 2007). Katz, Carter, and Levitt (2007), for example, used EMA for treatment of buccofacial apraxia, McNeil et al. (2010) used it to study acquired apraxia of speech, and Yunusova et al. (2017) used it to provide feedback to patients with Parkinson’s disease.

2.4. Accuracy and safety of EMA devices

Since the advent of EMA devices on the market, their sampling rate and number of channels have increased, and the accuracy has improved. Regarding the recording capabilities of the most recent articulographs, the NDI Wave and NDI Vox have a maximum sampling rate of up to 400 samples/s and can track 16 channels simultaneously (i.e., up to 16 sensors can be used). The AG500 can record 200 samples/s in 12 channels, while the AG501 can record 1250 samples/s of up to 24 channels (Sigona et al., 2018; Savariaux, Badin, Samson, & Gerber, 2017). The speed of current devices is more than enough to capture speech movements from the articulators. For example, Tasko and McClean (2004) indicated that the maximum speed of the tongue body during connected speech was 200 mm/s, and controlled (non-ballistic movements are much slower). A sampling rate of 400 Hz thus has sufficient temporal resolution to track the fastest known articulatory movements.

Several studies have investigated the spatial accuracy of articulographs. Berry (2011) reported that the Wave system showed < 0.5 mm errors for 95% of position samples recorded during human jaw movement for nine out of ten participants. A study on the Carstens AG500 has reported a median error of < 0.5 mm across different types of recordings, including manual movements and various speech tasks, with the error magnitude being dependent on calibration and on the location of the sensors in the electromagnetic field as well as on the proximity between the sensors (Yunusova, Green, & Mefferd, 2009). In addition, the AG500 was found to display some numerical instabilities and anomalies (Stella, M., Stella, A., Grimaldi, & Fivela, 2012) which were not predictable (Kroos, 2012). Finally, a comparison between the Wave and several Carstens systems (namely the AG200, AG500, and AG501) revealed that all four devices showed a local precision of around 1 mm, but a large range of global precision, spanning from 3 mm to 21.8 mm (Savariaux et al., 2017), with the AG501 as the most accurate device with precision of 0.3 mm (RMS; Electromagnetic Articulograph, 2019). Comparisons of the AG500 and AG501 additionally revealed that the AG501 was found to be more accurate, stable, and user friendly (Stella et al., 2013; Sigona et al., 2018) than the AG500. A recent study on the newest NDI articulography—namely, the NDI Vox, which has been discontinued recently—has shown it to be significantly more accurate than the NDI Wave, with an average sensor pair tracking error of 0.1 mm, although a direct side-by-side device comparison would be necessary to establish how the Vox compares with the AG501 (Rebernik, Jacobi, Tiede, & Wieling, in revision).

In general, electromagnetic articulographs are safe to use (Hasegawa-Johnson, 1998). The AG500, AG501, NDI Wave, and NDI Vox articulographs fulfil the safety requirements for electrical equipment as set by the International Electrotechnical Commission and the American Federal Communications Commission (Carstens AG500 Manual, 2006; Carstens AG501 Manual, 2014; Wave User Guide, Northern Digital Inc., 2009, rev. 2016; Vox User Guide, Northern Digital Inc., 2019). Note, however, that little research has been targeted specifically at the electromagnetic frequency ranges of EMA systems (Hoole & Nguyen, 1999; Earnest & Max, 2003). Furthermore, due to the moderate strength magnetic field³ a few exclusion criteria must be considered that impact participant recruitment, predominantly the use of implanted devices that might be prone to electromagnetic interference. These include (as discussed in the Wave User Guide, Northern Digital Inc., 2009, rev. 2016, and Carstens AG500 manual, 2006):

– the use of a pacemaker (the magnetic field of the EMA may interfere with pacemaker operation; see Smith & Assen, 1992, for a description of how electromagnetic fields affect cardiac pacemakers);
– large metal objects in or around the head (such as a hearing aid or cochlear implant; see Crose, Kuk, & Bindeballe, 2011, and Tognola, Parazzini, Sibella, Paglialonga, & Ravazzani, 2007, for electromagnetic interference in hearing aids and cochlear implants, respectively);
– the use of insulin pumps (see Zhang, Jones, & Jetley, 2010, for a hazard analysis of insulin pumps).

Some studies have tested the potential adverse effects of the EMA magnetic fields on metal objects in the field and, vice versa, the effect of metal objects on the integrity of the collected EMA data. Katz et al. (2003) tested compatibility of the Clarion 1.2 S-Series cochlear implant with the Carstens AG100 articulograph in order to determine whether EMA affects the functioning of the implant and the participants’ speech perception on the one hand, and whether the implant could potentially affect the accuracy of EMA data on the other hand. They determined that the tested cochlear implant was compatible with the AG100, as no adverse effects could be observed.

Joglar, Nguyen, Garst, and Katz (2009) tested potential interference between pacemakers/implantable cardioverter-defibrillators with the Carstens AG100. They determined that devices from Medtronic (type D154VRC), St. Jude (types 5172 and V-193), and Guidant (types 1860, T180, 1852 and 1853) were compatible with the Carstens AG100. Finally, Mücke et al. (2018; see also Hermes, Mücke, Thies, & Barbe, 2019) tested Essential Tremor patients who had undergone thalamic deep brain stimulation (DBS) surgery. Participants were tested using the Carstens AG501 while the implant was active and inactive, with no reported adverse effects. However, as new articulographs and medical devices are introduced, it is necessary to verify their field strength and electromagnetic frequency before doing any testing on participants. Additionally, some researchers advise against including pregnant women in empirical studies using EMA (Hoole & Nguyen, 1999; Stone, 2010) as the effect of the magnetic field is not entirely clear and it is better to err on the side of caution.

2.5. Participants

Due to the high time demands of the method—including long participant preparation times as well as data processing and analysis steps—EMA studies frequently limit their number of participants. Our literature review (see description below) showed that around 75% of studies published in journals included ten participants or fewer; around 46% included five participants or fewer. This is also in line with Kochetov (2020), who reported the median number of participants in an EMA study to be five. Early studies (e.g., earlier than 2003) have often only included one or two participants, and it was not uncommon for one of the authors to be a participant. With EMA’s increasing popularity, however, there has also been an increase in the number of studies with more participants, with the largest participant samples including around 50 participants (e.g., Schötz, Frid, & Löfqvist, 2013, N = 50; Cheng, Murdoch, Goozée, & Scott, 2007, N = 48; Wieling et al., 2016, N = 48).

In general, most participants tested with EMA are healthy adults (around 80% of the studies). Nevertheless, several studies have tested children from five years of age onwards (e.g., Katz & Bharadwaj, 2001; Cheng et al., 2007; Schötz et al., 2013), giving important insights into the development of individual articulators during the process of early speech acquisition. Articulographs have also frequently been used to study disordered speech in individuals suffering from various conditions that can impact speech production and/or speech motor control, ranging from speech disorders such as stuttering and cluttering (Didirkova & Hirsch, 2019; McClean, Tasko, & Runyan, 2004; Hartinger & Mooshammer, 2008) or apraxia or speech (e.g., Bartle-Meyer, Goozée, & Murdoch, 2009; Nijland, Maassen, Hulstijn, & Peters, 2004); hypokinetic dysarthria (e.g., Kearney et al., 2018; Mefferd & Dietrich, 2019) or Amyotrophic Lateral Sclerosis (e.g., Lee & Bell, 2018; Shellikeri et al., 2016) to congenital conditions such as cleft lip (e.g., van Lieshout, Rutjes, & Spauwen, 2002) or congenital blindness (e.g., Trudeau-Fisette, Tiede, & Ménard, 2017). Using EMA to study disordered speech (more studies can be found in the Appendix) is important to provide insight into the underlying issues of speech motor control that cannot be detected through acoustics only. However, as a method, EMA can also be more fatiguing, and researchers should thus distinguish between what they can and should ask of their participants (Gibbon, 2008; van Lieshout, 2007; see below).

3. Literature review

Section 3 of the paper is intended as a review and discussion of the prevalent trends in EMA data collection of the past three decades. To identify these practices and trends, we performed a systematic literature review.⁴ Using Google Scholar, we collected journal publications, conference proceedings papers, and other academic writings by employing the search terms ‘articulography,’ ‘articulograph,’ ‘articulometry,’ and ‘articulometer,’ between the years of 1987 and 2019. We excluded publications that were less than four pages long, publications that did not describe participant studies (e.g., because the authors used an existing database, focused on a new analysis procedure or assessed the more technical aspects of the EMA such as device accuracy), and publications that were written in languages other than English.⁵ These search criteria led to 905 identified publications, which likely encompasses the large majority of published works utilizing articulographs. It should thus provide a representative overview of EMA data collection procedures. The present review considers 412 journal publications, 413 conference papers, and 80 other writings (most frequently doctoral dissertations).

During the reviewing process, we identified the following parameters: type of EMA device used, number of participants, population, total number of sensors, number of tongue sensors, sensor placement, sensor preparation, and adhesive used for sensor placement. Not all publications reported all information. For example, while most publications mention the device type (especially after several manufacturers started producing articulographs) and number of sensors, few of them mention the adhesive in use.

In the Appendix, we have provided a table with all identified studies. Please note that for this paper, we have analyzed the trends and practices based on journal publications only (N = 412). This prevents us from counting the same study multiple times, because studies described in journal publications have often already been presented at one or more conferences but are rarely published in more than one journal.

3.1. Data collection practices

To draw valid conclusions about speech kinematics and speech motor control based on EMA data, it is necessary to ensure between-subjects and between-studies comparability. On the one hand, it is important to correctly place EMA sensors on the speech articulators depending on the specific goals of the study and to optimize sensor adhesion time to ensure cross-trial comparability (after re-attachment, a sensor might not be in the exact same position as before). On the other hand, it is necessary to make the experimental procedure as comfortable as possible for participants while not impeding scientific accuracy.

In the sections below, we lean on our literature review to report some general information on sensor placement, followed by information on certain anatomical considerations that might result in a different sensor attachment strategy, and finally information on the placement and preparation of specific sensor categories (including reference sensors, jaw movement sensors, tongue sensors, and lip sensors).

At this point, we would like to emphasize that most authors follow a certain template when reporting on their EMA study. Such a template is usually of the form:

Articulatory data was collected using [device name, device manufacturer] at a sampling rate of [sampling rate, often 100, 200 or 400 Hz]. Acoustic data was simultaneously collected using [microphone device] at [sampling frequency, often 16 kHz]. [Number] sensors were attached to the tongue, lips and jaw using the non-toxic adhesive [name adhesive]. Specifically, [number] sensors were affixed to the tongue: one on the tongue tip, [location, often “about 1 cm from the anatomical tip”], one on the back of the tongue [location, often “as far back as comfortable”], and one [location, with three sensors often “midway between the tongue tip and tongue back sensor”]. One sensor affixed to [location, often the lower incisor] tracked jaw movements and two sensors were placed on the vermillion border of the upper and lower lips. [Number] reference sensors were additionally placed on [location, often the left and right mastoid, nasion and/or upper incisor] to correct for head movement. A recording of the bite plane was made using [description of the process] and a palate trace was made [description of the process].

In the following sections, we discuss the variables that are indicated in this template in bold. Some of the other parts (such as devices and sampling rates) have already been discussed above. Finally, the following sections do not provide information on the EMA data analysis process: The reader is directed to consult Gafos, Kirov, and Shaw (2010) who provided guidelines for using mview, the frequently-used EMA data analysis programme developed by Mark Tiede at Haskins Laboratories (Tiede, 2005); Hoole (2012) who provides a tutorial on his software for processing AG500/AG501 data; and Kolb (2015) who details some other existing software tools and analysis methods. A tutorial on how to analyze EMA data using non-linear regression techniques is provided by Wieling (2018).

3.2. General sensor placement information

Articulographs can be used to study the behaviour of both extraoral (i.e., the lips and the jaw) and intraoral (i.e., the tongue) articulators. The exact choice of sensors depends on several factors, including the studied population (clinical versus healthy, see below; impacts the number of intraoral sensors) and the sounds that are to be investigated (e.g., apical versus lateral; impacts sensor placement). Researcher preference also plays a role: Some prefer to adhere the minimum number of sensors (to decrease the time necessary for participant preparation), while others prefer to adhere more sensors (to collect additional data, using it to answer more research questions). With few exceptions, sensors are almost always placed midsagitally.

The number of intraoral sensors is an important consideration in EMA studies. On the one hand, having more sensors on the tongue allows the tracking of more points and thus yields a better picture of the movement of the tongue. On the other hand, when also including the intraoral jaw movement sensor and reference sensor on the upper and lower incisors, respectively, speakers frequently have five or more wires in their mouth. This may lead to discomfort and affect participants’ speech. More tongue sensors are especially problematic where sensitive populations are concerned. These individuals may be more prone to fatigue (e.g., Friedman et al., 2007, on fatigue in PD patients), more likely to drool (Reddihough & Johnson, 1999), and find it more difficult to stick out their tongue or open their mouth. Furthermore, their speech is more likely to be impeded by a foreign object in their oral cavity. In the case of children, their tongues are smaller, they also salivate more, and need more frequent toilet visits, which necessitates shorter experimental procedures, including shorter preparation times. When testing children and patients, researchers therefore often opt for only two tongue sensors (tongue tip and tongue back) in addition to the intraoral jaw movement sensor and the intraoral reference sensor.

While the exact sensor placement depends on the study, there are some typical sensor placements. These are depicted in Figure 2, which shows movement sensors used to track the movement of articulators (red dots; including the lips, jaw, and tongue) and reference sensors, placed on orofacial structures that do not move during speech production (green dots; including both mastoids, the nasion, and upper incisor). More details on individual sensor categories are provided below.

Figure 2

Common placement of EMA sensors: Red dots mark movement sensors, green dots reference sensors. Original image by Tavin, distributed under the CC Attribution 3.0 Unported license (sensor points were added by the authors).

After all sensors have been placed, a biteplate⁶ recording can be made with a biteplate object that has several sensors attached to it (see Figure 9 in Section 4.2 for a picture of our lab’s biteplate with three sensors). The object is placed between the participant’s teeth and a recording is made to obtain the relative orientation of the sensors on the biteplate compared to the reference sensors. This information is then used to rotate the acquired sensor movement data (of the sensors attached to the articulators) to a comparable occlusal plane per participant (Westbury, 1994). Finally, palate trace recordings are made, where a sensor is used to trace the palate across the occlusal plane, providing an estimate of the shape of participants’ oral cavity (see Neufeld & van Lieshout, 2014, for a description on how EMA sensors can be used to construct a 3D model of the hard palate).

The time it takes for all sensors to be placed varies. Earnest and Max (2003), for example, state that it can take anywhere between 30 and 60 minutes. This time can be reduced depending on the device, the number of sensors, and their placement. Before starting the experiment, researchers additionally allow some time for the participants to adjust to the sensors. A study by Dromey et al. (2018), who tested sensor habituation, found that after ten minutes, participants reached a level of habituation to the sensors that did not improve even if the habituation stage lasted longer. In general, if researchers include a sensor habituation stage, it is most often 5–10 minutes of informal conversation (e.g., Katz, Mehta, & Wood, 2018; Goozée et al., 2007).

Several brands of adhesive can be used to adhere the sensors. The Carstens website recommends Epiglu (Meyer Haake GmbH), whereas NDI does not give any adhesive recommendations on their website. Other popular adhesives include PeriAcryl®90HV (Glustitch), Isodent cyanoacrylate adhesive (Ellman International), Cyano Veneer Fast (Scheu Dental Technology), Cyanodent (Ellman International), Histoacryl (B. Braun), and Aron Alpha (Toagosei). Note that IsoDent and Cyano Dent adhesives appear to be discontinued⁷, and Cyano Veneer Fast has not renewed its medical certification, while the intraoral use of Histoacryl may be problematic due to potential cytotoxic effects (Schneider & Otto, 2012). PeriAcryl®90HV has been used most often in recent years.

What these adhesives (except for Histoacryl; Schneider & Otto, 2012) have in common is that they are intended for oral tissue (e.g., for use in dental or oral surgery), are biologically safe, and relatively viscous. Dental cements, including Ketac™, Durelon, and Fuji, have also been used by several labs to attach tongue sensors (e.g., Mooshammer, Hoole, & Geumann, 2006; Tabain, 2003; Steele & van Lieshout, 2004), but are more invasive, as they involve covering the tongue dorsum with a hard substance. Dental cement also causes faster deterioration of sensors and leads to participant discomfort. However, it does have the benefit of making sensors adhere to the tongue for a longer period of time (e.g., Ball, Gracco, & Stone, 2001, state that the sensors remain firmly attached to the tongue surface for over 90 minutes).

Before discussing frequent sensor placements, it is also necessary to mention some more unusual sensor placements. In the past, sensors have been adhered to the velum using different means, from glue to atraumatic sutures (e.g., Engelke, Hoch, Bruns, & Striebeck, 1996, number of participants N = 1; Okadome & Honda, 2001, N = 3; Jaeger & Hoole, 2011, N = 4). Other orofacial structures to which sensors have been adhered include the uvula (e.g., Hoenig & Schoener, 1992, N = 30), thyroid cartilage/skin above the larynx (e.g., Alvarez et al., 2019, N = 14; Shosted, Carignan, & Rong, 2011, N = 4; Bückins, Greisbach, & Hermes, 2018, N = 4), and sublaminally on the underside of the tongue (e.g., Rochon & Pompino-Marschall, 1999, N = 4).

3.3. Anatomical considerations

3.3.1. Tongue anatomy

The tongue is a highly mobile and muscled articulator, responsible for speech, mastication, and deglutition. For the purposes of speech production, there are two potential ways of defining parts of the tongue: the anatomical perspective (see note⁸ for details) and the functional perspective, which defines the tongue in terms of functions that different parts serve in the process of speech motor control, and is thus directly relevant to EMA data collection. Following Ladefoged and Maddieson (1996, Ch. 2), the tongue consists of the tongue tip (Figure 3–1), tongue blade (just behind the tip), tongue body (Figure 3–2), and tongue root (Figure 3–3). The tip of the tongue starts parallel to the surfaces of incisors and extends to cover a small area about 2 mm wide on the upper surface of the tongue at rest. The blade of the tongue is the part that starts behind the tongue tip and extends to 2 mm behind the point of the tongue that is located below the center of the alveolar ridge (i.e., the point of the maximum slope). Sounds made with the tongue tip are said to be apical while those made with the tongue blade are said to be laminal. When discussing sensor placement, we refer to the sensor adhered to this most anterior part of the tongue (encompassing both the tip and the blade) as the ‘tongue tip’ sensor (Figure 3–1).

Figure 3

Tongue anatomy: tongue tip (1), tongue body (2), and tongue root (3). Original image by Jonas Töle, distributed under the CC CC0 1.0 Universal Public Domain Dedication license.

The tongue body (Figure 3–2) is the mass of tongue behind the blade and can roughly be divided into tongue body front (below the hard palate) and tongue body back (below the velum). Sounds that are produced with this part of the tongue are dorsal. When discussing sensor placement, we refer to sensors placed on the tongue body as either ‘tongue mid’ or ‘tongue back,’ depending on how close to the tongue root the sensor is. Unless specified differently, all sensors are placed along the midline of the tongue, i.e., the median sulcus, which divides the tongue into the left and right parts.

Finally—regarding the tongue parts that are not easily accessible for sensor placement and EMA measurements—the tongue root is found behind the tongue body (Figure 3–3), in the oropharynx, together with the epiglottis. It is not easily possible to track tongue root movements with an EMA sensor due to the gag reflex.

Depending on the target sounds and/or phenomena being studied, different sensors are used (see Table 1 for some common sounds and corresponding sensors). In all cases, it is presumed that reference sensors (most frequently on the nasion, upper incisor, and both mastoids) are additionally being used. Note that the table only shows a limited subset of sounds that have been studied with EMA. Importantly, Yunusova, Rosenthal, Rudy, Baljko, and Sakalogiannakis (2012) describe which lingual sounds can be distinguished using articulography, and state that consonants cannot be distinguished on the basis of only one characteristic, such as the tongue position measured with a single sensor, as more dimensions are needed (e.g., also lip sensors).

Table 1

Sounds studied with EMA sensors. Other sensors are needed in order to determine how sensor location relates to other orofacial structures and articulators. Example studies are included.

Target sound	Articulator sensor placement	Example study

bilabial stops (/p, b/)	vermillion border of upper and lower lips	Tong and Ng, 2011
velar stops (/k, g/)	tongue back sensor (close to place of constriction)	Brunner, Fuchs, and Perrier, 2011a
alveolar stops (/t, d/)	tongue tip	Kühnert and Hoole, 2004
liquids (/l, r/)	tongue sensors placed laterally and midsagitally	Howson and Kochetov, 2015
sibilants (/s, z, ʃ, ʒ/)	tongue tip	Bukmaier and Harrington, 2016
(labio)dental fricatives (/f, v, θ, ð/)	three tongue sensors	Wieling, Veenstra, Adank, and Tiede, 2017
trills	tongue sensors placed laterally and midsagitally	Howson, Kochetov, and van Lieshout, 2015
vowels	three or more tongue sensors	Hoole, Mooshammer, and Tillmann, 1994;
nasal vowels	three tongue sensors	Carignan, Shosted, Shih, and Rong, 2011

Tongue shapes vary vastly from one individual to the next (King & Parent, 2001; Kullaa-Mikkonen, Mikkonen, & Kotilainen, 1982). For example, some individuals may have a more fissured tongue with more grooving than others, which makes sensor adhesion directly to the median sulcus more difficult. Regarding tongue anatomy, several factors should be considered, including age (namely, adults have a longer tongue than children; Vorperian et al., 2005), body weight (namely, tongue muscle volume positively correlates with body weight; Stone et al., 2018), and gender. The effects of the latter are less clear, as some studies have shown that men have significantly larger tongue breadth and volume (Oliver & Evans, 1986; Mahne et al., 2007), while others failed to find such an effect, even though men do usually have a larger bony structure (Hopkin, 1967). Additionally, tongue rhythm and velocity correlate with age (movements are slower and more irregular in the elderly; Hirai et al., 1989). Finally, different types of tongue movements exist, from hollowing and grooving to pulling back, tipping, heaping, and bunching (Hiiemae & Palmer, 2003), which impacts the production of different sounds.

3.3.2. Hard palate, salivary flow rates, and gingival tissue

Aside from considerations related to the tongue itself, restrictions posed by the rest of the oral cavity have to be taken into account when placing intraoral sensors. Particularly relevant in this regard are the hard palate, gingival tissue, and salivary flow rates. Differences between speakers occur in the height, length, slope, width, and curvature of the hard palate (e.g., Brunner, Fuchs, & Perrier, 2009; Rudy & Yunusova, 2013; Lammert, Proctor, & Narayanan, 2018). These differences in palate shape are also responsible for variability in speech production. When comparing the speech produced by individuals with flat, domed, or regular palates, it has been hypothesized that speakers with flat palates have more precise articulations because that is the only way to maintain acoustic consistency (Bakst & Johnson, 2018; Brunner et al., 2009). Furthermore, palatal morphology can also account for some variability in tongue positioning (Rudy & Yunusova, 2013).

Other anatomical considerations include the production of saliva and gingival tissue. Salivary flow rates (i.e., the quantity of saliva) differ greatly across healthy individuals (Whelton, 2012). This may substantially influence how well intraoral sensors adhere to the tongue and incisors, as the usual cyanoacrylate adhesives (see description of adhesives above) polymerize after coming into contact with saliva. Moreover, the production of saliva is heavily influenced by external factors, such as degree of hydration or circadian rhythm, but also by minor factors including gender, age, and body weight (Whelton, 2012). Specifically, men salivate more than women (Inoue et al., 2006), elderly adults salivate less than middle-aged adults (Navazesh, Mulligan, Kipnis, Denny, P. A., & Denny, P. C., 1992), and individuals with a higher body mass index have a less heavy salivary flow rate (Flink, Bergdahl, Tegelberg, Rosenblad, & Lagerlöf, 2008).

Finally, especially relevant for the attachment of the intraoral jaw-movement and reference sensors, which are usually positioned on or close to the lower and upper incisors, is the amount of gingival tissue above and below the incisors. These two (lower and upper incisor) sensors can be more easily placed when the speaker has a larger gingival surface above and below the incisors. For speakers with a small gingival surface, or for speakers who have a prominent labial frenulum, an alternative sensor placement plan may be considered (e.g., on the chin—which is non-ideal due to skin movement—or directly on the incisors as opposed to the gingival tissue).

3.4. Reference sensors

3.4.1. Use and positioning

During the post-processing stage of EMA data, positional data from the reference sensors is used to correct for deviations in head position relative to a consistent reference position, which is usually the occlusal plane. The reference sensors are usually placed as far apart as possible (to minimize the effect of noise on the position estimation of individual sensors) on bony structures with least skin movement, including the nasion (N), mastoid processes (i.e., on the bone behind both ears; ML and MR), and the gingival tissue of upper central or lateral incisors (UI). Our literature review shows that older studies predominantly included two reference sensors placed in the midsagittal plane (i.e., on the nasion and upper incisor), while newer studies often include more.

While reference sensors are usually similar in architecture as movement sensors (i.e., capturing five degrees of freedom, hereinafter 5DOF), NDI has additionally developed a (two-channel) 6DOF sensor in which two 5DOF sensors are integrated to have a specific distance and relative orientation. If a 6DOF sensor is used, it is usually attached to the forehead, and automatically corrects the data of the other sensors for the head movements (measured via the 6DOF sensor). While it is convenient to use only one reference sensor, the potential for noise (induced by skin movement) is greater in comparison to the more commonly used three-sensor setup as discussed above.

3.4.2. Preparation and adhesion

Reference sensors are prepared differently depending on where they are being placed. Those placed on extraoral structures (i.e., the nasion and mastoid sensors) are generally taped using medical tape. They need to be taped firmly to prevent movement; a small drop of adhesive can additionally be added to achieve this. They can also be coated in latex to make disinfection after the experimental session easier and to prolong sensor longevity. The intraoral reference sensor is usually placed on the gingiva above the upper central or lateral incisors. Section 3.6.2. provides more information on preparing the intraoral incisor reference sensor.

The reference sensors can alternatively be prepared and placed on a pair of goggles, on the frame of a pair of plastic glasses, or on a headband (e.g., Ji, Berry, & Johnson, 2013; Mefferd, 2019; Thompson & Kim, 2019; Kearney et al., 2018). The Appendix shows additional information regarding individual researchers’ strategies to place reference sensors.

3.5. Tongue sensors

3.5.1. Use and positioning

Tongue sensors are used to track tongue movements and investigate the production of a wide range of sounds, from alveolar stops (with a tongue tip sensor) to velars (with a tongue back sensor). Sensors are placed midsagitally unless the researcher wishes to specifically study lateral sounds, in which case one or two sensors may be added on the lateral parts of the tongue.

Concerning tongue sensors, 375 journal studies (out of 412 in total) explicitly mention the number and/or positioning of tongue sensors (as opposed to, e.g., only generally mentioning that they used tongue sensors). A total of 41 out of 375 studies (11%) use one tongue sensor, 90 studies use two tongue sensors (24%), 165 studies use three tongue sensors (44%), 70 studies use four tongue sensors (19%), and nine studies use five tongue sensors or more (2%). Either two or three sensors on the tongue are thus the most frequent choice, bringing the total number of intraoral sensors to four or five (including the reference sensor on the upper incisors and a jaw-movement sensor on the lower incisors).

If three sensors are used, they are usually placed on the tongue tip (TT), tongue middle (TM), and tongue back (TB) along the tongue’s median sulcus. When three sensors are used, there are two main approaches to dividing the tongue dorsum: either by placing TT and TB according to a predetermined measurement strategy or by spacing the sensors equidistantly (see below and also Table 2).

Table 2

Tongue sensor placement strategies. Percentages are calculated based on the number of studies that use the sensor in question (as defined under the sensor type). The dominant strategy is in bold.

Sensor	Methods of placement	Studies (%)

Tongue Tip (TT)	from anatomical tongue tip
Tongue Tip (TT)	≤1 cm (often 0.5 cm)	30 (11%)
263 studies (96%) out of 273	1 cm	164 (62%)
use a TT sensor	1.1–2 cm	16 (6%)
	just behind the tongue tip	18 (7%)
	other (including not defined)	35 (13%)

Tongue Back (TB)	as far back (as feasible; as comfortable)	50 (23%)
216 studies (79%) out of 273 use a TB sensor	behind anatomical tongue tip
	<3.5 cm	9 (4%)
	4–4.5 cm	13 (%)
	5–5.5 cm	10 (%)
	>6	2 (%)
	behind TT sensor
	<3 cm	5 (2%)
	4–5 cm	5 (2%)
	behind TM1 or TM2 sensor
	1–2 cm	32 (15%)
	other	17 (8%)
	not defined	42 (19%)

Tongue Mid (TM)	With 2 or 3 sensors (TT, TM, TB):
207 studies (76%) out of 273 use one or two TM sensors	midpoint between TT and TB	40 (19%)
	1–2 cm behind TT sensor	29 (14%)
	3–3.5 cm behind TT sensor	20 (10%)
	1–2 cm behind anatomical tip	18 (9%)
	3–3.5 cm behind anatomical tip	17 (8%)
	4–5 cm behind anatomical tip	15 (7%)
	With 4 or more sensors (TT, TM1, TM2, TB):
	midpoint between TT and TB, equal-spaced	13 (6%)
	other (including not defined)	43 (21%)

In their placement of the TT sensor, most researchers provide a measurement, with ‘approximately 1 cm’ from anatomical tongue tip as the most popular choice (note that the sensor cannot be placed directly on the tip because it would interfere significantly with speech production and fall off quickly). Keeping in mind the functional perspective on tongue anatomy, this means that the ‘tongue tip’ sensor is in fact placed on the tongue blade as opposed to the tongue tip. The exact method of measurement (i.e., by ruler, calliper, or simply ‘eyeballing’) is mostly left unspecified. Furthermore, with a few exceptions, it is not indicated whether the measurements were performed with the tongue comfortably extended, stretched out, or at rest inside the mouth.

Regarding the placement of the TB and TM sensors, strategies vary to a greater extent than the strategies for the TT sensor. Some researchers decide on a specific measurement, e.g., by placing TB and TM sensors with 2 cm of space in between each sensor or by placing the TB sensor 4–5 cm from the TT sensor, with the TM sensor in between the two. Others decide to place the TB sensor ‘as far back as possible’ and the TM sensor in between. If two TM sensors are used, they are most often defined as being placed equidistantly between the TT and TB sensors.

Few studies use lateral sensors (some exceptions include e.g., Howson et al., 2015; Katz, Mehta, & Wood, 2017; Thibeault, Ménard, Baum, Richard, & McFarland, 2011; see the Appendix for a full list of studies using tongue lateral sensors). If lateral sensors are used, they are most often placed to the side of the TM sensor, about 1 cm from the tongue edge.

Table 2 provides an overview of the most common strategies for tongue sensor placement as well as their usage frequency in our literature review. The main strategy for each sensor type is highlighted in bold. In total, 273 out of 375 studies explicitly defined the position of at least one tongue sensor. For more details on which researchers use which strategy, the reader is invited to consult the ‘tongue sensors’ tab in the Appendix.

While not strictly in the purview of this literature review, we would like to mention two recent publications, which proposed more data-driven approaches to sensor placement. First, Patem, Illa, Afshan, and Ghosh (2018) used dynamic programming in order to determine optimal sensor placement for the sounds of American English based on rtMRI video frames of the vocal tract. Based on data of four participants (two male, two female), they determined that the optimal placement for three tongue sensors is to place the tongue tip sensor at 19.93 ± 11.45 mm from tongue base,⁹ the tongue middle sensor at 38.2 ± 11.52 mm from the tongue tip sensor, and the tongue back sensor at 80.51 ± 13.51 mm from the tongue tip sensor.

These measurements are informative for the four participants examined, however it would in practice be difficult to measure a participant’s tongue in such detail and difficult to find participants for whom such measurements would be suitable (e.g., placing a tongue back sensor at 8 cm from the tongue tip sensor is often not practically possible due to limited tongue length; Patem and colleagues themselves state that they did not consider the level of discomfort in determining optimal sensor locations). Furthermore, it is not possible to accurately determine the tongue base without access to MRI, and the confidence intervals of the presented optimal placements are rather large.

Second, Wang, Samal, Rong, and Green (2016) used machine learning to determine an optimal set of points needed for classifying speech movements. They determined that for classifying most sounds (including both vowels and consonants), a set of four sensors (tongue tip, tongue back, upper lip, and lower lip) suffices. This is especially informative when studying the speech of clinical populations, since in those circumstances it is often desirable to use the minimal number of sensors to limit the burden on the participants.

3.5.2. Preparation and adhesion

Few studies mention the preparation of tongue sensors prior to placement. However, no conclusions can be drawn from this, as some researchers might simply not mention the specifics of sensor preparation due to manuscript length limitations or a perceived lack of interest from the readers. We could nonetheless identify some tongue sensor preparation options. Note that the tongue itself is also often ‘prepared,’ as it is dried to improve sensor adhesion (see also Section 4 for our drying procedure). First, some researchers adhere the sensors to the tongue without any preparation (i.e., using bare or out-of-the-box sensors).

Another option is to coat the sensors in latex before adhesion, a frequently-used approach (Earnest & Max, 2003). This method is suggested on the website of the Carstens articulograph (Electromagnetic Articulograph, 2019), where it is indicated that Plasty late latex milk (Glorex GmbH) is a suitable product for coating the sensors. The latex coating, they report, keeps the sensors clean and without glue residue. In their Carstens AG500 Manual (2006) they additionally state, under the ‘Cleaning and disinfection of sensors’ section, that coating the sensors in latex is recommended, as the latex can simply be peeled off after testing. Sensors can (and, if possible, should) according to Carstens be coated in latex for use on other facial surfaces, not just lingual, as this increases sterility and sensor longevity. Latex coating should also increase the longevity of (reusable) NDI Vox sensors (NDI, personal communication).

The third approach for preparing tongue sensors consists of increasing the sensor size to increase the adhesion surface and thereby potentially increasing the sensor adhesion duration. This can be done, for example, by placing small pieces of silk between the sensor and lingual surfaces (e.g., Ji et al., 2013; Goozée, Murdoch, Theodoros, & Stokes, 2000; Fuchs, 2005), gluing a small transparent layer of plastic to the bottom of the sensors (e.g., Wieling, Veenstra, Adank, Weber, & Tiede, 2015), or covering the head of the sensors with a small, thin flap of latex (our approach; see Section 4).

We carried out a sensor-adhesion experiment to compare these three approaches for tongue sensor adhesion. This experiment is reported on in Section 5.

3.6. Jaw-movement sensors

3.6.1. Use and positioning

Jaw movements can be tracked with either an intraoral sensor that is adhered on the lower incisors or an extraoral sensor adhered to the chin. The former is preferred, as the position of the chin sensor may also be affected by skin movement during speaking. From 286 studies that use a sensor to track jaw movement, 214 (75%) use a sensor on (or near) the lower incisors, compared to 72 (25%) which use a sensor on the chin. However, note that there are also differences in the placement of incisor sensors: While most researchers refer to placement on ‘incisors,’ only few place the sensor on the incisors themselves (i.e., on the teeth). Most place the sensor on the gingival tissue below the incisors.

Most studies use only one jaw movement sensor. However, some have also used several (e.g., Wang et al., 2016, who placed three sensors on the jaw; Mooshammer, Tiede, Shattuck-Hufnagel, & Goldstein, 2019, who placed two sensors on the lower gumline, one below the front incisors and one below the left premolar; Mefferd, 2017, who placed three sensors to the lower gumline; or Mooshammer, Hoole, & Geumann, 2007, who placed two sensors on the outer and inner surface of the lower gumline and one sensor on the chin). Note that even with a single sensor, jaw movements can easily be tracked but are often hard to decouple from tongue and lower lip movement (e.g., Henriques & van Lieshout, 2013), as components of jaw movements are also present in tongue and lip movements. Furthermore, as the jaw is a rigid body, at least two 5DOF sensors are necessary to correctly track its orientation relative to the head.

3.6.2. Preparation and adhesion

If the jaw-movement sensor is placed extraorally, most frequently on the chin, no special preparation is mentioned in the reviewed studies (although the sensors can be coated in latex to increase sterility and longevity). In contrast, our literature review revealed several methods of preparing an intraoral jaw sensor (and the intraoral reference sensor). These methods include using the same dental adhesive as on the tongue, creating a custom dental mould of the incisor to which the sensor is adhered (e.g., Steele & van Lieshout, 2004; Steele, van Lieshout, & Pelletier, 2012), or adhering the sensor to a piece of Stomahesive wafer (e.g., Mefferd, 2017; Berry, Kolb, Schroeder, & Johnson, 2017; Dromey et al., 2018). The latter approach—using Stomahesive—increases the surface of the sensor as well as its adhesion to the participant’s gingival tissue due to the nature of the material. As this is the method used in our lab, there are further details on the preparation of Stomahesive-covered sensors in Section 4.

3.7. Lip sensors

3.7.1. Use and positioning

Lip sensors are generally placed on the vermillion border of the upper and lower lips. Data obtained through these sensor positions allow to estimate variations in lip aperture or lip protrusion that are phonetically relevant (e.g., production of bilabial stops as compared to fricatives, or between rounded and unrounded vowels). In some cases, such as when a study focuses on lip movements specifically, more lip sensors are attached, namely at the right and/or left lip corners (e.g., Meenakshi & Ghosh, 2018; Rong et al., 2012; Cler, Lee, Mittelman, Stepp, & Bohland, 2017).

3.7.2. Preparation and adhesion

Lip sensors can be bare or coated with latex (to increase hygiene and longevity, as these sensors come in contact with saliva). If more than two lip sensors are used, latex-coated sensors are likely to result in affected articulation due to their larger size. Most often, lip sensors are adhered with a piece of tape. To increase adhesiveness, a small drop of adhesive can additionally be added, which ensures that the sensors are firmly adhered for the duration of the experiment. This is especially important if the medical tape does not stick adequately (e.g., due to the participant’s sweat or repeated large labial movements in stimuli targeting plosives).

4. EMA data collection in practice: A suggested procedure

In Section 4 of the paper, we provide a practical description of the data collection procedure employed in our lab at the University of Groningen. Our approach is only one of the many possible strategies available to researchers who collect speech production data with EMA, as was also illustrated in the previous part. The description includes all details which are important, but often omitted from publications.

4.1. Preparation of the sensors using latex

In the procedure used in our lab, all sensors are prepared at least half a day before the experiment. In this preparation stage, we distinguish between three types of sensors: (1) the extraoral sensors (identified with MR, ML, N, UL, and LL, below) plus the sensors attached to the tongue (TM and TT), except for the most posterior tongue sensor, (2) the most posterior tongue sensor (TB), and (3) the sensors attached close to the incisors on the upper and lower gums (UI and LI). We check the sensors for any visible defects (e.g., broken wire) before using them.

The first group of sensors is prepared by dipping each of them in mask-making latex (RD 407 Mask Making Latex, Monster Makers). The TB sensor is prepared similarly but having an additional latex flap cover (see Section 5.1), which increases the surface of the sensor and may be beneficial for the adhesion duration (see Section 5.5). Finally, the UI and LI sensor are prepared using a Stomahesive wafer (ConvaTec PLC). A small rectangular piece of Stomahesive is cut measuring about 10 mm × 6 mm. The sensor is placed on top of this piece and a drop of latex is applied to it in order to make it adhere (Figure 4, left and right). The early preparation phase is necessary, as the latex takes several hours to completely dry. However, the sensors should not be prepared too early (e.g., a week in advance), as the latex becomes less flexible with time and more difficult to remove. In case of re-use, we disinfect sensors first using SPORECLEAR Medical Device Disinfectant (Hu-Friedy Mfg. Co., LLC) and then wipe them with an alcohol wipe before storing them.

Figure 4

Preparation of the incisor sensors with Stomahesive.

4.2. Preparation and attachment of reference sensors

After checking that participants are not pregnant, do not have a pacemaker, and do not have a latex allergy, our data collection procedure is as follows. All sensors are screwed into the miniature terminal blocks of the NDI Wave (or, in the case of the NDI Vox, plugged into the sensor harness assembly), wiped with an alcohol wipe, and placed on a sterilized tray a short time before the participant’s arrival. We perform a sensor validation check by verifying that each sensor that is screwed in also functions as it should. Once participants arrive, we first ask them to take a disposable toothbrush and scrub their tongue (especially along the midline). They do this in front of a mirror, so that they are aware of how far back they are reaching and do not trigger their gag reflex. By scrubbing their tongue, they remove the coating that covers the tongue (the amount of coating differs per participant¹⁰). We subsequently ask the participant to remove jewellery, glasses, and hearing aids, when applicable, as they make sensor placement more difficult and potentially could interfere with the signal (as the presence of metal inside the magnetic field has a negative effect on the precision of the recovered sensor positions). The glasses and hearing aids are returned to the participant once the sensor placement is complete if their use is necessary for successful participation in the experiment.

We additionally ask participants whether they are wearing dentures, as these may move slightly during speaking, which could result in some wire pull for sensors placed on the gingival tissue. Since dentures cannot be removed without impeding articulation, we note their presence but otherwise do not ask the participant to remove them. Additionally, if possible, participants should shave before the experiment and avoid wearing makeup as this makes sensor placement more difficult.

Subsequently, the participant is asked to sit down next to the EMA field generator (we were using the NDI Wave system, but have very recently moved to using the NDI Vox system). We first place four prepared reference sensors:¹¹

– mastoid right (MR)
– mastoid left (ML)
– nasion (N)
– (close to the) upper incisor (UI)

All sensors (reference and others) are first held in reverse action tweezers (Hobbycraft), as they make the application of sensors to the participant easier. The first three reference sensors are applied after the researcher has sterilized their hands using Sterilium® (Medline). Before placing any intraoral sensors, the researcher puts on (latex) dental gloves and a dental mask.¹²

The mastoid sensors (ML and MR) are placed behind the participant’s ears on the skin covering the mastoid part of the temporal bone, where there is minimal skin movement (Figure 5).

Figure 5

Mastoid sensor (placed below the glasses).

The nasion sensor (N; Figure 6) is placed on the part where there is least skin creasing. If the participant is wearing glasses, the sensor is placed right above or below their glasses, depending on how big the frame is. The first three sensors are secured with a drop of glue. We use PeriAcryl®90 HV adhesive (GluStitch Inc), which is kept in the fridge (at ~2°C) until the participant’s arrival. At that moment, two to three drops of adhesive are added to a small plastic mixing well (Maxill Inc.) after which the adhesive is returned to the fridge. A small disposable plastic pipette is used to transfer the adhesive from the mixing well to the sensor.

Figure 6

Nasion sensor.

The sensor wires are adhered to the participant using Leukopor or Leukosilk tape (BSN medical GmbH). A piece of tape is additionally placed over the ML and MR sensors to secure them (see tape in Figure 5). We add a piece of tape to the N sensor but place it slightly higher on the forehead (see tape in Figure 6), as it otherwise disturbs the participant’s visual field.

The final reference sensor (UI), on top of the piece of Stomahesive, is attached to the gingiva above the left upper incisor. No glue is added to the Stomahesive, as it adheres to tissue by itself. We avoid placing any incisor sensors to the midsagittal line, directly above the central incisors, due to the labial frenulum, which connects the upper lip to the gingival tissue and is quite sensitive. The UI sensor placement relative to the labial frenulum can be seen in Figure 7.

Figure 7

Upper incisor sensor placement.

After the reference sensors have been placed, the palate trace and biteplate recordings follow. These are crucial (particularly the biteplate recording) to ensure the subsequent quality of the collected data. For the palate trace, we adhere one spare sensor to the end of the participant’s dominant thumb using Leukopor tape (so that the sensor wires are leading down the thumb and pointing towards the wrist) and instruct them to trace the thumb from the back of the hard palate to their front teeth. The purpose of this procedure as well as the tracing method are explained by means of a mouth puppet (Super Duper® Publications; Figure 8), which, due to its cartoonish look, is also useful in decreasing participants’ potential anxiety. The palate trace is performed twice.

Figure 8

Mouth puppet with attached sensors is very useful in explaining EMA.

For the biteplate recording, we created a (reusable) fixed triangular protractor with three sensors glued to it (Figures 9 and 10). The same protractor is used for all participants; it is wiped with an alcohol wipe before every use and disinfected with SPORECLEAR Medical Device Disinfectant (Hu-Friedy Mfg. Co., LLC) after every use. The protractor is pushed as far back as comfortable into the corners of the participant’s mouth. The participant is then asked to hold the protractor firmly between their teeth and sit still for a few seconds while the biteplate recording is made. The protractor must be in contact with the molars in order to obtain a true occlusal reference. We check the biteplate recording directly by comparing the Euclidean distances between all the reference sensors and the three sensors on the biteplate, using MATLAB (MathWorks Inc.). If these distances remain relatively constant over time, this indicates that the position of the reference sensors and the biteplate sensors is correctly tracked.

Figure 9

Biteplate protractor with three attached sensors.

Figure 10

Biteplate protractor in use.

4.3. Attachment of movement sensors

After the palate trace and biteplate recordings, we proceed with attaching sensors to the articulators that we wish to capture. Most frequently, these sensors are the following (listed in the order of placement):

– tongue back (TB)
– tongue mid (TM)
– tongue tip (TT)
– lower incisor (LI)
– upper lip (UL)
– lower lip (LL)

To determine where to place the tongue back sensor, we use a colour transfer applicator stick (Dr. Thompson’s, GUNZdental). We ask the participant to drag the stick midsagitally across the midline of their hard palate (as they had done before with the palate trace sensor) and then pronounce the velar /k/, followed by directly sticking out their tongue.¹³ They are asked not to swallow while their tongue is being marked. The colour from the applicator is transferred from the palate to the part of the tongue where the back-most (velar) sound is made. We use the same stick to draw a coronal line through this spot. Additionally, we use measuring tape to measure 1 cm from the tongue tip (when the tongue is stretched) and drag a coronal line through that point as well. The coronal line enables us to always re-adhere the sensor to the approximately same position if it starts getting loose, as the point might become smudged through speaking and swallowing, but the line will remain clearly visible. Figure 11 below shows the coronal lines on the tongue left by the colour transfer applicator stick, with the median sulcus still clearly visible.

Figure 11

Indicatory markings for sensor placement.

The participant can now swallow as the coronal lines will remain clear, even when they come in contact with saliva. The participants are asked to stick out their tongue as far as comfortable. We place barber tape (Comair GmbH, folded three times to contain at least eight layers) on the back line marking on participant’s tongue, dab the tape on the tongue for about 5–10 seconds, and finally drag the tape across the tongue. This procedure dries the tongue dorsum and is crucial in ensuring that sensors do not fall off easily. We hold each sensor in the tweezers and add a drop of adhesive using a small plastic disposable pipette before placing the sensor on the tongue.

The TB sensor is placed on the crossing between the marked posterior line and the median sulcus, so that the wire of the sensor is pointing downward and towards the lip corner. A disposable wooden tongue depressor (Tegler) is used to press the sensor to the tongue for 10–20 seconds. The wire is then secured to the cheek using Leukopor tape. It is essential that the wires have enough slack, as large speech gestures may otherwise lead to wire tension, which is uncomfortable for the participant and may cause the sensor to come loose. The process is repeated for the TT sensor, which is placed on the crossing between the marked anterior line and the median sulcus. Note that the TT sensor is positioned in such a way that the wire is pointed towards the side of the tongue, as a wire running over the tongue tip feels uncomfortable for the participant and leads to lisping (Hoole & Nguyen, 1999).

The tongue mid sensor is placed halfway between the marked lines for the TT and TB sensors on the median sulcus by eyeballing. In line with previous methodological considerations (see Section 3.5.1), we generally do not use the TM sensor when testing clinical populations or children. If we are using lateral sensors, we place these to the right and left side of the TM sensor, 0.5–1 cm from the edge of the tongue (depending on how wide or narrow the participant’s tongue is). We only place more than three sensors if that is required for the purposes of the study. The final intraoral sensor (LI) tracks the jaw movement. This sensor, prepared with Stomahesive, is attached to the gingiva below the right lower incisor. No additional glue is needed, as Stomahesive adheres to tissue by itself.

Finally, two lip sensors (UL and LL) are attached at the vermillion border of the upper and lower lip using a drop of dental adhesive. Depending on the amount of facial hair surrounding the upper and lower lip, the removal of lip sensors can lead to mild discomfort.

5. Sensor adhesion experiment

5.1. Aim

The present experiment tested how different preparation methods for EMA sensors affect adherence to the tongue. As discussed in Section 3.5.2, several methods for sensor preparation exist. We specifically focus on the tongue sensors, as these usually are most likely to come off relatively quickly. The aim of this experiment was therefore to determine which type of sensor preparation (see below) is most beneficial for adhesion, also depending on the position on the tongue. In addition, we evaluated (qualitatively) whether the participant’s tongue anatomy influences adhesiveness.

We tested three types of sensor preparations: out-of-the-box (‘bare’) sensors, latex-coated sensors, and sensors with a latex flap. Out-of-the-box sensors (Figure 12, left) are the sensors as provided by NDI for the Wave device (approximate surface: 30 mm2), latex-coated sensors (Figure 12, center) are dipped in latex (with only a slightly larger surface than the out of the box sensors, but with rounder edges), and sensors with a latex flap (Figure 12, right) are covered in the same latex, but now a brush is used to apply the latex while the sensor head is lying on a flat surface (approximate surface: 70 mm2).

Figure 12

Sensor preparation types (from left to right: out of the box, latex-coated, latex flap).

5.2. Participants and experimental procedure

To test these three types of sensor preparations, we tested 10 female adult participants in three separate sessions. All 10 participants were between 20 and 30 years of age. The study was approved by the Faculty of Arts Research Ethics Review Committee of the University of Groningen (approval number 71276154).

For each of the three sessions, we used one type of sensor and followed the same application procedure for each type (as described in Section 4 above). The sessions took place on three different days, thus avoiding the risk of glue residue and tongue fatigue, both of which would have influenced the resulting adhesion times. During the first session, we adhered out-of-the-box sensors, during the second session the latex-coated sensors, and during the final session the sensors with the latex flap.

During every session, we placed five sensors on the tongue, as this is the maximum number of tongue sensors used by researchers (see Section 3.5.1 and the Appendix). The sensors in question were placed on the tongue tip (TT; 1 cm from the tip), tongue back (TB; place of /k/ constriction), tongue middle (TM; between TT and TB), and tongue lateral right and left (TLR and TLL, placed to the left and right of the TM sensor, respectively). While few studies investigate lateral sounds (see above), we wished to assess whether different types of sensor preparations are also suitable for studying lateral movement of the tongue, as those parts move differently, and the sensors are more prone to interference from the participants’ molars. Figure 13 below displays sensor placement examples for latex-coated sensors.

Figure 13

Sensor placement during adhesion experiment.

The sensor placement process took approximately ten minutes. After we placed the five sensors on the tongue, we started displaying the stimuli to the participants using Microsoft PowerPoint on a computer monitor in front of them. The articulograph was not turned on for this experiment, as we were not collecting kinematic data and merely wished to determine how long it took for each sensor to fall off.

5.3. Stimuli

The experimental procedure consisted of the following tasks and stimuli. First, the participants read the short text Please call Stella from the Speech Accent Archive (Weinberger, 2015). This allowed them to get used to speaking with sensors in their mouth (i.e., the sensor habituation stage) and took approximately one minute. We did not include a longer sensor habituation stage as our goal was not to record the participants’ natural speech. Following the text was a wordlist. It contained 300 words of varying lengths and from various thematic fields (e.g., vegetables, fruit, school, vocations). Each word appeared on the screen for four seconds, during which the participant read it out loud. This procedure lasted for 20 minutes. Finally, at the end of the first wordlist, the participants performed a rapid syllable repetition task, namely the diadochokinesis (DDK) task, at a comfortable but fast speaking pace as defined by the participants themselves. The DDK task involved the repetition of syllables /pa/, /ta/, /ka/, and /pataka/, and was included because fast repetitive movements may potentially cause the sensors to fall off faster. The experiment was considered complete once all the sensors had been detached or the three tasks were repeated twice, which took about 45 minutes. When a sensor fell off (the participants were instructed to inform us when they felt a sensor get loose), we removed it and noted the time it fell off. We did not re-attach any sensors.

The experimental procedure, including participant preparation (five minutes) and sensor placement (ten minutes), took 60 minutes at most. At that point, we stopped the experiment and removed the remaining sensors. The maximum time a sensor was adhered to a person was therefore 45 minutes. The experimental procedure is schematically presented in Figure 14.

Figure 14

Experimental procedure.

5.4. Anatomical measurements of the tongue

For all participants, we measured the relative tongue length, tongue width, and maximal mouth opening. All three measurements were taken with the participant’s tongue comfortably extended using a ruler. First, we measured the relative tongue length, defining it as the distance between the anatomical tongue tip and the place we had marked as the place of /k/ constriction. Second, we measured the tongue width, defined as the widest part of the tongue, parallel to the molars. Finally, we asked the participants to open their mouth as wide as they comfortably could and measured the vertical distance between the surface of the tongue and the edge of their upper central incisors. We defined this as ‘mouth opening,’ which in effect represents the maximum intraoral space that the researcher can work with during the sensor placement procedure.

Due to the lack of suitable equipment, we were not able to measure the participants’ salivary flow rate or take any other anatomical measurements.

5.5. Statistical analysis and results

To assess the potential effect of sensor preparation method and sensor position on sensor adhesiveness, we used linear mixed effects regression modelling with participant as a random-effect factor and the optimal random-effects structure (i.e., assessing the inclusion of random intercepts and slopes) determined via model comparison. Specifically, we evaluated whether sensor preparation type (OUT-OF-THE-BOX, LATEX-COATED, FLAP) and sensor position (TT, TM, TB, TLL, TLR) affected sensor adhesiveness. As our initial analysis appeared to show a clear distinction between the TB sensor (adhering for a much shorter duration than the other sensors, whose adhesion did not differ significantly from each other; see Figure 15), we created a new fixed effect predictor distinguishing the TB sensor from the other sensors.

Figure 15

Effect of sensor position on adhesiveness.

The best model for our data, determined via model comparison, only warranted the inclusion of the distinction between the TB sensor and the other sensors, in addition to the by-subject random intercept and a by-subject random slope for the contrast between the TB and the other sensors. Specifically, this model showed that the TB sensor adhered approximately 14 minutes less than the other sensors (β = –14.0, t = –5.0, p < 0.001). Sensor preparation type (see Figure 16) did not reach significance in the best model, nor did any of the other anatomical predictors. Of course, this may be partly due to our limited sample size (N = 10).

Figure 16

Effect of sensor preparation type on adhesiveness.

When explicitly focusing on the interaction between sensor preparation type and the sensor (TB versus the other sensors), the flap sensor appeared to be detrimental for the adhesion time for non-TB sensors, reducing the estimated adhesion time with about five minutes compared to the bare sensor and about three minutes compared to the sensor coated in latex. However, for the TB sensors the opposite pattern was found. The adhesion time of the sensor with the flap was estimated to be about five minutes higher compared to the sensor coated in latex and more than nine minutes higher compared to bare sensor. Figure 17 shows this interaction; Table 3 shows speaker-specific differences in sensor adhesion times.

Figure 17

Visualization of interaction between sensor position (TB is tongue body and Non-TB are all other sensors) and sensor preparation type (out-of-the-box, latex, and flap).

Table 3

Speaker-specific differences in sensor adhesiveness (distinction between non-TB and TB sensors; adhesion time is reported in minutes). For the Non-TB sensors the values are averaged and the SD is shown between parentheses.

Participant	Bare		Latex		Latex flap

	Non-TB	TB	Non-TB	TB	Non-TB	TB

P01	24 (13)	18	29 (11)	23	39 (12)	3
P02	14 (21)	1	13 (11)	16	24 (8)	45
P03	45 (0)	3	45 (0)	6	45 (0)	37
P04	42 (3)	33	30 (16)	25	19 (3)	3
P05	40 (11)	40	44 (3)	19	43 (5)	4
P06	29 (11)	1	12 (14)	1	10 (11)	1
P07	27 (19)	0	19 (15)	1	12 (11)	2
P08	45 (0)	32	45 (0)	11	28 (20)	45
P09	42 (6)	4	45 (0)	41	41 (9)	42
P10	38 (10)	1	45 (0)	45	38 (16)	45

5.6. Discussion of experimental investigation

In general, our sensor adhesion experiment demonstrated no clear general advantage of any particular sensor preparation type. With five sensors on the participant’s tongue, it was difficult to make all of them adhere for the duration of 45 minutes (the only exception being two participants). The adhesiveness of the TB sensor, which was significantly lower than that of other sensors, did improve when the sensor was prepared with a latex flap. When attaching intraoral sensors, it is crucial to preserve a sterile environment. As sensors coated in latex (both with and without a latex flap) are more hygienic, easier to clean, and likely deteriorate slower, we recommend coating the sensor in latex when possible. Based on our results we further recommend adding a latex flap for the tongue back sensor.

Additionally, we would like to mention some qualitative observations. Placing a total of five sensors on the tongue is not ideal, particularly when keeping in mind that in a regular experiment, two additional intraoral sensors would need to be included as well. This difficulty was especially pronounced with latex flap sensors, as the required tongue surface to attach the sensors to was largest. In the case of using sensors with a latex flap, participants appeared to take longer to get habituated. However, their articulation seemed to return to normal within the first ten minutes of the experiment (although note that we did not quantify this) and should therefore not be problematic for a regular experimental setup. A practical advantage of the sensors with a latex flap was that once part of the flap detaches from the tongue, this is quickly noticed by the participant and can be easily resolved by adding some glue underneath the flap.

There were several limitations to this study. First, we adhered five sensors to the tongue, which is a larger number than usual. While this was done on purpose, as we wished to assess not only adhesion of sensors placed midsagittally but also those placed laterally, it also might not reflect adequately how long the sensors would adhere in a normal experimental scenario with only two or three tongue sensors. Second, we did not readhere the sensors once they fell off. In a real experimental scenario, one would reglue the sensor to the same position where it fell off. In our experience, it is easiest to reglue a sensor with a flap as the adhesive surface is largest.

Another experiment would need to be conducted to assess how the different sensor types compare when focusing on ease of reattachment. Finally, sensor placement and its effectiveness are strongly impacted by individual factors. While we included certain tongue anatomical measures (none of which turned out to be significant predictors in our best model), others that were not measured and differ between participants—such as salivary flow rate (Whelton, 2012) and tongue surface (Kullaa-Mikkonen et al., 1982)—likely play an important role as well.¹⁴

6. Conclusion

The present paper provided an introduction to electromagnetic articulography and an overview of data collection procedures on the basis of reviewing 905 publications employing electromagnetic (midsagittal) articulography since 1987. In addition, we provided a detailed description of the procedure used in our own lab.

EMA data collection and analysis are time-consuming and technically demanding. Consequently, it is difficult to include a large numbers of participants. Compare, for example, the five participants that seem to be the norm in EMA research (see Section 2.5) with the 50 participants that would be needed for a study with 80% power and aimed at identifying effect sizes as low as Cohen’s d = 0.4 (Brysbaert, 2019). If testing 50 or more participants is not really feasible, then individuals who participate in EMA studies should be carefully selected and the testing procedure should facilitate between- and within-speaker comparability. Reliable, accurate, and replicable sensor placement should therefore be ensured.

As we demonstrated in our review, however, there is currently still a great variety of approaches used for EMA sensor preparation and placement. For example, while nearly all studies use a tongue tip sensor, frequently placing it ‘1 cm’ behind the anatomical tongue tip, researchers often do not specify how this distance from the tongue tip was measured (e.g., using ruler as opposed to eyeballing) nor the position that the tongue was in (e.g., at rest inside the mouth, comfortably protruded, completely stretched). This can make a substantial difference, however. Based on our experience, a point that is 1 cm from the tip with the tongue at rest can be nearly 1.5 cm from the tip when the tongue is protruded. Another example of varying sensor placement strategies pertains to the ‘tongue back’ sensor, which is often placed an arbitrary number of centimetres from the tongue tip or as far back as comfortable and/or possible. Participants, however, are not comparable, as tongue sizes, oral cavities, and comfort levels can differ greatly. One strategy for solving this (also used within our lab) is to place the tongue back sensor where the /k/ (or another sound involving a posterior constriction) is made. In this way, the placement of the sensor makes sense from an articulatory perspective, which is missing from the other (more arbitrary) approaches.¹⁵ Other conundrums with intraoral sensor placements, unfortunately, are not as easily solvable. These, for example, include situations in which a speaker does not have enough gingival tissue for the placement of an incisor sensor, when the tongue of the speaker is too small to place the desired number of sensors, or when a speaker produces too much saliva, causing the sensor to fall off repeatedly.

As point tracking technology continues to improve, it is necessary to strive for better and more consistent methods of sensor adhesion, preparation, and placement. Not to limit the creativity of researchers, but rather to ensure more comparable results in a field usually only focusing on small sample sizes. It is our hope that this paper may serve as a starting point for further debate on the topic.

Additional File

The following additional file for this article can be found as follows.

Appendix

An .xlsx file, which includes all EMA studies that were collected as part of our literature review. The appendix contains information on the topic, studied population, and sensors in use. It also includes specific information on sensor placement strategies for tongue sensors. DOI: https://doi.org/10.5334/labphon.237.s1

Notes

Electromagnetic Articulography (EMA) used to be known as Electromagnetic Midsagittal Articulography (EMMA). While the ‘midsagittal’ part is not applicable anymore as the sensors are tracked in 3D, both spellings remain in use in the literature. Other alternative names include ‘(electromagnetic) articulometry’ and ‘electromagnetometry.’ The device can be called an EMA, an articulograph, an articulometer, or (especially in the early years) a magnetometer. [^{^}]
The predecessor to the articulographs was the x-ray microbeam, which tracked six pellets on the tongue and teeth (Kiritani, Itoh, & Fujimura, 1975). [^{^}]
Please note that the term ‘moderate-strength’ is used here as the field is strong enough to cause interference with various devices (to the extent of corrupting the data, not harming the participant), but not nearly as strong as, for example, the field in an MRI chamber. [^{^}]
Our literature review underwent three separate stages, going from 247 publications (first draft) to 626 publications (second draft) and finally to 905 publications (final publication). For the first draft of this paper, we collected publications from five international peer-reviewed journals (namely the Journal of Laboratory Phonology; The Journal of the Acoustical Society of America; the Journal of Phonetics; the Journal of Speech, Language, and Hearing Research; and Clinical Linguistics and Phonetics) as well as conference abstracts from the International Congress of the Phonetic Sciences, which led us to identify 247 publications. On the basis of reviewers’ comments, we decided to perform a more extensive literature review for the second draft of the paper. We used the search terms ‘electromagnetic articulography’ and ‘electromagnetic midsagittal articulography’ on Google Scholar, which led us to identify 626 publications. In the second round of revisions, however, a reviewer (justly) pointed out that ‘articulometry’ is a frequent term that should be included. We therefore finally used the search terms described in Section 3 of this paper (namely, ‘articulography,’ ‘articulograph,’ ‘articulometry,’ and ‘articulometer’), excluding the search terms we had looked for previously for the second draft. We did not discard any publications at any stage of the process. [^{^}]
As electromagnetic articulography was pioneered in Germany, many early papers are written in German. [^{^}]
Researchers refer to both ‘biteplate’ and ‘biteplane’ recordings. [^{^}]
The company Ellman International, Inc., seems to have been acquired by Cynosure, Inc., in 2004 (Cynosure, Inc., 2014, para. 1) and some products were discontinued. [^{^}]
Following Seikel, Drumright, and Huddock (2020, Ch. 6), the tongue consists of the tongue tip or apex (i.e., the anterior-most portion of the tongue), the tongue body (i.e., the portion of the tongue that is found within the oral cavity and makes up about two thirds of the tongue surface), and the tongue root or base (i.e., the part of the tongue that resides in the oropharynx). The superior surface of the tongue is dorsal (also called the tongue dorsum), and the undersurface is ventral. The median sulcus divides the tongue into left and right sides. [^{^}]
Unfortunately, Patem et al. (2018) do not specify how their manual annotators defined ‘tongue base,’ but it is presumed that it refers to the point where the tongue meets the floor of the mouth. [^{^}]
Coffee, especially, leaves a brown coating on the tongue, which is not optimal for sensor placement. [^{^}]
In principle, three (or even two) reference sensors are enough to correct head movement. However, we (as many other researchers) use one additional sensor as a backup in case one of the reference sensors malfunctions. We do not use the NDI 6DOF sensor (containing two sensors with a specific distance and orientation towards each other) which may be used to automatically correct for head movement, but use separate reference sensors instead, as it is beneficial to maximize the difference between the reference sensors to minimize the influence of noise from the reference sensors on the rotation. [^{^}]
We use the dental mask for adults but often avoid it for children, as they do not yet have such a strong ‘germ reflex’ and we noticed it makes them feel uncomfortable. [^{^}]
This procedure is similar to the procedure used by Brunner, Hoole, and Perrier (2011b). However, we use the colour transfer applicator to mark the spot where the participant produces their /k/. Brunner et al. (2011b) used an oral disinfectant with a strong purple colouring agent and asked the participant to close their mouth and push their tongue (neutral position) against the hard palate. The colour mark was thus transferred to the tongue dorsum. [^{^}]
During our sensor placement, we did notice that intraoral sensors were more difficult to adhere to those participants who produced more saliva. However, as we could not objectively measure salivary flow rates, we cannot accurately report on the relationship between saliva production and sensor adhesiveness. [^{^}]
One centimetre behind the tongue tip is also somewhat arbitrary, however it seems to be a good compromise between seeing (and measuring) tongue tip movements and not overly impeding participants’ speech. [^{^}]

Acknowledgements

We would like to thank the editors and two anonymous reviewers for comments that helped us improve the paper substantially. Importantly, we would like to acknowledge that many parts of the EMA approach used in our lab are based on borrowing successful approaches from other labs. We would particularly like to thank Mark Tiede for demonstrating his procedure at Haskins Laboratories and for commenting on an earlier version of this paper. Furthermore, we would like to thank all other researchers with whom we have discussed issues and exchanged experiences regarding EMA studies, including June Sun, Fabian Tomaschek, Marianne Pouplier, Michael Proctor, and Stefanie Keulen.

We would further like to acknowledge funding from the Dutch Research Organisation (NWO) to Martijn Wieling (grants no. 019.2011.3.110.016, 016.144.049 and PGW.19.034), and the International Macquarie University Research Excellence Scholarship (iMQRES) grant awarded to Jidde Jacobi.

Competing Interests

The authors have no competing interests to declare.

References

Alvarez, G., Dias, F. J., Lezcano, M. F., Arias, A., & Fuentes, R. (2019). A Novel Three-Dimensional Analysis of Tongue Movement During Water and Saliva Deglutition: A Preliminary Study on Swallowing Patterns. Dysphagia, 34, 397–406. DOI: http://doi.org/10.1007/s00455-018-9953-0

Aron, M., Berger, M.-O., Kerrien, E., Wrobel-Dautcourt, B., Potard, B., and Laprie, Y. (2016). Multimodal acquisition of articulatory data: Geometrical and temporal registration. JASA, 139(2), 636–648. DOI: http://doi.org/10.1121/1.4940666

Badin, P., Tarabalka, Y., Elisei, F., & Bailly, G. (2010). Can you ‘read’ tongue movements? Evaluation of the contribution of tongue display to speech understanding. Speech Communication, 52, 493–503. DOI: http://doi.org/10.1016/j.specom.2010.03.002

Bakst, S., & Johnson, K. (2018). Modeling the effect of palate shape on the articulatory-acoustics mapping. JASA Express Letters, 144(1), EL71–EL75. DOI: http://doi.org/10.1121/1.5048043

Ball, M. J., Gracco, V., & Stone, M. (2001). A Comparison of Imaging Techniques for the Investigation of Normal and Disordered Speech Production. Advances in Speech Language Pathology, 3(1), 13–24. DOI: http://doi.org/10.3109/14417040109003705

Bartle-Meyer, C. J., Goozée, J. V., & Murdoch, B. E. (2009). Kinematic investigation of lingual movement in words of increasing length in acquired apraxia of speech. Clinical Linguistics and Phonetics, 23(2), 93–121. DOI: http://doi.org/10.1080/02699200802564284

Benus, S., & Gafos, A. I. (2007). Articulatory characteristics of Hungarian ‘transparent’ vowels. Journal of Phonetics, 35, 271–300. DOI: http://doi.org/10.1016/j.wocn.2006.11.002

Berry, J. J. (2011). Accuracy of the NDI Wave Speech Research System. JSLHR, 54, 1295–1301. DOI: http://doi.org/10.1044/1092-4388(2011/10-0226)

Berry, J., Kolb, A., Schroeder, J., & Johnson, M. T. (2017). Jaw Rotation in Dysarthria Measured with a Single Electromagnetic Articulography Sensor. American Journal of Speech-Language Pathology, 26(2S), 596–610. DOI: http://doi.org/10.1044/2017_AJSLP-16-0104

Bocquelet, F., Hueber, T., Girin, L., Savariaux, C., & Yvert, B. (2016). Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces. PLoS Comput Biol, 12(11), e1005119. DOI: http://doi.org/10.1371/journal.pcbi.1005119

Branderud, P. (1985). Movetrack – a movement tracking system. Proceedings of the French-Swedish Symposium on Speech, Grenoble, France, pp. 113–122.

Brunner, J., Fuchs, S., & Perrier, P. (2009). On the relationship between palate shape and articulatory behavior. JASA, 125, 3936–3949. DOI: http://doi.org/10.1121/1.3125313

Brunner, J., Fuchs, S., & Perrier, P. (2011a). Supralaryngeal control in Korean velar stops. Journal of Phonetics, 39, 178–195. DOI: http://doi.org/10.1016/j.wocn.2011.01.003

Brunner, J., Hoole, P., & Perrier, P. (2011b). Adaptation strategies in perturbed /s/. Clinical Linguistics and Phonetics, 25(8), 705–724. DOI: http://doi.org/10.3109/02699206.2011.553699

Brysbaert, M. (2019). How Many Participants Do We Have to Include in Properly Powered Experiments? A Tutorial of Power Analysis with Reference Tables. Journal of Cognition, 2(1), art. 16. DOI: http://doi.org/10.5334/joc.72

Bückins, A., Greisbach, R., & Hermes, A. (2018). Larynx movement in the production of Georgian ejective sounds. In Challenges in Analysis and Processing of Spontaneous Speech, 127–138. DOI: http://doi.org/10.18135/CAPSS.127

Bukmaier, V., & Harrington, J. (2016). The articulatory and acoustic characteristics of Polish sibilants and their consequences for diachronic change. Journal of the International Phonetic Association, 46(3), 311–329. DOI: http://doi.org/10.1017/S0025100316000062

Cai, Z., Qin, X., Cai, D., Li, M., Liu, X., & Zhong, H. (2018). The DKU-JNU-EMA electromagnetic articulography database on Mandarin and Chinese dialects with tandem feature based acoustic-to-articulatory inversion. ISCSLP 2018 – Proceedings, 235–239. DOI: http://doi.org/10.1109/ISCSLP.2018.8706629

Canevari, C., Badino, L., & Fadiga, L. (2015). A new Italian dataset of parallel acoustic and articulatory data. INTERSPEECH 2015, 2152–2156.

Carignan, C., Shosted, R., Shih, C., & Rong, P. (2011). Articulatory compensation for nasality: An EMA study of lingual position during nasalized vowels. Journal of Phonetics, 39(4), 668–682. DOI: http://doi.org/10.1016/j.wocn.2011.07.005

Carstens Medizinelektronik GmbH. (2006). AG500 Manual. Retrieved from http://www.ag500.de/

Carstens Medizinelektronik GmbH. (2014). AG501 Manual. Retrieved from https://www.ag500.de/

Cheng, H. Y., Murdoch, B. E., Goozée, J. V., & Scott, D. (2007). Physiologic development of tongue-jaw coordination from childhood to adulthood. JSLHR, 50(2), 352–60. DOI: http://doi.org/10.1044/1092-4388(2007/025)

Cler, G. J., Lee, J. C., Mittelman, T., Stepp, C. E., & Bohland, J. W. (2017). Kinematic analysis of speech sound sequencing errors induced by delayed auditory feedback. JSLHR, 60(6, special issue), 1695–1711. DOI: http://doi.org/10.1044/2017_JSLHR-S-16-0234

Crose, B., Kuk, F., & Bindeballe, H. (2011). Digital Wireless Hearing Aids, Part 4: Interference. Hearing Review, 18(13), 30–39. Retrieved from www.hearingreview.com

Cynosure, Inc. (2014, September 8). Cynosure Acquires Assets of RS Medical Device Manufacturer Ellman International, Inc [Press release]. Retrieved from https://prnewswire.com

Demange, S., & Ouni, S. (2011). Continuous Episodic Memory Based Speech Recognition Using Articulatory Dynamics. Proceedings of INTERSPEECH 2011, 2305–2308.

Didirková, I., & Hirsch, F. (2019). A two-case study of coarticulation in stuttered speech. An articulatory approach. Clinical Linguistics & Phonetics. DOI: http://doi.org/10.1080/02699206.2019.1660913

Dromey, C., Hunter, E., & Nissen, S. L. (2018). Speech adaptation to kinematic recording sensors: Perceptual and acoustic findings. JSLHR, 61(3), 593–603. DOI: http://doi.org/10.1044/2017_JSLHR-S-17-0169

Earnest, M. M., & Max, L. (2003). En Route to the Three-Dimensional Registration and Analysis of Speech Movements: Instrumental Techniques for the Study of Articulatory Kinematics. Contemporary Issues in Communication Science and Disorders, 30, 5–25. DOI: http://doi.org/10.1044/cicsd_30_S_5

Electromagnetic Articulograph. (2019). Highest-precision Electromagnetic Articulography (EMA): 3D recording of articulatory orofacial movements. Retrieved from www.articulograph.de

Engelke, W., Engelke, D., & Schwetska, R. (1990). Clinical and instrumental examination of tongue motor function [German]. Deutsche Zahnarztliche Zeitschrift, 45(7), S11–6.

Engelke, W., Hoch, G., Bruns, T., & Striebeck, M. (1996). Simultaneous Evaluation of Articulatory Velopharyngeal Function under Different Dynamic Conditions with EMA and Videoendoscopy. Folia Phoniatrica et Logopaedica, 48(2), 65–77. DOI: http://doi.org/10.1159/000266387

Engelke, W., Schönle, P. W., Kring, R. A., & Richter, C. (1989). Electromagnetic articulography (EMA) studies on orofacial movement functions [in German]. Deutsche Zahnarztliche Zeitschrift, 44(8), 618–622.

Flink, H., Bergdahl, M., Tegelberg, A., Rosenblad, A., & Lagerlöf, F. (2008). Prevalence of hyposalivation in relation to general health, body mass index and remaining teeth in different age groups of adults. Community Dentistry and Oral Epidemiology, 36(6), 523–531. DOI: http://doi.org/10.1111/j.1600-0528.2008.00432.x

Friedman, J. H., Brown, R. G., Comella, C., Garber, C. E., Krupp, L. B., Lou, J.-S., Marsh, L., Nail, L., Shulman, L., & Taylor, C. B. (2007). Fatigue in Parkinson’s disease: A review. Movement disorders, 22(3), 297–308. DOI: http://doi.org/10.1002/mds.21240

Fuchs, S. (2005). Articulatory correlates of the voicing contrast in alveolar obstruent production in German (Doctoral thesis, Centre for General Linguistics, Berlin, Germany). Deutsche National Bibliothek. https://d-nb.info/105944173X/34. DOI: http://doi.org/10.21248/zaspil.41.2005.268

Fuentes, R., Dias, F., Alvarez, G., Lezcano, M. F., Farfan, C., Astete, N., & Arias, A. (2018). Application of 3D Electromagnetic Articulography in Dentistry: Mastication and Deglutition Analysis. Protocol Report. International Journal of Odontostomatology, 12(1), 105–112. DOI: http://doi.org/10.4067/S0718-381X2018000100105

Gafos, A., Kirov, C., & Shaw, J. (2010). Guidelines for using mview. Retrieved from: http://www.haskins.yale.edu/staff/gafos_downloads/ArtA3DEMA.pdf

Geng, C., Turk, A., Scobbie, J. M., …, & Wiegand, R. (2013). Recording speech articulation in dialogue: Evaluating a synchronized double electromagnetic articulography setup. Journal of Phonetics, 41(6), 421–431. DOI: http://doi.org/10.1016/j.wocn.2013.07.002

Gibbon, F. (2008). Instrumental analysis of articulation in speech impairment. In M. J. Ball, M. R. Perkins, N. Müller, & S. Howard (Eds.). Handbook of Clinical Phonetics and Linguistics (pp. 311–331). DOI: http://doi.org/10.1002/9781444301007

Gilbert, G., Olsen, K. N., Leung, Y., & Stevens, C. J. (2015). Transforming an embodied conversational agent into an efficient talking head: From keyframe-based animation to multimodal concatenation synthesis. Computational Cognitive Science, 1(1), 1–12. DOI: http://doi.org/10.1186/s40469-015-0007-8

Girin, L., Hueber, T., & Alameda-Pineda, X. (2017). Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 662–673. DOI: http://doi.org/10.1109/TASLP.2017.2651398

Goozée, J. V., Murdoch, B. E., Theodoros, D. G., & Stokes, P. D. (2000). Kinematic analysis of tongue movements in dysarthria following traumatic brain injury using electromagnetic articulography. Brain Injury, 14(2), 153–174. DOI: http://doi.org/10.1080/026990500120817

Goozée, J., Murdoch, B., Ozanne, A., Cheng, Y., Hill, A., & Gibbon, F. (2007). Lingual kinematics and coordination in speech-disordered children exhibiting differentiated versus undifferentiated lingual gestures. International Journal of Language and Communication Disorders, 42(6), 703–724. DOI: http://doi.org/10.1080/13682820601104960

Harper, S., Lee, S., Goldstein, L., & Byrd, D. (2018). Simultaneous electromagnetic articulography and electroglottography data acquisition of natural speech. JASA, 144(5), e380–e385. DOI: http://doi.org/10.1121/1.5066349

Hartinger, M., & Mooshammer, C. (2008). Articulatory variability in cluttering. Folia Phoniatrica et Logopaedica, 60, 64–72. DOI: http://doi.org/10.1159/000114647

Hasegawa-Johnson, M. (1998). Electromagnetic exposure safety of the Carstens Articulograph AG100. JASA, 104, 2529–2532. DOI: http://doi.org/10.1121/1.423775

Henriques, R. N., & van Lieshout, P. (2013). A Comparison of Methods for Decoupling Tongue and Lower Lip from Jaw Movements in 3D Articulography. JSLHR, 56(5), 1503–1516. DOI: http://doi.org/10.1044/1092-4388(2013/12-0016)

Hermes, A., Mücke, D., Thies, T., & Barbe, M. T. (2019). Coordination patterns in Essential Tremor patients with Deep Brain Stimulation: Syllables with low and high complexity. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 10(1). DOI: http://doi.org/10.5334/labphon.141

Hiiemae, K. M., & Palmer, J. B. (2003). Tongue movements in feeding and speech. Critical Reviews in Oral Biology & Medicine, 14(6), 413–429. DOI: http://doi.org/10.1177/154411130301400604

Hirai, T., Tanaka, O., Koshino, H., Takasaki, H., Hashikawa, Y., Yajima, T., & Matai, N. (1989). Aging and tongue skill. Ultrasound (motion-mode) evaluation. [in Japanese] Nihon Hotetsu Shika Gakkai Zasshi, 33, 457–465. DOI: http://doi.org/10.2186/jjps.33.457

Hoenig, J. F., & Schoener, W. F. (1992). Radiological survey of the cervical spine in cleft lip and palate. Dentomaxillofacial Radiology, 21(1), 36–39. DOI: http://doi.org/10.1259/dmfr.21.1.1397450

Höhne, J., Schönle, P., Conrad, B., Veldschoten, H., Wenig, P., Faghouri, H., Sandner, N., & Hong, G. (1987). Direct measurement of vocal tract shape – articulography. In Proceedings of the European Conference on Speech Technology, 2230–2232.

Hoke, P., Tiede, M., Grender, J., Klukowska, M., Peters, J., & Carr, G. (2019). Using Electromagnetic Articulography to Measure Denture Micromovement during Chewing with and without Denture Adhesive. Journal of Prosthodontics, 28(1), e252–e258. DOI: http://doi.org/10.1111/jopr.12679

Hoole, P. (2012). Phil Hoole’s matlab software for EMA processing. Available from: https://www.phonetik.uni-muenchen.de/~hoole/articmanual/index.html

Hoole, P., & Gfoerer, S. (1990). Electromagnetic articulography as a tool in the study of lingual coarticulation. JASA, S123. DOI: http://doi.org/10.1121/1.2027902

Hoole, P., Mooshammer, C., & Tillmann, H. G. (1994). Kinematic analysis of vowel production in German. In Proceedings of ICSLP94.

Hoole, P., & Nguyen, N. (1999). 12 - Electromagnetic Articulography. In W. J. Harcastle (Ed.), Coarticulation: Theory, Data and Techniques. Cambridge, UK: Cambridge University Press, pp. 260–269. DOI: http://doi.org/10.1017/CBO9780511486395.013

Hoole, P., & Zierdt, A. (2010). Five-dimensional articulography. In B. Maassen & Pascal H. H. M. van Lieshout (Eds.), Speech Motor Control: New Developments in Basic and Applied Research (pp. 331–349). Oxford, UK: Oxford University Press. DOI: http://doi.org/10.1093/acprof:oso/9780199235797.003.0020

Hopkin, G. B. (1967). Neonatal and Adult Tongue Dimensions. The Angle Orthodontist, 37(2), 132–133.

Horn, H., Kühnast, K., Axmann-Krcmar, D., & Göz, G. (2004). Influence of Orofacial Dysfunctions on Spatial and Temporal Dimensions of Swallowing Movements. Journal of Orofacial Orthopedics, 65(5), 376–388. DOI: http://doi.org/10.1007/s00056-004-0315-1

Howson, P., & Kochetov, A. (2015). An EMA examination of liquids in Czech. In Proceedings of ICPhS 2015.

Howson, P., Kochetov, A., & van Lieshout, P. (2015). Examination of the grooving patterns of the Czech trill-fricative. Journal of Phonetics, 49, 117–129. DOI: http://doi.org/10.1016/j.wocn.2015.01.002

Inoue, H., Ono, K., Masuda, W., Morimoto, Y., Tanaka, T., Yokota, M., & Inenaga, K. (2006). Gender difference in unstimulated whole saliva flow rate and salivary gland sizes. Archives of Oral Biology, 51, 1055–1060. DOI: http://doi.org/10.1016/j.archoralbio.2006.06.010

Jaeger, M., & Hoole, P. (2011). Articulatory factors influencing regressive place assimilation across word boundaries in German. Journal of Phonetics, 39, 413–428. DOI: http://doi.org/10.1016/j.wocn.2011.03.002

Ji, A., Berry, J. J., & Johnson, M. T. (2013). Vowel production in Mandarin accented English and American English: Kinematic and acoustic data from the Marquette University Mandarin accented English corpus. In Proceedings of Meetings on Acoustics, 19(2013). DOI: http://doi.org/10.1121/1.4800290

Ji, A., Berry, J. J., & Johnson, M. T. (2014). The electromagnetic articulography Mandarin accented English (EMA-MAE) corpus of acoustic and 3D articulatory kinematic data. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings, 7719–7723. DOI: http://doi.org/10.1109/ICASSP.2014.6855102

Joglar, J. A., Nguyen, C., Garst, D. M., & Katz, W. F. (2009). Safety of Electromagnetic Articulography in Patients with Pacemakers and Implantable Cardioverter-Defibrillators. JSLHR, 52(4), 1082–1087. DOI: http://doi.org/10.1044/1092-4388(2009/08-0028)

Katz, W. F., & Bharadwaj, S. (2001). Coarticulation in fricative-vowel syllables produced by children and adults: A preliminary report. Clinical Linguistics and Phonetics, 15(1–2), 139–143. DOI: http://doi.org/10.3109/02699200109167646

Katz, W. F., Bharadwaj, S. V., Gabbert, G. J., Loizou, P. C., Tobey, E. A., & Poroy, O. (2003). EMA compatibility of the Clarion 1.2 cochlear implant system. Acoustic Research Letters Online, 4, 100–105. DOI: http://doi.org/10.1121/1.1591712

Katz, W. F., Carter, G. C., & Levitt, J. S. (2007). Treating buccofacial apraxia using augmented kinematic feedback. Aphasiology, 21(12), 1230–1247. DOI: http://doi.org/10.1080/02687030600591161

Katz, W. F., Mehta, S., & Wood, M. (2017). Using electromagnetic articulography with a tongue lateral sensor to discriminate manner of articulation. JASA, 141(1), EL57–EL63. DOI: http://doi.org/10.1121/1.4973907

Katz, W. F., Mehta, S., & Wood, M. (2018). Effects of syllable position and vowel context on Japanese /r/: Kinematic and perceptual data. Acoust. Sci. & Tech., 39(2), 130–137. DOI: http://doi.org/10.1250/ast.39.130

Kearney, E., Haworth, B., Scholl, J., Faloutsos, P., Baljko, M., & Yunusova, Y. (2018). Treating speech movement hypokinesia in Parkinson’s disease: Does movement size matter? JSLHR, 61(11), 2703–2721. DOI: http://doi.org/10.1044/2018_JSLHR-S-17-0439

Kim, J., Lammert, A. C., Ghosh, P. K., & Narayanan, S. S. (2014). Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging. JASA, 135(2), e115–e121. DOI: http://doi.org/10.1121/1.4862880

King, S. A., & Parent, R. E. (2001). A 3D parametric tongue model for animated speech. J. Visual. Comput. Animat., 12, 112–115. DOI: http://doi.org/10.1002/vis.249

Kiritani, S., Itoh, K., & Fujimura, O. (1975). Tongue-pellet tracking by a computer-controlled x-ray microbeam system. JASA, 57(6), 1516–1520. DOI: http://doi.org/10.1121/1.380593

Kochetov, A. (2020). Research methods in articulatory phonetics I: Introduction and studying oral gestures. Language and Linguistics Compass, 2020, e12368. DOI: http://doi.org/10.1111/lnc3.12368

Kolb, A. (2015). Software Tools and Analysis Methods for the Use of Electromagnetic Articulography Data in Speech Research (Master thesis, Marquette University, Milwaukee, Wisconsin). Marquette University e-publications. https://epublications.marquette.edu/theses_open/291/

Krivokapić, J., Tiede, M. K., & Tyrone, M. E. (2017). A Kinematic Study of Prosodic Structure in Articulatory and Manual Gestures: Results from a Novel Method of Data Collection. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 8(1), 1–26. DOI: http://doi.org/10.5334/labphon.75

Kröger, B. J., Pouplier, M., & Tiede, M. K. (2000). An evaluation of the Aurora system as a flesh-point tracking tool for speech production research. JSLHR, 51(4), 914–921. DOI: http://doi.org/10.1044/1092-4388(2008/067)

Kroos, C. (2012). Evaluation of the measurement precision in three-dimensional Electromagnetic Articulography (Carstens AG500). Journal of Phonetics, 40, 453–465. DOI: http://doi.org/10.1016/j.wocn.2012.03.002

Kroos, C., Bundgaard-Nielsen, R. L., & Best, C. T. (2012). Exploring nonlinear relationships between speech face motion and tongue movements using Mutual information. In International Speech Production Seminar 2014, Köln, Germany, 2014, pp. 237–240.

Kühnert, B., & Hoole, P. (2004). Speaker-specific kinematic properties of alveolar reductions in English and German. Clinical Linguistics & Phonetics, 18(6), 559–575. DOI: http://doi.org/10.1080/02699200420002268853

Kullaa-Mikkonen, A., Mikkonen, M., & Kotilainen, R. (1982). Prevalence of different morphologic forms of the human tongue in young Finns. Oral Surgery, Oral Medicine, Oral Pathology, 53(2), 152–156. DOI: http://doi.org/10.1016/0030-4220(82)90281-X

Ladefoged, P., & Maddieson, I. (1996). The Sounds of the World’s Languages. Oxford: Blackwell.

Lammert, A., Proctor, M., & Narayanan, S. (2018). Morphological variation in the adult hard palate and posterior pharyngeal wall. JSLHR, 56, 521–530. DOI: http://doi.org/10.1044/1092-4388(2012/12-0059)

Lee, J., & Bell, M. (2018). Articulatory range of movement in individuals with dysarthria secondary to amyotrophic lateral sclerosis. American Journal of Speech-Language Pathology, 27(3), 996–1009. DOI: http://doi.org/10.1044/2018_AJSLP-17-0064

Lobsang, G., Lu, W., Honda, K., Wei, J., Guan, W., Fang, Q., & Dang, J. (2016). Tibetan vowel analysis with a multi-modal Mandarin-Tibetan speech corpus. APSIPA 2016. DOI: http://doi.org/10.1109/APSIPA.2016.7820776

Maeda, S., Berger, M., Engwall, O., Laprie, Y., Maragos, P., Potard, B., & Schoentgen, J. (2006). Acoustic-to-articulatoy inversion: Methods and Acquisition of articulatory data (Report on Special Targeted Research Project).

Mahne, A., El-Haddad, G., Alavi, A., Houseni, M., Moonis, G., Mong, A., Hernandez-Pampaloni, M., & Torigian, D. A. (2007). Assessment of Age-Related Morphological and Functional Changes of Selected Structures of the Head and Neck by Computed Tomography, Magnetic Resonance Imaging, and Positron Emission Tomography. Seminars in Nuclear Medicine, 37(2), 88–102. DOI: http://doi.org/10.1053/j.semnuclmed.2006.10.003

Maurer, D., Gröne, B., Landis, T., Hoch, G., & Schönle, P. W. (1993). Re-examination of the relation between the vocal tract and the vowel sound with electromagnetic articulography (EMA) in vocalizations. Clinical Linguistics and Phonetics, 7(2), 129–143. DOI: http://doi.org/10.3109/02699209308985550

McClean, M. D., Tasko, S. M., & Runyan, C. M. (2004). Orofacial movements associated with fluent speech in persons who stutter. JSLHR, 47(2), 294–303. DOI: http://doi.org/10.1044/1092-4388(2004/024)

McNeil, M. R., Katz, W. F., Fossett, T. R. D., Garst, D. M., Szuminsky, N. J., Carter, G., & Lim, K. Y. (2010). Effects of Online Augmented Kinematic and Perceptual Feedback on Treatment of Speech Movements in Apraxia of Speech. Folia Phoniatrica et Logopaedica, 62(3), 127–133. DOI: http://doi.org/10.1159/000287211

Meenakshi, N., & Ghosh, P. K. (2018). Reconstruction of articulatory movements during neutral speech from those during whispered speech. JASA, 143(6), 3352–3364. DOI: http://doi.org/10.1121/1.5039750

Meenakshi, N., Yarra, C., Yamini, B. K., & Ghosh, P. K. (2014). Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording. In Proceedings of INTERSPEECH 2014 (pp. 935–939).

Mefferd, A. S. (2017). Tongue- and jaw-specific contributions to acoustic vowel contrast changes in the diphthong/ai/ in response to slow, loud, and clear speech. JSLHR, 60(11), 3144–3158. DOI: http://doi.org/10.1044/2017_JSLHR-S-17-0114

Mefferd, A. S. (2019). Effects of speaking rate, loudness, and clarity modifications on kinematic endpoint variability. Clinical Linguistics and Phonetics, 33(6), 570–585. DOI: http://doi.org/10.1080/02699206.2019.1566401

Mefferd, A. S., & Dietrich, M. S. (2019). Tongue- and Jaw-Specific Articulatory Underpinnings of Reduced and Enhanced Acoustic Vowel Contrast in Talkers with Parkinson’s Disease. JSLHR, 62(7), 2118–2132. DOI: http://doi.org/10.1044/2019_JSLHR-S-MSC18-18-0192

Mennen, I., Scobbie, J. M., de Leeuw, E., Schaeffler, S., & Schaeffler, F. (2010). Measuring language-specific phonetic settings. Second Language Research, 26(1), 13–41. DOI: http://doi.org/10.1177/0267658309337617

Mitra, V., Sivaraman, G., Nam, H., Espy-Wilson, C., Saltzman, E., & Tiede, M. (2017). Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition. Speech Communication, 89, 103–112. DOI: http://doi.org/10.1016/j.specom.2017.03.003

Moen, I., Gram Simonsen, H., & Lindstad, A. M. (2004). An electronic database of Norwegian speech sounds: Clinical aspects. Journal of Multilingual Communication Disorders, 2(1), 43–49. DOI: http://doi.org/10.1080/14769670310001616624

Mooshammer, C., Hoole, P., & Geumann, A. (2006). Interarticulator cohesion within coronal consonant production. JASA, 120(2), 1028–1039. DOI: http://doi.org/10.1121/1.2208430

Mooshammer, C., Hoole, P., & Geumann, A. (2007). Jaw and Order. Language and Speech, 50(2), 145–176. DOI: http://doi.org/10.1177/00238309070500020101

Mooshammer, C., Tiede, M., Shattuck-Hufnagel, S., & Goldstein, L. (2019). Towards the Quantification of Peggy Babcock: Speech Errors and Their Position within the Word. Phonetica, 76(5), 363–396. DOI: http://doi.org/10.1159/000494140

Mücke, D., Hermes, A., Roettger, T. B., Becker, J., Niemann, H., Gembek, T. A., Timmermann, L., Visser-Vandewalle, V., Fink, G. R., Grice, M., & Barbe, M. T. (2018). The effects of Thalamic Deep Brain Stimulation on speech dynamics in patients with Essential Tremor: an articulographic study. PLoS One, 13(1). DOI: http://doi.org/10.1371/journal.pone.0191359

Murdoch, B. E. (2011). Physiological investigation of dysarthria: Recent advances. International Journal of Speech-Language Pathology, 13(1), 28–35. DOI: http://doi.org/10.3109/17549507.2010.487919

Narayanan, S., Toutios, A., Ramanarayanan, V., …, & Proctor, M. (2014). Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). JASA, 137, 1307–1311. DOI: http://doi.org/10.1121/1.4890284

Navazesh, M., Mulligan, R. A., Kipnis, V., Denny, P. A., & Denny, P. C. (1992). Comparison of Whole Saliva Flow Rates and Mucin Concentrations in Healthy Caucasian Young and Aged Adults. Journal of Dental Research, 71(6), 1275–1278. DOI: http://doi.org/10.1177/00220345920710060201

Neufeld, C., & van Lieshout, P. (2014). Tongue kinematics in palate relative coordinate spaces for electromagnetic articulography. JASA, 135, 352–361. DOI: http://doi.org/10.1121/1.4836515

Nijland, L., Maassen, B., Hulstijn, W., & Peters, H. (2004). Speech motor coordination in Dutch-speaking children with DAS studied with EMMA. Journal of Multilingual Communication Disorders, 2(1), 50–60. DOI: http://doi.org/10.1080/1476967031000091015

Northern Digital Inc. (2009, rev. 2016). Wave User Guide. Retrieved from http://support.ndigital.com

Northern Digital Inc. (2019). Vox-EMA System User Guide. Retrieved from http://support.ndigital.com

Northern Digital Inc. (2020, June). NDI Company Update – June 2020. Retrieved from https://www.ndigital.com/

Okadome, T., & Honda, M. (2001). Generation of articulatory movements by using a kinematic triphone model. JASA, 110(1), 453–463. DOI: http://doi.org/10.1121/1.1377633

Oliver, R. G., & Evans, S. P. (1986). Tongue size, oral cavity size and speech. The Angle Orthodontist, 56, 234–243.

Patem, A. K., Illa, A., Afshan, A., & Ghosh, P. K. (2018). Optimal sensor placement in electromagnetic articulography recording for speech production study. Computer Speech & Language, 47, 157–174. DOI: http://doi.org/10.1016/j.csl.2017.07.008

Perkell, J. S., Cohen, M. H., Svirsky, M. A., Matthies, M. L., Garabieta, I., & Jackson, M. T. T. (1992). Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. JASA, 92(6), 3078–3096. DOI: http://doi.org/10.1121/1.404204

Peyron, M., Mioche, L., Renon, P., & Abouelkaram, S. (1996). Masticatory jaw movement recordings: A new method to investigate food texture. Food Quality and Preference, 7(3–4), 229–237. DOI: http://doi.org/10.1016/S0950-3293(96)00014-6

Rebernik, T., Jacobi, J., Tiede, M., & Wieling, M. (in revision). Accuracy assessment of two electromagnetic articulographs: NDI Wave and NDI Vox.

Reddihough, D. S., & Johnson, H. (1999). Assessment and Management of Saliva Control Problems in Children and Adults with Neurological Impairment. Journal of Development and Physical Disabilities, 11, 17–24. DOI: http://doi.org/10.1023/A:1021804500520

Richmond, K., Hoole, P., & King, S. (2011). Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus. In Proceedings of INTERSPEECH 2011, Florence, 1505–1508.

Rochon, M., & Pompino-Marschall, B. (1999). The articulation of secondarily palatalized coronals in Polish. In Proceedings of ICPhS 1999, San Francisco, 1897–1900.

Rong, P., Loucks, T., Kim, H., & Hasegawa-Johnson, M. (2012). Relationship between kinematics, F2 slope and speech intelligibility in dysarthria due to cerebral palsy. Clinical Linguistics & Phonetics, 26(9). DOI: http://doi.org/10.3109/02699206.2012.706686

Rudy, K., & Yunusova, Y. (2013). The effect of anatomic factors on tongue position variability during consonants. JSLHR, 56(1), 137–149. DOI: http://doi.org/10.1044/1092-4388(2012/11-0218)

Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46(4), 523–541. DOI: http://doi.org/10.1007/s10579-011-9145-0

Savariaux, C., Badin, P., Samson, A., & Gerber, S. (2017). A comparative study of the precision of Carstens and Northern Digital Instruments Electromagnetic Articulographs. JSLHR, 60, 322–340. DOI: http://doi.org/10.1044/2016_JSLHR-S-15-0223

Schneider, G., & Otto, K. (2012). In vitro and in vivo studies on the use of Histoacryl® as a soft tissue glue. Rhinology, 269, 1783–1789. DOI: http://doi.org/10.1007/s00405-011-1868-4

Schönle, P. W., Gräbe, K., Wenig, P., Höhne, J., Schrader, J., & Conrad, B. (1987). Electromagnetic articulography: Use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract. Brain and Language, 31, 26–35. DOI: http://doi.org/10.1016/0093-934X(87)90058-7

Schönle, P. W., Müller, C., & Wenig, P. (1989). Real-time analysis of orofacial movements with the aid of electromagnetic articulography [in German]. Biomedizinische Technik, 34(6), 126–130. DOI: http://doi.org/10.1515/bmte.1989.34.6.126

Schötz, S., Frid, J., & Löfqvist, A. (2013). Development of speech motor control: Lip movement variability. JASA, 133(6), 4210–4217. DOI: http://doi.org/10.1121/1.4802649

Seikel, J. A., Drumright, D. G., & Huddock, D. J. (2020). Anatomy & Physiology for Speech, Language, and Hearing. San Diego: Plural Publishing.

Shellikeri, S., Green, J. R., Kulkarni, M., Rong, P., Martino, R., Zinman, L., & Yunusova, Y. (2016). Speech movement measures as markers of bulbar disease in Amyotrophic Lateral Sclerosis. JSLHR, 59(5), 887–899. DOI: http://doi.org/10.1044/2016_JSLHR-S-15-0238

Shosted, R. K., Carignan, C., & Rong, P. (2011). Estimating vertical larynx position using EMA. In Proceedings of ISSP 2011, 139–146.

Sigona, F., Stella, M., Stella, A., Bernardini, P., Fivela, B. G., & Grimaldi, M. (2018). Assessing the Position Tracking Reliability of Carstens’ AG500 and AG501 Electromagnetic Articulographs during Constrained Movements and Speech Tasks. Speech Communication, 104, 73–88. DOI: http://doi.org/10.1016/j.specom.2018.10.001

Simonsen, H. G., Moen, I., & Cowen, S. (2008). Norwegian retroflex stops in a cross linguistic perspective. Journal of Phonetics, 36(2), 385–405. DOI: http://doi.org/10.1016/j.wocn.2008.01.001

Sivaraman, G., Espy-Wilson, C., & Wieling, M. (2017). Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and Languages. In Proceedings of INTERSPEECH 2017, 974–978. DOI: http://doi.org/10.21437/Interspeech.2017-260

Smith, S., & Aasen, R. (1992). The effects of electromagnetic fields on cardiac pacemakers. IEEE Transactions on Broadcasting, 38(2), 136–139. DOI: http://doi.org/10.1109/11.142666

Steele, C. M. (2015). The blind scientists and the elephant of swallowing: A review of instrumental perspectives on swallowing physiology. Journal of Texture Studies, 45, 122–137. DOI: http://doi.org/10.1111/jtxs.12101

Steele, C. M., & van Lieshout, P. (2009). Tongue Movements During Water Swallowing in Healthy Young and Older Adults. JSLHR, 52(5), 1255–1267. DOI: http://doi.org/10.1044/1092-4388(2009/08-0131)

Steele, C. M., & van Lieshout, P. H. H. M. (2004). Use of Electromagnetic Midsagittal Articulography in the Study of Swallowing. JSLHR, 47(2), 342–352. DOI: http://doi.org/10.1044/1092-4388(2004/027)

Steele, C. M., van Lieshout, P. H. H. M., & Pelletier, C. A. (2012). The Influence of Stimulus Taste and Chemesthesis on Tongue Movement Timing in Swallowing. JSLHR, 55, 262–275. DOI: http://doi.org/10.1044/1092-4388(2011/11-0012)

Stella, M., Stella, A., Figona, F., Bernardini, B., Grimaldi, M., & Fivela, B. G. (2013). Electromagnetic articulography with AG500 and AG501. In Proceedings of Interspeech 2013, Lyon, France, pp. 1316–1320.

Stella, M., Stella, A., Grimaldi, M., & Fivela, B. G. (2012). Numerical instabilities and three-dimensional electromagnetic articulography. JASA, 132(6), 3941–3949. DOI: http://doi.org/10.1121/1.4763549

Stone, M. (2010). Laboratory Techniques for Investigating Speech Articulation. In W. J. Hardcastle, J. Laver & F. E. Gibbon (Eds.), The Handbook of Phonetic Sciences (second edition, pp. 7–38). DOI: http://doi.org/10.1002/9781444317251.ch1

Stone, M., Woo, J., Lee, J., Poole, T., Seagraves, A., Chung, M., Kim, E., Murano, E. Z., Prince, J. L., & Blemker, S. S. (2018). Structure and variability in human tongue muscle anatomy. Comput Methods Biomech Biomed Eng Imaging Vis, 6(5), 599–507. DOI: http://doi.org/10.1080/21681163.2016.1162752

Suemitsu, A., Dang, J., Ito, T., & Tiede, M. (2015). A real-time articulatory visual feedback approach with target presentation for second language pronunciation learning. JASA, 138(4), e382–e387. DOI: http://doi.org/10.1121/1.4931827

Tabain, M. (2003). Effects of prosodic boundary on /aC/ sequences: Articulatory results. JASA, 113(5), 2834–2849. DOI: http://doi.org/10.1121/1.1564013

Tasko, S. M., & McClean, M. D. (2004). Variations in Articulatory Movement with Changes in Speech Task. JSLHR, 47, 85–100. DOI: http://doi.org/10.1044/1092-4388(2004/008)

Thibeault, M., Ménard, L., Baum, S. R., Richard, G., & McFarland, D. H. (2011). Articulatory and acoustic adaptation to palatal perturbation. JASA, 192, 2112–2120. DOI: http://doi.org/10.1121/1.3557030

Thompson, A., & Kim, Y. (2019). Relation of second formant trajectories to tongue kinematics. JASA, 145(4), e323–e328. DOI: http://doi.org/10.1121/1.5099163

Tiede, M. (2005). MVIEW: Software for visualization and analysis of concurrently recorded movement data. New Haven, CT: Haskins Laboratories.

Tiede, M., Bundgaard-Nielsen, R., Kroos, C., Gibert, G., Attina, V., Kasisopa, B., Vatikiotis-Bateson, E., & Best, C. (2010). Speech articulator movements recorded from facing talkers using two electromagnetic articulometer systems simultaneously. In Proceedings of Meetings on Acoustics 11. DOI: http://doi.org/10.1121/1.3508805

Tiede, M., Chen, W., & Whalen, D. H. (2019). Taiwanese Mandarin sibilant contrasts investigated using coregistered EMA and ultrasound. In Proceedings of ICPhS 2019.

Tiede, M., Espy-Wilson, C. Y., Goldenberg, D., Mitra, V., Nam, H., & Sivaraman, G. (2017). Quantifying kinematic aspects of reduction in a contrasting rate production task. JASA, 141(5), 3580–3580. DOI: http://doi.org/10.1121/1.4987629

Tognola, G., Parazzini, M., Sibella, F., Paglialonga, A., & Ravazzani, P. (2007). Electromagnetic interference and cochlear implants. Annali dell’Istituto Superiore di Sanita, 43(3), 241–247.

Tong, E., & Ng, M. L. (2011). Interaction between lexical tone and labial movement in Cantonese bilabial plosive production. In Proceedings of ICPhS 2011.

Trudeau-Fisette, P., Tiede, M., & Ménard, L. (2017). Compensations to auditory feedback perturbations in congenitally blind and sighted speakers: Acoustic and articulatory data. PLoS One, 12(7), e0180300. DOI: http://doi.org/10.1371/journal.pone.0180300

van Lieshout, P. H. H. M. (2007). The use of Electro-Magnetic Midsaggital Articulography in oral motor research. In E. Padrós-Serrat (Ed.), Bases Diagnosticas, Terapeuticas Y Posturales Del Funcionalismo Craneofacial [Diagnostic, therapeutic and postural basis of craniofaxial functionalism] (pp. 1140–1156). Ripano Editorial Medica.

van Lieshout, P. H. H. M., Rutjes, C. A. W., & Spauwen, P. H. M. (2002). The Dynamics of Interlip Coupling in Speakers with a Repaired Unilateral Cleft-Lip History. JSLHR, 45(1), 5–19. DOI: http://doi.org/10.1044/1092-4388(2002/001)

Vorperian, H. K., Kent, R. D., Lindstrom, M. J., Kalina, C. M., Gentry, L. R., & Yandell, B. S. (2005). Development of vocal tract length during early childhood: A magnetic resonance imaging study. JASA, 117(1), 338–350. DOI: http://doi.org/10.1121/1.1835958

Wang, J., Samal, A., Green, J. R., & Rudzicz, F. (2012). Whole-word recognition from articulatory movements for silent speech interfaces. In Proceedings of INTERSPEECH 2012. DOI: http://doi.org/10.1109/ICASSP.2012.6289039

Wang, J., Samal, A., Rong, P., & Green, J. R. (2016). An optimal set of flesh points on tongue and lips for speech-movement classification. JSLHR, 59, 15–26. DOI: http://doi.org/10.1044/2015_JSLHR-S-14-0112

Weinberger, S. (2015). Speech Accent Archive. George Mason University. Retrieved from http://accent.gmu.edu

West, P. (1999). The extent of coarticulation of English liquids: An acoustic and articulatory study. International Congress of Phonetics, 1901–1904. Retrieved from http://www.phon.ox.ac.uk/files/people/west/icphswest.pdf

Westbury, J. R. (1994). On coordinate systems and the representation of articulatory movements. JASA, 95, 2271–2273. DOI: http://doi.org/10.1121/1.408638

Whalen, D. H., Iskarous, K., Tiede, M. K., Ostry, D. J., Lehnert-LeHouillier, H., Vatikiotis-Bateson, E., & Hailey, D. S. (2005). The Haskins Optically Corrected Ultrasound System (HOCUS). JSLHR, 48, 543–553. DOI: http://doi.org/10.1044/1092-4388(2005/037)

Whelton, H. (2012). Introduction: The anatomy and physiology of salivary glands. In M. Edgar, C. Dawes and D. O’Mullane (Eds.), Saliva and oral health (4th Ed., pp. 1–17). Comberton, UK: Stephen Hancocks Limited.

Wieling, M. (2018). Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics, 70, 86–116. DOI: http://doi.org/10.1016/j.wocn.2018.03.002

Wieling, M., Tomaschek, F., Arnold, D., Tiede, M., Bröker, F., Thiele, S., Wood, S. N., & Baayen, H. (2016). Investigating dialectal differences using articulography. Journal of Phonetics, 59, 122–143. DOI: http://doi.org/10.1016/j.wocn.2016.09.004

Wieling, M., Veenstra, P., Adank, P., & Tiede, M. (2017). Articulatory differences between L1 and L2 speakers of English. In Proceedings of ISSP11.

Wieling, M., Veenstra, P., Adank, P., Weber, A., & Tiede, M. K. (2015). Comparing L1 and L2 speakers using articulography. In Proceedings of ICPhS 2015.

Wrench, A. (2000). A Multichannel Articulatory Database and its Application for Automatic Speech Recognition. In Proceedings of 5th Seminar of Speech Production, 305–308.

Yunusova, Y., Green, J. R., & Mefferd, A. (2009). Accuracy Assessment for AG500, Electromagnetic Articulograph. JSLHR, 52(2), 547–555. DOI: http://doi.org/10.1044/1092-4388(2008/07-0218)

Yunusova, Y., Kearney, E., Kulkarni, M., Haworth, B., Baljko, M., & Faloutsos, P. (2017). Game-based augmented visual feedback for enlarging speech movements in Parkinson’s disease. JSLHR, 60(6S), 1818–1825. DOI: http://doi.org/10.1044/2017_JSLHR-S-16-0233

Yunusova, Y., Rosenthal, J. S., Rudy, K., Baljko, M., & Sakalogiannakis, J. (2012). Positional targets for lingual consonants defined using electromagnetic articulography. JASA, 132(2), 1027–1038. DOI: http://doi.org/10.1121/1.4733542

Zhang, Y., Jones, P. L., & Jetley, R. (2010). A hazard analysis for a generic insulin infusion pump. Journal of Diabetes Science and Technology, 4(2), 263–283. DOI: http://doi.org/10.1177/193229681000400207

Accepted	2020-12-10
Published	2021-03-01

Abstract

Keywords

How to Cite

Download

6033

1485

15