There is increasing evidence that language sounds which appear to be ‘the same sound’ are in some cases actually distinct from each other, with consequences for phonological analysis. The classification of sound categories is of great importance to understanding the phonological organization in a language, and close attention to small phonetic differences has contributed to the development of both Lexical Phonology (Mohanan, 1982; Kiparsky, 1982) and Articulatory Phonology (Browman & Goldstein, 1986, 1992).
The effect of sameness may arise from categorical perception (Liberman et al., 1957; Harnad, 2003); the fact remains that native speakers do not necessarily attend to all distinctions in speech sounds. For example, sometimes segments with different phonetic properties are classed together as a single sound, such as the different articulations of American English /ɹ/ (Delattre & Freeman, 1968; Mielke et al., 2010; Archangeli et al., 2011; Mielke et al., 2016) and the different acoustics of American English /s/ (Baker et al., 2011). Another class of relevant examples is neutralization, both incomplete and complete. Most striking are instances of incomplete neutralization, where two sounds from different sources were considered to be the same but were later revealed to be slightly different under close phonetic examination (see Port, 1996 and Yu, 2007 for summaries of the issues; final devoicing examples are found in Port & O’Dell, 1985, Warner et al., 2004; Winter & Röttger, 2011 among others; English [l] in Lee-Kim et al., 2013; tonal near neutralization in Myers & Tsay, 2008; Cheng et al., 2013; incomplete nasal place assimilation in Stephenson & Harrington, 2002; coronal-velar nasal-obstruent heteromorphemic sequences in English in Barry, 1991, and Korean heteromorphemic palatalization in Sung, 2015). Incomplete neutralizations contrast with complete neutralization, for example complete neutralization of manner for coronals in Korean codas (Kim & Jongman, 1996) and of pre-velar nasals in Italian (Celata et al., 2013). Sometimes in the same language both complete and incomplete neutralization are found, depending on the context: Nasal-obstruent place assimilation in Spanish is complete with a following stop, but it is partial with a following fricative (Kochetov & Colantoni, 2011); /s/+/j/ in English varies depending on intervening boundary type (Zsiga, 1995).
Such examples raise a question about ‘exceptions’ to otherwise general phonological patterns when a sound appears to have anomalous behavior: Is this truly an exception, inconsistent with the general pattern, or does closer phonetic examination suggest that in fact the sound was misperceived and there is a systematic pattern? To illustrate, the earliest description of Okpẹ vowels (Hoffmann, 1973) is used as an example of absolute neutralization (Kenstowicz & Kisseberth, 1979): It appears that /ɪ/ and /e/ neutralize to [e], /ʊ/ and /o/ neutralize to [o]. However, Omamor (1973) includes a pilot study suggesting that the high retracted and mid advanced vowel pairs have slightly different formant values—in which case this is not absolute neutralization. Similarly, Mutaka (1995) assumes that Kinande short low vowels are exempt from an otherwise general tongue root advancement harmony, being retracted even in advanced environments; ultrasound examination (Gick et al., 2006) reveals that low vowels have advanced tongue root in advanced contexts. In short, all vowels participate in tongue root harmony and the Kinande pattern is completely symmetric.
Another potential example comes in the phenomenon called nasal substitution, occurring in Austronesian languages, including the Malayo-Polynesian languages Sasak and Javanese. Nasal substitution refers to morphological pairing of obstruent-initial and homorganic nasal-initial forms—except that [s] pairs with a sound described as [ɲ] in both Sasak and Javanese. Our question is whether closer inspection reveals that [s] and the nasal morphologically related to [s] are in fact homorganic sounds, despite the description of them as heterorganic sounds.
Nasal substitution occurs in many Austronesian languages, with similar correspondences across languages. In certain morphological contexts, words with initial voiceless obstruents appear with a homorganic nasal instead of the obstruent—labial with labial, dental/alveolar with dental/alveolar, velar with velar— so-called nasal substitution (De Guzman, 1978; Pater, 1999, 2001; Blust, 2004), as in Table 1a. For words with initial voiced obstruents, a homorganic nasal surfaces along with the obstruent, Table 1b. Our focus is on the Table 1a pattern, the voiceless obstruents and the corresponding homorganic nasals.1
What is less straightforward—yet familiar across different languages in this family—is the behavior of word-initial orthographic ‘s.’ Since ‘s’ is typically classed with the other dental/alveolar consonants, we would expect it to pair with [n] in this paradigm. However, it pairs with orthographic ‘ny’ instead of with ‘n,’ illustrated in Table 2 where the nasal correspondent of [s] is represented as [Ns].2
The contrast between the behavior of [s] and the behavior of other voiceless obstruents appears to break the ‘homorganic nasal’ pattern. Our question is whether this is indeed the case, arguing in favor of an abstract pattern, i.e., a relation between two sounds which is not grounded completely in their phonetic properties. The alternative is that the homorganic nasal pattern is concrete, i.e., the relation between the two sounds is fully grounded in their phonetic properties. In an abstract relation, the places of articulation of [s] and of [Ns] would be quite different from each other while if the relation is concrete, the articulation of [s] and [Ns] would be relatively similar to each other. In either case, there remains the question of what the place of articulation is for each of these sounds; possibilities are shown in Table 3.
|[t, n]||other||[ʨ, ɲ]|
|abstract, heterorganic a.||[s]||[Ns]|
|concrete, homorganic d.||[s, Ns]|
Under the abstract hypothesis, the two sounds are related morphologically, but the homorganic relation found with other segments does not hold with [s] and its corresponding nasal [Ns]. We expect that the tongue positions of [s] and [Ns] would be quite different from each other, and comparable to the way that heterorganic tongue positions differ from each other. The place of the two sounds might correspond to the perceived place, with [s] homorganic to [t, n] (an dental/alveolar [s]; Dart, 1991) and [Ns] homorganic to [ʨ, ɲ] (Table 3a). Alternatively, [s] might be distinct from [t, n] (e.g., [s] is postalveolar in the phonological analysis of Mester, 1986) or [Ns] might be distinct from [ʨ, ɲ] (Table 3b, c respectively).
Under the concrete hypothesis, articulatory similarity between [s] and [Ns] is driving the alternation. Thus, we expect that [s], like other voiceless obstruents, would be homorganic with its nasal counterpart and their tongue positions quite similar to each other. If [s] and [Ns] are homorganic, they could class with [t, n], with [ʨ, ɲ], or with a third place of articulation (Table 3d, e, f respectively). The abstract and concrete hypotheses are summarized in Table 4.
|a.||Abstract Hypothesis||[s] ≠ [Ns]||articulatory dissimilarity|
|b.||Concrete Hypothesis||[s] = [Ns]||articulatory similarity|
We set out to answer the “Abstract or concrete?” question and to address the related place of articulation issues based on ultrasound data of speakers of both Sasak and Javanese.
2. Language background
Sasak is the primary local language of Lombok, Indonesia, with speakers estimated at 2 million (Clynes, 1995) and 2.5 million (Marli, 2015). Sasak, along with Balinese, Sumbawa, Malayic, and Chamic, is within the Malayo-Polynesian sub-group Malayo-Sumbawan (Adelaar, 2005). Sasak is described as having four (Jacq, 1998) or five (Austin, 2003) major dialects. (Austin, 2003 reports that the informal names for the dialects relate to how each group pronounces the deictic words for ‘like this’ and ‘like that’: Ngenó-Ngené (central northeast, central east, and central west coasts of Lombok), Menó-Mené (central Lombok), Ngotó-Ngeté (northeastern Lombok), Ngenó-Mené, also known as Kutó-Kuté (north Lombok), and Meriaq-Meriku (south central Lombok). Jacq, 1998 does not include Ngenó-Mené as a dialect.) The dialects with the widest geographical distribution are Ngenó-Ngené and Menó-Mené, which are the only varieties used by Sasak speakers in this study.
Javanese is the most-spoken regional language of Indonesia and the most-spoken language of the Austronesian language family with approximately 75 million speakers. It is found along the northwest coast of Java (Banten, Krawang, Cirebon) and in the central and eastern areas of this island. Outside of Java, it is used in diasporic communities in neighboring provinces of Sumatra, Kalimantan, and Sulawesi, as well as in Suriname and New Caledonia. Three dialects are usually distinguished (western, central, and eastern) (Ras, 1985), and the western dialect is further divided into seven subdialects (Nothofer, 1980). Our participants are all from the central and eastern dialects, which have not yet been studied in much detail (Nothofer, 2006).
Nasal substitution in Sasak appears on verbs “used when the Patient-like argument is non-referential” according to Austin (2013, p. 41). There are many other factors determining the distribution of oral- or nasal-initial verb forms; see Austin (2013). Similarly, in Javanese, nasal substitution appears on verbs and relates to argument structure: Sato (2008, p. 53) calls nasal substitution the ‘active voice morphology’ and shows that it is necessary in basic transitive clauses, but does not occur in Wh-questions or passives. See also Sato (2015); see Herawati et al. (2016) for discussion of nasal substitution and denominal verbs.
As for the sounds, nasal substitution refers to related pairs of words, typically one which begins with a voiceless oral obstruent and the other with a nasal homorganic to that obstruent. (Voiced obstruents, sonorants, and vowel-initial words have their own patterns; see Clynes, 1995 and Austin, 2013 for Sasak; Dudas, 1976; Robson, 1992; and Lee, 2001 for Javanese, and Pater, 1999, 2001 for the Austronesian pattern in general).
To put nasal substitution in context, the Sasak and Javanese consonant inventories are shown in Table 5, based on Clynes (1995); Archangeli et al. (2016) for Sasak3 and Dudas (1976) for Javanese. In these tables, we describe sounds c [ʨ], j [ʥ], ny [ɲ], and y [j] as ‘postaveolar/palatal’ due to the lack of agreement in the literature as to the precise location of constriction described for these consonantal sounds. Both languages distinguish bilabial, postalveolar, and velar consonants. Sasak has a single dental/alveolar category. Javanese has both dental and alveolar consonants (Dudas, 1976 [citing Horne, 1961; Hayward & Mulijono, 1991]), also described as a dental/retroflex contrast (Suharno, 1982; Robson, 1992; Adisasmito-Smith, 2004; Graff & Jaeger, 2009); ‘s’ is classed with dentals by all sources. Javanese nasal substitution is described as resulting in a dental nasal [n̪] regardless of whether the corresponding sounds is a dental [t̪] or an alveolar (or retroflex) [ṯ/ʈ]. Because of the challenges of imaging the tip of the tongue with ultrasound, we did not use stimuli with initial alveolar/retroflex stops in the Javanese study.4Table 5
Consonant inventories for Sasak (above) & Javanese (below). The consonants that are our focus here are boxed in these tables.
The languages are similar in that each has only one sibilant, which is one of the sounds targeted in this study. It is possible that there is more variety of articulation for the sibilant because there is no sibilant contrast to be maintained—e.g., there is no [∫] alongside the [s]. Clynes (1995) classes Sasak [s] together with [ʨ, ʥ]. Mester (1986) views Javanese /s/ as a postalveolar consonant, and Robson (1992) states that the Javanese ‘s’ is similar to that of English, but “sometimes is heard as approaching sh” (p. 12).5 We did not perceive this fluctuation ourselves, in either language. These different classifications, along with nasal substitution apparently relating ‘s’ and a palatal nasal, raise questions about the phonetic nature of the sibilant in both languages: Could it be articulated somewhere between an dental/alveolar sound and a postalveolar/palatal sound, or is it indeed postalveolar/palatal? On the other hand, because Sasak has only one contrast in the dental/alveolar region while Javanese has two contrasts for the stops, we might expect concomitant differences between the two languages in the nasal counterparts to [s].
In order to carry out this study, we collected and analyzed ultrasound tongue imaging data. Sasak data were collected at the Mataram Lingua Franca Institute in Lombok, Indonesia, and Javanese data were collected at the University of Hong Kong.
The procedure for collecting and processing data in the two languages is largely the same. Differences in the methodologies arose because the two studies were carried out independently; we saw the value of putting the two together after the data were collected; analysis methods are as similar as possible given differences in number of stimuli per language and number of repetitions per stimulus for each language.6 We present the basic methodology here, along with ways in which the procedures for the two data sets differed.
For the Sasak part of the study, there were 11 participants who all reported speaking Sasak exclusively until elementary school; all continued to use Sasak on a daily basis throughout their lives. All participants also reported fluency in Bahasa Indonesia and have learned English as a third language. Ages ranged from 19 to 37 (average 24.4); 4 were female and 7 were male. Of these, 8 self-identified as speakers of the Menó-Mené (M-M) dialect while 3 self-identified as Ngenó-Ngené (Ng-Ng) speakers. Data from 2 additional speakers were omitted due to poor quality of the ultrasound images.
For Javanese, 8 female native speakers (and no male speakers) were recorded.7 All participants reported speaking Javanese (either the eastern or central dialect) on a daily basis until moving to Hong Kong, and all also reported fluency in Bahasa Indonesia. Some learned English as a third language, while others instead learned Cantonese. Ages ranged between 23 and 39 (average 31). The experiment was conducted either in English or in Cantonese, with occasional explanation in Javanese or Bahasa Indonesian by a bilingual Javanese- and Bahasa Indonesian-speaking assistant. (Data from two additional Javanese speakers were omitted due to poor imaging quality in the ultrasound signal).
Information about each Sasak- and Javanese-speaking participant’s gender, age, and native dialect appear in Table 6.
To examine the relationship between the place of oral and nasal sounds, we identified items with initial coronal voiceless consonants along with their morphologically-related nasal-initial forms. These initial consonants either had a known place of articulation, that is, either dental/alveolar or postalveolar, or they were ambiguous in place ([s] and [Ns]), shown in Table 7, with examples in the rightmost column.
|Non-nasal forms||Related nasal forms||Examples (Sasak)|
|Postalveolar||ʨ||ɲ||ʨaplɔk||ɲaplɔk||‘take s.o.’s property’|
In selecting stimuli, only morphologically-related forms were included, with either [a] or [ə] in the first syllable. The vowels [ə] and [a] were chosen in order to minimize the effects of coarticulation of the consonant from the following vowel. Non-high, central vowel contexts were chosen because high vowels with front or back tongue position (e.g., [i] or [u]) typically show stronger influences on tongue shape and position during consonantal constriction than do other types of vowels (Öhman, 1966; Zharkova & Hewlett, 2009).
For Sasak, one word with each vowel was identified for each of the 6 consonants, resulting in 12 stimuli; an additional [s]-initial pair was included so there are 14 stimuli for Sasak. In one case, the vowel [ɛ] was inadvertently used instead of [ə]: [tɛmbaʔ]/[nɛmbaʔ] ‘shoot.’ Orthographically, the symbol ‘e’ is used for both [ə] and [ɛ/e], leading to confusion about the desired stimuli from the orthographic prompt.
For Javanese, stimuli were selected in a similar way, with either [a] or [ə] present in the first syllable of each target item. For this language, 10 items were identified for each of the 6 target consonants, resulting in a total of 60 Javanese target stimuli. A full list of target Sasak and Javanese stimulus items is presented in the appendix.
Each data collection session began with an explanation of the study and the data collection methods. The participants were seated in front of a display laptop, which was used to present visual prompts. Two posable camera arms (Manfrotto 143 Magic Arm) were adjusted to provide a stable headrest for the participants and to minimize head movement throughout the entire duration of the collection session. A third fixed the position of the ultrasound sensor along the centerline of the lower jaw at a location where the full midsagittal contour of the tongue imaged most clearly. This setup is shown in Figure 1, where a close-up of head-to-probe stabilization method is shown in the image to the right.
When stabilization arm adjustments were complete, participants were asked whether they were willing to continue with the study. On agreement, each was asked to sip water slowly through a straw, in order to create an image of the palate. Once a good palate image was obtained, participants were instructed to produce each target stimulus from the dedicated display laptop’s screen, and prompts were advanced for each participant by an experimenter. Throughout the task, another experimenter monitored the imaging quality during collection to ensure that good ultrasound images were obtained.
Participants were asked to read each of the target word items, which appeared on the screen of the display laptop, one at a time. In the case of Sasak words, target items were produced in isolation, whereas for Javanese, target words were presented in the carrier phrase Kata ____ ‘(the) word (is)___,’ in order to make speakers produce a preceding [a] vowel immediately before the target initial-consonant sounds.
Stimuli were presented in a randomized order that was unique to each participant. For Sasak speakers, randomization was performed within each of six stimulus blocks, with each block containing one iteration of each target word. This resulted in six productions of each item and a total of up to 84 token productions per session for Sasak. For Javanese speakers, randomization was performed within one large block, in which each target item appeared two times, resulting in two productions of each target item and a total of up to 120 token productions per session for Javanese. The target number of prompts and repetitions for each sound in each language is summarized in Table 8.
|initial C||V context||sounds||items||reps||total||sounds||items||reps||total|
|total tokens per participant||84||total tokens per participant||120|
Because the collection procedures for Sasak and Javanese were designed independently of each other and occurred separately, the number of target items for Javanese was much larger than that for Sasak in this study. On the other hand, the number of iterations of each item in Sasak was larger than that for each item in Javanese. The consequence is that there are 50% more token productions in the Javanese data than in the Sasak data.
For all recording sessions, the ultrasound images were collected using a 2–4 MHz convex ultrasound sensor (Telemed MC4-2R20N) coupled with a Telemed ClarUs-EXT portable, ultrasonic beam-former connected to a high-performance laptop that functioned as a machine dedicated to audio- and video-data collection. The ultrasound images were constructed and displayed using Echo Wave II ultrasound imaging software (Telemed 2015) at approximately 60 frames per second. On-screen renderings of these images were collected using a combination of desktop-display software (XSplit Broadcaster: SplitmediaLabs, 2015) and real-time on-screen video-capture software (Fraps: Beepa, 2015) at a stable rate of 60 frames per second.8 Each session involved under 30 minutes of recording.
Audio was captured with an over-the-ear condenser microphone. In order to synchronize audio and ultrasound video, for Sasak, one experimenter produced a series of 6–14 tokens of the voiceless post-alveolar click [k͡!] at the end of each speaker’s video and audio recordings immediately before stopping those recordings. The Javanese recordings lacked these click productions, and thus, in order to synchronize each ultrasound video to its corresponding audio signal, 10 instances of the release of the unaspirated voiceless velar stop [k], present at the beginning of each item’s carrier phrase Kata _______, were selected at random throughout each recording and analyzed instead of clicks. Video-to-audio synchronization was achieved by determining the mean temporal lag between the onset time of the acoustic release burst of each stop (voiceless post-alveolar click [k͡!] for Sasak; velar plosive [k] for Javanese) and the time of the ultrasound frame immediately prior to visible articulatory release.9 In most cases, the ultrasound video signal had a consistent lag of between 1 and 3 seconds after the audio, and for each ultrasound video, frame times were subsequently readjusted in order to align them temporally with corresponding acoustic events in the audio signal. Post-collection and post-alignment, single ultrasound image frames were extracted from the video recordings as PNG-format image file sequences using digital video playback software (QuickTime Pro: Apple 2010). The recordings for 4 Javanese speakers (J2, J5, J6, and J7) were inadvertently halted mid-collection and thus their productions were recorded in multiple video files. However, these speakers did not move out of position when the recordings were halted, and in these cases, video-to-audio synchronization simply required the calculation of lag for each individual video file relative to its corresponding audio signal. (See manual alignment techniques in Miller & Finch, 2011.)
For the Sasak recordings, collected in Lombok, there were environmental factors that increased the noise-to-signal level in each audio recording, such as roosters crowing at random times, mosque calls for prayer, a pre-school promotion ceremony, motor-scooters passing by, and echo-y rooms to record in. The condenser microphone, positioned close to participants’ mouths and set to record at a low-level setting for gain during recording, served to improve the signal-to-noise ratio in the audio signal. These disruptions have had at most a minor impact on this study since the primary focus is the articulatory gestures associated with the sounds of interest, not their acoustic properties, and were not severe enough to prevent aural identification of items. The Javanese recordings were collected in a quiet room at the University of Hong Kong and did not have such issues.
Analysis involved four steps: (i) identifying frames to analyze, (ii) assigning coordinates to tongue contours, (iii) quantifying the distance between contours, (iv) statistically modeling the distribution of distance values across conditions.
Frames were identified through the audio recording, using Praat (Boersma & Weenink, 2015). Details follow about how frames were identified for different sound types.
- Oral stops and affricates [t, ʨ]. For oral stops and affricates, the frame of interest was defined as the last frame before the release of the oral stop constriction, which was identified from the corresponding waveform and spectrogram. The extracted ultrasound frame was assumed to represent a full stop constriction (i.e., the achievement of a full postalveolar constriction gesture) because the temporal distance between ultrasound frames (16.67 ms) was shorter than the time it would take for the tongue to maintain a stop constriction prior to release. The frames of interest for affricates were determined in the same manner because of the oral-stop portion of such sounds contained a complete oral constriction at the location for the affricates.
- Nasal stops [n, ɲ, Ns]. For nasal stops, the extracted frame was the frame closest to the acoustic midpoint of the interval of nasal-stop constriction, as determined from the waveform and spectrogram. In Sasak, where words were collected in isolation, the nasal stop was preceded by silence and the onset of nasalization was identified as the onset of the vocal fold vibration in the acoustic waveform. For the end of the nasal, and in Javanese where the nasal stop was preceded by a vowel due to the carrier phrase, the nasal stop boundaries were identified by the loss of vowel formant structure and a significant decrease in acoustic intensity in the acoustic waveform and spectrogram.
- Fricative [s]. For fricatives, the frame of interest was identified as the frame closest to the acoustic midpoint of the frication associated with the [s] articulation. The onset and offset of frication was determined by the presence of aperiodic noise in the waveform. The frame closest to the midpoint of the frication interval was assumed to best represent the [s] articulation because the corresponding acoustic pattern in the spectrogram was most characteristic of [s] at the center of the fricative.
The next step was to convert the images into coordinates corresponding to the tongue surface as shown in the image. EdgeTrak software (Li et al., 2005) was used to determine the boundary edges (corresponding to the surface of the tongue) and fit a smoothed graphical spline curves onto the boundaries; edges were hand-corrected as needed. The data for each spline were exported as a set of 100 equidistant coordinate points.
Although continuous attempts were made to ensure that the collected ultrasonic data did not contain any head movement relative to the transducer, comparisons of 9–10 successive traces of each speaker’s palate collected throughout the ultrasound recordings indicated that head movement occurred during the scanning for 5 talkers (S9, S10, J6, J7, and J10). Based on the palate data, the approximate moment of each significant shift in head position during production was identified, and palate contours affected by the movement were adjusted via spatial translation until the anterior portion of these affected palate contours were situated in the same region as those of the other palate traces. Then all sound contours of interest for these participants that were also affected by the head movement were adjusted in the same manner as the adjusted palate traces in order to correct for head movement in the spline data. This adjustment resulted in an overall reduction in variation in spatial position of the tongue splines in the data for the 5 talkers with head movement. An example of all palate and tongue contour data for participant S9 before adjustment and after adjustment is given in Figure 2.
In order to derive measures of similarity/dissimilarity between production tokens, root mean squared distances (RMSDs) between tongue contour pairs were calculated. Mean squared distance and RMSD values are used for various purposes in ultrasound tongue position research, e.g., comparing native and non-native speakers’ articulations (Li et al., 2005; Davidson, 2005; Berry et al., 2012), understanding coarticulation (Irfana & Sreedevi, 2016), and evaluating the accuracy of edge-detection algorithms (Roussos et al., 2009; Fasel & Berry, 2010; Csapó & Lulich, 2015). For our purposes, low RMSD values indicate that two sound tokens were articulated with high similarity, whereas high RMSDs indicate that the tokens were articulated with quite distinct lingual contours. RMSDs were calculated from the spatial distances between the contours from each token pair along each angle with an integer value shared between the two contours with respect to the location of the origin. The origin was defined as the point of intersection between the lines representing the leftmost and rightmost boundaries of the ultrasonic image for each talker, and the origin’s location depended solely on the scan settings (scan frequency, scan depth, and field of view) used in the EchoWave software. The RMSD calculation procedure is depicted in Figure 3. Distances at each angle were squared individually, then summed together, and divided by the total number of angles, and the square root of this value was taken as the RMSD measure for the token pairing.
For each spline pair, no measures were taken at angles (θ) that intersected with only a single trace, as shown at both ends of the images in Figure 3. This method contrasts with the mean Euclidean distance algorithm used in (Zharkova & Hewlett, 2009), which calculates the arithmetic mean of the shortest distances of all points along each spline to all points along the paired spline. Where there was a mismatch in spline length along the x-dimension (the θ-dimension in our method), as seen in Figure 3, the Zharkova and Hewlett approach could overestimate mean distance between splines because points at the extreme ends of each spline would be measured as having longer distances to their nearest point along the paired spline. Our method, on the other hand, essentially ignores measures at angles (θ) where both contours were not present in the spline data, and the potential overestimation of distance between spline pairs at the extreme ends is less likely here than in Zharkova and Hewlett. Since our aim is to determine the degree of similarity/difference (or degree of homorganicity/heterorganicity) between contours, we chose a method that would not necessarily overestimate mean distances between contours, i.e., not exaggerate their differences, in cases where contours simply differed in length along the θ-dimension rather than in articulatory place of constriction.
RMSD data for each language were submitted to linear mixed-effects regression (LMER) models using the lmer() function in the lmerTest package (Kuznetsova et al., 2012) in R statistical software (R Core Team, 2016). Both languages’ LMER models contained fixed effects of Place (homorganic, heterorganic, ambiguous) and Nasality (shared, contrastive) and random slopes and intercepts for factor levels within Subject. Word was omitted as a random effect because the number of tokens per item in the Javanese data set was small (2 iterations per word), and the addition of this factor did not improve the fit of the model for either language. Estimates of RMSD values from these LMER models allowed for comparisons of magnitude of difference between sound pairs according to Place and Nasality conditions. The number of RMSD values per speaker and per language and condition are reported in Tables 9 and 10.
|participant||no. of splines||mean degrees||participant||no. of splines||mean degrees|
In order to determine the significance of the above observations, we carried out LMERs on RMSD differences for various classes of sounds: The RMSD between two tongue splines serves as a measure of similarity of articulatory tongue positiontion. We divided the relevant sound pairs into six categories, shown in Table 11. ‘Homorganic’ refers to sounds with the same place of articulation, dental/alveolar or postalveolar, while ‘heterorganic’ is a cross between these two. These comparisons give a measure for RMSD for homorganic and heterorganic sounds. The ‘ambiguous’ category contains the sounds tested by this study, [s] and [Ns]. ‘Nasality-same’ (nasalitys) means either both sounds are oral or both sounds are nasal; ‘nasality-contrastive’ (nasalityc) means that the two sounds in a pair disagree for nasal/oral articulation.10
|Nasality||s(ame)||[t]–[t], [n]–[n], [ʨ]–[ʨ], [ɲ]–[ɲ]||[t]–[ʨ], [n]–[ɲ]||[s]–[s], [Ns]–[Ns]|
|c(ontrastive)||[t]–[n], [ʨ]–[ɲ]||[t]–[ɲ], [ʨ]–[n]||[s]–[Ns]|
Using the categories from Table 11, we are able to make our predictions explicit. Our first focus is on the categories homorganic and heterorganic, whether oral or nasal. We use the homorganic class to establish reasonable RMSDs for sounds made with the same place of articulation (same or similar articulatory position of the tongue). RMSDs are predicted to be small in this case because homorganic sounds are made with the same place of articulation by definition. In contrast, we expect large RMSDs when the two sounds have different places of articulation, the heterorganic condition. Putting these together, we expect that the RMSDhomorganic is smaller than the RMSDheterorganic.
When nasality is added in, we expect to find that homorganic sounds with contrastive nasality still have a small RMSD, but it is slightly larger than when the two sounds are truly identical due to differences introduced by the nasal/oral contrast because of the different manners of articulation. Our four expectations are summarized in Table 12.11
|a.||Homorganic||same||The estimated RMSD is relatively small.|
|contrastive||The estimated RMSD is relatively small, but larger than in the homorganic-same case.|
|b.||Heterorganic||same||The estimated RMSD is relatively large.|
|contrastive||The estimated RMSD is relatively large.|
|c.||Summary||RMSDHomorganic-s < RMSDHomorganic-c << RMSDHeterorganic|
The box plot in Figure 4 shows the distribution of RMSDs for both the unambiguous sounds and the ambiguous sounds. Focusing on the unambiguous sounds, we see that the RMSDs for homorganic sounds are small, well below 5 mm, with a lower mean when sounds are identical (i.e., the nasalitys case; Sasak: 2.5 mm, Javanese: 1.6 mm) than when nasality differs (nasalityc; Sasak: 3.4 mm, Javanese: 3.9 mm), exactly as expected: RMSDHomorganic-s < RMSDHomorganic-c. (This difference is significant, as seen in Table 14.) In contrast, the means for heterorganic sounds are above 5 mm for both Sasak and Javanese, whether there is a nasality contrast or not, again as expected: RMSDHomorganic-s,c << RMSDHeterorganic. RMSDs pattern as expected; we now have a measure to use in quantifying comparisons involving ambiguous sounds.
In particular, we now can understand the abstract/concrete hypotheses’ predictions in terms of RMSD. Under the abstract hypothesis, [s] is heterorganic to [Ns], shown by a relatively large RMSD. Under the concrete hypothesis, [s] is homorganic with [Ns], shown by a RMSD similar to that of homorganic-contrastive pairs since [s] and [Ns] have different nasality values; see Table 13. Under both hypotheses, the self-comparison values (RMSDAmbiguous-s)—i.e., [s] compared with [s] and [Ns] compared with [Ns]—are expected to be small.
|Abstract||[s]–[s], [Ns]–[Ns]||homorganic||The estimated RMSD is relatively small.|
|[s]–[Ns]||heterorganic||The estimated RMSD is relatively large.|
|Summary||RMSDambiguous-same = RMSDhomorganic-same|
|RMSDambiguous-same < RMSDambiguous-contrastive|
|Concrete||[s-s], [Ns]–[Ns]||homorganic||The estimated RMSD is relatively small.|
|[s]–[Ns]||homorganic||The estimated RMSD is relatively small.|
|Summary||RMSDambiguous-same = RMSDhomorganic-same|
|RMSDambiguous-contrastive = RMSDhomorganic-contrastive|
Table 14 evaluates the predictions from Table 13 in terms of estimated RMSD values for different comparisons, using LMERs to make those comparisons. The Place effects show that homorganic sounds have a significantly smaller estimated RMSD than do heterorganic sounds (Sasak same nasality: 2.5 mm < 6.4 mm, p < 0.0001; Sasak contrastive nasality: 3.4 mm < 6.5 mm, p < 0.0001; Javanese same nasality: 1.6 mm < 4.3 mm, p < 0.0001; Javanese contrastive nasality: 3.9 mm < 4.5 mm, p < 0.0001), with a general difference of approximately 3 mm. This is consistent with the prediction for place differences. In contrast, according to the LMER results for both languages, estimated RMSD values for homorganic sounds with same nasality (i.e., [t] compared with [t], [n] with [n], [ʨ] with [ʨ], [ɲ] with [ɲ]) and for ambiguous sounds with same nasality (i.e., [s] with [s], [Ns] with [Ns]) differ by a much smaller margin (Sasak: homorganics 2.5 mm > ambiguouss 2.2 mm, p < 0.0001; Javanese: homorganics 1.6 mm = ambiguouss 1.6 mm, p = 0.354), consistent with the predictions pertaining to ambiguoussame sounds.Table 14
LMER estimates of RMSD values (in mm) for unambiguous and ambiguous sound classes in Sasak (top) and Javanese (bottom). “Est. 1” shows RMSD estimates for the lefthand side of each prediction; “Est. 2” shows RMSD estimates for the righthand side of each comparison. The box highlights the p-value that is not significant. Sounds are categorized by place and nasality as in Table 11.
|Prediction||Est. 1||Est. 2||S.E.||t-value||p-value|
|Place||homorganics << heterorganics||2.50||6.36||0.04||87.55||<0.0001|
|homorganicc << heterorganicc||3.42||6.47||0.04||70.89||<0.0001|
|homorganics = ambiguouss||2.50||2.24||0.04||6.04||<0.0001|
|Nasality||homorganics < homorganicc||2.50||3.42||0.04||20.80||<0.0001|
|heterorganics < heterorganicc||6.36||6.47||0.04||2.644||0.0082|
|ambiguouss << ambiguousc||2.24||9.49||0.04||176.262||<0.0001|
|Prediction||Est. 1||Est. 2||S.E.||t-value||p-value|
|Place||homorganics << heterorganics||1.63||4.29||0.03||104.43||<0.0001|
|homorganicc << heterorganicc||2.97||4.54||0.03||62.38||<0.0001|
|homorganics = ambiguouss||1.63||1.60||0.03||0.93|
|Nasality||homorganics < homorganicc||1.63||2.97||0.03||52.48||<0.0001|
|heterorganics < heterorganicc||4.29||4.54||0.03||9.75||<0.0001|
|ambiguouss << ambiguousc||1.60||6.91||0.04||140.05||<0.0001|
Turning to the effect of nasality, we see that the RMSD estimates for homorganic comparisons with the same nasality are smaller than those for homorganic comparisons with contrastive nasality (Sasak: 2.5 mm < 3.4 mm, p < 0.0001; Javanese: 1.6 mm < 3.9 mm, p < 0.0001), although the RMSD value for homorganiccontrastive pairings is still relatively small. Heterorganicsame and heterorganiccontrastive pairings were similar to each other in RMSD estimates but still differed significantly (Sasak: 6.4 mm < 6.5 mm, p = 0.0082; Javanese: 4.3 mm < 4.5 mm, p < 0.0001), and corresponding RMSD values in these conditions were relatively large. Importantly, RMSD estimates for ambiguous sounds with contrastive nasality (ambiguousc) were large (Sasak: 9.5 mm; Javanese: 6.9 mm) and differed drastically from RMSD esimates for ambiguous sounds with matching nasality (Sasak: 9.5 mm >> 2.2 mm, p < 0.0001; Javanese: 6.9 mm >> 1.6 mm, p < 0.0001). We conclude that [s] and [Ns] exhibit the greatest distinctness in lingual contour shape and place of articulation in both languages, exactly as predicted under the abstract hypothesis.
The answer to the abstract/concrete question raises the issue of whether [s] is dental/alveolar sound and [Ns] is postalveolar, or whether one or the other is something else. In order to test whether [s] and [Ns] are homorganic with other dental/alveolar and postalveolar sounds respectively, we revised our earlier LMER models to target these two ambiguous sounds, including pairwise comparisons between each of the ambiguous sounds [s, Ns] and each of the unambiguous sounds [t, n, ʨ, ɲ] (a separate model was generated for each ambiguous sound, with only relevant sound comparisons included in the data set). The revised models provide RMSD estimates for each sound-sound pair as well as p-values for comparisons with corresponding unambiguous homorganic and unambiguous heterorganic categories given in Table 11, homorganics, homorganicc, heterorganics, and heterorganicc. RMSD estimates and p-values from the revised LMER models are reported in Table 15. The closer the RMSD estimate for a given sound pair is to the RMSD of one of the unambiguous categories, the more likely that that pair is a member of that category. If a p-value indicates that the RMSD comparison for the ambiguous sound does not differ from the RMSD for the corresponding unambiguous homorganic comparison, then the place-ambiguous sound in the pair is homorganic with the other sound in the pair.
|Homorganic comparison||Heterorganic comparison|
In all comparisons shown in Table 15, the RMSD estimates for [s] compared with either [t] or [n], and for [Ns] compared with either [ʨ] or [ɲ] are similar to the RMSD estimates for corresponding homorganic comparisons (columns 4–6, Table 15) and much smaller than the estimates for corresponding heterorganic comparisons (columns 7–9, Table 15). In general, RMSD estimates for comparisons of [s] to both [t] and [n] and of [Ns] to both [ɲ] and [ʨ] are within the range of homorganic contrasts (a RMSD near 3-mm), rather than in the range of heterorganic contrasts (a RMSD at or above 6-mm). In two cases, comparisons between Sasak [Ns]–[ɲ] and Javanese [Ns]–[ʨ] with their corresponding homorganic pairs [ɲ]–[ɲ] and [ɲ]–[ʨ] result in large p-values (p = 0.618 and p = 0.190, respectively), indicating no significant difference between RMSDs for these specific sound pairs and their unambiguous homorganic analogues. In Javanese, RMSDs for [Ns]–[ɲ] differ significantly from those for the known homorganic pair [ɲ]–[ɲ] (p = 0.0468), though to a lesser extent than for other presumably homorganic comparisons, and the RMSD estimate (1.9 mm) is even smaller than most other homorganic RMSDs. These relations are shown visually in Figure 5, in which presumed homorganic pairings, [s]–[t], [s]–[n], [Ns]–[ɲ], and [Ns]–[ʨ] (middle and leftmost orange shapes in each plot), are more similar to corresponding homorganic-same and -contrastive groups (gray shapes) than their heterorganic-same or -contrastive counterparts (green shapes) in terms of mean RMSD between contours. In each case for both languages, the comparison between [s] and [Ns] (rightmost orange shapes) has a larger mean RMSD value that any other homorganic or heterorganic comparison.
These results are consistent with the dental/alveolar/postalveolar hypothesis, that, in both languages, [s] is a member of the known dental/alveolar class along with [t] and [n] and that [Ns] is a member of the postalveolar/palatal class along with [ʨ] and [ɲ].
Comparison of lingual contours in ultrasound images for the relevant sounds shows that in these languages, the nasal substitution pattern relating a voiceless obstruent with its homorganic nasal stop breaks down with [s] and [Ns]. As illustrated by the contour plots and Smoothing Spline ANOVAs (SS-ANOVAs, Davidson 2006)12 for S2 in Figure 6, these two sounds have different articulations: [s] is an anterior, dental/alveolar sound in both Sasak and Javanese, with an articulation similar to that of [t] and [n], while [ɲ]—whether related to [s] or to [ʨ]—is a postalveolar sound like [ʨ]. This pattern is borne out across all talkers in both languages. Given the morphological pairing of [s] with [ɲ] and the attested variation among possible articulations for [s] sounds (Dart, 1991), we would not have been surprised to find a postalveolar [s] in these languages, with an articulation more similar to [ɲ]. However, this is not the case, nor is it the case that [s] and [Ns] share a single articulatory configuration that contrasts with both dental/alveolar and postalveolar consonants. Instead, there is an abstract morphophonological relation between an dental/alveolar [s] and derived postalveolar/palatal nasal [ɲ], just as impressionistic accounts lead us to expect.
Further examination of the splines and SS-ANOVAs provides a better understanding of the articulation of these consonants in the two languages. Reviewing these images shows that nearly all talkers (10 out of 11 Sasak, 8 out of 8 Javanese) articulated postalveolar sounds [ʨ] and [ɲ] as seen in Figure 6—with somewhat dissimilar articulatory positions: [ʨ] was articulated with a tongue-blade constriction posterior to dental/alveolar [t, s, n] but without full raising of the tongue body to the hard palate, whereas [ɲ], whether derived from [ʨ] or [s], was articulated similarly with the tongue blade but with a complete palatal occlusion between the tongue body and hard palate, at or slightly anterior to the constriction locations for palatal sounds [j] and [i] in the two languages. Thus, [ɲ] in Sasak and in Javanese is characterized as having a longer region of palatal constriction than [ʨ], and this articulation is consistent with that observed in electropalatalographic data for Peninsular Spanish alveolo-palatal ñ [ɲ] (Martínez Celdrán & Fernández Planas, 2007; Fernández Planas, 2009; Shosted et al., 2012).
These articulator relations align most closely with the first abstract, heterorganic hypothesis in Table 3a, and are summarized as in Table 16. Furthermore, we see that not only is the relation between [s] and ‘Ns’ abstract, so too is the relation between [ʨ] and [ɲ]: They are not entirely homorganic sounds.
That these articulatory patterns obtain in a pair of related languages suggests that the anomalous pairing of [s] and [ɲ] with respect to nasal substitution is stable, despite being an abstract linguistic relation. The pairing appears to have resisted pressure to move towards a concrete relation that would, over time, serve to regularize the pattern. Archangeli et al. (2012) find the same sort of stability in Bantu vowel harmony: Bantu height harmony occurs primarily in verbs, where sequences of mid vowels are preferred to mid-high sequences, with the exception of the permitted sequence [e…u]. The same pattern is found in nouns to a lesser degree—with the exception that even [e…u] occurs less often than expected. The difference is that in verbs, the [e…u] sequence arises across a morpheme boundary, and so any changes in this sequence disrupt an entire morphological paradigm while with the nouns, gradual item-by-item change is possible. To recast Archangeli et al. (2012) in terms of the current discussion, every nasal substitution form relating [s] and [ɲ] constitutes pressure against reanalyzing any single instance relating [s] and [ɲ]. Such pressure predicts that anomalous patterns that enter a morphophonological relation (through whatever means) are likely to remain in a language.
In addition to finding a lack of within-paradigm regularity in our study (the concrete-homorganic hypothesis in Table 3d–f), we also find little evidence of even partial assimilation. Although [s] is clearly articulated in the dental/alveolar region with the tongue tip/blade, we do not observe articulations of its nasal counterpart ([Ns]) that are more similar to that of [s] than would be expected from a palatal nasal sound. Comparisons between [Ns] and [ɲ] result in similar RMSD values (or smaller, in the case of Javanese) than comparisons within [ɲ] contours alone, indicating that [Ns] is no different from [ɲ] than other productions of [ɲ]. Thus, it is not the case that [s] as a derivational source of [Ns] influences that articulation of its nasal substitution counterpart with its anterior, apical/laminal articulation, and in fact the RMSD values between [s] and its nasal counterpart [Ns] are among the largest values observed in the entire data set for each language.
This work uses RMSD calculations to determine whether two sounds are homorganic or heterorganic in articulatory place, giving a quantified comparison of tongue-contour data from ultrasound images. RMSDs provide individual numerical values to represent the magnitudes of spatial difference within each contour comparison, and these values can be used for further comparison. Moreover, when coupled with a mixed-effects linear regression model, RSMD values from multiple talkers can be compared within a single analysis by treating Speaker as a random effect. This kind of comparison enables us to identify patterns that hold generally across speakers of a given language. Statistical comparisons of RMSD values across test conditions of interest allow for an evaluation of similarity or difference between those conditions, adding to the similarity tools proposed in Mielke (2012).
To conclude, this study has shown that there is an abstract morphophonological relation in the nasal substitution paradigm in both Sasak and Javanese. Our analysis of ultrasound images supports [s/ɲ] nasal substitution as an abstract relation within what appeared to be an otherwise general and concrete homorganic pattern: We show that in both languages, [s] is articulated quite similarly to the dental/alveolar sounds and that the morphologically-related [Ns] is indistinguishable from [ɲ]. Further examination of the data shows that the voiceless affricate is alveolo-palatal [ʨ], which corresponds to palatal [ɲ] in the same paradigm, again an abstract relation. Thus, the pattern is not as homorganic as impressionistic analysis suggests. We suggest that the pattern has resisted regularization because it is a robust morphophonological relation, held in place by morphological paradigm pressure.
Finally, we introduce the RMSD/LMER method for comparing tongue contours, pooling comparisons from multiple subjects in order to better understand general patterns within each language.