Elsevier

Journal of Phonetics

Volume 52, September 2015, Pages 152-169
Journal of Phonetics

Research Article
Development of phonetic variants (allophones) in 2-year-olds learning American English: A study of alveolar stop /t, d/ codas

https://doi.org/10.1016/j.wocn.2015.06.003Get rights and content

Highlights

  • We examined three phonetic variants (unreleased, flapped, glottalized) of /t, d/.

  • English-learning 2-year-olds produced the three variants less often than adults.

  • This suggests that it takes time for children to learn to produce phonetic variants.

Abstract

This study examined the emergence of the phonetic variants (often called allophones) of alveolar phonemes in the speech production of 2-year-olds. Our specific question was: Does the child start by producing a “canonical” form of a phoneme (e.g., /t/ with a clear closure and a release burst), only later learning to produce its other phonetic variants (e.g., unreleased stop, flap, and glottal stop)? Or, does the child start by producing the appropriate phonetic variants in the appropriate contexts and only later learn that they are phonetic variants of the same phoneme? In order to address this question, we investigated the production of three phonetic variants (unreleased stop, flap, and glottal stop) of the alveolar stop codas /t, d/ in the spontaneous speech of 6 American-English-speaking mother–child dyads, using both acoustic and perceptual coding. The results showed that 2-year-old children produced all three variants significantly less often than their mothers, and produced acoustic cues to canonical /t, d/ more often. This supports the view that young children start out by producing a fully articulated canonical variant of a phoneme in contexts where an adult would produce non-canonical forms. The implications of these findings for early phonological representations are discussed.

Introduction

One of the central questions in child language development is what early phonological representations look like and how these develop over time. Because sounds are the building blocks of words, this question is crucial to understanding the emergence and development of words in children. The purpose of the present study was to shed light on the nature and the organization of early phonological representations by examining 2-year-olds' production of the phonetic variants (sometimes called allophones) of phonemes.1

It is generally agreed by adult native speakers of English that there is a certain relationship between the underlined sounds in words like hat, writer, and fountain, i.e., that they are all variants of the same sound, represented here by the same orthographic letter. In linguistic terms, they are all phonetic implementations of the phonological category of voiceless alveolar stop consonant /t/, which can be released or unreleased, flapped or not flapped, or glottalized or not glottalized. For children, however, this relationship might not be obvious in the early stages of language learning; instead it might be something that is acquired later during the course of language development. Furthermore, although some phonetic variants can be in free variation, the choice of variant is often dependent on its environment. Therefore, in order to sound natural while speaking, an individual must learn the correct use of variants in the appropriate phonological contexts. These observations pose an important question: During development, when are the phonetic variants of a phoneme acquired and used appropriately?

There is only limited information about the development of phonetic variants in children, as most studies to date have focused on children's production and perception at the phonemic level, rather than at the phonetic level. Although there is a substantial body of literature examining the characteristics of allophones in adults' speech production, there are only a handful of studies exploring their use in young children. Therefore, in the present study we aimed to examine the use of phonetic variants in children around 2 years of age, focusing on the production of the alveolar stop codas /t, d/. In general, the realization of coda stops tends to vary much more compared to onset stops. We chose to examine alveolar stop codas because these are especially rich in phonetic variants in English. The allophones that we investigated are: the unreleased stop [t̚], [d̚], the flap [ɾ], and the glottal stop [ʔ]. In the present study, we focused on the following specific questions: When do children first start producing these three phonetic variants of alveolar stop codas? Do children begin by producing a “canonical” variant of a phoneme (e.g., /t/ with a clear closure and a release) and only later learn to produce the various phonetic variants in their appropriate contexts? Or do they start by producing various other non-canonical phonetic variants and only later learn how these are related to a particular phoneme? To address these questions, we examined whether children's production of phonetic variants differs from that of adults, i.e., How often are the different variants used and in what way does the acoustic shape of children's productions differ from that of adults? To give a preview, the 2-year-olds in the present study produced all three phonetic variants of alveolar stop codas less frequently than adults. In the discussion, we explore the implications of these findings for the nature of early phonological representations, especially considering children's developing articulatory skills and the role of speech input to children.

The rest of this introduction consists of two parts. In the first part, we review the literature on the characteristics of phonetic variants of alveolar stops in adult speech. Once we know what children are targeting, we will be better equipped to understand the path of phonetic development. In the second part of the introduction, we review the existing studies on the development of phonetic variants in children.

However, before we proceed, we would like to clarify one point. Although we will be using terms like released/unreleased, flapped/unflapped, and glottalized/unglottalized throughout the paper, we believe that phonetic variants are not binary or contrastive in terms of articulation and acoustics. Rather, we believe that these should be understood in terms of degree or amount of release, flap and glottalization, although they could be perceived categorically by the listener. For example, even when the listener judges that an alveolar stop /t/ was produced as unreleased [t̚], it could be that /t/ is not fully unreleased articulatorily. Therefore, we suggest that phonetic variants should not necessarily be thought of as belonging to one category vs. another category in terms of articulation and acoustics, but rather as at some point along a continuum of acoustic cues to the features of that phonemic category. To reflect this, we employed acoustic measurements in our analysis when determining the presence of a phonetic variant, conducting perceptual judgments of phonetic variants only when necessary.

In this section, we review the literature on the articulatory, acoustic, and perceptual characteristics of different variants of the alveolar stops /t, d/ in adult speech. We primarily discuss three variants: unreleased stops, flaps, and glottalized stops, as they are the focus of our study. According to Zue and Laferriere (1979), the articulatory gestures used in the production of a flap involve the tongue tip making brief contact with the alveolar ridge, followed by an immediate release. The tongue tip can either make an up-and-down movement or a front-and-back movement to touch the alveolar ridge, and the closure can be complete or partial. These articulatory movements produce a wide variety of acoustic realizations. For example, in the case of partial closure, sometimes noise will be generated at the constriction, resulting in a voiced fricative. Flaps are generally quite short in duration, often with little or no sign of a release burst. Also, flaps are more likely to occur within a word if the vowel following the phonological stop is reduced or unstressed, as in butter (although flapping can occur across a word boundary in other stress contexts, as in knit a lot; see Fukaya & Byrd, 2005).

Byrd (1993) provides a broad picture of the characteristics of American English stops using data from the TIMIT database, which includes sentences read by 630 American speakers from a range of geographical locations. The data set included 54,384 oral and nasal stops, affricates, oral and nasal flaps, and glottal stops, which Byrd described and analyzed for frequency of occurrence, mean segmental durations, voice onset times, and certain effects of voicing, place, word position, and speaker gender. Concerning the characteristics of oral stops, 24,414 closures and 21,847 releases were found in the TIMIT transcription. In sentence-final position, where the environment (i.e., a following silence) was more controlled than in sentence-medial position, bilabial stops were released 49.5% of the time, alveolar stops 57% of the time, and velar stops 83.11% of the time. Also transcribed in the TIMIT database were flaps, which were classified as oral or nasal. A total of 4980 flaps occurred: 3649 oral flaps (e.g., water) and 1331 nasal flaps (e.g., suit in). Mean duration of flaps was 29 ms, and there was no difference in mean duration between oral and nasal flaps. As for glottal stops, there were 4834 in the TIMIT database. Word-initial glottal stops made up 49% of the total, word-medial glottal stops constituted 6%, word-final glottal stops constituted 16%, and unaffiliated glottal stops (glottal stops occurring between two vowels at a word boundary) were 29% of the total.

De Jong (1998) provides an investigation into the nature of flapping of /t, d/ in American English. In particular, he evaluated two models of flapping: the traditional model in which flapping is considered as a categorical switch from stop to flap in a specified linguistic environment, and another model in which flapping is produced as a by-product of articulatory changes related to the prosodic structure. To address the issue, the data were collected from three speakers using the X-ray microbeam systems, which tracked the location and motion of radio-opaque pellets attached to the articulators during speech. De Jong (1998) showed several acoustic correlates that distinguished the perceptually transcribed flaps from the tokens transcribed as stops. In general, flaps had shorter occlusion durations, shorter voice onset times and a greater percentage of voicing during closure. In addition, flaps were similar to /d/ in that both had longer preceding vowel durations than /t/, and also similar to /t/ in that both had smaller changes in F2 than /d/. Articulatory measures showed that the tongue body was more retracted during flaps than during stops. In terms of the location and movement of the tongue tip/blade, flaps were more similar to /d/.

De Jong's (1998) results showed that all four transcribers were quite consistent in transcribing the presence/absence of flaps. This suggested that flapping across a word boundary might be a perceptually categorical phenomenon. However, for speakers, production of flaps appeared to be optional or inconsistent across a word boundary. These findings suggest a model in which a gradient articulatory change results in quantized acoustic outputs, giving rise to the consistent transcriptions of either flaps or stops. This challenges both the categorical rule account and the prosodic by-product account. If this is the case, from a speaker's point of view, there is no need for a rule that demands a specific production of a flap before an unstressed vowel. Rather, what is important for the speaker is to understand the segmental and prosodic conventions of the language sufficiently to know when a salient consonant release is needed.

Further acoustic and perceptual analyses of flaps were carried out by Herd, Jongman, and Sereno (2010), who examined whether /t/ and /d/ are neutralized in flapping environments. When the authors examined the distribution of /t/ or /d/ duration, which included both closure and release burst, they found a clear separation of durations between flapped and fully articulated stops for all speakers; consistent with previous studies, flaps were shorter than fully articulated stops. Therefore, they were able to determine within speaker cut-off values for flaps; tokens below the cut-off value were considered flapped and those above were considered unflapped. The average flap cut-off value across 20 speakers was 56 ms, with a range of 43–69 ms. The average flap duration was 32 ms, with a range of 24 to 41 ms. When this method was applied, word-medial /t/ and /d/ were found to be flapped 76% and 99% of the time, respectively.

One of the three phonetic variants we examined was glottalization. We review some of the studies of the acoustic and perceptual characteristics of glottalization below. Acoustically, glottalization is often characterized by irregularly spaced pitch periods and characteristics such as low amplitude and low F0 and breathiness (Pierrehumbert & Talkin, 1992). The frequency of occurrence of glottalization is known to be affected by various factors including the position of the word within the utterance and intonational phrase (Pierrehumbert and Talkin, 1992, Dilley et al., 1996), gender (Byrd, 1994), and dialect (Docherty & Foulkes, 1995).

To understand the factors and articulatory mechanisms involved in glottalization, Redi and Shattuck-Hufnagel (2001) examined glottalizations at phrase boundaries in both medial and final positions within an utterance. Of the two corpora they examined, the first, called the Labnews, consisted of read speech produced by 6 professional radio announcers (from the BU FM news corpus) and 4 non-professionals, all native speakers of American English. The second corpus, called the ABC corpus, included read speech produced by four non-professional speakers. In both corpora, the frequency of occurrence of glottalization differed greatly from individual to individual. Also, not all speakers produced the same acoustic characteristics, and the results were inconclusive about whether males or females produced more glottalizations. On the other hand, the frequency of occurrence of glottalization was significantly affected by position within the utterance; speakers produced glottalizations more frequently in utterance-final positions compared to utterance-medial positions.

As the above review indicates, the alveolar stop codas in English are rich in phonetic variants, each with its own characteristic acoustic pattern. At the same time, the review of the literature suggests that, despite numerous studies on flap production, it is challenging to fully characterize flaps (for relevant discussion, see Herd et al., 2010). In the next section, we review the literature on the development of phonetic variants in young children, where much less is known.

To our knowledge, there are only a few studies that have examined children's speech production at the level of phonetic variants. Klein, Altman, and Tate (1998) examined how closure duration in young children's speech affects the adult listener's perception of flap. To this end, 34 adults listened to audio recordings of two children, a male and female, ages between 30 and 48 months. The target words, which contained medial /t, d/ in flap contexts (e.g., water, feeding), were taken from a series of monthly recordings made for a longitudinal study on the acquisition of /t, d/ allophones. The tokens were chosen based on closure duration, and were placed in one of five successive closure-duration categories increasing by 20 ms at each step; tokens in these categories had closures of 3–17 ms, 23–37 ms, 43–57 ms, 63–77 ms, and 83–97 ms, respectively.

The results showed that tokens were less likely to be judged to be flaps by adult listeners as closure duration increased. Flap duration of 3–17 ms produced the greatest percentage of flap judgments (at 75%); 23–37 ms ranked second (at 61%); 43–57 ms and 63–77 ms were perceived as flaps 50% of the time, and 83–97 ms had the lowest percentage of flap judgments (at 32%). In sum, children's medial /t, d/ productions were most often judged as flaps when their closure durations were at or under 37 ms. This is consistent with the range of closure duration for adult flaps from Zue and Laferriere (1979), 10–40 ms, and suggests that a listener is most likely to identify an alveolar stop as a flap when its closure duration is in this range, regardless of the age of the speaker. However, it is interesting to note that not all tokens in the shortest duration category were perceived as flaps; likewise, not all tokens in the longest duration category were judged as being stop-like /t, d/s. Therefore, there might be other acoustic parameters playing a role in addition to closure duration.

Although Klein et al. (1998) showed the relationship between the production and perception of children's flaps, they did not address at what age children develop flaps. In a subsequent speech production study, Klein and Altman (2002) carried out a longitudinal study to examine the acquisition of medial /t, d/ allophones in bisyllabic words in 4 typically-developing children. For analysis purpose, the longitudinal data were consolidated into three sessions: (1) under 36 months, (2) from 36 to 47 months, (3) from 48 to 60 months. The allophones of /t, d/ examined included flaps (as in ladder), laterally-released forms (as in bottle), and nasally-released forms (as in button). Because the authors wanted to examine how the various phonetic and prosodic contexts affected the production of flaps, the flap contexts were further divided into three groups: before ‘y’ (as in kitty), before ‘er’ (as in butter) and before ‘ing’ (as in eating).

The phonetic transcriptions of the productions by the parents of the children revealed that the parents consistently produced the flap, and used both lateral release and nasal release. On the other hand, children's speech productions seldom exhibited lateral and nasal release, but importantly, flaps increased by about 15% with each additional session. By the last session (between 48 and 60 months), the children produced flaps in the adult flapping context about 50% of the time. In the flapping environment in children's productions, a ‘-ty’ ending facilitated flap production most strongly, while ‘-ding’ was found to be the least facilitative. Overall, there was much variability in the 2–5-year-olds' use of variants for /t, d/, suggesting that there are many factors that determine the rate and progress toward adult-like context-appropriate medial /t, d/ in young children.

Rimac and Smith (1984) compared word productions from 8 children (around 8 years of age) and 8 adults with the aim of examining the relationship between children's and adults' speech segment durations. Consistent with previous developmental studies of speech timing, they found that the 8-year-olds' segment durations were longer than those of adults, including both stressed vowels and flaps. Overall, Rimac and Smith (1984) found that acquisition of the flap is a gradual process. The authors suggest that this delay could be due to the fast movement of the tongue involved in the production of a flap, since it is known that children generally speak more slowly than adults. However, as they point out, it could also be attributed to non-motoric factors; for example, children may be unaware of the phonetic contexts in which flaps are typically produced. In sum, both Rimac and Smith (1984) and Klein and Altman (2002) suggest that children around 5–8 years of age are still learning to produce flaps in an adult-like manner.

In a study of acoustic cues to coda contrasts, Song, Demuth, and Shattuck-Hufnagel (2012) showed that English-learning children as young as 1;6 produced many adult-like cues to voicing (voiced vs. voiceless) and place (alveolar vs. velar) contrasts in coda stops. At the same time, there were two aspects of the children's production of these cues that were generally different from adults. Overall, 1;6-year-olds had more frequent releases of stops than adults. In addition, these children did not produce glottalization at the end of the vowel before the coda stop as often as adults, especially in sentence-medial position. These findings suggest that a canonical variant of the coda stop (e.g., /t/ with a clear closure and a release) may be more common than non-canonical variants in young children's speech production.

In addition to these studies of the acquisition of phonetic variants in speech production, evidence from some infant studies shows that awareness of phonetic variants in perception might be acquired early in life, well before their production is mastered. For example, Hohne and Jusczyk (1994) showed that two-month-olds were able to discriminate pairs like nitrate and night rate, which differ in phonetic realizations of /t/ and /ɹ/. This suggests that infants at this age are able to discriminate phonetic differences that could provide information about word boundaries. In a subsequent study, Jusczyk, Hohne and Bauman (1999) further asked when infants develop sensitivity to the way that variants are systematically distributed within words, and when they begin to use these cues to segment words from fluent speech. The results showed that by 10.5 months of age, infants were able to take advantage of what the authors called allophonic cues to word boundaries in recognizing nitrates vs. night rates in fluent speech. The authors argue that, compared to other types of cues to segmentation that are utilized earlier in development (Newsome & Jusczyk, 1995; Saffran, Aslin, & Newport, 1996), the acquisition of allophonic cues might be later because the infant learner needs to be exposed to a sufficient number of instances of words before being able to learn the mapping between context-governed variants and the contexts in which they appear. More recently, there has been increasing interest in infants' ability to learn about allophonic variation on the basis of distributional information (Peperkamp and Dupoux, 2002, Peperkamp et al., 2003, White et al., 2008). These studies suggest that by the end of their first year of life, infants are sensitive to complementary distribution where phonological alternations occur, and can learn patterns of phonological alternations on the basis of the distributional cues in the input. This ability might be a prerequisite for producing context-appropriate phonetic variants.

Although these studies on the development of speech perception suggest that infants below one year of age might be sensitive to the distribution of phonetic variants in the input, to date only limited information is available on the development of phonetic variants in young children's speech production. Therefore, in the current study, we aimed to expand the existing literature in several directions. First, in contrast to previous speech production studies in children, which have primarily focused on the production of flaps, we investigated the production of three different variants for /t, d/: unreleased stops, flaps, and glottalized stops. By examining more than one kind of variant, we hoped to be able to draw more general conclusions about the path of development of phonetic variants in young children. Second, we examined the use of phonetic variants in children between 1;6-2;6 years of age, an age group that is younger than that found in most early speech production studies. Children produce their first words around one year of age (MacNeilage & Davis, 1990), and around 1;6 years of age, many children gain speed in word learning, a shift in rate of vocabulary development which is often cited as the ‘vocabulary spurt’ (Goldfield & Reznick, 1990). Therefore, examining children in the age range 1;6–2;6 could provide valuable information on the initial processes of phonological development. Finally, we combined transcriptional methods and acoustic measures to compare the use of phonetic variants in children and adults, rather than relying on phonetic transcriptions alone.

Existing literature suggests that production of some variants of /t, d/, such as flaps, is articulatorily challenging for children; moreover, acquisition of adult skill with these variants also involves learning the phonetic and prosodic environments in which they occur. Therefore, we predicted that children would exhibit a late acquisition of the three phonetic variants, showing non-adult-like acoustic patterns at the beginning of the development process.

Section snippets

Database

The data examined in this study came from the Providence Corpus (Demuth, Culbertson, & Alter, 2006), a collection of spontaneous speech interactions between six mother–child pairs from the New England area (for further information and access to the corpus, see the Child Language Data Exchange System [CHILDES: http://childes.psy.cmu.edu/]). All six children (three boys, three girls) were typically developing, monolingual speakers of American English. Digital audio/video recordings were collected

Results

In the following three sets of analyses, we compared the frequency of occurrence of three non-canonical phonetic variants of /t, d/ (i.e., unreleased stop, flap, glottalized stop) between 2-year-olds and adults (mothers). Before we examine the results, we will first discuss the tokens labeled as no coda. Because these tokens were determined to have no acoustic or perceptual evidence of a realized coda, they were left out in the main analyses where we examined the use of variants in children's

Discussion

The goal of this study was to examine 2-year-olds' production of three phonetic variants of the alveolar coda stops /t, d/ (i.e., unreleased stop, flap, and glottalized stop). In particular, we were interested to know whether children would start by producing the more canonical form of the phoneme (i.e., /t, d/ with a clear closure and a release), or were able to produce its other phonetic variants from early on. The results from this study collectively suggest that young children produce the

Acknowledgments

We thank the children and their mothers for their participation. We also thank the members of the Child Language Lab at Brown University and the Phonetics Lab at the University of Wisconsin-Milwaukee (especially Elizabeth Alberswerth, Alana Dust, and Kacy Kreger) for research assistance. This work was supported by NIH Grant R01HD057606 to Demuth and Shattuck-Hufnagel.

Appendix A. Number of tokens broken up by participant and word type

Appendix B. Results broken up participant and word type

References (55)

  • Bates, D. M., Maechler, M., & Bolker, B. (2012). lme4: Linear mixed-effects models using S4 classes. R package version...
  • R.H. Baayen

    Analyzing linguistic data: A practical introduction to statistics

    (2008)
  • N. Bernstein Ratner

    Phonological rule usage in mother–child speech

    Journal of Phonetics

    (1984)
  • M.E. Beckman et al.

    Articulatory evidence for differentiating stress categories

  • J.E. Bernthal et al.

    Intraoral air pressure during the production of /p/ and /b/ by children, youths, and adults

    Journal of Speech and Hearing Research

    (1978)
  • D. Burnham et al.

    What's new pussycat? On talking to animals and babies

    Science

    (2002)
  • D. Bolinger

    Contrastive accent and contrastive stress

    Language

    (1961)
  • D. Byrd

    54,000 American stops

    UCLA Working Papers in Phonetics

    (1993)
  • K. Demuth

    Markedness and the development of prosodic structure

  • K. Demuth et al.

    Word-minimality, epenthesis and coda licensing in the early acquisition of English

    Language and Speech

    (2006)
  • K. Demuth et al.

    The prosodic (re)organization of children's early English articles

    Journal of Child Language

    (2009)
  • L.C. Dilley et al.

    Acoustic-phonetic variation in word-final alveolar consonants in infant-directed speech over the first two years

    Journal of Child Language

    (2014)
  • Docherty, G. J., & Foulkes, P. (1995). Acoustic profiling of glottal and glottalised variants of English stops. In...
  • A. Fernald et al.

    Expanded intonation contours in mothers' speech to newborns

    Developmental Psychology

    (1984)
  • P. Fikkert

    On the acquisition of prosodic structure

    (1994)
  • T. Fukaya et al.

    An articulatory examination of word-final flapping at phrase edges and interiors

    Journal of the International Phonetic Association

    (2005)
  • B.A. Goldfield et al.

    Early lexical acquisition: Rate, content, and the vocabulary spurt

    Journal of Child Language

    (1990)
  • Cited by (10)

    • Simultaneous bilingualism and speech style as predictors of variation in allophone production: Evidence from Finland-Swedish

      2021, Journal of Phonetics
      Citation Excerpt :

      Another potential explanation for cross-linguistic interference in the case of the mid front rounded allophones in Finland-Swedish relates to the nature of the sounds in question. In a study on American English-speaking children’s allophonic production, Song et al. (2015) examined the use of “canonical” forms of a phoneme versus its phonetic variants in the speech of 2-year-olds. The study showed that children produced the non-canonical phonetic variants less of often than adults, despite receiving input containing both canonical and non-canonical forms.

    • Development of allophonic realization until adolescence: A production study of the affricate-fricative variation of /z/ among Japanese children

      2022, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    • Children’s Acquisition of Morphosyntactic Variation

      2022, Language Learning and Development
    View all citing articles on Scopus
    View full text