Elsevier

Journal of Phonetics

Volume 40, Issue 6, November 2012, Pages 745-763
Journal of Phonetics

The tongue in speech and feeding: Comparative articulatory modelling

https://doi.org/10.1016/j.wocn.2012.08.001Get rights and content

Abstract

Two major functions of the human vocal tract are feeding and speaking. As, ontogenetically and phylogenetically, feeding tasks precede speaking tasks, it has been hypothesised that the skilled movements of the orofacial articulators specific to speech may have evolved from feeding functions. Our study explores this hypothesis by proposing an original methodological approach. Vocal tract articulatory measurements on two male subjects have been recorded for speech and feeding by electromagnetic articulography. Two guided Principal Component Analysis (PCA) articulatory models of the jaw/tongue system have been built for speech and feeding tasks. The two articulatory models show similar reconstruction accuracy. The speech and feeding articulations have been reconstructed respectively from feeding and speech raw PCA models. Root mean square reconstruction errors show better capacity of the feeding model to be generalised to the other set of articulations than the speech model. Our study suggests therefore that the tested hypothesis cannot be excluded on articulatory grounds for our two cases and brings a new methodology into the discussion of the ontogenetic and phylogenetic origins of speech.

Highlights

► A methodology for testing whether speech movements evolve from feeding is proposed. ► Two articulatory models of the tongue for speech and feeding tasks have been built. ► The feeding model covers a wider range of articulations than the speech model. ► Results suggest that the hypothesis cannot be ruled out on articulatory grounds.

Introduction

Of all living species, human beings are the only ones that communicate through speech. The origin of speech, however, still remains largely unexplained although a large number of studies have been devoted to the evolution of speech production in humans (see e.g. MacNeilage (2008) for a review).

Three main functions can be ascribed to the human vocal tract: breathing, feeding and speaking. Whilst breathing does not require complex vocal tract articulation, the tasks of feeding and speaking are associated with more involved articulatory movements requiring significant coordination. Noting that both ontogenetically and phylogenetically feeding tasks precede speaking tasks, MacNeilage (1998) suggests in his paper on the frame/content theory of the evolution of speech production, that the “articulatory cyclicity of speech evolves from ingestive cyclicities”. He describes speech as open–close mandibular alternations, which start at babbling onset, and whose cyclicity may have evolved from jaw cyclicity in feeding. Note, however, that this theory has been subject to debate and alternative origins have been proposed for the articulatory cyclicity of the jaw, such as call vocalisation (e.g. Andrew, 1998, Jürgens, 1998) or modulation of acoustic parameters (Ohala, 1998) for example.

Following MacNeilage's theory and noting furthermore that feeding is a more fundamental activity than speech, Hiiemae et al. (2002) and Hiiemae and Palmer (2003) hypothesize that the tongue movements during speech may be “derived from the wide variety of tongue movements found in suckling and feeding”. Generalising this hypothesis, they explicitly suggest in their paper that the movements of speech might be a subset of those used in feeding (Hiiemae & Palmer, 2003).

Within this context, we present in this paper a first comparative modelling articulatory study of the movements of the tongue in speech and feeding. Our research aims to provide an original methodological framework to tackle this question from an articulatory modelling point of view. As detailed later, the articulator movements are characterised by way of articulatory modelling of the vocal tract for each task under study, i.e. speech and feeding. Note that this paper is a greatly expanded version of Serrurier, Barney, Badin, Boë, and Savariaux (2008). Note also that the question of the motor control system in speech vs. feeding and its implication for evolution will decidedly not be considered in this study.

To set the scene for our study we summarise for the reader in the next two sections the movements of the vocal tract in speech and feeding and review the literature dedicated to the comparison between these movements. The following section is dedicated to the various measurement techniques adopted in feeding studies. The last section reviews articulatory modelling approaches and explains our motivations for such an approach for this study.

For a century, prototypical vowel and consonant tongue positions in speech have been widely studied and the range of shapes which can be voluntarily articulated has already been very largely documented. These positions have been used to establish the International Phonetic Alphabet (IPA Handbook, 1999) with places of articulation which extend – concerning the tongue articulator – from the incisors to the glottis via the hard palate, the velum, the pharynx and the larynx. The reader of the Journal of Phonetics can refer to phonetic text-books (cf. Catford, 1977, Ladefoged, 2001, Ladefoged and Maddieson, 1996, Laver, 1994) for detailed and systematic descriptions of the articulations in the world's languages. The variety of movements between target phonemes would take too long to describe, but various articulatory studies of the tongue have consistently reported that four to six independent articulatory components are necessary and sufficient to reproduce most of the phonemes, at least for French, English or German (see e.g. Badin and Serrurier, 2006b, Badin et al., 2002, Harshman et al., 1977, Hoole, 1999, Maeda, 1990). The degrees of freedom associated with the jaw position (open–closed), the tongue body position (back–front), the tongue dorsal shape (bunched–flattened) and the tongue tip position (raised–lowered) are generally acknowledged for western European languages. Other languages such as click languages may involve very different articulations beyond the scope of this study.

In contrast to speech, as emphasized by Hiiemae and Palmer (2003), the literature has often reported tongue movements in feeding in terms of continuous patterns of swallowing. Unlike speech, the swallowing process follows a regular path from the collection of the food at the lips to the passage of the food into the oesophagus. Many descriptions of the swallowing process exist in the literature. The reader can refer for instance to Logemann (1998) or Hiiemae and Palmer (2003) for a detailed description. The swallowing process is usually divided into three main phases: the oral phase, the pharyngeal phase and the oesophageal phase. The oral phase can be further sub-divided into two phases, the oral preparatory phase and the oral transport phase.

The oral preparatory phase consists of opening the mouth to collect the food, bringing food into the mouth, masticating and preparing a bolus ready to be transported towards the pharynx. For this, coordination of various extrinsic and intrinsic muscles of the lips, the jaw, the tongue and the cheeks is required. After several mastication cycles where the food is crushed, little by little, into small pieces and mixed with saliva, a bolus is formed in the middle of the mouth between the tongue and the hard palate. Viewed in the midsagittal plane the bolus sits in a depression in the tongue at the end of this stage. Note that this phase requires lateral movements of the tongue also.

The oral transport phase starts when the bolus is compressed by the tongue against the hard palate. The bolus is then transported towards the pharynx under the action of the tongue. Logemann (1998) describes this phase as “an anterior to posterior rolling action of the midline of the tongue, with tongue elevation progressing sequentially more posteriorly to push the bolus backward” (p. 27). In addition, Wilson and Green (2006) report that the lingual propulsion requires a significant degree of coordination and functional independence among biomechanically coupled regions of the tongue and is characterized by the sequential elevation of the anterior, middle, and dorsal regions, respectively, and refer to this movement as a “propulsive wave”. A similar description was made by Green and Wang (2003) who reported that the tongue motion during swallowing propagates in a wavelike manner from apex to dorsum. The oral phase is generally considered as a voluntary motion throughout.

The reflex of swallowing starts with the pharyngeal phase. The velum rises until it makes contact with the pharyngeal walls to avoid any food entering the nasal passages and the larynx gets closed to prevent food entering the trachea. The bolus is passed to the oesophagus by means of a sequence of constrictions and relaxations involving the base of the tongue, the pharyngeal walls, the larynx and the oesophagus walls. This phase might still affect to some extent the shape of the tongue in the mouth. Finally, the food is transported through the oesophagus towards the stomach during the oesophageal phase.

We might deduce that if the patterns of swallowing are similar across swallows and subjects, the complex underlying mechanisms involved may be described by a limited number of articulatory degrees of freedom for the tongue in feeding.

Regarding the jaw, an early comparative study was carried out by Gibbs and Messerman (1972). Recording the six degrees of freedom of the jaw during speech and feeding tasks, they observed overall less jaw motion during speech than during chewing. These results were confirmed by Ostry and Munhall (1994) by means of X-ray microbeam measurements. An extended study was carried out by Ostry, Vatikiotis-Bateson, and Gribble (1997) who recorded the six degrees of freedom of the jaw during speech and feeding tasks by means of an Optotrak and observed that in both behaviours there was evidence of independent motion in pitch and yaw (rotations around the left–right and the top–bottom axes) and horizontal and vertical positions, supporting the idea that motions in these degrees of freedom are independently controlled.

Regarding the tongue, only a small number of studies has focused on articulatory comparisons. As far as we know, the most significant study has been led by Hiiemae and colleagues. Hiiemae et al. (2002) recorded 10 subjects by videofluorography during speaking and feeding. They concluded that two markers attached to the tongue in the midsagittal plane represented a speech spatial domain enclosed within a larger feeding domain. More precisely, the jaw and tongue marker movements in speech occurred within the sagittal domains used for feeding, though they noted that the hyoid domains were significantly different (Hiiemae & Palmer, 2003). Hiiemae and Palmer (2003) attempted to extend this study of spatial domains in speech and feeding to a comparison of the tongue movements in both tasks based on a detailed literature review. Martin (1991) led a very comprehensive comparative study of the tongue movements in swallowing and speech production based on X-ray microbeam measurements. A kinematic analysis of the data led her to claim (p. 314) that “the finding of similarity between certain elemental movements in speech and swallowing could be taken as support for the view that speech and swallowing involve similar oral movement components”, though she also suggested that “the kinematic similarities between speech and swallow-related movements may have been based on their shared biomechanical constraints” (Martin, 1991, p. 315) and stated that her “results suggest that the mechanisms underlying the temporal activation of different regions of the lingual surface are fundamentally different in speech and swallowing” (Martin, 1991, p. 315). In the context of ageing, Bennett, Van Lieshout, and Steele (2007) compared the dynamics of tongue movements in speech and feeding and suggested slower and more variable movements for swallowing than speaking. Note finally the work of Akgul, Kambhamettu, and Stone (1998) which shows various tongue shapes obtained for speech and feeding by means of ultrasound in their attempt to develop an automatic segmentation of the tongue contour.

Early studies on feeding were often carried out, for clinical purposes, by means of X-ray techniques available in the medical environment (see e.g. Abd-El-Malek, 1955, Mosher, 1927, Whillis, 1956). As reported by Abd-El-Malek (1955), these studies describe the tongue movements inside the vocal tract while feeding, although “it is clear that these complex movements are not yet fully understood”. Surprisingly, however, although the interest in such analysis and descriptions of the human vocal tract seems to have greatly increased in the last 10 years, most of the studies prior to that have been dedicated to the analysis of the jaw and tongue movements for nonhuman primates (see e.g. Hiiemae, Hayenga, & Reese, 1995). Several more recent studies have focused on the link between the jaw and the tongue (see Hiiemae and Palmer, 2001, Palmer et al., 1997, Steele and Van Lieshout, 2008) or between the jaw and the velum (Matsuo, Hiiemae, & Palmer, 2005) and the results have shown that the jaw movement is often closely related to the tongue and velum movements, but not systematically so. Recent studies devoted to the movements of the tongue in feeding have been performed by tracking 3 or 4 pellets glued on the tongue in the midsagittal plane by means of X-ray microbeam technology (see Green and Wang, 2003, Tasko et al., 2002, Wilson and Green, 2006) or electromagnetic articulography (see Steele & Van Lieshout, 2004). These studies describe the overall patterns of swallowing, with Wilson and Green (2006) suggesting that the movement patterns of the anterior tongue regions are distinct from those of the posterior tongue regions, in general agreement with Green and Wang (2003). Note finally the electromyographic study of Napadow, Chen, Wedeen, and Gilbert (1999) for swallowing. The reader can refer to Hiiemae and Palmer (2003), Wilson and Green (2006), or Steele and Van Lieshout (2008) for additional review of the literature on this topic.

Articulatory studies in feeding have been mostly focused on the biomechanics of the swallowing (see e.g. Crary et al., 2006, Dang et al., 2009, Felton et al., 2007, Napadow et al., 1999) or chewing mechanism (Hannam, Stavness, Lloyd, & Fels, 2008). In speech, unlike in feeding, an abundant literature has been devoted to the study of articulation for more than 50 years. In particular, in order to understand, evaluate and imitate the movements of the speech articulators, various models have been developed. In geometrical models, the shapes of the vocal tract midsagittal contours are represented by means of simple geometrical shapes (see e.g. Cohen et al., 1998, Coker, 1967, Mermelstein, 1973). In biomechanical models, the shape and movements of the articulators result from modelling the actions of the muscles and their control (see e.g. Dang and Honda, 2004, Gérard et al., 2003, Payan and Perrier, 1997). Finally, in functional, data-based, linear articulatory models, the shapes and positions of the articulators are modelled as weighted sums of a small number of principal shapes extracted through various statistical analyses (see e.g. Badin et al., 2002, Beautemps et al., 2001, Harshman et al., 1977, Maeda, 1990, Serrurier and Badin, 2008).

These studies have focused on either speech or feeding, but have never proposed any systematic comparison between the two types of task. Within the framework of speech evolution research, our objective is therefore to measure and compare the movements of the articulators in speech and feeding, based on the development of linear models of the articulators for each of the two tasks. Based on a functional approach (see e.g. Badin et al., 2002), modelling allows extraction from the data of the various independent movements of the vocal tract articulators produced during each task and evaluation of their contribution to the overall task. This linear articulatory modelling approach is above all motivated by the fact that it allows representation of the raw data by weighted linear combinations of a small number of components, though at the cost of some modelling error. This has the advantage of generalising the raw data, i.e. being able to interpolate articulations theoretically producible by the vocal tract for a given task (for which the recorded data constitute a sampling) even though they may not be present in the recorded data. Moreover, this approach provides a convenient means to assess the level of agreement or discrepancy between the models of speech and feeding, thus permitting systematic comparison of the two tasks.

The literature review above has shown that, while there is an abundance of articulatory studies on speech, feeding has been very little studied from the point of view of articulatory modelling. The first objective of this study consists thus in building an articulatory model of the tongue for feeding which will constitute a first attempt in the literature as far as we know. The second objective will be to compare the geometric and articulatory spaces of the speech and feeding tasks, and thus to evaluate whether one set of movements can constitute a subset of the other one or not. Such a modelling approach can provide a deeper understanding of the feeding mechanisms in relation to those of speech and a powerful tool for research progress in that field. Our experimental set-up and modelling approach are detailed in the next section.

Section snippets

Modelling approach

Our objective is to build and to compare linear articulatory models of the tongue derived from speech tasks and from feeding tasks. As emphasized by Kelso, Saltzman, and Tuller (1986), the vocal tract organs are made of a large number of neuromuscular components that offer a potentially high dimensionality and which must be functionally coupled in order to produce relatively simple gestures. Thus, following the approach used by Beautemps et al. (2001), Badin et al. (2002) or Serrurier and Badin

Functional articulatory models of jaw and tongue in speech and feeding

In order to highlight the differences between strategies for speech and feeding tasks and to uncover the degrees of freedom of the tongue for the feeding task, we developed articulatory models of the jaw and tongue for each task for subject PB by means of PCA and multiple linear regressions. Following a method already described in the literature (see e.g. Badin et al., 2002), the procedure for developing articulatory models for speech has been applied here to both speech and feeding: (1) the

Comparison of the speech and feeding tasks

Functional articulatory models of the jaw and the tongue for the speech and feeding tasks for subject PB have been detailed in the first section. This has allowed highlighting of the differences between strategies for speech and feeding and better understanding of the complexity of feeding movements. We have undertaken the same analysis for subject AS and found comparable results. Our objective in the present section is to compare quantitatively the geometric and articulatory spaces of the two

Discussion

As mentioned in Section 2.3, our concern in designing the corpora has been to embrace as much as possible the normal tasks of speech and feeding. For speech, we used all articulations produced in three extreme vocalic contexts [a i u] with artificially sustained articulations, encompassing all the extreme target articulations of speech possibly missed during continuous speech, to ensure that the whole phonetic repertoire of the speakers was covered. Note however that neither specific emotion

Acknowledgements

This work formed part of the HandToMouth project funded by the European Commission under the NEST initiative. We would like to thank Morwenna Collins (University of Southampton, UK) for helpful discussions about the feeding corpus design and the feeding model. We are also in debt to Pascal van Lieshout (University of Toronto, Canada) for his pertinent comments on a previous version of this document, Kenneth de Jong for his pertinent editorial advice, Adam Baker for his long and detailed review

References (73)

  • R.J. Andrew

    Cyclicity in speech derived from call repetition rather than from intrinsic cyclicity of ingestion. Comment on “The frame/content theory of evolution of speech production”

    Behavioural and Brain Sciences

    (1998)
  • P. Badin et al.

    An audiovisual talking head for augmented speech generation: Models and animations based on a real speaker's articulatory data

  • Badin, P., & Serrurier, A. (2006a). Three-dimensional linear modeling of tongue: Articulatory data and models. In H. C....
  • Badin, P., & Serrurier, A. (2006b). Three-dimensional modeling of speech organs: Articulatory data and models. In:...
  • Bailly, G., Elisei, F., Badin, P., & Savariaux, C. (2006). Degrees of freedom of facial movements in face-to-face...
  • D. Beautemps et al.

    Linear degrees of freedom in speech production: Analysis of cineradio- and labio-film data and articulatory–acoustic modeling

    Journal of the Acoustical Society of America

    (2001)
  • J.W. Bennett et al.

    Tongue control for speech and swallowing in healthy younger and older subjects

    International Journal of Orofacial Myology

    (2007)
  • J.C. Catford

    Fundamental problems in phonetics

    (1977)
  • Cohen, M. M., Beskow, J., & Massaro, D. W. (1998). Recent developments in facial animation: An inside view. In D....
  • Coker, C. H. (1967). Synthesis by rule from articulatory parameters. In Proceedings of the 1967 conference on speech...
  • P. Combescure

    20 listes de dix phrases phonétiquement équilibrées

    Revue d'Acoustique

    (1981)
  • M.A. Crary et al.

    Biomechanical correlates of surface electromyography signals obtained during swallowing by healthy adults

    Journal of Speech, Language, and Hearing Research

    (2006)
  • J. Dang et al.

    Construction and control of a physiological articulatory model

    Journal of the Acoustical Society of America

    (2004)
  • Dang, J., Honda, Y., & Buchaillard, S. (2009). Modeling of swallowing mechanism based on MRI observation. In...
  • J. Deese

    Thought into speech: The psychology of a language

    (1984)
  • Engwall, O. (2000). A 3D tongue model based on MRI data. In proceedings of the 6th International Conference on Spoken...
  • S.M. Felton et al.

    Mechanical basis for lingual deformation during the propulsive phase of swallowing as determined by phase-contrast magnetic resonance imaging

    Journal of Applied Physiology

    (2007)
  • S. Fuchs et al.

    On the complex nature of speech kinematics

    ZAS Papers in Linguistics

    (2005)
  • J.-M. Gérard et al.

    A 3D dynamical biomechanical tongue model to study speech motor control

    Research Developments in Biomechanics

    (2003)
  • C.H. Gibbs et al.

    Jaw motion during speech

    American Speech and Hearing Association Reports

    (1972)
  • J.R. Green et al.

    Tongue-surface movement patterns during speech and swallowing

    Journal of the Acoustical Society of America

    (2003)
  • R. Harshman et al.

    Factor analysis of tongue shape

    Journal of the Acoustical Society of America

    (1977)
  • K.M. Hiiemae et al.

    Tongue–jaw linkages: The mechanisms of feeding revisited

    Bulletin of the Museum of Comparative Zoology

    (2001)
  • K.M. Hiiemae et al.

    Tongue movements in feeding and speech

    Critical Reviews in Oral Biology and Medicine

    (2003)
  • P. Hoole

    On the lingual organization of the German vowel system

    Journal of the Acoustical Society of America

    (1999)
  • P. Hoole et al.

    Electromagnetic articulography in coarticulation research

    Forschungsberichte des Instituts für Phonetik und Spachliche Kommunikation der Universität München (FIPKM)

    (1997)
  • Cited by (12)

    • Automatic segmentation of speech articulators from real-time midsagittal MRI based on supervised learning

      2018, Speech Communication
      Citation Excerpt :

      Based on videofluorography, Hiiemae and Palmer (2003)) speculated that certain gestures used for feeding tasks could also be found in speech tasks. Using EMA, Serrurier et al. (2012) showed that articulatory models based on feeding data would better represent speech tasks than articulatory models based on speech would represent feeding tasks. RT-MRI could provide very useful data to allow further comparisons, though automatic segmentation for swallowing is expected to be more challenging due to the frequent contacts occurring between articulators and food.

    • FRANK: A Hybrid 3D Biomechanical Model of the Head and Neck

      2017, Biomechanics of Living Organs: Hyperelastic Constitutive Laws for Finite Element Modeling
    View all citing articles on Scopus
    View full text