skip to main content
10.1145/3448748.3448796acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbicConference Proceedingsconference-collections
research-article

A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese

Authors Info & Claims
Published:21 March 2021Publication History

ABSTRACT

In this paper, a new speech driven 3-D geometric tongue model is constructed. The constructed 3-D tongue shape is controlled with control points on 2-D midsagittal tongue curve, and speech-driven inverse estimation based on the constructed model is evaluated by empirical data. X-Ray 2-D vocal tract motion videos are tagged for the midsagittal tongue motion, and static 3-D vocal tracts of 20 phonemes are collected with MRI for the realistic 3-D tongue shape. MFCC are calculated from the videos as acoustic features, and are then used in a LSTM-RNN to predict the control points movement of the tongue shape. Three geometrically intuitive control points are selected to represent and calculate the midsagittal line of the tongue through linear regression. Cross-sections on the central lines of the tongues, whose height, width and angle are then predicted from the midsagittal line, are reconstructed with geometric curves, and the shape of each cross-section are then placed on the midsagittal line to get the overall predicted moving grid of the 3-D tongue. In this 3-D tongue model, acoustic features and realistic tongue motion are mapped directly to preserve more realistic articulatory details, and the control points are intuitive for non-experts to control the model, and the geometric tongue shapes predicted are comparable with realistic tongue dynamics. Based on the proposed method, the speech-driven prediction is evaluated with the realistic data, which proved this proposed method feasible.

References

  1. P., Birkholz, D., Jackèl, & N. J., Kroger (2006). Construction and control of a three-dimensional vocal tract model. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 1, pp. I-I). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  2. O., Engwall (2003). Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication, 41(2-3), 303--329.Google ScholarGoogle ScholarCross RefCross Ref
  3. P., Badin, & A., Serrurier (2006). Three-dimensional linear modeling of tongue: Articulatory data and models.Google ScholarGoogle Scholar
  4. Q., Fang, J., Liu, C., Song, J., Wei, & W., Lu (2014). A novel 3D geometric articulatory model. In The 9th International Symposium on Chinese Spoken Language Processing (pp. 368--371). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  5. Q., Fang, H., Li, J., Wei, J., Wang, & X., Wu (2018). A Nonlinear 3D Geometric Tongue Model. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4989--4993). IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P., Badin, G., Bailly, M., Raybaudi, & C., Segebarth (1998). A three-dimensional linear articulatory model based on MRI data. In The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis.Google ScholarGoogle ScholarCross RefCross Ref
  7. Y., Yao (2016). The Study of the 4D Vocal-Tract Model of Mandarin Chinese. Ph.D. Thesis, Peking University.Google ScholarGoogle Scholar
  8. G., Wang (2010). An Articulatory Model of Vocal Tract in Mandarin. Ph.D. Thesis, Peking University.Google ScholarGoogle Scholar
  9. J., Zhang (2018). Research on Articulatory Model. Ph.D. Thesis, Peking University.Google ScholarGoogle Scholar
  10. C., Qin, M. A., Carreira-Perpinán, K., Richmond, A., Wrench, & S., Renals (2008). Predicting tongue shapes from a few landmark locations.Google ScholarGoogle ScholarCross RefCross Ref
  11. T., Kaburagi, & M., Honda (1994). Determination of sagittal tongue shape from the positions of points on the tongue surface. The Journal of the Acoustical Society of America, 96(3), 1356--1366.Google ScholarGoogle ScholarCross RefCross Ref
  12. P., Badin, E., Baricchi, & A., Vilain, (1997). Determining tongue articulation: from discrete fleshpoints to continuous shadow. In Fifth European Conference on Speech Communication and Technology.Google ScholarGoogle ScholarCross RefCross Ref
  13. P., Liu, Q., Yu, Z., Wu, S., Kang, H., Meng, & L., Cai, (2015). A deep recurrent approach for acoustic-to-articulatory inversion. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4450--4454). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  14. P., Zhu, L., Xie, & Y., Chen (2015). Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings. In Sixteenth Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        BIC 2021: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing
        January 2021
        445 pages
        ISBN:9781450390002
        DOI:10.1145/3448748

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 March 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)10
        • Downloads (Last 6 weeks)2

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader