research-article

A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese

Authors:
Changwei Liang

Department of Chinese Language and Literature, Peking University, Beijing China

Department of Chinese Language and Literature, Peking University, Beijing China
View Profile

,
Jiangping Kong

Department of Chinese Language and Literature, Peking University, Beijing China

Department of Chinese Language and Literature, Peking University, Beijing China
View Profile

,
Xiyu Wu

Department of Chinese Language and Literature, Peking University, Beijing China and Department of Chinese Language and Literature and Center for Chinese Linguistics, Peking University, Beijing, P.R. China

Department of Chinese Language and Literature, Peking University, Beijing China and Department of Chinese Language and Literature and Center for Chinese Linguistics, Peking University, Beijing, P.R. China
View Profile

BIC 2021: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent ComputingJanuary 2021Pages 297–302https://doi.org/10.1145/3448748.3448796

Published:21 March 2021Publication History

BIC 2021: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing

Pages 297–302

ABSTRACT

In this paper, a new speech driven 3-D geometric tongue model is constructed. The constructed 3-D tongue shape is controlled with control points on 2-D midsagittal tongue curve, and speech-driven inverse estimation based on the constructed model is evaluated by empirical data. X-Ray 2-D vocal tract motion videos are tagged for the midsagittal tongue motion, and static 3-D vocal tracts of 20 phonemes are collected with MRI for the realistic 3-D tongue shape. MFCC are calculated from the videos as acoustic features, and are then used in a LSTM-RNN to predict the control points movement of the tongue shape. Three geometrically intuitive control points are selected to represent and calculate the midsagittal line of the tongue through linear regression. Cross-sections on the central lines of the tongues, whose height, width and angle are then predicted from the midsagittal line, are reconstructed with geometric curves, and the shape of each cross-section are then placed on the midsagittal line to get the overall predicted moving grid of the 3-D tongue. In this 3-D tongue model, acoustic features and realistic tongue motion are mapped directly to preserve more realistic articulatory details, and the control points are intuitive for non-experts to control the model, and the geometric tongue shapes predicted are comparable with realistic tongue dynamics. Based on the proposed method, the speech-driven prediction is evaluated with the realistic data, which proved this proposed method feasible.

References

P., Birkholz, D., Jackèl, & N. J., Kroger (2006). Construction and control of a three-dimensional vocal tract model. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 1, pp. I-I). IEEE.Google ScholarCross Ref
O., Engwall (2003). Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication, 41(2-3), 303--329.Google ScholarCross Ref
P., Badin, & A., Serrurier (2006). Three-dimensional linear modeling of tongue: Articulatory data and models.Google Scholar
Q., Fang, J., Liu, C., Song, J., Wei, & W., Lu (2014). A novel 3D geometric articulatory model. In The 9th International Symposium on Chinese Spoken Language Processing (pp. 368--371). IEEE.Google ScholarCross Ref
Q., Fang, H., Li, J., Wei, J., Wang, & X., Wu (2018). A Nonlinear 3D Geometric Tongue Model. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4989--4993). IEEE.Google ScholarDigital Library
P., Badin, G., Bailly, M., Raybaudi, & C., Segebarth (1998). A three-dimensional linear articulatory model based on MRI data. In The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis.Google ScholarCross Ref
Y., Yao (2016). The Study of the 4D Vocal-Tract Model of Mandarin Chinese. Ph.D. Thesis, Peking University.Google Scholar
G., Wang (2010). An Articulatory Model of Vocal Tract in Mandarin. Ph.D. Thesis, Peking University.Google Scholar
J., Zhang (2018). Research on Articulatory Model. Ph.D. Thesis, Peking University.Google Scholar
C., Qin, M. A., Carreira-Perpinán, K., Richmond, A., Wrench, & S., Renals (2008). Predicting tongue shapes from a few landmark locations.Google ScholarCross Ref
T., Kaburagi, & M., Honda (1994). Determination of sagittal tongue shape from the positions of points on the tongue surface. The Journal of the Acoustical Society of America, 96(3), 1356--1366.Google ScholarCross Ref
P., Badin, E., Baricchi, & A., Vilain, (1997). Determining tongue articulation: from discrete fleshpoints to continuous shadow. In Fifth European Conference on Speech Communication and Technology.Google ScholarCross Ref
P., Liu, Q., Yu, Z., Wu, S., Kang, H., Meng, & L., Cai, (2015). A deep recurrent approach for acoustic-to-articulatory inversion. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4450--4454). IEEE.Google ScholarCross Ref
P., Zhu, L., Xie, & Y., Chen (2015). Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings. In Sixteenth Annual Conference of the International Speech Communication Association.Google ScholarCross Ref

Index Terms

A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese
1. Applied computing
  1. Arts and humanities
  2. Life and medical sciences

Recommendations

A Speech-Driven 3-D Lip Synthesis with Realistic Dynamics in Mandarin Chinese
ISAIMS '20: Proceedings of the 1st International Symposium on Artificial Intelligence in Medical Sciences

In this paper, a new speech-driven lip synchronization method is developed, predicting the 3-D geometric shape of the lip without using speech recognition model in the visualization procedure, and can be trained and evaluated with realistic dynamics. ...
Read More
Generating tonal distinctions in Mandarin Chinese using an electrolarynx with preprogrammed tone patterns

We created an electrolarynx (EL) that generates tonal distinctions in Mandarin.Tones are identified with greater accuracy in the tonal vs. monotone EL condition.Tonal information enhances EL speech intelligibility and acceptability. An electrolarynx (EL)...
Read More
The Coding Strategy for the Mandarin Speech Conveying Sarcasm in Acoustic and Articulatory Domain
ICDSP '21: Proceedings of the 2021 5th International Conference on Digital Signal Processing

Purpose: This study investigated the coding strategy for the speech conveying two opposing attitudes, i.e., sarcasm and praising, based on the utterances elicited by role-play dialogues. Method: Using an electromagnetic articulography (EMA), we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

BIC 2021: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing
January 2021
445 pages
ISBN:9781450390002
DOI:10.1145/3448748

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3-D tongue model
Mandarin Chinese
realistic dynamics
speech-driven
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 34
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese

BIC 2021: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Speech-Driven 3-D Lip Synthesis with Realistic Dynamics in Mandarin Chinese

Generating tonal distinctions in Mandarin Chinese using an electrolarynx with preprogrammed tone patterns

The Coding Strategy for the Mandarin Speech Conveying Sarcasm in Acoustic and Articulatory Domain

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese

BIC 2021: Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Speech-Driven 3-D Lip Synthesis with Realistic Dynamics in Mandarin Chinese

Generating tonal distinctions in Mandarin Chinese using an electrolarynx with preprogrammed tone patterns

The Coding Strategy for the Mandarin Speech Conveying Sarcasm in Acoustic and Articulatory Domain

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media