Abstract
This study describes a novel method of constructing a geometric articulatory model based on magnetic resonance imaging data by taking the physiological boundaries of speech apparatus into account. Two improvements have been made to the modeling process: i) Images taken from different viewpoints are combined to improve the accuracy of outline annotation. ii) Speech organs’ meshes are modeled with reference to the anatomical structures. Both qualitative and quantitative evaluations indicated that the proposed method surpasses the conventional method. Based on the meshes of the speech organs associated with different articulations, the linear component analysis was used to extract the control parameters. Each speech organ can be described using three control parameters or fewer. After the reconstruction, the average error between model and real data was less than 1.0 mm. This is also the first effort made to construct a 3D vocal tract model based on Chinese MRI data. It will facilitate the theoretical study and practical use in Chinese-speech-production related issues.
Similar content being viewed by others
References
Guenther, F. H. (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72, 43–53.
Guenther, F. H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review, 102, 594–621.
Guenther, F. H. (2006). Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders, 39, 350–365.
Fang, Q., et al. (2008). Investigation of the functional relationship of tongue muscles for the control of a physioloigcal articulatory model. in The 8th national conference of Phonetics. Beijing, China.
Fang, Q., Nishikido, A., & Dang, J. (2009). Feedforward control of a 3D physiological articulatory model for vowel production. Tsinghua Science and Technology, 14(5), 617–622.
Dang, J., & Honda, K. (2004). Construction and control of a physiological articulatory model. Journal of the Acoustical Society of America, 115(2), 853–870.
Birkholz, P., Jackèl, D., & Kröger, B. J. (2007). Simulation of losses due to turbulence in the time-varying vocal system. IEEE Transactions on Audio, Speech and Language Processing, 15(4), 1218–1226.
Perrier, P., Ma, L. & Payan, Y. (2005). Modeling the production of VCV sequences via the inversion if a biomechanical model of the tongue. in INTERSPEECH 2005. Lisbon, Portugal.
Mermelstein, P. (1973). Articulatory model for the study of speech production. Journal of the Acoustical Society of America, 53, 1070–1082.
Maeda, S. (1990). Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle and A. Marchal (Eds.), Speech Production and Speech Modelling, Kluwer Academic, Dordrecht, 131–149.
Badin, P., & Serrurier, A. (2006). Three-dimensional modeling of speech organs: articulatory data and models. Transactions on Technical Committee of Psychological and Physiological Acoustics, 365(H-2006-77), 421–426.
Engwall, O. (2003). Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication, 41, 303–329.
Rubin, P., & Baer, T. (1981). An articulatory synthesizer for perception research. Journal of the Acoustical Society of America, 70(2), 321–328.
Birkholz, P., Jackèl, D., & Kröger, B. J. (2006). Construction and control of a three-dimensional vocal tract model, in ICASSP. p. 873–876.
Beautemps, D., Badin, P., & Bailly, G. (2001). Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling. Journal of the Acoustical Society of America, 109(5), 2165–2180.
Badin, P., et al. (1998). A threedimensional linear articulatory model based on MRI data, in The 3rd ESCA/COCOSDA International Workshop on Speech Synthesis. p. 249–254.
Badin, P., et al. (2002). Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. Journal of Phonetics, 30(3), 533–553.
Masaki, S., Tiede, M. K., et al. (1996). MRI-based speech production study using a synchronized sampling method. Journal of Acoustic Society Japan (E), 20, 375–379.
Beautemps, D., et al. (1996). Evaluation of an articulatory-acoustic model based on a reference subject, in The 4th ISSP.
Acknowledgments
This work was supported by the National Natural Science-Foundation of China (No. 61175016,61304250), Key Fund projects of 61233009 and financial support from CASS Innovation Project “teaching pronunciation models for speech research”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, J., Liu, J., Fang, Q. et al. A Novel Method for Constructing 3D Geometric Articulatory Models. J Sign Process Syst 82, 295–302 (2016). https://doi.org/10.1007/s11265-015-1002-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-1002-8