Abstract
This paper proposes an emotional text-driven 3D visual pronunciation system for Mandarin Chinese. Firstly, based on an articulatory speech corpus collected by Electro-Magnetic Articulography (EMA), the articulatory features are trained by Hidden Markov model (HMM), and the fully context-dependent modeling is taken into account by making full use of the rich linguistic features. Secondly, considering the fact that the emotion is more remarkably adjusted in the articulatory domain owing to the independency in the manipulation of articulators, the differences between articulatory movements in different emotions are investigated. Thirdly, the emotional speech is generated by adjusting the speech parameters, such as fundamental frequency (F0), duration and intensity, based on Praat. Then when playing the generated emotional speech, the corresponding articulatory movements are synthesized by the HMM prediction rules simultaneously which is used to drive the head mesh model along with emotional speech. The experiments demonstrate the system can synthesize accurate emotional speech synchronized animation of articulators at phoneme level.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yu, J., Li, A.: 3D visual pronunciation of Mandarine Chinese for language learning. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 2036–2040. IEEE (2014)
Ling, Z.-H., Richmond, K., Yamagishi, J.: An analysis of HMM-based prediction of articulatory movements. Speech Commun. 52(10), 834–846 (2010)
Ling, Z.H., Richmond, K., Yamagishi, J., et al.: Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1171–1185 (2009)
Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)
Ben-Youssef, A., Shimodaira, H., Braude, D.A.: Speech driven talking head from estimated articulatory features. In: The International Conference on Acoustics, Speech and Signal Processing, pp. 4573–4577 (2014)
Zhu, P., Xie, L., Chen, Y.: Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Yu, J., Wang, Z.F.: A video, text and speech driven realistic 3D virtual head for human-machine interface. IEEE Trans. Cybern. 45(5), 977–988 (2015)
Jun, Y., Wang, Z.F.: 3D facial motion tracking by combining online appearance model and cylinder head model in particle filtering. Sci. Chin. - Inf. Sci. 57(2), 274–280 (2014)
Lee, S., Yildirim, S., Kazemzadeh, A., et al.: An articulatory study of emotional speech production. In: INTERSPEECH, pp. 497–500 (2005)
Erickson, D., Zhu, C., Kawahara, S., et al.: Articulation, acoustics and perception of Mandarin Chinese emotional speech
Erickson, D., Abramson, A., Maekawa, K., et al.: Articulatory characteristics of emotional utterances in spoken English. In: INTERSPEECH, pp. 365–368 (2000)
Li, A., Fang, Q., Hu, F., et al.: Acoustic and articulatory analysis on Mandarin Chinese vowels in emotional speech. In: 2010 7th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 38–43. IEEE (2010)
Lee, S., Kato, T., Narayanan, S.S.: Relation between geometry and kinematics of articulatory trajectory associated with emotional speech production. In: Ninth Annual Conference of the International Speech Communication Association (2008)
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
Odell, J.J.: The use of context in large vocabulary speech recognition. Am. J. Math. 75(2), 241–259 (1996)
Yoshimura, T.: Duration modeling for HMM-based speech synthesis. In: ICSLP, vol. 90, no. 3, pp. 692–693 (1998)
Tokuda, K., Yoshimura, T., Masuko, T., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: IEEE International Conference on-ICASSP, pp. 1315–1318 (2000)
Lee, Y, Terzopoulos, D, Waters, K.: Realistic modeling for facial animation. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pp. 55–62. ACM (1995)
Marcos, S, Bermejo, J.G.G., Zalama, E.: A realistic facial animation suitable for human-robot interfacing. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2008, pp. 3810–3815. IEEE (2008)
Ekman, P., Friesen, W.V.: Manual for the Facial Action Coding System. Psychologists Press, Palo Altom (1978)
Tang, C.Y., Zhang, G., Tsui, C.P.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behaviour. J. Biomech. 42(7), 865–872 (2009)
Zen, H., Nose, T., Yamagishi, J., et al.: The HMM-based speech synthesis system (HTS) version 2.0. Ieice Technical report Natural Language Understanding and Models of Communication, vol. 107, no. 406, pp. 301–306 (2002)
Praat speech processing softward. http://www.fon.hum.uva.nl/praat/
Acknowledgement
This work is supported by the National Natural Science Foundation of China (No. 61572450 and No. 61303150), the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University (No. A1501), the Fundamental Research Funds for the Central Universities (WK2350000002), the Open Funding Project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No. BUAA-VR-16KF-12).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yu, L., Luo, C., Yu, J. (2016). An Emotional Text-Driven 3D Visual Pronunciation System for Mandarin Chinese. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 662. Springer, Singapore. https://doi.org/10.1007/978-981-10-3002-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-3002-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3001-7
Online ISBN: 978-981-10-3002-4
eBook Packages: Computer ScienceComputer Science (R0)