An Emotional Text-Driven 3D Visual Pronunciation System for Mandarin Chinese

Yu, Lingyun; Luo, Changwei; Yu, Jun

doi:10.1007/978-981-10-3002-4_8

Lingyun Yu¹⁶,
Changwei Luo¹⁶ &
Jun Yu^16,17

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 662))

Included in the following conference series:

Chinese Conference on Pattern Recognition

1836 Accesses

Abstract

This paper proposes an emotional text-driven 3D visual pronunciation system for Mandarin Chinese. Firstly, based on an articulatory speech corpus collected by Electro-Magnetic Articulography (EMA), the articulatory features are trained by Hidden Markov model (HMM), and the fully context-dependent modeling is taken into account by making full use of the rich linguistic features. Secondly, considering the fact that the emotion is more remarkably adjusted in the articulatory domain owing to the independency in the manipulation of articulators, the differences between articulatory movements in different emotions are investigated. Thirdly, the emotional speech is generated by adjusting the speech parameters, such as fundamental frequency (F0), duration and intensity, based on Praat. Then when playing the generated emotional speech, the corresponding articulatory movements are synthesized by the HMM prediction rules simultaneously which is used to drive the head mesh model along with emotional speech. The experiments demonstrate the system can synthesize accurate emotional speech synchronized animation of articulators at phoneme level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A realistic 3D articulatory animation system for emotional visual pronunciation

Article 21 April 2017

A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators

Two-stage visual speech recognition for intensive care patients

Article Open access 17 January 2023

References

Yu, J., Li, A.: 3D visual pronunciation of Mandarine Chinese for language learning. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 2036–2040. IEEE (2014)
Google Scholar
Ling, Z.-H., Richmond, K., Yamagishi, J.: An analysis of HMM-based prediction of articulatory movements. Speech Commun. 52(10), 834–846 (2010)
Article Google Scholar
Ling, Z.H., Richmond, K., Yamagishi, J., et al.: Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1171–1185 (2009)
Article Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)
Article Google Scholar
Ben-Youssef, A., Shimodaira, H., Braude, D.A.: Speech driven talking head from estimated articulatory features. In: The International Conference on Acoustics, Speech and Signal Processing, pp. 4573–4577 (2014)
Google Scholar
Zhu, P., Xie, L., Chen, Y.: Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Yu, J., Wang, Z.F.: A video, text and speech driven realistic 3D virtual head for human-machine interface. IEEE Trans. Cybern. 45(5), 977–988 (2015)
Google Scholar
Jun, Y., Wang, Z.F.: 3D facial motion tracking by combining online appearance model and cylinder head model in particle filtering. Sci. Chin. - Inf. Sci. 57(2), 274–280 (2014)
Google Scholar
Lee, S., Yildirim, S., Kazemzadeh, A., et al.: An articulatory study of emotional speech production. In: INTERSPEECH, pp. 497–500 (2005)
Google Scholar
Erickson, D., Zhu, C., Kawahara, S., et al.: Articulation, acoustics and perception of Mandarin Chinese emotional speech
Google Scholar
Erickson, D., Abramson, A., Maekawa, K., et al.: Articulatory characteristics of emotional utterances in spoken English. In: INTERSPEECH, pp. 365–368 (2000)
Google Scholar
Li, A., Fang, Q., Hu, F., et al.: Acoustic and articulatory analysis on Mandarin Chinese vowels in emotional speech. In: 2010 7th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 38–43. IEEE (2010)
Google Scholar
Lee, S., Kato, T., Narayanan, S.S.: Relation between geometry and kinematics of articulatory trajectory associated with emotional speech production. In: Ninth Annual Conference of the International Speech Communication Association (2008)
Google Scholar
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
Article Google Scholar
Odell, J.J.: The use of context in large vocabulary speech recognition. Am. J. Math. 75(2), 241–259 (1996)
MathSciNet Google Scholar
Yoshimura, T.: Duration modeling for HMM-based speech synthesis. In: ICSLP, vol. 90, no. 3, pp. 692–693 (1998)
Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: IEEE International Conference on-ICASSP, pp. 1315–1318 (2000)
Google Scholar
Lee, Y, Terzopoulos, D, Waters, K.: Realistic modeling for facial animation. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pp. 55–62. ACM (1995)
Google Scholar
Marcos, S, Bermejo, J.G.G., Zalama, E.: A realistic facial animation suitable for human-robot interfacing. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2008, pp. 3810–3815. IEEE (2008)
Google Scholar
Ekman, P., Friesen, W.V.: Manual for the Facial Action Coding System. Psychologists Press, Palo Altom (1978)
Google Scholar
Tang, C.Y., Zhang, G., Tsui, C.P.: A 3D skeletal muscle model coupled with active contraction of muscle fibres and hyperelastic behaviour. J. Biomech. 42(7), 865–872 (2009)
Article Google Scholar
Zen, H., Nose, T., Yamagishi, J., et al.: The HMM-based speech synthesis system (HTS) version 2.0. Ieice Technical report Natural Language Understanding and Models of Communication, vol. 107, no. 406, pp. 301–306 (2002)
Google Scholar
Praat speech processing softward. http://www.fon.hum.uva.nl/praat/

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61572450 and No. 61303150), the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University (No. A1501), the Fundamental Research Funds for the Central Universities (WK2350000002), the Open Funding Project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No. BUAA-VR-16KF-12).

Author information

Authors and Affiliations

Department of Automation, University of Science and Technology of China, Hefei, China
Lingyun Yu, Changwei Luo & Jun Yu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People’s Republic of China
Jun Yu

Authors

Lingyun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Changwei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Yu .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, L., Luo, C., Yu, J. (2016). An Emotional Text-Driven 3D Visual Pronunciation System for Mandarin Chinese. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 662. Springer, Singapore. https://doi.org/10.1007/978-981-10-3002-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-10-3002-4_8
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3001-7
Online ISBN: 978-981-10-3002-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics