Abstract
This paper presents a method of hidden Markov model (HMM)-based Mandarin-Tibetan bi-lingual emotional speech synthesis by speaker adaptive training with a Mandarin emotional speech corpus. A one-speaker Tibetan neutral speech corpus, a multi-speaker Mandarin neutral speech corpus and a multi-speaker Mandarin emotional speech corpus are firstly employed to train a set of mixed language average acoustic models of target emotion by using speaker adaptive training. Then a one-speaker Mandarin neutral speech corpus or a one-speaker Tibetan neutral speech corpus is adopted to obtain a set of speaker dependent acoustic models of target emotion by using the speaker adaptation transformation. The Mandarin emotional speech or the Tibetan emotional speech is finally synthesized from Mandarin speaker dependent acoustic models of target emotion or Tibetan speaker dependent acoustic models of target emotion. Subjective tests show that the average emotional mean opinion score is 4.14 for Tibetan and 4.26 for Mandarin. The average mean opinion score is 4.16 for Tibetan and 4.28 for Mandarin. The average degradation opinion score is 4.28 for Tibetan and 4.24 for Mandarin. Therefore, the proposed method can synthesize both Tibetan speech and Mandarin speech with high naturalness and emotional expression by using only Mandarin emotional training speech corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barra-Chicote, R., Yamagishi, J., King, S., et al.: Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Commun. 52, 394–404 (2010)
Lorenzo-Trueba, J., Barra-Chicote, R., San-Segundo, R., et al.: Emotion transplantation through adaptation in HMM-based speech synthesis. Comput. Speech Lang. 34, 292–307 (2015)
Schröder M.: Emotional speech synthesis: a review. In: Interspeech, pp. 561–564 (2001)
Adell, J., Escudero, D., Bonafonte, A.: Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Commun. 54, 459–476 (2012)
Hamza, W., Eide, E., Bakis, R., et al.: The IBM expressive speech synthesis system. In: Interspeech (2004)
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51, 1039–1064 (2009)
Pitrelli, J.F., Bakis, R., Eide, E.M., et al.: The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Audio Speech Lang. Process. 14, 1099–1108 (2006)
Strom, V., King, S.: Investigating Festival’s target cost function using perceptual experiments (2008)
Yamagishi, J., Onishi, K., Masuko, T., et al.: Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE Trans. Inf. Syst. 88, 502–509 (2005)
Tachibana, M., Yamagishi, J., Masuko, T., et al.: Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE Trans. Inf. Syst. 88, 2484–2491 (2005)
Takashi, N., Yamagishi, J., Masuko, T., et al.: A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Inf. Syst. 90, 1406–1413 (2007)
Yamagishi, J., Kobayashi, T., Nakano, Y., et al.: Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 17, 66–83 (2009)
Tokuda, K., Nankaku, Y., Toda, T., et al.: Speech synthesis based on hidden Markov models. Proc. IEEE 101, 1234–1252 (2013)
Masuko, T., Tokuda, K., Kobayashi, T., et al.: Voice characteristics conversion for HMM-based speech synthesis system. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1997, vol. 3, pp. 1611–1614. IEEE (1997)
Tamura, M., Masuko, T., Tokuda, K., et al.: Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, (ICASSP 2001), vol. 2, pp. 805–808. IEEE (2001)
Lorenzo-Trueba, J., Barra-Chicote, R., Yamagishi, J., Montero, J.M.: Towards cross-lingual emotion transplantation. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo GarcÃa, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 199–208. Springer, Cham (2014). doi:10.1007/978-3-319-13623-3_21
Yamagishi, J., Kobayashi, T.: Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Trans. Inf. Syst. 90, 533–543 (2007)
Yang, H., Oura, K., Wang, H., et al.: Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis. Multimedia Tools Appl. 74, 9927–9942 (2015)
Russell, J.A.: Pancultural aspects of the human conceptual organization of emotions. J. Pers. Soc. Psychol. 45, 1281 (1983)
Wester, M.: The emime bilingual database. University of Edinburgh (2010)
Kawahara, H., Masuda-Katsuse, I., De Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)
Loizou, P.C.: Speech quality assessment. In: Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E., Wang, H. (eds.) Multimedia Analysis, Processing and Communications. SCI. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19551-8_23
Acknowledgments
The research leading to these results was partly funded by the National Natural Science Foundation of China (Grant No. 11664036, 61263036) and Natural Science Foundation of Gansu (Grant No. 1506RJYA126).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, P., Yang, H., Gan, Z. (2017). Towards Realizing Mandarin-Tibetan Bi-lingual Emotional Speech Synthesis with Mandarin Emotional Training Corpus. In: Zou, B., Han, Q., Sun, G., Jing, W., Peng, X., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 728. Springer, Singapore. https://doi.org/10.1007/978-981-10-6388-6_11
Download citation
DOI: https://doi.org/10.1007/978-981-10-6388-6_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6387-9
Online ISBN: 978-981-10-6388-6
eBook Packages: Computer ScienceComputer Science (R0)