Skip to main content

An HMM-Based Mandarin Chinese Text-To-Speech System

  • Conference paper
Book cover Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

Abstract

In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, “the common spoken language”, is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3 corresponding HMMs, including: (1) spectral envelop and gain; (2) voiced/unvoiced and fundamental frequency; and (3) segment duration. The corresponding HMMs are trained from a read speech database of 1,000 sentences recorded by a female speaker. Specifically, the spectral information is derived from short-time LPC spectral analysis. Among all LPC parameters, Line Spectrum Pair (LSP) has the closest relevance to the natural resonances or the “formants” of a speech sound and it is selected to parameterize the spectral information. Furthermore, the property of clustered LSPs around a spectral peak justify augmenting LSPs with their dynamic counterparts, both in time and frequency, in both HMM modeling and parameter trajectory synthesis. One hundred sentences synthesized by 4 LSP-based systems have been subjectively evaluated with an AB comparison test. The listening test results show that LSP and its dynamic counterpart, both in time and frequency, are preferred for the resultant higher synthesized speech quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zen, H., Toda, T.: An Overview of Nitech HMM-based Speech Synthesis System for Blizzard Challenge 2005. In: Proc. EuroSpeech (2005)

    Google Scholar 

  2. Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English. In: 2002 IEEE Speech Synthesis Workshop, Santa Monica, California, September 11-13 (2002)

    Google Scholar 

  3. Tokuda, K., Kobayashi, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter generation algorithms for HMM-based speech synthesis. In: Proc. ICASSP, Istanbul, Turkey, June 2000, pp. 1315–1318 (2000)

    Google Scholar 

  4. Tomoki, T., Keiichi, T.: Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis. In: Proc. Eurospeech 2005 (2005)

    Google Scholar 

  5. Kawahara, H., Masuda-Katsuse, I., Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneousfrequency- based f0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)

    Article  Google Scholar 

  6. Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Hidden semi-Markov model based speech synthesis. In: Proc. ICSLP, pp. 1185–1180 (2004)

    Google Scholar 

  7. Itakura, F.: Line spectrum representation of linear predictive coefficients of speech signals. J. Acoust. Soc. Am. 57, S35 (1975)

    Article  Google Scholar 

  8. Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for melcepstral analysis of speech. In: Proc. ICASSP, pp. 137–140 (1992)

    Google Scholar 

  9. Soong, F.K., Juang, B.H.: Line spectrum pair (LSP) and speech data compression. In: Proc. ICASSP, San Diego, CA, pp. 1.10.1–1.10.4. (1984)

    Google Scholar 

  10. Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space Probability Distribution HMM. IEICE Trans. Inf. & Syst. E85-D(3), 455–464 (2002)

    Google Scholar 

  11. Shinoda, K., Watanabe, T.: Acoustic Modeling Based on The MDL Principle for Speech Recognition. In: Proc. EuroSpeech 1997, pp. 99–102 (1997)

    Google Scholar 

  12. Wakita, H.: Linear prediction voice synthesizers: line spectrum pairs (LSP) is the newest of the several techniques. Speech Technol. 1, 17–22 (1981)

    Google Scholar 

  13. Paliwal, K.K.: On the use of line spectral frequency parameters for speech recognition. Digital Signal Processing 2, 80–87 (1992)

    Article  Google Scholar 

  14. Chu, M., Peng, H., Yang, H., Chang, E.: Selecting non-uniform units from a very large corpus for concatenative speech synthesizer. In: Proc. ICASSP 2001, Salt Lake City (2001)

    Google Scholar 

  15. Huang, C., Shi, Y., Zhou, J.L., Chu, M., Wang, T., Chang, E.: Segmental Tonal Modeling for Phone Set Design in Mandarin LVCSR. In: Proc. ICASSP 2004, pp. 901–904 (2004)

    Google Scholar 

  16. Zen, H., Tokuda, K., Kitamura, T.: A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features. In: Proc. of ICASSP 2004, pp. 837–840 (2004)

    Google Scholar 

  17. Wu, Y.J., Wang, R.H.: Minimum generation error training for HMM-based speech synthesis. In: Proc. of ICAPP 2006, pp. 89–93 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qian, Y., Soong, F., Chen, Y., Chu, M. (2006). An HMM-Based Mandarin Chinese Text-To-Speech System. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_26

Download citation

  • DOI: https://doi.org/10.1007/11939993_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics