Skip to main content

Statistical Approach to the Automatic Synthesis of Czech Speech

  • Conference paper
  • First Online:
Book cover Text, Speech and Dialogue (TSD 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

Abstract

The usage of multiple Hidden Markov Models (HMMs) to construct a Czech speech segment database (SSD) and a speech synthesis based on this inventory are presented in this paper. HMMs are used to model triphones. Binary decision trees are applied to automatically cluster the states of triphone HMMs. The clustered states are then employed to automatically segment the speech corpus and to create a SSD. The SSD constructed in this way is assumed to enable more precise context modeling than was previously possible. Several speech techniques are discussed to construct a concatenation-based synthesizer. Special attention is paid to an MFCC-based pitch-synchronous residually excited approach.

This work was supported by the project No. VS97159 of the Ministry of Education of Czech Republic

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Donovan R.E., Eide M.: The IBM Trainable Speech Synthesis System. Proceedings of ICSLP’98, Sydney (1998).

    Google Scholar 

  2. Huang X., Acero A., Adcock J., Hon H-W., Goldsmith J., Liu J., and Plumpe M.: Whistler: A Trainable Text-to-Speech System; Proceedings of ICSLP’96, Philadelphia, (1996) 2387–2390.

    Google Scholar 

  3. Davis S., Mermelstein P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Trans. ASSP, ASSP-28 (1980) 357–366.

    Article  Google Scholar 

  4. Tychtl Z., Psutka J.: Speech Production Based on the Mel-Frequency Cepstral Co-efficients. Proceedings of Eurospeech’99 (1999).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matoušek, J., Psutka, J., Tychtl, Z. (1999). Statistical Approach to the Automatic Synthesis of Czech Speech. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_72

Download citation

  • DOI: https://doi.org/10.1007/3-540-48239-3_72

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66494-9

  • Online ISBN: 978-3-540-48239-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics