Statistical Approach to the Automatic Synthesis of Czech Speech

Matoušek, Jindřich; Psutka, Josef; Tychtl, Zbyněk

doi:10.1007/3-540-48239-3_72

Jindřich Matoušek³,
Josef Psutka³ &
Zbyněk Tychtl³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

468 Accesses
1 Citations

Abstract

The usage of multiple Hidden Markov Models (HMMs) to construct a Czech speech segment database (SSD) and a speech synthesis based on this inventory are presented in this paper. HMMs are used to model triphones. Binary decision trees are applied to automatically cluster the states of triphone HMMs. The clustered states are then employed to automatically segment the speech corpus and to create a SSD. The SSD constructed in this way is assumed to enable more precise context modeling than was previously possible. Several speech techniques are discussed to construct a concatenation-based synthesizer. Special attention is paid to an MFCC-based pitch-synchronous residually excited approach.

This work was supported by the project No. VS97159 of the Ministry of Education of Czech Republic

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Donovan R.E., Eide M.: The IBM Trainable Speech Synthesis System. Proceedings of ICSLP’98, Sydney (1998).
Google Scholar
Huang X., Acero A., Adcock J., Hon H-W., Goldsmith J., Liu J., and Plumpe M.: Whistler: A Trainable Text-to-Speech System; Proceedings of ICSLP’96, Philadelphia, (1996) 2387–2390.
Google Scholar
Davis S., Mermelstein P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Trans. ASSP, ASSP-28 (1980) 357–366.
Article Google Scholar
Tychtl Z., Psutka J.: Speech Production Based on the Mel-Frequency Cepstral Co-efficients. Proceedings of Eurospeech’99 (1999).
Google Scholar

Download references

Author information

Authors and Affiliations

University of West Bohemia, Department of Cybernetics, Univerzitní 8, 306 14, Plzeň, Czech Republic
Jindřich Matoušek, Josef Psutka & Zbyněk Tychtl

Authors

Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar
Josef Psutka
View author publications
You can also search for this author in PubMed Google Scholar
Zbyněk Tychtl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matoušek, J., Psutka, J., Tychtl, Z. (1999). Statistical Approach to the Automatic Synthesis of Czech Speech. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_72

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_72
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics