Experiments on Reducing Footprint of Unit Selection TTS System

Hanzlíček, Zdeněk; Matoušek, Jindřich; Tihelka, Daniel

doi:10.1007/978-3-642-40585-3_32

Zdeněk Hanzlíček²⁰,
Jindřich Matoušek²⁰ &
Daniel Tihelka²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2500 Accesses

Abstract

The quality of speech produced by modern TTS systems utilizing the unit selection approach is very high. However, the system demands are enormous. The storage requirements are directly proportional to the size of speech unit inventory from which the units are selected during the synthesis process. This paper presents the analysis and reduction experiments performed on two large speech corpora employed by a unit selection TTS system for the Czech language. A procedure for exclusion of utterances from the default speech corpus based on statistics of the usage of particular speech units was proposed. The exclusion of whole utterances from the corpus was preferred over the exclusion of individual speech units in order to preserve the fundamental feature of the unit selection method – selection of possibly longest sequences of speech units. Experiments were performed for several reduction levels. Resulting synthetic speech was evaluated by a proposed statistics based on the concatenation points density. Moreover, the speech quality was evaluated in listening tests. All reduced versions of TTS system were evaluated as similar or slightly worse than the baseline system.

This work was supported by the Ministry of Industry and Trade of the Czech Republic, project No. MPO FR-TI1/518. The access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the program ”Projects of Large Infrastructure for Research, Development, and Innovations” (LM2010005) is highly appreciated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

OMSST Approach for Unit Selection from Speech Corpus for Telugu TTS

Last Syllable Unit Penalization in Unit Selection TTS

$$\hbox {F}_0$$ Post-Stress Rise Trends Consideration in Unit Selection TTS

References

Dutoit, T.: Corpus-based Speech Synthesis. In: Benesty, H., Sondhi, M., Huang, Y. (eds.) Springer Handbook of Speech Processing, pp. 437–455. Springer, Dordrecht (2008)
Chapter Google Scholar
Chazan, D., Hoory, R., Kons, Z., Sagi, A., Shechtman, S., Sorin, A.: Small Footprint Concatenative Text-to-Speech Synthesis System using Complex Spectral Envelope Modeling. In: Proc. of Interspeech 2005, Lisbon, Portugal, pp. 2569–2572 (2005)
Google Scholar
Strecha, G., Eichner, M., Hoffmann, R.: Line Cepstral Quefrencies and Their Use for Acoustic Inventory Coding. In: Proc. of Interspeech 2007, Antwerp, Belgium, pp. 2873–2876 (2007)
Google Scholar
Matoušek, J., Tihelka, D., Romportl, J.: Current State of Czech Text-to-Speech System ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)
Chapter Google Scholar
Kominek, J., Black, A.W.: Impact of durational outlier removal from unit selection catalogs. In: Proc. of the 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, pp. 155–160 (2004)
Google Scholar
Tihelka, D.: Corpus-based Approach to Unit Selection Speech Unit Inventory Reduction in ARTIC TTS. In: Proc. of 17th Czech-German Workshop on Speech Processing, pp. 160–167. Institute of Photonics and Electronics AS CR, Prague (2007)
Google Scholar
Matoušek, J., Tihelka, D., Romportl, J.: Building of a Speech Corpus Optimised for Unit Selection TTS Synthesis. In: Proc. of LREC 2008, Marrakech, Morocco (2008)
Google Scholar
Young, S.: The HTK Book (for HTK version 3.4). Cambridge University, UK (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Zdeněk Hanzlíček, Jindřich Matoušek & Daniel Tihelka

Authors

Zdeněk Hanzlíček
View author publications
You can also search for this author in PubMed Google Scholar
Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanzlíček, Z., Matoušek, J., Tihelka, D. (2013). Experiments on Reducing Footprint of Unit Selection TTS System. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics