Optimal Number of States in HMM-Based Speech Synthesis

Hanzlíček, Zdeněk

doi:10.1007/978-3-319-64206-2_40

Zdeněk Hanzlíček¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1502 Accesses
3 Citations

Abstract

This paper deals with using models with a variable number of states in the HMM-based speech synthesis system. The paper also includes some implementation details on how to use these models in systems based on the HTS toolkit, which cannot handle the models with an unequal number of states directly. A bypass to enable this functionality is proposed here. A data-based method for the determination of the optimal number of states for particular models is proposed here and experimentally tested on 4 large speech corpora. The preference listening test, focused on local differences, proved the preference of the proposed system to the traditional system with 5-state models, while the size of the proposed system (the total number of states) is lower.

This research was supported by the Czech Science Foundation (GA CR), project No. GA16-04420S. Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme CESNET LM2015042, is greatly appreciated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Praat: doing phonetics by computer, www.praat.org.
2.
HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp.
3.
The detailed scheme of the training procedure is more complex, e.g. the reestimation and clustering of models are usually repeated twice.
4.
A bug had to be fixed in HTS toolkit ver.2.2 (file HFB.c) to allow using the 1-state models or else it did not work properly.
5.
Names of HTS tools are stated here to specify the point of transition to 1-state models as precisely as possible.
6.
However, proposed methods are certainly not language-dependent.

References

Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)
Article Google Scholar
Ling, Z.H., Kang, S.Y., Zen, H., Senior, A., Schuster, M., Qian, X.J., Meng, H.M., Deng, L.: Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends. IEEE Signal Process. Mag. 32(3), 35–52 (2015)
Article Google Scholar
Matoušek, J., Tihelka, D., Romportl, J.: Building of a speech corpus optimised for unit selection TTS synthesis. In: Proceedings of LREC (2008)
Google Scholar
Romportl, J., Matoušek, J., Tihelka, D.: Advanced prosody modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 441–447. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30120-2_56
Chapter Google Scholar
Shao, X., Pollet, V., Breen, A.: Refined statistical model tuning for speech synthesis. In: Proceedings of the 7th ISCA Workshop on Speech Synthesis, pp. 284–287 (2010)
Google Scholar
Wells, J.: SAMPA computer readable phonetic alphabet. In: Handbook of Standards and Resources for Spoken Language Systems, pp. 684–732. Mouton de Gruyter, Berlin (1997)
Google Scholar
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, NTIS - New Technology for the Information Society, University of West Bohemia, Univerzitní 22, 306 14, Plzeň, Czech Republic
Zdeněk Hanzlíček

Authors

Zdeněk Hanzlíček
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zdeněk Hanzlíček .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanzlíček, Z. (2017). Optimal Number of States in HMM-Based Speech Synthesis. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_40
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics