Abstract
Unit selection based text-to-speech synthesis (TTS) can generate high quality speech. However, The HMM-based text-to-speech (HTS) has also advantages such as the lack of spurious errors that are observed in the unit selection scheme. Another advantage is the small memory footprint requirement. Here, we propose a novel hybrid statistical/unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. Listeners preferred the hybrid system over a state-of-the-art HTS baseline system in the A/B preference tests.
This Research is supported by TUBITAK. Project no: 109E281
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lu, H., Ling, Z.H., Lei, M., Wang, C.C., Zhao, H.H., Chen,L.H., Hu,Y. Dai,L.R., Wang, R.H.: The USTC system for Blizzard challenge 2009. In: Blizzard Challenge Workshop (2009)
Kawai, H., Toda, T., Ni, J., Tsuzaki, M., Tokuda, K.: XIMERA: a new TTS from ATR based on corpus-based technologies. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Rouibia, S., Rosec, O.: Unit selection for speech synthesis based on a new acoustic target cost. In: INTERSPEECH, pp. 2565–2568. (2005)
Qian, Y., Yan, Z.J., Wu, Y., Soong, F.K., Zhuang, X., Kong, S.: An HMM trajectory tiling (HTT) approach to high quality TTS. In: INTERSPEECH, pp. 422–425. (2010)
Tiomkin, S., Malah, D., Shechtman, S., Kons, Z.: A hybrid text-to-speech system that combines concatenative and statistical synthesis units. In: Audio, Speech, and Language Processing, IEEE Transactions on, vol. pp. 99. (2010)
Pollet, V., Breen, A.: Synthesis by generation and concatenation of multiform segments. In: INTERSPEECH, pp. 1825–1828. (2008)
Plumpe, M., Acero, A., Hon, H.W., Huang, X.: HMM-based smoothing for concatenative speech synthesis. In: Fifth International Conference on Spoken Language Processing (1998)
Oflazer, K., Inkelas, S.: A finite state pronunciation lexicon for Turkish. In: Proceedings of the EACL Workshop on Finite State Methods in NLP, vol. 82, pp. 900–918. Budapest (2003)
Black, A.W., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: Proceedings of ICASSP, vol. 4, pp. 1229–1232. (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this paper
Cite this paper
Guner, E., Demiroglu, C. (2011). A Small Footprint Hybrid Statistical and Unit Selection Text-to-Speech Synthesis System for Turkish. In: Gelenbe, E., Lent, R., Sakellari, G. (eds) Computer and Information Sciences II. Springer, London. https://doi.org/10.1007/978-1-4471-2155-8_10
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2155-8_10
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2154-1
Online ISBN: 978-1-4471-2155-8
eBook Packages: EngineeringEngineering (R0)