Abstract
This paper concerns limited domain TTS system based on the concatenative method, and presents an algorithm capable to extract the minimal domain-oriented text corpus from the real data of the given domain, while still reaching the maximum coverage of the domain. The proposed approach ensures that the least amount of texts are extracted, containing the most common phrases and (possibly) all the words from the domain. At the same time, it ensures that appropriate phrase overlapping is kept, allowing to find smooth concatenation in the overlapped regions to reach high quality synthesized speech. In addition, several recommendations allowing a speaker to record the corpus more fluently and comfortably are presented and discussed. The corpus building is tested and evaluated on several domains differing in size and nature, and the authors present the results of the algorithm and demonstrate the advantages of using the domain oriented corpus for speech synthesis.
This work was supported by the European Regional Development Fund (ERDF), project “New Technologies for Information Society” (NTIS), European Centre of Excellence, CZ.1.05/1.1.00/02.0090, the Technology Agency of the Czech Republic, project No. TA01030476 and SGS-2013-032.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brenton, H., Gillies, M., Ballin, D., Chatting, D.: The uncanny valley: does it exist. In: 19th British HCI Group Annual Conference: Workshop on Human-animated Character Interaction (2005)
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 174–177 (2010)
Grůber, M., Hanzlíček, Z.: Czech expressive speech synthesis in limited domain: Comparison of unit selection and HMM-based approaches. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 656–664. Springer, Heidelberg (2012)
Matoušek, J., Tihelka, D., Romportl, J.: Building of a speech corpus optimised for unit selection tts synthesis. In: LREC 2008, Proceedings of 6th International Conference on Language Resources and Evaluation. ELRA (2008)
Matoušek, J., Romportl, J.: On building phonetically and prosodically rich speech corpus for text-to-speech synthesis. In: Proc. of the Second IASTED Int. Conf. on Computational intelligence, pp. 442–447. ACTA Press, San Francisco (2006)
Tihelka, D.: Towards automatic measure of similarity for use in unit selection. In: 9th Int. Conf. on Signal Processing, ICSP 2008, Beijing, China, pp. 637–642 (2008)
Black, A.W., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: Proc. ICASSP 2007, pp. 1229–1232 (2007)
Labov, W.: The Social Stratification of English in New York City. Center for Applied Linguistics, Washington, DC (1966)
Jůzová, M., Tihelka, D.: Tuning limited domain speech synthesis using general tts system. Accepted at Text, Speech and Dialogue 2014 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jůzová, M., Tihelka, D. (2014). Minimum Text Corpus Selection for Limited Domain Speech Synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_48
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)