Abstract
When we started to deal with the unit selection technique in ARTIC TTS, the question of the choice of the unit type used within the system was being dealt with. Although the basic version of our TTS system is based on triphones, we decided on the use of diphones in unit selection – mainly due to our concerns about the susceptibility of the unit selection technique to segmentation inaccuracies, and due to a limited experience with the overall system behaviour. However, we also planned to examine the possibilities of the use of triphones. As the first version of our unit selection is being built at present, this paper will examine whether the use of diphones can bring a significant advantage over the use of triphones, and whether there is a clear reason why one type of units behaves better than the other.
This research was supported by the Grant Agency of the Czech Republic, project no. GACR 102/06/P205, and by the Academy of Sciences of the Czech Republic, project no. 1ET101470416.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Matoušek, J., Tihelka, D., Psutka, J.: Influence of Variable-Length Speech Units on Quality of Synthetic Speech Signal. In: Proceedings of NORSIG 2002, Norway (2002)
Hon, H.-W., Acero, A., Huang, X., Liu, J., Plumpe, M.: Automatic generation of synthesis units for trainable text-to-speech systems. In: Proceedings of ICASSP 1998, Seattle, vol. 1, pp. 2293–2296 (1998)
Clark, R.A.J., Richmond, K., King, S.: Festival 2 – Build Your Own General Purpose Unit Selection Speech Synthesizer. In: Proceedings of ISCA Speech Synthesis Workshop, Pittsburgh, pp. 173–178 (2004)
Beutnagel, M., Conkie, A., Syrdal, A.K.: Diphone Synthesis using Unit Selection. In: Proceedings of 3rd ESCA/COCOSDA Speech Synthesis Workshop, Jenolan Caves, Australia, pp. 231–236 (1998)
Conkie, A.: A robust unit selection system for speech synthesis. In: Proceedings of Joint Meeting of ASA/EAA/DAGA in Berlin, Germany (1999)
Tihelka, D.: Symbolic Prosody Driven Unit Selection for Highly Natural Synthetic Speech. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 2525–2528 (2005)
Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent Improvements on ARTIC: Czech Text-to-Speech System. In: Proceedings of ICSLP 2004, Jeju Island, Korea, vol. III, pp. 1933–1936 (2004)
Matoušek, J., Psutka, J., Krůta, J.: On Building Speech Corpus for Concatenation-Based Speech Synthesis. In: Proceedings of Eurospeech 2001, Ålborg, vol. 3, pp. 2047–2050 (2001)
Matoušek, J., Tihelka, D., Psutka, J.: Automatic Segmentation for Czech Concatenative Speech Synthesis Using Statistical Approach with Boundary-Specific Correction. In: Proceedings of Eurospeech 2003, Geneva, pp. 301–304 (2003)
Stylianou, Y., Syrdal, A.K.: Perceptual and Objective Detection of Discontinuities in Concatenative Speech Synthesis. In: Proceedings of ICASSP, Salt Lake City, vol. 2, pp. 837–840 (2001)
Vepa, J., King, S.: Join Cost for Unit Selection Speech Synthesis. In: Text to Speech Synthesis: New Paradigms and Advances, ch. 3, pp. 35–62. Prentice Hall PTR, Englewood Cliffs (2004)
Tihelka, D., Matoušek, J.: Revealing the most Significant Deterioration Factors in Single Candidate Synthetic Speech. In: Proceedings of SPECOM 2005, Greece, pp. 171–174 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tihelka, D., Matoušek, J. (2006). Diphones vs. Triphones in Czech Unit Selection TTS. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_67
Download citation
DOI: https://doi.org/10.1007/11846406_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)