Diphones vs. Triphones in Czech Unit Selection TTS

Tihelka, Daniel; Matoušek, Jindřich

doi:10.1007/11846406_67

Daniel Tihelka²¹ &
Jindřich Matoušek²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1085 Accesses

Abstract

When we started to deal with the unit selection technique in ARTIC TTS, the question of the choice of the unit type used within the system was being dealt with. Although the basic version of our TTS system is based on triphones, we decided on the use of diphones in unit selection – mainly due to our concerns about the susceptibility of the unit selection technique to segmentation inaccuracies, and due to a limited experience with the overall system behaviour. However, we also planned to examine the possibilities of the use of triphones. As the first version of our unit selection is being built at present, this paper will examine whether the use of diphones can bring a significant advantage over the use of triphones, and whether there is a clear reason why one type of units behaves better than the other.

This research was supported by the Grant Agency of the Czech Republic, project no. GACR 102/06/P205, and by the Academy of Sciences of the Czech Republic, project no. 1ET101470416.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

ANNPRO: A Desktop Module for Automatic Segmentation and Transcription

Towards the Description of Multiword Units in Russian Everyday Speech: State-of-the-Art and the Methodology of Further Research

AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes

Article 29 January 2018

References

Matoušek, J., Tihelka, D., Psutka, J.: Influence of Variable-Length Speech Units on Quality of Synthetic Speech Signal. In: Proceedings of NORSIG 2002, Norway (2002)
Google Scholar
Hon, H.-W., Acero, A., Huang, X., Liu, J., Plumpe, M.: Automatic generation of synthesis units for trainable text-to-speech systems. In: Proceedings of ICASSP 1998, Seattle, vol. 1, pp. 2293–2296 (1998)
Google Scholar
Clark, R.A.J., Richmond, K., King, S.: Festival 2 – Build Your Own General Purpose Unit Selection Speech Synthesizer. In: Proceedings of ISCA Speech Synthesis Workshop, Pittsburgh, pp. 173–178 (2004)
Google Scholar
Beutnagel, M., Conkie, A., Syrdal, A.K.: Diphone Synthesis using Unit Selection. In: Proceedings of 3rd ESCA/COCOSDA Speech Synthesis Workshop, Jenolan Caves, Australia, pp. 231–236 (1998)
Google Scholar
Conkie, A.: A robust unit selection system for speech synthesis. In: Proceedings of Joint Meeting of ASA/EAA/DAGA in Berlin, Germany (1999)
Google Scholar
Tihelka, D.: Symbolic Prosody Driven Unit Selection for Highly Natural Synthetic Speech. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 2525–2528 (2005)
Google Scholar
Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent Improvements on ARTIC: Czech Text-to-Speech System. In: Proceedings of ICSLP 2004, Jeju Island, Korea, vol. III, pp. 1933–1936 (2004)
Google Scholar
Matoušek, J., Psutka, J., Krůta, J.: On Building Speech Corpus for Concatenation-Based Speech Synthesis. In: Proceedings of Eurospeech 2001, Ålborg, vol. 3, pp. 2047–2050 (2001)
Google Scholar
Matoušek, J., Tihelka, D., Psutka, J.: Automatic Segmentation for Czech Concatenative Speech Synthesis Using Statistical Approach with Boundary-Specific Correction. In: Proceedings of Eurospeech 2003, Geneva, pp. 301–304 (2003)
Google Scholar
Stylianou, Y., Syrdal, A.K.: Perceptual and Objective Detection of Discontinuities in Concatenative Speech Synthesis. In: Proceedings of ICASSP, Salt Lake City, vol. 2, pp. 837–840 (2001)
Google Scholar
Vepa, J., King, S.: Join Cost for Unit Selection Speech Synthesis. In: Text to Speech Synthesis: New Paradigms and Advances, ch. 3, pp. 35–62. Prentice Hall PTR, Englewood Cliffs (2004)
Google Scholar
Tihelka, D., Matoušek, J.: Revealing the most Significant Deterioration Factors in Single Candidate Synthetic Speech. In: Proceedings of SPECOM 2005, Greece, pp. 171–174 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Daniel Tihelka & Jindřich Matoušek

Authors

Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar
Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tihelka, D., Matoušek, J. (2006). Diphones vs. Triphones in Czech Unit Selection TTS. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_67

Download citation

DOI: https://doi.org/10.1007/11846406_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Diphones vs. Triphones in Czech Unit Selection TTS

Abstract

Access this chapter

Preview

Similar content being viewed by others

ANNPRO: A Desktop Module for Automatic Segmentation and Transcription

Towards the Description of Multiword Units in Russian Everyday Speech: State-of-the-Art and the Methodology of Further Research

AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Diphones vs. Triphones in Czech Unit Selection TTS

Abstract

Access this chapter

Preview

Similar content being viewed by others

ANNPRO: A Desktop Module for Automatic Segmentation and Transcription

Towards the Description of Multiword Units in Russian Everyday Speech: State-of-the-Art and the Methodology of Further Research

AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation