Skip to main content

Diphones vs. Triphones in Czech Unit Selection TTS

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Abstract

When we started to deal with the unit selection technique in ARTIC TTS, the question of the choice of the unit type used within the system was being dealt with. Although the basic version of our TTS system is based on triphones, we decided on the use of diphones in unit selection – mainly due to our concerns about the susceptibility of the unit selection technique to segmentation inaccuracies, and due to a limited experience with the overall system behaviour. However, we also planned to examine the possibilities of the use of triphones. As the first version of our unit selection is being built at present, this paper will examine whether the use of diphones can bring a significant advantage over the use of triphones, and whether there is a clear reason why one type of units behaves better than the other.

This research was supported by the Grant Agency of the Czech Republic, project no. GACR 102/06/P205, and by the Academy of Sciences of the Czech Republic, project no. 1ET101470416.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Matoušek, J., Tihelka, D., Psutka, J.: Influence of Variable-Length Speech Units on Quality of Synthetic Speech Signal. In: Proceedings of NORSIG 2002, Norway (2002)

    Google Scholar 

  2. Hon, H.-W., Acero, A., Huang, X., Liu, J., Plumpe, M.: Automatic generation of synthesis units for trainable text-to-speech systems. In: Proceedings of ICASSP 1998, Seattle, vol. 1, pp. 2293–2296 (1998)

    Google Scholar 

  3. Clark, R.A.J., Richmond, K., King, S.: Festival 2 – Build Your Own General Purpose Unit Selection Speech Synthesizer. In: Proceedings of ISCA Speech Synthesis Workshop, Pittsburgh, pp. 173–178 (2004)

    Google Scholar 

  4. Beutnagel, M., Conkie, A., Syrdal, A.K.: Diphone Synthesis using Unit Selection. In: Proceedings of 3rd ESCA/COCOSDA Speech Synthesis Workshop, Jenolan Caves, Australia, pp. 231–236 (1998)

    Google Scholar 

  5. Conkie, A.: A robust unit selection system for speech synthesis. In: Proceedings of Joint Meeting of ASA/EAA/DAGA in Berlin, Germany (1999)

    Google Scholar 

  6. Tihelka, D.: Symbolic Prosody Driven Unit Selection for Highly Natural Synthetic Speech. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 2525–2528 (2005)

    Google Scholar 

  7. Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent Improvements on ARTIC: Czech Text-to-Speech System. In: Proceedings of ICSLP 2004, Jeju Island, Korea, vol. III, pp. 1933–1936 (2004)

    Google Scholar 

  8. Matoušek, J., Psutka, J., Krůta, J.: On Building Speech Corpus for Concatenation-Based Speech Synthesis. In: Proceedings of Eurospeech 2001, Ålborg, vol. 3, pp. 2047–2050 (2001)

    Google Scholar 

  9. Matoušek, J., Tihelka, D., Psutka, J.: Automatic Segmentation for Czech Concatenative Speech Synthesis Using Statistical Approach with Boundary-Specific Correction. In: Proceedings of Eurospeech 2003, Geneva, pp. 301–304 (2003)

    Google Scholar 

  10. Stylianou, Y., Syrdal, A.K.: Perceptual and Objective Detection of Discontinuities in Concatenative Speech Synthesis. In: Proceedings of ICASSP, Salt Lake City, vol. 2, pp. 837–840 (2001)

    Google Scholar 

  11. Vepa, J., King, S.: Join Cost for Unit Selection Speech Synthesis. In: Text to Speech Synthesis: New Paradigms and Advances, ch. 3, pp. 35–62. Prentice Hall PTR, Englewood Cliffs (2004)

    Google Scholar 

  12. Tihelka, D., Matoušek, J.: Revealing the most Significant Deterioration Factors in Single Candidate Synthetic Speech. In: Proceedings of SPECOM 2005, Greece, pp. 171–174 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tihelka, D., Matoušek, J. (2006). Diphones vs. Triphones in Czech Unit Selection TTS. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_67

Download citation

  • DOI: https://doi.org/10.1007/11846406_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39090-9

  • Online ISBN: 978-3-540-39091-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics