Skip to main content

On the Impact of Annotation Errors on Unit-Selection Speech Synthesis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7499))

Abstract

Unit selection is a very popular approach to speech synthesis. It is known for its ability to produce nearly natural-sounding synthetic speech, but, at the same time, also for its need for very large speech corpora. In addition, unit selection is also known to be very sensitive to the quality of the source speech corpus the speech is synthesised from and its textual, phonetic and prosodic annotations and indexation. Given the enormous size of current speech corpora, manual annotation of the corpora is a lengthy process. Despite this fact, human annotators do make errors. In this paper, the impact of annotation errors on the quality of unit-selection-based synthetic speech is analysed. Firstly, an analysis and categorisation of annotation errors is presented. Then, a speech synthesis experiment, in which the same utterances were synthesised by unit-selection systems with and without annotation errors, is described. Results of the experiment and the options for fixing the annotation errors are discussed as well.

Support for this work was provided by the TA CR, project No. TA01030476, and by the European Regional Development Fund, project “New Technologies for Information Society”, European Centre of Excellence, ED1.1.00/02.0090.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi Search for Fast Unit Selection Synthesis. In: Proc. Interspeech, Makuhari, Japan, pp. 174–177 (2010)

    Google Scholar 

  2. Hanzlíček, Z.: Czech HMM-Based Speech Synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 291–298. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Cox, S., Brady, R., Jackson, P.: Techniques for Accurate Automatic Annotation of Speech Waveforms. In: Proc. ICSLP, Sydney, Australia (1998)

    Google Scholar 

  4. Tachibana, R., Nagano, T., Kurata, G., Nishimura, M., Babaguchi, N.: Preliminary Experiments Toward Automatic Generation of New TTS Voices from Recorded Speech Alone. In: Proc. Interspeech, Antwerp, Belgium, pp. 1917–1920 (2007)

    Google Scholar 

  5. Aylett, M.P., King, S., Yamagishi, J.: Speech Synthesis Without a Phone Inventory. In: Proc. Interspeech, Brighton, England, pp. 2087–2090 (2009)

    Google Scholar 

  6. Matoušek, J., Romportl, J.: Recording and Annotation of Speech Corpus for Czech Unit Selection Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 326–333. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Matoušek, J., Tihelka, D., Psutka, J.V.: Experiments with Automatic Segmentation for Czech Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 287–294. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Švec, J., Šmídl, L.: Prototype of Czech Spoken Dialog System with Mixed Initiative for Railway Information Service. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 568–575. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Železný, M., Krňoul, Z., Císař, P., Matoušek, J.: Design, Implementation and Evaluation of the Czech Realistic Audio-Visual Speech Synthesis. Signal Processing 12, 3657–3673 (2006)

    Article  Google Scholar 

  10. Müller, L., Psutka, J.V., Šmídl, L.: Design of Speech Recognition Engine. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 259–264. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Šmídl, L., Trmal, J.: Keyword Spotting Result Post-processing to Reduce False Alarms. In: Recent Advances in Signals and Systems, vol. 9, pp. 49–52. WSEAS Press, Budapest (2009)

    Google Scholar 

  12. Malfrere, F., Deroo, O., Dutoit, T., Ris, C.: Phonetic Alignment: Speech Synthesis-Based Vs. Viterbi-Based. Speech Communication 40, 503–515 (2003)

    Article  Google Scholar 

  13. Lu, H., Wei, S., Dai, L., Wang, R.-H.: Automatic Error Detection for Unit Selection Speech Synthesis Using Log Likelihood Ratio Based SVM Classifier. In: Proc. Interspeech, Makuhari, Japan, pp. 162–165 (2010)

    Google Scholar 

  14. Grůber, M.: Acoustic Analysis of Czech Expressive Recordings from a Single Speaker in Terms of Various Communicative Functions. In: Proc. ISSPIT, Bilbao, Spain, pp. 267–272 (2011)

    Google Scholar 

  15. Přibil, J., Přibilová, A.: An Experiment with Evaluation of Emotional Speech Conversion by Spectrograms. Measurement Science Review 10, 72–77 (2010)

    Article  Google Scholar 

  16. Matoušek, J., Skarnitzl, R., Machač, P., Trmal, J.: Identification and Automatic Detection of Parasitic Speech Sounds. In: Proc. Interspeech, Brighton, England, pp. 876–879 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matoušek, J., Tihelka, D., Šmídl, L. (2012). On the Impact of Annotation Errors on Unit-Selection Speech Synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32790-2_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32789-6

  • Online ISBN: 978-3-642-32790-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics