Abstract
Unit selection is a very popular approach to speech synthesis. It is known for its ability to produce nearly natural-sounding synthetic speech, but, at the same time, also for its need for very large speech corpora. In addition, unit selection is also known to be very sensitive to the quality of the source speech corpus the speech is synthesised from and its textual, phonetic and prosodic annotations and indexation. Given the enormous size of current speech corpora, manual annotation of the corpora is a lengthy process. Despite this fact, human annotators do make errors. In this paper, the impact of annotation errors on the quality of unit-selection-based synthetic speech is analysed. Firstly, an analysis and categorisation of annotation errors is presented. Then, a speech synthesis experiment, in which the same utterances were synthesised by unit-selection systems with and without annotation errors, is described. Results of the experiment and the options for fixing the annotation errors are discussed as well.
Support for this work was provided by the TA CR, project No. TA01030476, and by the European Regional Development Fund, project “New Technologies for Information Society”, European Centre of Excellence, ED1.1.00/02.0090.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi Search for Fast Unit Selection Synthesis. In: Proc. Interspeech, Makuhari, Japan, pp. 174–177 (2010)
Hanzlíček, Z.: Czech HMM-Based Speech Synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 291–298. Springer, Heidelberg (2010)
Cox, S., Brady, R., Jackson, P.: Techniques for Accurate Automatic Annotation of Speech Waveforms. In: Proc. ICSLP, Sydney, Australia (1998)
Tachibana, R., Nagano, T., Kurata, G., Nishimura, M., Babaguchi, N.: Preliminary Experiments Toward Automatic Generation of New TTS Voices from Recorded Speech Alone. In: Proc. Interspeech, Antwerp, Belgium, pp. 1917–1920 (2007)
Aylett, M.P., King, S., Yamagishi, J.: Speech Synthesis Without a Phone Inventory. In: Proc. Interspeech, Brighton, England, pp. 2087–2090 (2009)
Matoušek, J., Romportl, J.: Recording and Annotation of Speech Corpus for Czech Unit Selection Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 326–333. Springer, Heidelberg (2007)
Matoušek, J., Tihelka, D., Psutka, J.V.: Experiments with Automatic Segmentation for Czech Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 287–294. Springer, Heidelberg (2003)
Švec, J., Šmídl, L.: Prototype of Czech Spoken Dialog System with Mixed Initiative for Railway Information Service. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 568–575. Springer, Heidelberg (2010)
Železný, M., Krňoul, Z., Císař, P., Matoušek, J.: Design, Implementation and Evaluation of the Czech Realistic Audio-Visual Speech Synthesis. Signal Processing 12, 3657–3673 (2006)
Müller, L., Psutka, J.V., Šmídl, L.: Design of Speech Recognition Engine. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 259–264. Springer, Heidelberg (2000)
Šmídl, L., Trmal, J.: Keyword Spotting Result Post-processing to Reduce False Alarms. In: Recent Advances in Signals and Systems, vol. 9, pp. 49–52. WSEAS Press, Budapest (2009)
Malfrere, F., Deroo, O., Dutoit, T., Ris, C.: Phonetic Alignment: Speech Synthesis-Based Vs. Viterbi-Based. Speech Communication 40, 503–515 (2003)
Lu, H., Wei, S., Dai, L., Wang, R.-H.: Automatic Error Detection for Unit Selection Speech Synthesis Using Log Likelihood Ratio Based SVM Classifier. In: Proc. Interspeech, Makuhari, Japan, pp. 162–165 (2010)
Grůber, M.: Acoustic Analysis of Czech Expressive Recordings from a Single Speaker in Terms of Various Communicative Functions. In: Proc. ISSPIT, Bilbao, Spain, pp. 267–272 (2011)
Přibil, J., Přibilová, A.: An Experiment with Evaluation of Emotional Speech Conversion by Spectrograms. Measurement Science Review 10, 72–77 (2010)
Matoušek, J., Skarnitzl, R., Machač, P., Trmal, J.: Identification and Automatic Detection of Parasitic Speech Sounds. In: Proc. Interspeech, Brighton, England, pp. 876–879 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matoušek, J., Tihelka, D., Šmídl, L. (2012). On the Impact of Annotation Errors on Unit-Selection Speech Synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_55
Download citation
DOI: https://doi.org/10.1007/978-3-642-32790-2_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32789-6
Online ISBN: 978-3-642-32790-2
eBook Packages: Computer ScienceComputer Science (R0)