SVM-Based Detection of Misannotated Words in Read Speech Corpora

Matoušek, Jindřich; Tihelka, Daniel

doi:10.1007/978-3-642-40585-3_58

Jindřich Matoušek²⁰ &
Daniel Tihelka²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2396 Accesses
2 Citations

Abstract

Automatic detection of misannotated words in single-speaker read-speech corpora is investigated in this paper. Support vector machine (SVM) classifier was proposed to detect the misannotated words. Its performance was evaluated with respect to various word-level feature sets. The SVM classifier was shown to perform very well with both high precision and recall scores and with F1 measure being almost 88%. This is a statistically significant improvement over a traditionally used outlier-based detection method.

The work has been supported by the Technology Agency of the Czech Republic, project No. TA01030476, and by the European Regional Development Fund (ERDF), project “New Technologies for Information Society” (NTIS), European Centre of Excellence, ED1.1.00/02.0090. The access to the MetaCentrum clusters provided under the programme LM2010005 is highly appreciated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Matoušek, J., Tihelka, D., Šmídl, L.: On the impact of annotation errors on unit-selection speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 456–463. Springer, Heidelberg (2012)
Chapter Google Scholar
Matoušek, J., Romportl, J.: Recording and Annotation of Speech Corpus for Czech Unit Selection Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 326–333. Springer, Heidelberg (2007)
Chapter Google Scholar
Adell, J., Agüero, P.D., Bonafonte, A.: Database pruning for unsupervised building of text-to-speech voices. In: Proc. ICASSP, Toulouse, France, pp. 889–892 (2006)
Google Scholar
Tachibana, R., Nagano, T., Kurata, G., Nishimura, M., Babaguchi, N.: Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone. In: Proc. INTERSPEECH, Antwerp, Belgium, pp. 1917–1920 (2007)
Google Scholar
Wei, S., Hu, G., Hu, Y., Wang, R.H.: A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Commun. 51(10), 896–905 (2009)
Article Google Scholar
Kominek, J., Black, A.: Impact of durational outlier removal from unit selection catalogs. In: Proc. SSW, Pittsburgh, USA, pp. 155–160 (2004)
Google Scholar
Lu, H., Wei, S., Dai, L., Wang, R.H.: Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier. In: Proc. INTERSPEECH, Makuhari, Japan, pp. 162–165 (2010)
Google Scholar
Wang, W.Y., Georgila, K.: Automatic detection of unnatural word-level segments in unit-selection speech synthesis. In: Proc. ASRU, Hawaii, USA, pp. 289–294 (2011)
Google Scholar
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proc. INTERSPEECH, Makuhari, Japan, pp. 174–177 (2010)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: HTK Book (for HTK Version 3.4). The Cambridge University, Cambridge (2006)
Google Scholar
Matoušek, J., Tihelka, D., Psutka, J.V.: Experiments with Automatic Segmentation for Czech Speech Synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 287–294. Springer, Heidelberg (2003)
Chapter Google Scholar
Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proc. INTERSPEECH, Brisbane, Australia, pp. 1626–1629 (2008)
Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Leaming 20(3), 273–279 (1995)
MATH Google Scholar
Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proc. Interspeech, Lyon, France (2013)
Google Scholar
Romportl, J., Kala, J.: Prosody modelling in Czech text-to-speech synthesis. In: Proc. SSW, Bonn, Germany, pp. 200–205 (2007)
Google Scholar
Taylor, P., Caley, R., Black, A., King, S.: Edinburgh speech tools library: System documentation (1999), http://www.cstr.ed.ac.uk/projects/speech_tools/manual-1.2.0/
Pedregosa, F., Varoquaux, G., Gramfort, A., Thirion, V.M.B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perror, M.: Édouard Duchesnay: Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Přibil, J., Přibilová, A.: Comparison of spectral and prosodic parameters of male and female emotional speech in Czech and Slovak. In: Proc. ICASSP, Prague, Czech Republic, pp. 4720–4723 (2011)
Google Scholar
Ircing, P., Psutka, J., Psutka, J.V.: Using morphological information for robust language modeling in Czech ASR system. IEEE Trans. Audio Speech Lang. Process. 17, 840–847 (2009)
Article Google Scholar
Psutka, J., Švec, J., Psutka, J.V., Vaněk, J., Pražák, A., Šmídl, L., Ircing, P.: System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive. EURASIP J. Audio Speech Music Process. 10 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, Dept. of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Jindřich Matoušek & Daniel Tihelka

Authors

Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matoušek, J., Tihelka, D. (2013). SVM-Based Detection of Misannotated Words in Read Speech Corpora. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_58

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics