Abstract
We compare two approaches to automatic detection of annotation errors in single-speaker read-speech corpora used for speech synthesis: anomaly- and classification-based detection. Both approaches principally differ in that the classification-based approach needs to use both correctly annotated and misannotated words for training. On the other hand, the anomaly-based detection approach needs only the correctly annotated words for training (plus a few misannotated words for validation). We show that both approaches lead to statistically comparable results when all available misannotated words are utilized during detector/classifier development. However, when a smaller number of misannotated words are used, the anomaly detection framework clearly outperforms the classification-based approach. A final listening test showed the effectiveness of the annotation error detection for improving the quality of synthetic speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boeffard, O., Charonnat, L., Maguer, S.L., Lolive, D., Vidal, G.: Towards fully automatic annotation of audiobooks for TTS. In: Language Resources and Evaluation Conference, Istanbul, Turkey, pp. 975–980 (2012)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1/3), 389–422 (2002)
Kala, J., Matoušek, J.: Very fast unit selection using Viterbi search with zero-concatenation-cost chains. In: IEEE International Conference on Acoustics Speech and Signal Processing, Florence, Italy, pp. 2569–2573 (2014)
Matoušek, J., Romportl, J.: Recording and annotation of speech corpus for czech unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 326–333. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74628-7_43
Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: INTERSPEECH, Brisbane, Australia, pp. 1626–1629 (2008)
Matoušek, J., Tihelka, D.: Anomaly-based annotation errors detection in TTS corpora. In: INTERSPEECH, Dresden, Germany, pp. 314–318 (2015)
Matoušek, J., Tihelka, D.: On the influence of the number of anomalous and normal examples in anomaly-based annotation errors detection. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS, vol. 9924, pp. 326–334. Springer, Cham (2016). doi:10.1007/978-3-319-45510-5_37
Matoušek, J., Tihelka, D.: Anomaly-based annotation error detection in speech-synthesis corpora. Comput. Speech Lang. 46, 1–35 (2017)
Matoušek, J., Tihelka, D., Šmídl, L.: On the impact of annotation errors on unit-selection speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 456–463. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32790-2_55
Meinedo, H., Neto, J.: Automatic speech annotation and transcription in a broadcast news task. In: ISCA Workshop on Multilingual Spoken Document Retrieval, Hong Kong, pp. 95–100 (2003)
Pedregosa, F., Varoquaux, G., Gramfort, A., Thirion, V.M.B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perror, M., Duchesnay, É.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Salzberg, S.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Disc. 328, 317–328 (1997)
Tachibana, R., Nagano, T., Kurata, G., Nishimura, M., Babaguchi, N.: Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone. In: INTERSPEECH, Antwerp, Belgium, pp. 1917–1920 (2007)
Young, S., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: HTK Book (for HTK Version 3.4). Cambridge University, Cambridge (2006)
Acknowledgments
This research was supported by the Czech Science Foundation (GA CR), project No. GA16-04420S. The access to the MetaCentrum clusters provided under the programme LM2015042 is highly appreciated.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Matoušek, J., Tihelka, D. (2017). Annotation Error Detection: Anomaly Detection vs. Classification. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)