Abstract
The manual segmentation of speech databases still outperforms the automatic segmentation algorithms and, at the same time, the quality of resulting synthetic voice depends on the accuracy of the phonetic segmentation. In this paper we describe a semi-automatic speech segmentation procedure, in which a human expert manually allocates the selected boundaries prior to the automatic segmentation of the rest of the corpus. Segmentation error predictor is designed, estimated and then used to generate a sequence of manual annotations done by an expert. The obtained error response curves are significantly better than random segmentation strategies. The results are presented for two different Polish corpora.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ostendorf, M., Digalakis, V., Kimball, O.: From HMM’s to Segment Models: A unified view of stochastic modeling for speech recognition. IEEE Trans. on Speech and Audio Proc. 4(5), 360–378 (1996)
Szymański, M., Grocholewski, S.: Post-processing of automatic segmentation of speech using dynamic programming. In: Proc. 9th International Conference on Text, Speech and Dialogue (2006)
Szymański, M., Grocholewski, S.: Semi-automatic segmentation of speech: manual segmentation strategy; problem space analysis. In: Proc. CORES 2005, Wroclaw (2005)
Kvale, K.: Segmentation and labelling of speech. Ph.D. thesis, Institutt for Teleteknikk, Trondheim (1993)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
Grocholewski, S.: Corpora speech database for Polish diphones. In: Eurospeech 1997, pp. 1735–1738 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szymański, M., Grocholewski, S. (2008). Error Prediction-Based Semi-automatic Segmentation of Speech Databases. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_69
Download citation
DOI: https://doi.org/10.1007/978-3-540-87391-4_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)