Abstract
This paper investigates a kind of lightly supervised acoustic model training method for Mandarin continues speech recognition system. The speech materials with rough transcription, which provide some light supervision for acoustic model training, are available in various forms these days. In this work, the quality problem of this kind of data is classified into two types: the first is non-speech and low-quality speech in the corpora, while the second is the transcription errors. A framework is proposed to tackle these two types separately: the speech recognition with transcription-relevant language model is adopted to remove the first type, while with general language model to provide candidate transcription errors which are checked by the final automatic verification process. The performance of proposed framework was evaluated from two aspects: the data quality has significantly improved, and the speech recognition results show that a 21.88% relative CER reduction was obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lamel, L., Gauvain, J., Adda, G.: Lightly Supervised and Unsupervised Acoustic Model Training. Computer Speech and Language 16, 115–129 (2002)
Wessel, F., Ney, H.: Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing 13(1), 23–31 (2005)
Wang, L., Gales, M.J.F., Woodland, P.C.: Unsupervised training for Mandarin Broadcast News and Conversation Transcription. In: Proc. ICASSP, vol. 4, pp. 353–356 (2007)
Fraga-Silva, T., Gauvain, J., Lamel, L.: Lattice-based unsupervised acoustic model training. In: Proc. ICASSP, pp. 4656–4659 (2011)
Kawahara, T., Mimura, M., Akita, Y.: Language model transformation applied to lightly supervised training of acoustic model for congress meetings. In: Proc. ICASSP, pp. 3853–3856 (2009)
Nguyen, L., Xiang, B.: Light Supervision in Acoustic Model Training. In: Proc. of ICASSP, vol. 1, pp. 185–188 (2004)
Chen, B., Kuo, J.W., Tsai, W.H.: Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription. In: Proc. of ICASSP, vol. 1, pp. 770–780 (2004)
Pitz, M., Molau, S., Schluter, R., Ney, H.: Automatic Transcription Verification of Broadcast News and Similar Speech Corpora. In: Proc. DARPA Broadcast News Workshop, pp. 157–159 (1999)
Kurata, G., Itoh, N., Nishimura, M.: Acoustic Model Training with Detecting Transcription Errors in the Training data. In: Proc. of INTERSPEECH, pp. 1689–1692 (2011)
Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication 45(4), 355–470 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, X., Pang, Z., Wu, X. (2013). Lightly Supervised Acoustic Model Training for Mandarin Continuous Speech Recognition. In: Yang, J., Fang, F., Sun, C. (eds) Intelligent Science and Intelligent Data Engineering. IScIDE 2012. Lecture Notes in Computer Science, vol 7751. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36669-7_88
Download citation
DOI: https://doi.org/10.1007/978-3-642-36669-7_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36668-0
Online ISBN: 978-3-642-36669-7
eBook Packages: Computer ScienceComputer Science (R0)