Lightly Supervised Acoustic Model Training for Mandarin Continuous Speech Recognition

Li, Xiangang; Pang, Zaihu; Wu, Xihong

doi:10.1007/978-3-642-36669-7_88

Xiangang Li^19,20,
Zaihu Pang^19,20 &
Xihong Wu^19,20,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7751))

Included in the following conference series:

International Conference on Intelligent Science and Intelligent Data Engineering

2439 Accesses

Abstract

This paper investigates a kind of lightly supervised acoustic model training method for Mandarin continues speech recognition system. The speech materials with rough transcription, which provide some light supervision for acoustic model training, are available in various forms these days. In this work, the quality problem of this kind of data is classified into two types: the first is non-speech and low-quality speech in the corpora, while the second is the transcription errors. A framework is proposed to tackle these two types separately: the speech recognition with transcription-relevant language model is adopted to remove the first type, while with general language model to provide candidate transcription errors which are checked by the final automatic verification process. The performance of proposed framework was evaluated from two aspects: the data quality has significantly improved, and the speech recognition results show that a 21.88% relative CER reduction was obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lamel, L., Gauvain, J., Adda, G.: Lightly Supervised and Unsupervised Acoustic Model Training. Computer Speech and Language 16, 115–129 (2002)
Article Google Scholar
Wessel, F., Ney, H.: Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing 13(1), 23–31 (2005)
Article Google Scholar
Wang, L., Gales, M.J.F., Woodland, P.C.: Unsupervised training for Mandarin Broadcast News and Conversation Transcription. In: Proc. ICASSP, vol. 4, pp. 353–356 (2007)
Google Scholar
Fraga-Silva, T., Gauvain, J., Lamel, L.: Lattice-based unsupervised acoustic model training. In: Proc. ICASSP, pp. 4656–4659 (2011)
Google Scholar
Kawahara, T., Mimura, M., Akita, Y.: Language model transformation applied to lightly supervised training of acoustic model for congress meetings. In: Proc. ICASSP, pp. 3853–3856 (2009)
Google Scholar
Nguyen, L., Xiang, B.: Light Supervision in Acoustic Model Training. In: Proc. of ICASSP, vol. 1, pp. 185–188 (2004)
Google Scholar
Chen, B., Kuo, J.W., Tsai, W.H.: Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription. In: Proc. of ICASSP, vol. 1, pp. 770–780 (2004)
Google Scholar
Pitz, M., Molau, S., Schluter, R., Ney, H.: Automatic Transcription Verification of Broadcast News and Similar Speech Corpora. In: Proc. DARPA Broadcast News Workshop, pp. 157–159 (1999)
Google Scholar
Kurata, G., Itoh, N., Nishimura, M.: Acoustic Model Training with Detecting Transcription Errors in the Training data. In: Proc. of INTERSPEECH, pp. 1689–1692 (2011)
Google Scholar
Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication 45(4), 355–470 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Hearing Research Center, Peking University, Beijing, 100871, China
Xiangang Li, Zaihu Pang & Xihong Wu
Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
Xiangang Li, Zaihu Pang & Xihong Wu
College of Computer Science and Technology, Jilin University, Changchun, 130012, China
Xihong Wu

Authors

Xiangang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zaihu Pang
View author publications
You can also search for this author in PubMed Google Scholar
Xihong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanjing University of Science & Technology, 210094, Nanjing, China
Jian Yang
Department of Psychology, Peking University, 100871, Beijing, China
Fang Fang
School of Automation, Southeast University, 210096, Nanjing, P.R. China
Changyin Sun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Pang, Z., Wu, X. (2013). Lightly Supervised Acoustic Model Training for Mandarin Continuous Speech Recognition. In: Yang, J., Fang, F., Sun, C. (eds) Intelligent Science and Intelligent Data Engineering. IScIDE 2012. Lecture Notes in Computer Science, vol 7751. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36669-7_88

Download citation

DOI: https://doi.org/10.1007/978-3-642-36669-7_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36668-0
Online ISBN: 978-3-642-36669-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics