Skip to main content

Lightly Supervised Acoustic Model Training for Mandarin Continuous Speech Recognition

  • Conference paper
Intelligent Science and Intelligent Data Engineering (IScIDE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7751))

  • 2439 Accesses

Abstract

This paper investigates a kind of lightly supervised acoustic model training method for Mandarin continues speech recognition system. The speech materials with rough transcription, which provide some light supervision for acoustic model training, are available in various forms these days. In this work, the quality problem of this kind of data is classified into two types: the first is non-speech and low-quality speech in the corpora, while the second is the transcription errors. A framework is proposed to tackle these two types separately: the speech recognition with transcription-relevant language model is adopted to remove the first type, while with general language model to provide candidate transcription errors which are checked by the final automatic verification process. The performance of proposed framework was evaluated from two aspects: the data quality has significantly improved, and the speech recognition results show that a 21.88% relative CER reduction was obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lamel, L., Gauvain, J., Adda, G.: Lightly Supervised and Unsupervised Acoustic Model Training. Computer Speech and Language 16, 115–129 (2002)

    Article  Google Scholar 

  2. Wessel, F., Ney, H.: Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing 13(1), 23–31 (2005)

    Article  Google Scholar 

  3. Wang, L., Gales, M.J.F., Woodland, P.C.: Unsupervised training for Mandarin Broadcast News and Conversation Transcription. In: Proc. ICASSP, vol. 4, pp. 353–356 (2007)

    Google Scholar 

  4. Fraga-Silva, T., Gauvain, J., Lamel, L.: Lattice-based unsupervised acoustic model training. In: Proc. ICASSP, pp. 4656–4659 (2011)

    Google Scholar 

  5. Kawahara, T., Mimura, M., Akita, Y.: Language model transformation applied to lightly supervised training of acoustic model for congress meetings. In: Proc. ICASSP, pp. 3853–3856 (2009)

    Google Scholar 

  6. Nguyen, L., Xiang, B.: Light Supervision in Acoustic Model Training. In: Proc. of ICASSP, vol. 1, pp. 185–188 (2004)

    Google Scholar 

  7. Chen, B., Kuo, J.W., Tsai, W.H.: Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription. In: Proc. of ICASSP, vol. 1, pp. 770–780 (2004)

    Google Scholar 

  8. Pitz, M., Molau, S., Schluter, R., Ney, H.: Automatic Transcription Verification of Broadcast News and Similar Speech Corpora. In: Proc. DARPA Broadcast News Workshop, pp. 157–159 (1999)

    Google Scholar 

  9. Kurata, G., Itoh, N., Nishimura, M.: Acoustic Model Training with Detecting Transcription Errors in the Training data. In: Proc. of INTERSPEECH, pp. 1689–1692 (2011)

    Google Scholar 

  10. Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication 45(4), 355–470 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, X., Pang, Z., Wu, X. (2013). Lightly Supervised Acoustic Model Training for Mandarin Continuous Speech Recognition. In: Yang, J., Fang, F., Sun, C. (eds) Intelligent Science and Intelligent Data Engineering. IScIDE 2012. Lecture Notes in Computer Science, vol 7751. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36669-7_88

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36669-7_88

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36668-0

  • Online ISBN: 978-3-642-36669-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics