Skip to main content

A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition

  • Conference paper
Intelligence Science and Big Data Engineering (IScIDE 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8261))

Abstract

This paper compared the performance of different acoustic modeling units in deep neural networks (DNNs) based large vocabulary continuous speech recognition (LVCSR) systems for Chinese. Recently, the deep neural networks based acoustic modeling method has achieved very competitive performance for many speech recognition tasks, and has become the focus of current LVCSR research. Some previous work have studied the context independent and context dependent DNNs based acoustic models. For Chinese, a syllabic language, the choice of basic modeling units under the background of DNNs based LVCSR systems is a very important issue. In this work, three basic modeling units: syllables, Initial/Finals, phones, are discussed and compared. Experimental results showed that, in the DNNs based systems, the context dependent phones obtain the best performance, and the context independent syllables have the similar performance with the context dependent Initial/Finals. Besides, how the number of clustered states impact on the performance of DNNs based systems is also discussed, which showed some different properties from the GMMs based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  2. Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Compution 18(7), 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  3. Mohamed, A., Dahl, G., Hinton, G.: Deep belief networks for phone recognition. In: Proc. NIPS Workshop Deep Learning for Speech Recognition and Related Applications (2009)

    Google Scholar 

  4. Mohamed, A., Dahl, G., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Processing 20(1), 14–22 (2012)

    Article  Google Scholar 

  5. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Processing 20(1), 30–42 (2012)

    Article  Google Scholar 

  6. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proc. Interspeech, pp. 437–440 (2011)

    Google Scholar 

  7. Yu, D., Deng, L., Seide, F.: Large vocabulary speech recognition using deep tensor neural networks. In: Proc. Interspeech (2012)

    Google Scholar 

  8. Deng, L., Yu, D.: Deep convex network: A scalable architecture for speech pattern classification. In: Proc. Interspeech, pp. 2285–2288 (2011)

    Google Scholar 

  9. Deng, L., Yu, D., Platt, J.: Scalable stacking and learning for building deep architectures. In: Proc. ICASSP, pp. 2133–2136 (2012)

    Google Scholar 

  10. Yu, D., Deng, L., Li, G., Seide, F.: Discriminative pretraining of deep neural networks. U.S. Patent Filling (November 2011)

    Google Scholar 

  11. Wu, H., Wu, X.H.: Context Dependent Syllable Acoustic model for Continuous Chinese speech recognition. In: Proc. Interspeech, pp. 1713–1716 (2007)

    Google Scholar 

  12. Hinton, G.: A practical guide to training restricted Boltzmann machines. Technical Report UTML TR 2010-003, University of Toronto (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, X., Yang, Y., Wu, X. (2013). A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition. In: Sun, C., Fang, F., Zhou, ZH., Yang, W., Liu, ZY. (eds) Intelligence Science and Big Data Engineering. IScIDE 2013. Lecture Notes in Computer Science, vol 8261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42057-3_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-42057-3_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-42056-6

  • Online ISBN: 978-3-642-42057-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics