A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition

Li, Xiangang; Yang, Yuning; Wu, Xihong

doi:10.1007/978-3-642-42057-3_60

Xiangang Li²¹,
Yuning Yang²¹ &
Xihong Wu²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8261))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

2480 Accesses
3 Citations

Abstract

This paper compared the performance of different acoustic modeling units in deep neural networks (DNNs) based large vocabulary continuous speech recognition (LVCSR) systems for Chinese. Recently, the deep neural networks based acoustic modeling method has achieved very competitive performance for many speech recognition tasks, and has become the focus of current LVCSR research. Some previous work have studied the context independent and context dependent DNNs based acoustic models. For Chinese, a syllabic language, the choice of basic modeling units under the background of DNNs based LVCSR systems is a very important issue. In this work, three basic modeling units: syllables, Initial/Finals, phones, are discussed and compared. Experimental results showed that, in the DNNs based systems, the context dependent phones obtain the best performance, and the context independent syllables have the similar performance with the context dependent Initial/Finals. Besides, how the number of clustered states impact on the performance of DNNs based systems is also discussed, which showed some different properties from the GMMs based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

DNN-HMM Acoustic Modeling for Large Vocabulary Telugu Speech Recognition

Automatic Speech Recognition Based on Neural Networks

An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition

References

Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Compution 18(7), 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
Mohamed, A., Dahl, G., Hinton, G.: Deep belief networks for phone recognition. In: Proc. NIPS Workshop Deep Learning for Speech Recognition and Related Applications (2009)
Google Scholar
Mohamed, A., Dahl, G., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Processing 20(1), 14–22 (2012)
Article Google Scholar
Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Processing 20(1), 30–42 (2012)
Article Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proc. Interspeech, pp. 437–440 (2011)
Google Scholar
Yu, D., Deng, L., Seide, F.: Large vocabulary speech recognition using deep tensor neural networks. In: Proc. Interspeech (2012)
Google Scholar
Deng, L., Yu, D.: Deep convex network: A scalable architecture for speech pattern classification. In: Proc. Interspeech, pp. 2285–2288 (2011)
Google Scholar
Deng, L., Yu, D., Platt, J.: Scalable stacking and learning for building deep architectures. In: Proc. ICASSP, pp. 2133–2136 (2012)
Google Scholar
Yu, D., Deng, L., Li, G., Seide, F.: Discriminative pretraining of deep neural networks. U.S. Patent Filling (November 2011)
Google Scholar
Wu, H., Wu, X.H.: Context Dependent Syllable Acoustic model for Continuous Chinese speech recognition. In: Proc. Interspeech, pp. 1713–1716 (2007)
Google Scholar
Hinton, G.: A practical guide to training restricted Boltzmann machines. Technical Report UTML TR 2010-003, University of Toronto (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
Xiangang Li, Yuning Yang & Xihong Wu

Authors

Xiangang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuning Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xihong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Automation and Electrical Engineering, University of Science and Technology, Xueyuan Road No. 30, 100083, Beijing, China
Changyin Sun
Department of Psychology, Peking University, Yiheyuan Road No. 5, 100871, Beijing, China
Fang Fang
Department of Computer Science and Technology, Nanjing University, Xianlin Avenue No. 163, 210023, Nanjing, China
Zhi-Hua Zhou
School of Automation, Southeast University, Sipailou No. 2, 210096, Nanjing, China
Wankou Yang
Institute of Automation, Chinese Academy of Sciences, No. 95 East Zhongguancun Road, 100190, Beijing, China
Zhi-Yong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Yang, Y., Wu, X. (2013). A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition. In: Sun, C., Fang, F., Zhou, ZH., Yang, W., Liu, ZY. (eds) Intelligence Science and Big Data Engineering. IScIDE 2013. Lecture Notes in Computer Science, vol 8261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42057-3_60

Download citation

DOI: https://doi.org/10.1007/978-3-642-42057-3_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-42056-6
Online ISBN: 978-3-642-42057-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

DNN-HMM Acoustic Modeling for Large Vocabulary Telugu Speech Recognition

Automatic Speech Recognition Based on Neural Networks

An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

DNN-HMM Acoustic Modeling for Large Vocabulary Telugu Speech Recognition

Automatic Speech Recognition Based on Neural Networks

An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation