Acoustic Modeling with Deep Belief Networks for Russian Speech Recognition

Zulkarneev, Mikhail; Grigoryan, Ruben; Shamraev, Nikolay

doi:10.1007/978-3-319-01931-4_3

Mikhail Zulkarneev²²,
Ruben Grigoryan²² &
Nikolay Shamraev²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

International Conference on Speech and Computer

1215 Accesses
3 Citations

Abstract

This paper presents continuous Russian speech recognition with deep belief networks in conjunction with HMM. Recognition is performed in two stages. In the first phase deep belief networks are used to calculate the phoneme state probability for feature vectors describing speech. In the second stage, these probabilities are used by Viterbi decoder for generating resulting sequence of words. Two-stage training procedure of deep belief networks is used based on restricted Boltzmann machines. In the first stage neural network is represented as a stack of restricted Boltzmann machines and sequential training is performed, when the previous machine output is the input to the next. After a rough adjustment of the weights second stage is performed using a back-propagation training procedure. The advantage of this method is that it allows usage of unlabeled data for training. It makes the training more robust and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285 (1989)
Article Google Scholar
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997)
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.-R., Jaitly, N., Senior, A.W., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal Processing Magazine (2012)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Young, S.J., Russel, N.H., Thornton, J.H.S.: Token Passing: A Simple Conceptual Model for Connected Speech Recognition Systems, Cambridge University, technical report (1989)
Google Scholar
Young, S.J.: The HTK Book. Version 3.4 (2006)
Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1711–1800 (2002)
Article MathSciNet Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: A CPU and GPU Math Expression Compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), Austin, June 30-July 3 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

FSSI Research Institute “Spezvuzautomatika”, Rostov-on-Don, Russia
Mikhail Zulkarneev, Ruben Grigoryan & Nikolay Shamraev

Authors

Mikhail Zulkarneev
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Grigoryan
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Shamraev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Miloš Železný
University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal
Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation for the Russian Academy of Sciences, 14-th line, 39, 199178, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zulkarneev, M., Grigoryan, R., Shamraev, N. (2013). Acoustic Modeling with Deep Belief Networks for Russian Speech Recognition. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-01931-4_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Acoustic Modeling with Deep Belief Networks for Russian Speech Recognition