A Comparison of RNN LM and FLM for Russian Speech Recognition

Kipyatkova, Irina; Karpov, Alexey

doi:10.1007/978-3-319-23132-7_5

Irina Kipyatkova^7,8 &
Alexey Karpov^7,9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1622 Accesses
2 Citations

Abstract

In the paper, we describe a research of recurrent neural network (RNN) language model (LM) for N-best list rescoring for automatic continuous Russian speech recognition and make a comparison of it with factored language model (FLM). We tried RNN with different number of units in the hidden layer. For FLM creation, we used five linguistic factors: word, lemma, stem, part-of-speech, and morphological tag. All models were trained on the text corpus of 350M words. Also we made linear interpolation of RNN LM and FLM with the baseline 3-gram LM. We achieved the relative WER reduction of 8 % using FLM and 14 % relative WER reduction using RNN LM with respect to the baseline model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Article Google Scholar
Huang, Z., Zweig, G., Dumoulin, B.: Cache based recurrent neural network language model inference for first pass speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6354–6358. IEEE (2014)
Google Scholar
Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceedings of the SPECOM, pp. 515–520 (2009)
Google Scholar
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Google Scholar
Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)
Article Google Scholar
Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 219–226. Springer, Heidelberg (2013)
Chapter Google Scholar
Kipyatkova, I., Karpov, A.: Development of factored language models for automatic Russian speech recognition. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, pp. 234–246 (2015)
Google Scholar
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proceedings of APSIPA ASC 2009, 2009 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 131–137. International Organizing Committee (2009)
Google Scholar
Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE (2011)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010
Google Scholar
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-recurrent neural network language modeling toolkit. In: Proceedings of the ASRU Workshop, pp. 196–201 (2011)
Google Scholar
Schwenk, H., Gauvain, J.L.: Training neural network language models on very large corpora. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 201–208 (2005)
Google Scholar
Shi, Y., Larson, M., Wiggers, P., Jonker, C.M.: Exploiting the succeeding words in recurrent neural network language models. In: INTERSPEECH, pp. 632–636 (2013)
Google Scholar
Sokirko, A.: Morphological modules on the website. In: Proceedings of Dialog 2004 International Conference, pp. 559–564 (2004). www.aot.ru
Stolcke, A., Zheng, J., Wang, W., Abrash, V.: Srilm at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, p. 5 (2011)
Google Scholar
Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schluter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)
Google Scholar
Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Vazhenina, D., Markov, K.: Evaluation of advanced language modeling techniques for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 124–131. Springer, Heidelberg (2013)
Chapter Google Scholar
Zulkarneev, M., Penalov, S.: System of speech recognition for Russian language, using deep neural networks and finite state transducers. Neurocomput. Develop. Appl. 10, 40–46 (2013)
Google Scholar

Download references

Acknowledgments

This research is partially supported by the Council for Grants of the President of Russia (Projects No. MK-5209.2015.8 and MD-3035.2015.8), by the Russian Foundation for Basic Research (Projects No. 15-07-04415 and 15-07-04322), and by the Government of the Russian Federation (Grant No. 074-U01).

Author information

Authors and Affiliations

SPIIRAS, 39, 14th Line, St. Petersburg, 199178, Russia
Irina Kipyatkova & Alexey Karpov
SUAI, 67, Bolshaya Morskaia, St. Petersburg, 199000, Russia
Irina Kipyatkova
University ITMO, 49 Kronverksky Pr., St. Petersburg, 197101, Russia
Alexey Karpov

Authors

Irina Kipyatkova
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Karpov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina Kipyatkova .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kipyatkova, I., Karpov, A. (2015). A Comparison of RNN LM and FLM for Russian Speech Recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_5
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics