A Comparison of Language Model Training Techniques in a Continuous Speech Recognition System for Serbian

Popović, Branislav; Pakoci, Edvin; Pekar, Darko

doi:10.1007/978-3-319-99579-3_54

Branislav Popović^16,18,19,
Edvin Pakoci^16,17 &
Darko Pekar^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

International Conference on Speech and Computer

1404 Accesses
3 Citations

Abstract

In this paper, a number of language model training techniques will be examined and utilized in a large vocabulary continuous speech recognition system for the Serbian language (more than 120000 words), namely Mikolov and Yandex RNNLM, TensorFlow based GPU approaches and CUED-RNNLM approach. The baseline acoustic model is a chain sub-sampled time delayed neural network, trained using cross-entropy training and a sequence-level objective function on a database of about 200 h of speech. The baseline language model is a 3-gram model trained on the training part of the database transcriptions and the Serbian journalistic corpus (about 600000 utterances), using the SRILM toolkit and the Kneser-Ney smoothing method, with a pruning value of 10⁻⁷ (previous best). The results are analyzed in terms of word and character error rates and the perplexity of a given language model on training and validation sets. Relative improvement of 22.4% (best word error rate of 7.25%) is obtained in comparison to the baseline language model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodman, J.T.: A bit of progress in language modeling, extended version. Microsoft Research, Technical report, MSR-TR-2001-72 (2001)
Google Scholar
Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88, 1270–1278 (2000)
Article Google Scholar
Pakoci, E., Popović, B., Pekar, D.: Language model optimization for a deep neural network based speech recognition system for Serbian. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 483–492. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_48
Chapter Google Scholar
Mulder, W.D., Bethard, S., Moens, M.F.: A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 30(1), 61–98 (2015)
Article Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Černocký, J.H., Khudanpur, S.: Extensions of recurrent neural network language model. In: Proceedings of ICASSP, pp. 5528–5531. IEEE (2011)
Google Scholar
Popović, B., Pakoci, E., Pekar, D.: End-to-end large vocabulary speech recognition for the Serbian language. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 343–352. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_33
Chapter Google Scholar
Pakoci, E., Popović, B., Pekar, D.: Fast sequence-trained deep neural network models for Serbian speech recognition. In: 11th Digital Speech and Image Processing, DOGS, Novi Sad, Serbia, pp. 25–28 (2017)
Google Scholar
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Černocký, J.H.: RNNLM - recurrent neural network language modeling toolkit. In: Procedings of ASRU Workshop (2011)
Google Scholar
Mikolov, T., Chen K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space, arXiv:1301.3781 (2013)
Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems, Chicago, pp. 693–701 (2011)
Google Scholar
Chen, X., Liu, X., Gales, M.J.F., Woodland, P.C.: Recurrent neural network language model training with noise contrastive estimation for speech recognition. In: Proceedings of ICASSP, pp. 5411–5415. IEEE (2015)
Google Scholar
Abadi, M: TensorFlow: large-scale machine learning on heterogeneous distributed systems, arXiv:1603.04467 (2016)
Chen, X., Liu, X., Qian, Y., Gales, M.J.F., Woodland P.C.: CUED-RNNLM – an open-source toolkit for efficient training and evaluation of recurrent neural network language models. In: Proceedings of ICASSP, pp. 6000–6004. IEEE (2015)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 1–4. IEEE Signal Processing Society (2011)
Google Scholar
Xu, H., et al.: A pruned RNNLM lattice-rescoring algorithm for automatic speech recognition (2017)
Google Scholar

Download references

Acknowledgments

The work described in this paper was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia, within the project “Development of Dialogue Systems for Serbian and Other South Slavic Languages”, EUREKA project DANSPLAT, “A Platform for the Applications of Speech Technologies on Smartphones for the Languages of the Danube Region”, id E! 9944, and the Provincial Secretariat for Higher Education and Scientific Research, within the project “Central Audio-Library of the University of Novi Sad”, No. 114-451-2570/2016-02.

Author information

Authors and Affiliations

Department for Power, Electronic and Telecommunication Engineering, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, 21000, Novi Sad, Serbia
Branislav Popović, Edvin Pakoci & Darko Pekar
AlfaNum Speech Technologies, Bulevar Vojvode Stepe 40, 21000, Novi Sad, Serbia
Edvin Pakoci & Darko Pekar
Department for Music Production and Sound Design, Academy of Arts, Alfa BK University, Nemanjina 28, 11000, Belgrade, Serbia
Branislav Popović
Computer Programming Agency Code85 Odžaci, Ive Andrića 1A, 25250, Odžaci, Serbia
Branislav Popović

Authors

Branislav Popović
View author publications
You can also search for this author in PubMed Google Scholar
Edvin Pakoci
View author publications
You can also search for this author in PubMed Google Scholar
Darko Pekar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Branislav Popović .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popović, B., Pakoci, E., Pekar, D. (2018). A Comparison of Language Model Training Techniques in a Continuous Speech Recognition System for Serbian. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_54
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics