Skip to main content
Log in

Global context-dependent recurrent neural network language model with sparse feature learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recurrent neural network language models (RNNLMs) are an important type of language model. In recent years, context-dependent RNNLMs are the most widely used ones as they apply additional information summarized from other sequences to access the larger context. However, when the sequences are mutually independent or randomly shuffled, these models cannot learn useful additional information, resulting in no larger context taken into account. In order to ensure that the model can obtain more contextual information in any case, a new language model is proposed in this paper. It can capture the global context just by the words within the current sequences, incorporating all the preceding and following words of target, without resorting to additional information summarized from other sequences. This model includes two main modules: a recurrent global context module used for extracting the global contextual information of the target and a sparse feature learning module that learns the sparse features of all the possible output words to distinguish the target word from others at the output layer. The proposed model was tested on three language modeling tasks. Experimental results show that it improves the perplexity of the model, speeds up the convergence of the network and learns better word embeddings compared with other language models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://www.microsoft.com/en-us/research/project/msr-sentence-completion-challenge/.

References

  1. Bengio Y, Schwenk H, Sencal JS, Morin F, Gauvain JL (2003) A neural probabilistic language model. J Mach Learn Res 3(6):1137–1155

    Google Scholar 

  2. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  3. Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1997) Class-based n -gram models of natural language. Comput Linguist 18(4):467–479

    Google Scholar 

  4. Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T (2013) One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint; arXiv:1312.3005

  5. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  6. Federico M (1996) Bayesian estimation methods for n-gram language model adaptation. In: Proceedings of the international conference on spoken language, Icslp 96. vol 1, pp 240–243

  7. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  8. Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143

    MathSciNet  MATH  Google Scholar 

  9. Graves A (2013) Generating sequences with recurrent neural networks. arXiv preprint; arXiv:1308.0850

  10. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  11. Jozefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv preprint; arXiv:1602.02410

  12. Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv preprint; arXiv:1508.06615

  13. Kneser R, Ney H (1995) Improved backing-off for n-gram language modeling

  14. Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area v2. In: Advances in neural information processing systems, pp 873–880

  15. Liu X, Chen X, Gales M, Woodland P (2015) Paraphrastic recurrent neural network language models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, pp 5406–5410. IEEE

  16. Mahoney M (2009) Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. html

  17. Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the Penn treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  18. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint; arXiv:1301.3781

  19. Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH, vol 2, p 3

  20. Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: SLT, pp 234–239

  21. Niesler TR, Woodland PC (1996) A variable-length category-based n-gram language model. In: ICASSP, pp 164–167

  22. Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint; arXiv:1312.6026

  23. Peng X, Yu Z, Yi Z, Tang H (2016) Constructing the L2-graph for robust subspace learning and subspace clustering. IEEE Trans Cybern 99: 1–14. 10.1109/TCYB.2016.2536752

  24. Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint; arXiv:1312.6120

  25. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  26. Sukhbaatar S, Weston J, Fergus R et al (2015) End-to-end memory networks. In: Advances in neural information processing systems, pp 2431–2439

  27. Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: INTERSPEECH, pp 194–197

  28. Team TD, Alrfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, Bastien F, Bayer J, Belikov A (2016) Theano: a python framework for fast computation of mathematical expressions

  29. Tomáš M (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012

  30. Wang T, Cho K (2015) Larger-context language modelling. arXiv preprint; arXiv:1511.03729

  31. Xiong D, Zhang M, Li H (2011) Enhancing language models in statistical machine translation with backward n-grams and mutual information triggers. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1, pp 1288–1297. Association for Computational Linguistics

  32. Yamamoto H, Isogai S, Sagisaka Y (2003) Multi-class composite n-gram language model. Syst Comput Jpn 34(7):108–114

    Article  Google Scholar 

  33. Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint; arXiv:1409.2329

  34. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint; arXiv:1212.5701

  35. Zhang S, Jiang H, Wei S, Dai L (2015) Feedforward sequential memory neural networks without recurrent feedback. arXiv preprint; arXiv:1510.02693

  36. Zhang S, Jiang H, Xu M, Hou J, Dai L (2015) The fixed-size ordinally-forgetting encoding method for neural network language models. Short Papers 2: 495

  37. Zhen L, Peng D, Yi Z, Xiang Y, Chen P (2016) Underdetermined blind source separation using sparse coding. IEEE Trans Neural Netw Learn Syst 99: 1–7. doi:10.1109/TNNLS.2016.2610960

Download references

Acknowledgements

This work was supported by Fok Ying Tung Education Foundation (Grant 151068); National Natural Science Foundation of China (Grants 61332002); and Foundation for Youth Science and Technology Innovation Research Team of Sichuan Province (Grants 2016TD0018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflicts of interest to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deng, H., Zhang, L. & Wang, L. Global context-dependent recurrent neural network language model with sparse feature learning. Neural Comput & Applic 31 (Suppl 2), 999–1011 (2019). https://doi.org/10.1007/s00521-017-3065-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3065-x

Keywords

Navigation