Abstract
Recurrent neural network language models (RNNLMs) are an important type of language model. In recent years, context-dependent RNNLMs are the most widely used ones as they apply additional information summarized from other sequences to access the larger context. However, when the sequences are mutually independent or randomly shuffled, these models cannot learn useful additional information, resulting in no larger context taken into account. In order to ensure that the model can obtain more contextual information in any case, a new language model is proposed in this paper. It can capture the global context just by the words within the current sequences, incorporating all the preceding and following words of target, without resorting to additional information summarized from other sequences. This model includes two main modules: a recurrent global context module used for extracting the global contextual information of the target and a sparse feature learning module that learns the sparse features of all the possible output words to distinguish the target word from others at the output layer. The proposed model was tested on three language modeling tasks. Experimental results show that it improves the perplexity of the model, speeds up the convergence of the network and learns better word embeddings compared with other language models.
Similar content being viewed by others
References
Bengio Y, Schwenk H, Sencal JS, Morin F, Gauvain JL (2003) A neural probabilistic language model. J Mach Learn Res 3(6):1137–1155
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1997) Class-based n -gram models of natural language. Comput Linguist 18(4):467–479
Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T (2013) One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint; arXiv:1312.3005
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Federico M (1996) Bayesian estimation methods for n-gram language model adaptation. In: Proceedings of the international conference on spoken language, Icslp 96. vol 1, pp 240–243
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471
Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143
Graves A (2013) Generating sequences with recurrent neural networks. arXiv preprint; arXiv:1308.0850
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Jozefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv preprint; arXiv:1602.02410
Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv preprint; arXiv:1508.06615
Kneser R, Ney H (1995) Improved backing-off for n-gram language modeling
Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area v2. In: Advances in neural information processing systems, pp 873–880
Liu X, Chen X, Gales M, Woodland P (2015) Paraphrastic recurrent neural network language models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, pp 5406–5410. IEEE
Mahoney M (2009) Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. html
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the Penn treebank. Comput Linguist 19(2):313–330
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint; arXiv:1301.3781
Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH, vol 2, p 3
Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: SLT, pp 234–239
Niesler TR, Woodland PC (1996) A variable-length category-based n-gram language model. In: ICASSP, pp 164–167
Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint; arXiv:1312.6026
Peng X, Yu Z, Yi Z, Tang H (2016) Constructing the L2-graph for robust subspace learning and subspace clustering. IEEE Trans Cybern 99: 1–14. 10.1109/TCYB.2016.2536752
Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint; arXiv:1312.6120
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sukhbaatar S, Weston J, Fergus R et al (2015) End-to-end memory networks. In: Advances in neural information processing systems, pp 2431–2439
Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: INTERSPEECH, pp 194–197
Team TD, Alrfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, Bastien F, Bayer J, Belikov A (2016) Theano: a python framework for fast computation of mathematical expressions
Tomáš M (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012
Wang T, Cho K (2015) Larger-context language modelling. arXiv preprint; arXiv:1511.03729
Xiong D, Zhang M, Li H (2011) Enhancing language models in statistical machine translation with backward n-grams and mutual information triggers. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1, pp 1288–1297. Association for Computational Linguistics
Yamamoto H, Isogai S, Sagisaka Y (2003) Multi-class composite n-gram language model. Syst Comput Jpn 34(7):108–114
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint; arXiv:1409.2329
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint; arXiv:1212.5701
Zhang S, Jiang H, Wei S, Dai L (2015) Feedforward sequential memory neural networks without recurrent feedback. arXiv preprint; arXiv:1510.02693
Zhang S, Jiang H, Xu M, Hou J, Dai L (2015) The fixed-size ordinally-forgetting encoding method for neural network language models. Short Papers 2: 495
Zhen L, Peng D, Yi Z, Xiang Y, Chen P (2016) Underdetermined blind source separation using sparse coding. IEEE Trans Neural Netw Learn Syst 99: 1–7. doi:10.1109/TNNLS.2016.2610960
Acknowledgements
This work was supported by Fok Ying Tung Education Foundation (Grant 151068); National Natural Science Foundation of China (Grants 61332002); and Foundation for Youth Science and Technology Innovation Research Team of Sichuan Province (Grants 2016TD0018).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflicts of interest to this work.
Rights and permissions
About this article
Cite this article
Deng, H., Zhang, L. & Wang, L. Global context-dependent recurrent neural network language model with sparse feature learning. Neural Comput & Applic 31 (Suppl 2), 999–1011 (2019). https://doi.org/10.1007/s00521-017-3065-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3065-x