Global context-dependent recurrent neural network language model with sparse feature learning

Deng, Hongli; Zhang, Lei; Wang, Lituan

doi:10.1007/s00521-017-3065-x

Global context-dependent recurrent neural network language model with sparse feature learning

Original Article
Published: 21 June 2017

Volume 31, pages 999–1011, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Hongli Deng^1,2,
Lei Zhang¹ &
Lituan Wang¹

389 Accesses
8 Citations
Explore all metrics

Abstract

Recurrent neural network language models (RNNLMs) are an important type of language model. In recent years, context-dependent RNNLMs are the most widely used ones as they apply additional information summarized from other sequences to access the larger context. However, when the sequences are mutually independent or randomly shuffled, these models cannot learn useful additional information, resulting in no larger context taken into account. In order to ensure that the model can obtain more contextual information in any case, a new language model is proposed in this paper. It can capture the global context just by the words within the current sequences, incorporating all the preceding and following words of target, without resorting to additional information summarized from other sequences. This model includes two main modules: a recurrent global context module used for extracting the global contextual information of the target and a sparse feature learning module that learns the sparse features of all the possible output words to distinguish the target word from others at the output layer. The proposed model was tested on three language modeling tasks. Experimental results show that it improves the perplexity of the model, speeds up the convergence of the network and learns better word embeddings compared with other language models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Word Representation for RNN Language Models on Cellphones

Statistical Machine Translation Context Modelling with Recurrent Neural Network and LDA

K-Component Adaptive Recurrent Neural Network Language Models

Notes

https://www.microsoft.com/en-us/research/project/msr-sentence-completion-challenge/.

References

Bengio Y, Schwenk H, Sencal JS, Morin F, Gauvain JL (2003) A neural probabilistic language model. J Mach Learn Res 3(6):1137–1155
Google Scholar
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1997) Class-based n -gram models of natural language. Comput Linguist 18(4):467–479
Google Scholar
Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T (2013) One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint; arXiv:1312.3005
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
MATH Google Scholar
Federico M (1996) Bayesian estimation methods for n-gram language model adaptation. In: Proceedings of the international conference on spoken language, Icslp 96. vol 1, pp 240–243
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471
Article Google Scholar
Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143
MathSciNet MATH Google Scholar
Graves A (2013) Generating sequences with recurrent neural networks. arXiv preprint; arXiv:1308.0850
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Jozefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv preprint; arXiv:1602.02410
Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv preprint; arXiv:1508.06615
Kneser R, Ney H (1995) Improved backing-off for n-gram language modeling
Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area v2. In: Advances in neural information processing systems, pp 873–880
Liu X, Chen X, Gales M, Woodland P (2015) Paraphrastic recurrent neural network language models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, pp 5406–5410. IEEE
Mahoney M (2009) Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. html
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the Penn treebank. Comput Linguist 19(2):313–330
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint; arXiv:1301.3781
Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH, vol 2, p 3
Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: SLT, pp 234–239
Niesler TR, Woodland PC (1996) A variable-length category-based n-gram language model. In: ICASSP, pp 164–167
Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint; arXiv:1312.6026
Peng X, Yu Z, Yi Z, Tang H (2016) Constructing the L2-graph for robust subspace learning and subspace clustering. IEEE Trans Cybern 99: 1–14. 10.1109/TCYB.2016.2536752
Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint; arXiv:1312.6120
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Sukhbaatar S, Weston J, Fergus R et al (2015) End-to-end memory networks. In: Advances in neural information processing systems, pp 2431–2439
Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: INTERSPEECH, pp 194–197
Team TD, Alrfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, Bastien F, Bayer J, Belikov A (2016) Theano: a python framework for fast computation of mathematical expressions
Tomáš M (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012
Wang T, Cho K (2015) Larger-context language modelling. arXiv preprint; arXiv:1511.03729
Xiong D, Zhang M, Li H (2011) Enhancing language models in statistical machine translation with backward n-grams and mutual information triggers. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1, pp 1288–1297. Association for Computational Linguistics
Yamamoto H, Isogai S, Sagisaka Y (2003) Multi-class composite n-gram language model. Syst Comput Jpn 34(7):108–114
Article Google Scholar
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint; arXiv:1409.2329
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint; arXiv:1212.5701
Zhang S, Jiang H, Wei S, Dai L (2015) Feedforward sequential memory neural networks without recurrent feedback. arXiv preprint; arXiv:1510.02693
Zhang S, Jiang H, Xu M, Hou J, Dai L (2015) The fixed-size ordinally-forgetting encoding method for neural network language models. Short Papers 2: 495
Zhen L, Peng D, Yi Z, Xiang Y, Chen P (2016) Underdetermined blind source separation using sparse coding. IEEE Trans Neural Netw Learn Syst 99: 1–7. doi:10.1109/TNNLS.2016.2610960

Download references

Acknowledgements

This work was supported by Fok Ying Tung Education Foundation (Grant 151068); National Natural Science Foundation of China (Grants 61332002); and Foundation for Youth Science and Technology Innovation Research Team of Sichuan Province (Grants 2016TD0018).

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, China
Hongli Deng, Lei Zhang & Lituan Wang
Education and Information Technology Center, China West Normal University, Nanchong, 637002, China
Hongli Deng

Authors

Hongli Deng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lituan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflicts of interest to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, H., Zhang, L. & Wang, L. Global context-dependent recurrent neural network language model with sparse feature learning. Neural Comput & Applic 31 (Suppl 2), 999–1011 (2019). https://doi.org/10.1007/s00521-017-3065-x

Download citation

Received: 11 December 2016
Accepted: 13 June 2017
Published: 21 June 2017
Issue Date: 13 February 2019
DOI: https://doi.org/10.1007/s00521-017-3065-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global context-dependent recurrent neural network language model with sparse feature learning

Abstract

Access this article

Similar content being viewed by others

Sparse Word Representation for RNN Language Models on Cellphones

Statistical Machine Translation Context Modelling with Recurrent Neural Network and LDA

K-Component Adaptive Recurrent Neural Network Language Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Global context-dependent recurrent neural network language model with sparse feature learning

Abstract

Access this article

Similar content being viewed by others

Sparse Word Representation for RNN Language Models on Cellphones

Statistical Machine Translation Context Modelling with Recurrent Neural Network and LDA

K-Component Adaptive Recurrent Neural Network Language Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation