Abstract
Language model (LM) plays an essential role in natural language processing tasks. Given the context, the language model can predict the next word. However, when the history becomes longer, the single hidden vector may be not big enough to store the entire information. In this paper, we propose a deep attentive structured language model (DAS LM), which extends the Long Short-Term Memory (LSTM) neural network with the attention mechanism. With the alternative input of part of speech (POS) tags, the language model is capable of extracting relations between a word and its context. Our model is evaluated on Penn Treebank, Chinese short message and Swb-Fisher corpora. The experiments in language modeling show that our model achieves significant improvements compared to the conventional LSTM language model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The LSTMN language model is implemented by us. The PPL score in the original paper is 108.
References
Bengio, Y., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(February), 1137–1155 (2003)
Mikolov, T., et al.: Recurrent neural network based language model. In: Interspeech, vol. 2 (2010)
Mikolov, T., et al.: Extensions of recurrent neural network language model. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Sundermeyer, M., Schlter, R., Ney, H.: LSTM neural networks for language modeling. In: Interspeech (2012)
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733 (2016)
Chelba, C., Jelinek, F.: Structured language modeling for speech recognition. arXiv preprint cs/0001023 (2000)
Hochreiter, S., et al.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (2001)
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)
Sukhbaatar, S., Weston, J., Fergus, R.: End-to-end memory networks. In: Advances in Neural Information Processing Systems (2015)
Cho, K., Courville, A., Bengio, Y.: Describing multimedia content using attention-based encoder-decoder networks. IEEE Trans. Multimedia 17(11), 1875–1886 (2015)
Feng, S., Liu, S., Li, M., et al.: Implicit distortion and fertility models for attention-based encoder-decoder NMT model. arXiv preprint arXiv:1601.03317 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Part of speech. https://en.wikipedia.org/wiki/Part_of_speech
Wang, P., et al.: A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. arXiv preprint arXiv:1511.00215 (2015)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
The Stanford Natural Language Processing Group. https://nlp.stanford.edu/software/stanford-dependencies.shtml
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Acknowledgement
This work was supported by the Shanghai Sailing Program No. 16YF1405300, the China NSFC projects (Nos. 61573241 and 61603252) and the Interdisciplinary Program (14JCZ03) of Shanghai Jiao Tong University in China. Experiments have been carried out on the PI supercomputer at Shanghai Jiao Tong University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cao, D., Yu, K. (2017). Deep Attentive Structured Language Model Based on LSTM. In: Sun, Y., Lu, H., Zhang, L., Yang, J., Huang, H. (eds) Intelligence Science and Big Data Engineering. IScIDE 2017. Lecture Notes in Computer Science(), vol 10559. Springer, Cham. https://doi.org/10.1007/978-3-319-67777-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-67777-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67776-7
Online ISBN: 978-3-319-67777-4
eBook Packages: Computer ScienceComputer Science (R0)