Enhancing the Recurrent Neural Networks with Positional Gates for Sentence Representation

Song, Yang; Hu, Wenxin; Chen, Qin; Hu, Qinmin; He, Liang

doi:10.1007/978-3-030-04167-0_46

Yang Song¹⁶,
Wenxin Hu¹⁶,
Qin Chen¹⁶,
Qinmin Hu¹⁶ &
…
Liang He¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11301))

Included in the following conference series:

International Conference on Neural Information Processing

3741 Accesses
1 Citations

Abstract

The recurrent neural networks (RNN) with attention mechanism have shown good performance for answer selection in recent years. Most previous attention mechanisms focus on generating the attentive weights after obtaining all the hidden states, while the contextual information from the other sentence is not well studied during the internal hidden state generation. In this paper, we propose a position gated RNN (PG-RNN) model, which merges the positional contextual information of the question words for the inner hidden state generation. Specifically, we first design a positional interaction monitor to detect and measure the positional influence of question word within answer sentence. Then we present a positional gating mechanism and embed it into RNN to automatically absorb the positional contextual information for the hidden state update. Experiments on two benchmark datasets, namely TREC-QA and WikiQA, show the great advantages of our proposed model. In particular, we achieve the new state-of-the-art performance on TREC-QA and WikiQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nlp.stanford.edu/data/glove.6B.zip.

References

Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Tan, M., dos Santos, C., Xiang, B., Zhou, B.: LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015)
Wang, B., Liu, K., Zhao, J.: Inner attention based recurrent neural networks for answer selection. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1288–1297 (2016)
Google Scholar
Wang, D., Nyberg, E.: A long short-term memory model for answer sentence selection in question answering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 707–712 (2015)
Google Scholar
dos Santos, C., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks. arXiv preprint arXiv:1602.03609 (2016)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Google Scholar
Lv, Y., Zhai, C.X.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306 (2009)
Google Scholar
Zhao, J., Huang, J.X., He, B.: CRTER: using cross terms to enhance probabilistic information retrieval. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155–164 (2011)
Google Scholar
Chen, Q., Hu, Q., Huang, J.X., He, L., An, W.: Enhancing recurrent neural networks with positional attention for question answering. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993–996. ACM (2017)
Google Scholar
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382 (2015)
Google Scholar
Zhao, Z., Lu, H., Zheng, V.W., Cai, D., He, X., Zhuang, Y.: Community-based question answering via asymmetric multi-faceted ranking network learning. In: AAAI, pp. 3532–3539 (2017)
Google Scholar
Fang, H., Wu, F., Zhao, Z., Duan, X., Zhuang, Y., Ester, M.: Community-based question answering via heterogeneous social network learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Zhang, X., Li, S., Sha, L., Wang, H.: Attentive interactive neural networks for answer selection in community question answering. In: AAAI, pp. 3525–3531 (2017)
Google Scholar
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning, pp. 2342–2350 (2015)
Google Scholar
Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: AAAI, pp. 2786–2792 (2016)
Google Scholar
Wang, M., Smith, N.A., Mitamura, T.: What is the jeopardy model? A quasi-synchronous grammar for QA. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 22–32 (2007)
Google Scholar
Yang, Y., Yih, W., Meek, C.: Wikiqa: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2013–2018 (2015)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Wang, Z., Ittycheriah, A.: FAQ-based question answering via word alignment. arXiv preprint arXiv:1507.02628 (2015)
Yin, W., Schütze, H., Xiang, B., Zhou, B.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. arXiv preprint arXiv:1512.05193 (2015)

Download references

Acknowledgements

We thank all viewers who provided the thoughtful and constructive comments on this paper. The second author is the corresponding author. This research is funded by the National Natural Science Foundation of China (No. 61572193). The computation is performed in the Supercomputer Center of East China Normal University.

Author information

Authors and Affiliations

School of Computer Science and Software Engineering, East China Normal University, Shanghai, 200241, China
Yang Song, Wenxin Hu, Qin Chen, Qinmin Hu & Liang He

Authors

Yang Song
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Qin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qinmin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Liang He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Hu .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, Y., Hu, W., Chen, Q., Hu, Q., He, L. (2018). Enhancing the Recurrent Neural Networks with Positional Gates for Sentence Representation. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11301. Springer, Cham. https://doi.org/10.1007/978-3-030-04167-0_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-04167-0_46
Published: 17 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04166-3
Online ISBN: 978-3-030-04167-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics