skip to main content
10.1145/3234944.3234947acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
short-paper

Entire Information Attentive GRU for Text Representation

Published:10 September 2018Publication History

ABSTRACT

Recurrent Neural Networks~(RNNs), such as Long Short-Term Memory~(LSTM) and Gated Recurrent Unit~(GRU), have been widely utilized in sequence representation. However, RNNs neglect variational information and long-term dependency. In this paper, we propose a new neural network structure for extracting a comprehension sequence embedding by handling the entire representation of the sequence. Unlike previous works that put attention mechanism after all steps of GRU, we add the entire representation to the input of the GRU which means the GRU model takes the entire information of the sequence into consideration in every step. We provide three various strategies to adding the entire information which are the Convolutional Neural Network~(CNN) based attentive GRU~(CBAG), the GRU inner attentive GRU~(GIAG) and the pre-trained GRU inner attentive GRU~(Pre-GIAG). To evaluate our proposed methods, we conduct extensive experiments on a benchmark sentiment classification dataset. Our experimental results show that our models outperform state-of-the-art baselines significantly.

References

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  2. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin . 2003. A neural probabilistic language model. JMLR Vol. 3, Feb (2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Minmin Chen . 2017. Efficient vector representation for documents through corruption. arXiv preprint arXiv:1707.02377 (2017).Google ScholarGoogle Scholar
  4. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google ScholarGoogle Scholar
  5. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio . 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarGoogle Scholar
  6. Cıcero Nogueira dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou . 2016. Attentive pooling networks. CoRR, abs/1602.03609 (2016).Google ScholarGoogle Scholar
  7. Felix Hill, Kyunghyun Cho, and Anna Korhonen . 2016. Learning distributed representations of sentences from unlabelled data. arXiv preprint arXiv:1602.03483 (2016).Google ScholarGoogle Scholar
  8. Sepp Hochreiter and Jürgen Schmidhuber . 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom . 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014).Google ScholarGoogle Scholar
  10. Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler . 2015. Skip-thought vectors. In NIPS. 3294--3302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher . 2016. Ask me anything: Dynamic memory networks for natural language processing ICML. 1378--1387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Quoc Le and Tomas Mikolov . 2014. Distributed representations of sentences and documents ICML. 1188--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ji Young Lee and Franck Dernoncourt . 2016. Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827 (2016).Google ScholarGoogle Scholar
  14. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio . 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).Google ScholarGoogle Scholar
  15. Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts . 2011. Learning word vectors for sentiment analysis. In ACL. 142--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  17. Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur . 2010. Recurrent neural network based language model.. In Interspeech, Vol. Vol. 2. 3.Google ScholarGoogle ScholarCross RefCross Ref
  18. Tim Rockt"aschel, Edward Grefenstette, Karl Moritz Hermann, Tomávs Kovciskỳ, and Phil Blunsom . 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664 (2015).Google ScholarGoogle Scholar
  19. Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et almbox. . 2015. End-to-end memory networks. In NIPS. 2440--2448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou . 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015).Google ScholarGoogle Scholar
  21. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol . 2008. Extracting and composing robust features with denoising autoencoders ICML. ACM, 1096--1103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bingning Wang, Kang Liu, and Jun Zhao . 2016. Inner attention based recurrent neural networks for answer selection ACL.Google ScholarGoogle Scholar
  23. Sida Wang and Christopher D Manning . 2012. Baselines and bigrams: Simple, good sentiment and topic classification ACL. 90--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jason Weston, Sumit Chopra, and Antoine Bordes . 2014. Memory networks. arXiv preprint arXiv:1410.3916 (2014).Google ScholarGoogle Scholar

Index Terms

  1. Entire Information Attentive GRU for Text Representation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICTIR '18: Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval
            September 2018
            238 pages
            ISBN:9781450356565
            DOI:10.1145/3234944

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 September 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper

            Acceptance Rates

            ICTIR '18 Paper Acceptance Rate19of47submissions,40%Overall Acceptance Rate209of482submissions,43%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader