ABSTRACT
Recurrent Neural Networks~(RNNs), such as Long Short-Term Memory~(LSTM) and Gated Recurrent Unit~(GRU), have been widely utilized in sequence representation. However, RNNs neglect variational information and long-term dependency. In this paper, we propose a new neural network structure for extracting a comprehension sequence embedding by handling the entire representation of the sequence. Unlike previous works that put attention mechanism after all steps of GRU, we add the entire representation to the input of the GRU which means the GRU model takes the entire information of the sequence into consideration in every step. We provide three various strategies to adding the entire information which are the Convolutional Neural Network~(CNN) based attentive GRU~(CBAG), the GRU inner attentive GRU~(GIAG) and the pre-trained GRU inner attentive GRU~(Pre-GIAG). To evaluate our proposed methods, we conduct extensive experiments on a benchmark sentiment classification dataset. Our experimental results show that our models outperform state-of-the-art baselines significantly.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin . 2003. A neural probabilistic language model. JMLR Vol. 3, Feb (2003), 1137--1155. Google ScholarDigital Library
- Minmin Chen . 2017. Efficient vector representation for documents through corruption. arXiv preprint arXiv:1707.02377 (2017).Google Scholar
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google Scholar
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio . 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google Scholar
- Cıcero Nogueira dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou . 2016. Attentive pooling networks. CoRR, abs/1602.03609 (2016).Google Scholar
- Felix Hill, Kyunghyun Cho, and Anna Korhonen . 2016. Learning distributed representations of sentences from unlabelled data. arXiv preprint arXiv:1602.03483 (2016).Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber . 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom . 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014).Google Scholar
- Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler . 2015. Skip-thought vectors. In NIPS. 3294--3302. Google ScholarDigital Library
- Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher . 2016. Ask me anything: Dynamic memory networks for natural language processing ICML. 1378--1387. Google ScholarDigital Library
- Quoc Le and Tomas Mikolov . 2014. Distributed representations of sentences and documents ICML. 1188--1196. Google ScholarDigital Library
- Ji Young Lee and Franck Dernoncourt . 2016. Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827 (2016).Google Scholar
- Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio . 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).Google Scholar
- Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts . 2011. Learning word vectors for sentiment analysis. In ACL. 142--150. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur . 2010. Recurrent neural network based language model.. In Interspeech, Vol. Vol. 2. 3.Google ScholarCross Ref
- Tim Rockt"aschel, Edward Grefenstette, Karl Moritz Hermann, Tomávs Kovciskỳ, and Phil Blunsom . 2015. Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664 (2015).Google Scholar
- Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et almbox. . 2015. End-to-end memory networks. In NIPS. 2440--2448. Google ScholarDigital Library
- Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou . 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015).Google Scholar
- Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol . 2008. Extracting and composing robust features with denoising autoencoders ICML. ACM, 1096--1103. Google ScholarDigital Library
- Bingning Wang, Kang Liu, and Jun Zhao . 2016. Inner attention based recurrent neural networks for answer selection ACL.Google Scholar
- Sida Wang and Christopher D Manning . 2012. Baselines and bigrams: Simple, good sentiment and topic classification ACL. 90--94. Google ScholarDigital Library
- Jason Weston, Sumit Chopra, and Antoine Bordes . 2014. Memory networks. arXiv preprint arXiv:1410.3916 (2014).Google Scholar
Index Terms
Entire Information Attentive GRU for Text Representation
Recommendations
Network text sentiment analysis method combining LDA text representation and GRU-CNN
In order to improve the performance of internet public sentiment analysis, a text sentiment analysis method combining Latent Dirichlet Allocation (LDA) text representation and convolutional neural network (CNN) is proposed. First, the review texts are ...
An efficient two-state GRU based on feature attention mechanism for sentiment analysis
AbstractSentiment analysis is one of the most challenging tasks in natural language processing (NLP). The extensively used application of sentiment analysis is sentiment classification of reviews. The purpose of sentiment classification is to determine ...
DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysRecurrent Neural Networks (RNNs) are widely used in speech recognition and natural language processing applications because of their capability to process temporal sequences. Because RNNs are fully connected, they require a large number of weight memory ...
Comments