Abstract
In this paper, we propose a bi-attention, a multi-layer attention and an attention mechanism and convolution neural network based text representation and classification model (ACNN). The bi-attention have two attention mechanism to learn two context vectors, forward RNN with attention to learn forward context vector \(\overrightarrow {\mathbf {c}}\) and backward RNN with attention to learn backward context vector \(\overleftarrow {\mathbf {c}}\), and then concatenation \(\overrightarrow {\mathbf {c}}\) and \(\overleftarrow {\mathbf {c}}\) to get context vector c. The multi-layer attention is the stack of the bi-attention. In the ACNN, the context vector c is obtained by the bi-attention, then the convolution operation is performed on the context vector c, and the max-pooling operation is used to reduce the dimension. After max-pooling operation the text is converted to low-dimensional sentence vector m. Finally, the Softmax classifier be used for text classification. We test our model on 8 benchmarks text classification datasets, and our model achieved a better or the same performance compare with the state-of-the-art methods.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Arora S, Liang Y, Ma T (2017) A simple but tough-to-beat baseline for sentence embeddings
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:14090473
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:14061078
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv:170502364
Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems, pp 3079–3087
Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. arXiv:160203483
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 168–177
Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: ACL. ACL, pp 873–882
Johnson R, Zhang T (2014) Effective use of word order for text categorization with convolutional neural networks. arXiv:14121058
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. arXiv:14042188
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:14085882
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv:14126980
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302
Lai A, Hockenmaier J (2014) Illinois-lh: a denotational and distributional approach to semantics. In: SemEval@ COLING, pp 329–334
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: AAAI, vol 333, pp 2267–2273
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 1188–1196
Li X, Roth D (2002) Learning question classifiers. In: COLING. ACL, pp 1–7
Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:170303130
Maas A, Daly R, Pham P, Huang D, Ng A, Potts C (2011) Learning word vectors for sentiment analysis. In: ACL. ACL, pp 142–150
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv:13013781
Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the 24th international conference on machine learning. ACM, pp 641–648
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL. ACL, p 271
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL. ACL, pp 115–124
Pang B, Lee L et al (2008) Opinion mining and sentiment analysis. Foundations and Trends®;, in Information Retrieval 2(1–2):1–135
Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In: Learning for text categorization: papers from the 1998 workshop, vol 62, pp 98–105
Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C et al (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), vol 1631, pp 1642
Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv:150300075
Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: EMNLP, pp 1422–1432
Wang S, Manning C (2013) Fast dropout training. In: ICML, pp 118–126
Wang S, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual meeting of the association for computational linguistics: short papers, vol 2. Association for Computational Linguistics, pp 90–94
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2):165–210
Wieting J, Bansal M, Gimpel K, Livescu K (2015) Towards universal paraphrastic sentence embeddings. arXiv:151108198
Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: HLT-NAACL, pp 1480–1489
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Zhao H, Lu Z, Poupart P (2015) Self-adaptive hierarchical sentence model. In: IJCAI, pp 4069–4076
Acknowledgements
This research was supported by the National Natural Science Foundation of China [NSFC61572005], the Fundamental Research Funds for the Central Universities [2016JBM080], and Key Projects of Science and Technology Research of Hebei Province Higher Education [ZD2-017304].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, T., Yu, S., Xu, B. et al. Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48, 3797–3806 (2018). https://doi.org/10.1007/s10489-018-1176-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1176-4