Abstract
Sentence embedding is a powerful tool in many natural language processing subfields, such as sentiment analysis, natural language inference and questions classification. However, previous work just integrates the final states, which are the output of encoder of multiple-layer architecture, with average pooling or max pooling as the final sentence representation. Average pooling is simple and fast for summarizing the overall meaning of sentences, but it may ignore some significant latent semantic features considering that information is flowing through the multiple layers. In this paper, we propose a new dynamic interaction method for improving the final sentence representation. It aims to make the states of the last layer more conducive to the next classification layer by introducing some constraint from the states of the previous layers. The constraint is the product of dynamic interaction between states of intermediate layers and states of the upper-most layer. Experiments can surpass prior state-of-the-art sentence embedding methods on 4 datasets.
Similar content being viewed by others
References
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: EMNLP, pp 632–642
Chen Q, Ling Z-H, Zhu X (2018) Enhancing sentence embedding with generalized pooling. In: COLING, pp 1815–1826
Chen Q, Zhu X, Ling Z-H, Wei S, Jiang H, Inkpen D (2017) Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. Association for Computational Linguistics, pp 1657–1668
Conneau A, Kiela D, Schwenk H, LBarrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. In: EMNLP, pp 670–680
Ding Z, Xia R, Yu J, Li X, Yang J (2018) Densely connected bidirectional LSTM with applications to sentence classification. In: NLPCC, pp 278–287
Gao J, Duh K, Liu X, Shen Y (2018) Stochastic answer networks for machine reading comprehension. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. Association for Computational Linguistics, pp 1694–1704
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: ECCV, pp 630–645
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hu M, Liu B (2014) Mining and summarizing customer reviews In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 168–177
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, pp 2261–2269
Irsoy O, Cardie C (2014) Deep recursive neural networks for compositionality in language. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp 2096–2104
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. The Association for Computer Linguistics, pp 655–665
Kim S, Hong J-H, Kang I, Kwak N (2018) Semantic sentence matching with densely-connected recurrent and co-attentive information. arXiv:180511360
Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP, pp 1746–1751
Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 3294– 3302
Li X, Roth D (2002) Learning question classifiers. In: COLING
Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:170303130
Liu Y, Sun C, Lin L, Wang X (2016) Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:160509090
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pp 3111–3119
Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: short papers, vol 2. The Association for Computer Linguistics
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp 271–278
Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics, pp 115–124
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1532–1543
Qian Q, Huang M, Zhu X (2016) Linguistically regularized lstms for sentiment classification. arXiv:161103949
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 3859–3869
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. Trans Signal Process 45(11):2673–2681
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence
Socher R, Lin CC-Y, Ng AY, Manning CD (2011) Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine L earning, pp 129–136
Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48:3797–3806
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp 1631–1642
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp 1631–1642
Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:150500387
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: long papers, vol 1. The Association for Computer Linguistics, pp 1556–1566
Turian JP, Ratinov L-A, Bengio Y (2010) Word representations: A simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 384–394
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 6000–6010
Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: NAACL, pp 1480–1489
Yoon D, Lee D, Lee S (2018) Dynamic self-attention : Computing attention over words dynamically for sentence embedding. arXiv:180807383
Zhang X, Zhao JJ, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 649–657
Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING, pp 3485–3495
Acknowledgments
We thank the support from National Natural Science Foundation of China (Nos. 11771152); Science and Technology Foundation of Guangdong Province (Nos. 2015B010128008 & 2015B010109006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xie, J., Li, Y., Sun, Q. et al. Enhancing sentence embedding with dynamic interaction. Appl Intell 49, 3283–3292 (2019). https://doi.org/10.1007/s10489-019-01456-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01456-x