Skip to main content
Log in

Enhancing sentence embedding with dynamic interaction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Sentence embedding is a powerful tool in many natural language processing subfields, such as sentiment analysis, natural language inference and questions classification. However, previous work just integrates the final states, which are the output of encoder of multiple-layer architecture, with average pooling or max pooling as the final sentence representation. Average pooling is simple and fast for summarizing the overall meaning of sentences, but it may ignore some significant latent semantic features considering that information is flowing through the multiple layers. In this paper, we propose a new dynamic interaction method for improving the final sentence representation. It aims to make the states of the last layer more conducive to the next classification layer by introducing some constraint from the states of the previous layers. The constraint is the product of dynamic interaction between states of intermediate layers and states of the upper-most layer. Experiments can surpass prior state-of-the-art sentence embedding methods on 4 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: EMNLP, pp 632–642

  2. Chen Q, Ling Z-H, Zhu X (2018) Enhancing sentence embedding with generalized pooling. In: COLING, pp 1815–1826

  3. Chen Q, Zhu X, Ling Z-H, Wei S, Jiang H, Inkpen D (2017) Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. Association for Computational Linguistics, pp 1657–1668

  4. Conneau A, Kiela D, Schwenk H, LBarrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. In: EMNLP, pp 670–680

  5. Ding Z, Xia R, Yu J, Li X, Yang J (2018) Densely connected bidirectional LSTM with applications to sentence classification. In: NLPCC, pp 278–287

  6. Gao J, Duh K, Liu X, Shen Y (2018) Stochastic answer networks for machine reading comprehension. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. Association for Computational Linguistics, pp 1694–1704

  7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778

  8. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: ECCV, pp 630–645

  9. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  10. Hu M, Liu B (2014) Mining and summarizing customer reviews In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 168–177

  11. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, pp 2261–2269

  12. Irsoy O, Cardie C (2014) Deep recursive neural networks for compositionality in language. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp 2096–2104

  13. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: long papers, vol 1. The Association for Computer Linguistics, pp 655–665

  14. Kim S, Hong J-H, Kang I, Kwak N (2018) Semantic sentence matching with densely-connected recurrent and co-attentive information. arXiv:180511360

  15. Kim Y (2014) Convolutional neural networks for sentence classification. In: EMNLP, pp 1746–1751

  16. Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 3294– 3302

  17. Li X, Roth D (2002) Learning question classifiers. In: COLING

  18. Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:170303130

  19. Liu Y, Sun C, Lin L, Wang X (2016) Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:160509090

  20. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pp 3111–3119

  21. Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: short papers, vol 2. The Association for Computer Linguistics

  22. Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp 271–278

  23. Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics, pp 115–124

  24. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1532–1543

  25. Qian Q, Huang M, Zhu X (2016) Linguistically regularized lstms for sentiment classification. arXiv:161103949

  26. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 3859–3869

  27. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  28. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence

  29. Socher R, Lin CC-Y, Ng AY, Manning CD (2011) Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine L earning, pp 129–136

  30. Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48:3797–3806

    Article  Google Scholar 

  31. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp 1631–1642

  32. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, pp 1631–1642

  33. Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:150500387

  34. Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: long papers, vol 1. The Association for Computer Linguistics, pp 1556–1566

  35. Turian JP, Ratinov L-A, Bengio Y (2010) Word representations: A simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 384–394

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 6000–6010

  37. Yang Z, Yang D, Dyer C, He X, Smola AJ, Hovy EH (2016) Hierarchical attention networks for document classification. In: NAACL, pp 1480–1489

  38. Yoon D, Lee D, Lee S (2018) Dynamic self-attention : Computing attention over words dynamically for sentence embedding. arXiv:180807383

  39. Zhang X, Zhao JJ, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 649–657

  40. Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING, pp 3485–3495

Download references

Acknowledgments

We thank the support from National Natural Science Foundation of China (Nos. 11771152); Science and Technology Foundation of Guangdong Province (Nos. 2015B010128008 & 2015B010109006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongjun Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, J., Li, Y., Sun, Q. et al. Enhancing sentence embedding with dynamic interaction. Appl Intell 49, 3283–3292 (2019). https://doi.org/10.1007/s10489-019-01456-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01456-x

Keywords

Navigation