Skip to main content
Log in

Deep bi-directional interaction network for sentence matching

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The goal of sentence matching is to determine the semantic relation between two sentences, which is the basis of many downstream tasks in natural language processing, such as question answering and information retrieval. Recent studies using attention mechanism to align the elements of two sentences have shown promising results in capturing semantic similarity/relevance. Most existing methods mainly focus on the design of multi-layer attention network, however, some critical issues have not been dealt with well: 1) the higher attention layer is easily affected by error propagation because it relies on the alignment results of preceding attentions; 2) models have the risk of losing low-layer semantic features with the increase of network depth; and 3) the approach of capturing global matching information brings about large computing complexity for model training. To this end, we propose a Deep Bi-Directional Interaction Network (DBDIN) to solve these issues, which captures semantic relatedness from two directions and each direction employs multiple attention-based interaction units. To be specific, the attention of each interaction unit will repeatedly focus on the original sentence representation of another one for semantic alignment, which alleviates the error propagation problem by attending to a fixed semantic representation. Then we design deep fusion to aggregate and propagate attention information from low layers to high layers, which effectively retains low-layer semantic features for subsequential interactions. Moreover, we introduce a self-attention mechanism at last to enhance global matching information with smaller model complexity. We conduct experiments on natural language inference and paraphrase identification tasks with three benchmark datasets SNLI, SciTail and Quora. Experimental results demonstrate that our proposed method can achieve significant improvements over baseline systems without using any external knowledge. Additionally, we conduct interpretable study to disclose how our deep interaction network with attention can benefit sentence matching, which provides a reference for future model design. Ablation studies and visualization analyses further verify that our model can better capture interactive information between two sentences, and the proposed components are indeed able to help modeling semantic relation more precisely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.nltk.org/

  2. https://pytorch.org/

References

  1. Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, pp 4144–4150

  2. Bowman S, Angeli G, Potts C, Manning C D (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 632–642

  3. Iftene A, Balahur-Dobrescu A (2007) Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Association for Computational Linguistics, pp 125–130

  4. Madnani N, Tetreault J, Chodorow M (2012) Re-examining machine translation metrics for paraphrase identification. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 182–190

  5. Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272

    Article  Google Scholar 

  6. Clark P, Etzioni O, Khot T, Sabharwal A, Tafjord O, Turney P, Khashabi D (2016) Combining retrieval, statistics, and inference to answer elementary science questions. In: Thirtieth AAAI Conference on Artificial Intelligence

  7. Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H (2020) Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Inf Sci 514:88–105

    Article  Google Scholar 

  8. Xiao L, Wissmann D, Brown M, Jablonski S (2004) Information extraction from the web: System and techniques. Appl Intell 21(2):195–224

    Article  Google Scholar 

  9. Androutsopoulos I, Malakasiotis P (2010) A survey of paraphrasing and textual entailment methods. J Artif Intell Res 38:135–187

    Article  Google Scholar 

  10. Liu Q, Huang Z, Huang Z, Liu C, Chen E, Su Y, Hu G (2018) Finding similar exercises in online education systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1821–1830

  11. OShea K (2012) An approach to conversational agent design using semantic sentence similarity. Appl Intell 37(4):558–568

    Article  Google Scholar 

  12. Gong Y, Luo H, Zhang J (2017) Natural language inference over interaction space. arXiv:1709.04348

  13. Liu P, Qiu X, Chen J, Huang X (August 2016) Deep fusion LSTMs for text semantic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin

  14. Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T (2018) Attention-fused deep matching network for natural language inference. In: IJCAI, pp 4033–4040

  15. Parikh A, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 2249–2255

  16. Rocktäschel T, Grefenstette E, Hermann K M, Kočiskỳ T, Blunsom P (2015) Reasoning about entailment with neural attention. arXiv:1509.06664

  17. Park C, Song H, Lee C (2020) S3-net: Sru-based sentence and self-matching networks for machine reading comprehension. ACM Trans Asian Low-Resource Lang Inf Process (TALLIP) 19(3):1–14

    Article  Google Scholar 

  18. Peng D, Wu S, Liu C (2019) Mpsc: A multiple-perspective semantics-crossover model for matching sentences. IEEE Access 7:61320–61330

    Article  Google Scholar 

  19. Pota M, Esposito M, Pietro G D, Fujita H (2020) Best practices of convolutional neural networks for question classification. Appl Sci 10(14):4710

    Article  Google Scholar 

  20. Tan M, Santos CD, Xiang B, Zhou B (2015) Lstm-based deep learning models for non-factoid answer selection. arXiv:1511.04108

  21. Chen Q, Zhu X, Ling Z-H, Wei S, Jiang H, Inkpen D (2017) Enhanced lstm for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1657–1668

  22. Ghaeini R, Hasan S A, Datla V, Liu J, Lee K, Qadir A, Ling Y, Prakash A, Fern X, Farri O (2018) Dr-bilstm: Dependent reading bidirectional lstm for natural language inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 1460–1469

  23. Bjerva J, Plank B, Bos J (2016) Semantic tagging with deep residual networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 3531–3541

  24. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  25. Heilman M, Smith N A (2010) Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 1011–1019

  26. Wang Z, Ittycheriah A (2015) Faq-based question answering via word alignment. arXiv:1507.02628

  27. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186

  28. Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 130–136

  29. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 2227–2237

  30. Wang S, Jiang J (2016) A compare-aggregate model for matching text sequences. arXiv:1611.01747

  31. Tai K S, Socher R, Manning C D (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 1556–1566

  32. Bowman S, Gauthier J, Rastogi A, Gupta R, Manning C D, Potts C (2016) A fast unified model for parsing and sentence understanding. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1466–1477

  33. Hermann K M, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp 1693–1701

  34. Wang S, Jiang J (2016) Learning natural language inference with lstm. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1442– 1451

  35. Yang R, Zhang J, Gao X, Ji F, Chen H (July 2019) Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence

  36. Im J, Cho S (2017) Distance-based self-attention network for natural language inference. arXiv:1712.02047

  37. Lin Z, Feng M, Santos CND, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:1703.03130

  38. Chen Q, Zhu X, Ling Z-H, Inkpen D, Wei S (2018) Neural natural language inference models enhanced with external knowledge. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2406–2417

  39. Pan B, Yang Y, Zhao Z, Zhuang Y, Cai D, He X (2018) Discourse marker augmented network with reinforcement learning for natural language inference. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 989–999

  40. Wang Y, Wang M, Fujita H (2020) Word sense disambiguation: A comprehensive knowledge exploitation framework. Knowl-Based Syst 190:105030

    Article  Google Scholar 

  41. Miller, George A (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  42. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  43. Chomsky N (1957) Syntactic structures. the hague: Mouton.. 1965. aspects of the theory of syntax. Cambridge, Mass.: MIT Press.(1981) Lectures on Government and Binding, Dordrecht: Foris. (1982) Some Concepts and Consequences of the Theory of Government and Binding. LI Monographs, vol 6, p 1–52

  44. Dowty D (2007) Compositionality as an empirical problem. Direct Compositional (14):23–101

  45. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  46. Dozat T, Manning C D (2016) Deep biaffine attention for neural dependency parsing. arXiv:1611.01734

  47. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp 315– 323

  48. Fan H, Zhou J (2018) Stacked latent attention for multimodal reasoning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  49. Srivastava R K, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387

  50. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  51. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734

  52. Tay Y, Tuan L A, Hui S C (2017) A compare-propagate architecture with alignment factorization for natural language inference. arXiv:1801.00102

  53. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  54. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448–456

  55. Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  56. Khot T, Sabharwal A, Clark P (2018) Scitail: A textual entailment dataset from science question answering. In: Thirty-Second AAAI Conference on Artificial Intelligence

  57. Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. Proc AAAI Conf Artif Intell 33:6586–6593. https://doi.org/10.1609/aaai.v33i01.33016586

    Google Scholar 

  58. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 551–561

  59. Sha L, Chang B, Sui Z, Li S (2016) Reading and thinking: Re-read lstm unit for textual entailment recognition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 2870–2879

  60. Paria B, Annervaz KM, Dukkipati A, Chatterjee A, Podder S (2016) A neural architecture mimicking humans end-to-end for natural language inference. arXiv:1611.04741

  61. Zhang Z, Wu Y, Li Z, Zhao H (2019) Explicit contextual semantics for text comprehension. In: Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33)

  62. Zhang Z, Wu Y, Zhao H, Li Z, Zhang S, Zhou X, Zhou X (2019) Semantics-aware bert for language understanding. arXiv:1909.02209

  63. Yin W, Roth D, Schütze H (2018) End-task oriented textual entailment via deep explorations of inter-sentence interactions. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 540–545

  64. Wang Z, Mi H, Ittycheriah A (2016) Sentence similarity learning by lexical decomposition and composition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 1340–1349

  65. Ding Y, Liu Y, Luan H, Sun M (2017) Visualizing and understanding neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1150–1159

  66. Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078

  67. Li J, Chen X, Hovy E, Jurafsky D (2016) Visualizing and understanding neural models in nlp. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 681–691

  68. Papineni K, Roukos S, Ward T, Zhu W J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of Meeting of the Association for Computational Linguistics

  69. Ganitkevitch J, Van Durme B, Callison-Burch C (2013) Ppdb: The paraphrase database. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 758–764

Download references

Funding

The research work descried in this paper has been supported by the National Nature Science Foundation of China (No. 61876198, 61976015, 61370130 and 61976016), and also supported by the Beijing Municipal Natural Science Foundation (No. 4172047), and the International Science and Technology Cooperation Program of China (No. K11F100010).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Mingtong Liu, Yujie Zhang, Jinan Xu and Yufeng Chen. The first draft of the manuscript was written by Mingtong Liu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yujie Zhang.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, M., Zhang, Y., Xu, J. et al. Deep bi-directional interaction network for sentence matching. Appl Intell 51, 4305–4329 (2021). https://doi.org/10.1007/s10489-020-02156-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02156-7

Keywords

Navigation