Abstract
The goal of sentence matching is to determine the semantic relation between two sentences, which is the basis of many downstream tasks in natural language processing, such as question answering and information retrieval. Recent studies using attention mechanism to align the elements of two sentences have shown promising results in capturing semantic similarity/relevance. Most existing methods mainly focus on the design of multi-layer attention network, however, some critical issues have not been dealt with well: 1) the higher attention layer is easily affected by error propagation because it relies on the alignment results of preceding attentions; 2) models have the risk of losing low-layer semantic features with the increase of network depth; and 3) the approach of capturing global matching information brings about large computing complexity for model training. To this end, we propose a Deep Bi-Directional Interaction Network (DBDIN) to solve these issues, which captures semantic relatedness from two directions and each direction employs multiple attention-based interaction units. To be specific, the attention of each interaction unit will repeatedly focus on the original sentence representation of another one for semantic alignment, which alleviates the error propagation problem by attending to a fixed semantic representation. Then we design deep fusion to aggregate and propagate attention information from low layers to high layers, which effectively retains low-layer semantic features for subsequential interactions. Moreover, we introduce a self-attention mechanism at last to enhance global matching information with smaller model complexity. We conduct experiments on natural language inference and paraphrase identification tasks with three benchmark datasets SNLI, SciTail and Quora. Experimental results demonstrate that our proposed method can achieve significant improvements over baseline systems without using any external knowledge. Additionally, we conduct interpretable study to disclose how our deep interaction network with attention can benefit sentence matching, which provides a reference for future model design. Ablation studies and visualization analyses further verify that our model can better capture interactive information between two sentences, and the proposed components are indeed able to help modeling semantic relation more precisely.
Similar content being viewed by others
References
Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, pp 4144–4150
Bowman S, Angeli G, Potts C, Manning C D (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 632–642
Iftene A, Balahur-Dobrescu A (2007) Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Association for Computational Linguistics, pp 125–130
Madnani N, Tetreault J, Chodorow M (2012) Re-examining machine translation metrics for paraphrase identification. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 182–190
Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272
Clark P, Etzioni O, Khot T, Sabharwal A, Tafjord O, Turney P, Khashabi D (2016) Combining retrieval, statistics, and inference to answer elementary science questions. In: Thirtieth AAAI Conference on Artificial Intelligence
Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H (2020) Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Inf Sci 514:88–105
Xiao L, Wissmann D, Brown M, Jablonski S (2004) Information extraction from the web: System and techniques. Appl Intell 21(2):195–224
Androutsopoulos I, Malakasiotis P (2010) A survey of paraphrasing and textual entailment methods. J Artif Intell Res 38:135–187
Liu Q, Huang Z, Huang Z, Liu C, Chen E, Su Y, Hu G (2018) Finding similar exercises in online education systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1821–1830
OShea K (2012) An approach to conversational agent design using semantic sentence similarity. Appl Intell 37(4):558–568
Gong Y, Luo H, Zhang J (2017) Natural language inference over interaction space. arXiv:1709.04348
Liu P, Qiu X, Chen J, Huang X (August 2016) Deep fusion LSTMs for text semantic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin
Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T (2018) Attention-fused deep matching network for natural language inference. In: IJCAI, pp 4033–4040
Parikh A, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 2249–2255
Rocktäschel T, Grefenstette E, Hermann K M, Kočiskỳ T, Blunsom P (2015) Reasoning about entailment with neural attention. arXiv:1509.06664
Park C, Song H, Lee C (2020) S3-net: Sru-based sentence and self-matching networks for machine reading comprehension. ACM Trans Asian Low-Resource Lang Inf Process (TALLIP) 19(3):1–14
Peng D, Wu S, Liu C (2019) Mpsc: A multiple-perspective semantics-crossover model for matching sentences. IEEE Access 7:61320–61330
Pota M, Esposito M, Pietro G D, Fujita H (2020) Best practices of convolutional neural networks for question classification. Appl Sci 10(14):4710
Tan M, Santos CD, Xiang B, Zhou B (2015) Lstm-based deep learning models for non-factoid answer selection. arXiv:1511.04108
Chen Q, Zhu X, Ling Z-H, Wei S, Jiang H, Inkpen D (2017) Enhanced lstm for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1657–1668
Ghaeini R, Hasan S A, Datla V, Liu J, Lee K, Qadir A, Ling Y, Prakash A, Fern X, Farri O (2018) Dr-bilstm: Dependent reading bidirectional lstm for natural language inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 1460–1469
Bjerva J, Plank B, Bos J (2016) Semantic tagging with deep residual networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 3531–3541
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Heilman M, Smith N A (2010) Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 1011–1019
Wang Z, Ittycheriah A (2015) Faq-based question answering via word alignment. arXiv:1507.02628
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 130–136
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 2227–2237
Wang S, Jiang J (2016) A compare-aggregate model for matching text sequences. arXiv:1611.01747
Tai K S, Socher R, Manning C D (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 1556–1566
Bowman S, Gauthier J, Rastogi A, Gupta R, Manning C D, Potts C (2016) A fast unified model for parsing and sentence understanding. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1466–1477
Hermann K M, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp 1693–1701
Wang S, Jiang J (2016) Learning natural language inference with lstm. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1442– 1451
Yang R, Zhang J, Gao X, Ji F, Chen H (July 2019) Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence
Im J, Cho S (2017) Distance-based self-attention network for natural language inference. arXiv:1712.02047
Lin Z, Feng M, Santos CND, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:1703.03130
Chen Q, Zhu X, Ling Z-H, Inkpen D, Wei S (2018) Neural natural language inference models enhanced with external knowledge. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2406–2417
Pan B, Yang Y, Zhao Z, Zhuang Y, Cai D, He X (2018) Discourse marker augmented network with reinforcement learning for natural language inference. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 989–999
Wang Y, Wang M, Fujita H (2020) Word sense disambiguation: A comprehensive knowledge exploitation framework. Knowl-Based Syst 190:105030
Miller, George A (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Chomsky N (1957) Syntactic structures. the hague: Mouton.. 1965. aspects of the theory of syntax. Cambridge, Mass.: MIT Press.(1981) Lectures on Government and Binding, Dordrecht: Foris. (1982) Some Concepts and Consequences of the Theory of Government and Binding. LI Monographs, vol 6, p 1–52
Dowty D (2007) Compositionality as an empirical problem. Direct Compositional (14):23–101
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Dozat T, Manning C D (2016) Deep biaffine attention for neural dependency parsing. arXiv:1611.01734
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp 315– 323
Fan H, Zhou J (2018) Stacked latent attention for multimodal reasoning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Srivastava R K, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734
Tay Y, Tuan L A, Hui S C (2017) A compare-propagate architecture with alignment factorization for natural language inference. arXiv:1801.00102
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448–456
Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Khot T, Sabharwal A, Clark P (2018) Scitail: A textual entailment dataset from science question answering. In: Thirty-Second AAAI Conference on Artificial Intelligence
Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. Proc AAAI Conf Artif Intell 33:6586–6593. https://doi.org/10.1609/aaai.v33i01.33016586
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 551–561
Sha L, Chang B, Sui Z, Li S (2016) Reading and thinking: Re-read lstm unit for textual entailment recognition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 2870–2879
Paria B, Annervaz KM, Dukkipati A, Chatterjee A, Podder S (2016) A neural architecture mimicking humans end-to-end for natural language inference. arXiv:1611.04741
Zhang Z, Wu Y, Li Z, Zhao H (2019) Explicit contextual semantics for text comprehension. In: Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33)
Zhang Z, Wu Y, Zhao H, Li Z, Zhang S, Zhou X, Zhou X (2019) Semantics-aware bert for language understanding. arXiv:1909.02209
Yin W, Roth D, Schütze H (2018) End-task oriented textual entailment via deep explorations of inter-sentence interactions. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 540–545
Wang Z, Mi H, Ittycheriah A (2016) Sentence similarity learning by lexical decomposition and composition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 1340–1349
Ding Y, Liu Y, Luan H, Sun M (2017) Visualizing and understanding neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1150–1159
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078
Li J, Chen X, Hovy E, Jurafsky D (2016) Visualizing and understanding neural models in nlp. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 681–691
Papineni K, Roukos S, Ward T, Zhu W J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of Meeting of the Association for Computational Linguistics
Ganitkevitch J, Van Durme B, Callison-Burch C (2013) Ppdb: The paraphrase database. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 758–764
Funding
The research work descried in this paper has been supported by the National Nature Science Foundation of China (No. 61876198, 61976015, 61370130 and 61976016), and also supported by the Beijing Municipal Natural Science Foundation (No. 4172047), and the International Science and Technology Cooperation Program of China (No. K11F100010).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Mingtong Liu, Yujie Zhang, Jinan Xu and Yufeng Chen. The first draft of the manuscript was written by Mingtong Liu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, M., Zhang, Y., Xu, J. et al. Deep bi-directional interaction network for sentence matching. Appl Intell 51, 4305–4329 (2021). https://doi.org/10.1007/s10489-020-02156-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02156-7