Skip to main content
Log in

VSCA: A Sentence Matching Model Incorporating Visual Perception

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Stacking multiple layers of attention networks can significantly improve a model’s performance. However, this also increases the model’s time and space complexity, making it difficult for the model to capture detailed information on the underlying features. We propose a novel sentence matching model (VSCA) that uses a new attention mechanism based on variational autoencoders (VAE), which exploits the contextual information in sentences to construct a basic attention feature map and combines it with VAE to generate multiple sets of related attention feature maps for fusion. Furthermore, VSCA introduces a spatial attention mechanism that combines visual perception to capture multilevel semantic information. The experimental results show that our proposed model outperforms pretrained models such as BERT on the LCQMC dataset and performs well on the PAWS-X data. Our work consists of two parts. The first part compares the proposed sentence matching model with state-of-the-art pretrained models such as BERT. The second part conducts innovative research on applying VAE and spatial attention mechanisms in NLP. The experimental results on the related datasets show that the proposed method has satisfactory performance, and VSCA can capture rich attentional information and detailed information with less time and space complexity. This work provides insights into the application of VAE and spatial attention mechanisms in NLP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

  1. https://github.com/liuhuanyong/ChineseEmbedding

  2. https://github.com/YJiangcm/Chinese-sentence-pair-modeling

References

  1. Wang Z, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences. arXiv:1702:03814 [Preprint]. 2017. Available from: http://arxiv.org/abs/1702.03814.

  2. Zhang X, Sun X, Wang H. Duplicate question identification by integrating framenet with neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32.

  3. Bogdanova D, dos Santos C, Barbosa L, Zadrozny B. Detecting semantically equivalent questions in online user forums. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning. 2015. p. 123–31.

  4. Chen J, Chen Q, Liu X, Yang H, Lu D, Tang B. The BQ corpus: a large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018. p. 4946–51.

  5. Bowman SR, Angeli G, Potts C, Manning CD. A large annotated corpus for learning natural language inference. arXiv:1508.05326 [Preprint]. 2015. Available from: http://arxiv.org/abs/1508.05326.

  6. Iftene A, Balahur A. Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing; 2007. p. 125–30.

  7. Madnani N, Tetreault J, Chodorow M. Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2012. p. 182–90.

  8. Yin W, Schütze H, Xiang B, Zhou B. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist. 2016;4:259–72.

    Article  Google Scholar 

  9. Dolan W, Quirk C, Brockett C, Dolan B. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. 2004.

  10. Liu Q, Huang Z, Huang Z, Liu C, Chen E, Su Y, et al. Finding similar exercises in online education systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018. p. 1821–30.

  11. Clark P, Etzioni O, Khot T, Sabharwal A, Tafjord O, Turney P, et al. Combining retrieval, statistics, and inference to answer elementary science questions. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016;30.

  12. Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H. Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Info Sci. 2020;514:88–105.

    Article  Google Scholar 

  13. Xiao L, Wissmann D, Brown M, Jablonski S. Information extraction from the web: System and techniques. Appl Intell. 2004;21(2):195–224.

    Article  MATH  Google Scholar 

  14. Gálvez-López D, Tardos JD. Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot. 2012;28(5):1188–97.

    Article  Google Scholar 

  15. Landauer TK, Dumais ST. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev. 1997;104(2):211.

    Article  Google Scholar 

  16. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.

  17. Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustics Speech Signal Process. 1980;28(4):357–66.

    Article  Google Scholar 

  18. Parikh AP, Täckström O, Das D, Uszkoreit J. A decomposable attention model for natural language inference. arXiv:1606.01933 [Preprint]. 2016. Available from http://arxiv.org/abs/1606.01933.

  19. Chen Q, Zhu X, Ling Z, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. arXiv:1609.06038 [Preprint]. 2016. Available from http://arxiv.org/abs/1609.06038.

  20. Ghaeini R, Hasan SA, Datla V, Liu J, Lee K, Qadir A, et al. DR-BiLSTM: dependent reading bidirectional LSTM for natural language inference. arXiv:1802.05577 [Preprint]. 2018. Available from http://arxiv.org/abs/1802.05577.

  21. Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T. Attention-fused deep matching network for natural language inference. In: IJCAI; 2018. p. 4033–40.

  22. Park C, Song H, Lee C. S3-NET: SRU-based sentence and self-matching networks for machine reading comprehension. ACM Trans Asian Low-Resource Language Info Process (TALLIP). 2020;19(3):1–14.

    Article  Google Scholar 

  23. Peng D, Wu S, Liu C. MPSC: a multiple-perspective semantics-crossover model for matching sentences. IEEE Access. 2019;7:61320–30.

    Article  Google Scholar 

  24. Yu W, Yang K, Yao H, Sun X, Xu P. Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing. 2017;237:235–41.

    Article  Google Scholar 

  25. Bjerva J, Plank B, Bos J. Semantic tagging with deep residual networks. arXiv:1609.07053 [Preprint]. 2016. Available from http://arxiv.org/abs/1609.07053.

  26. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit; 2016. p. 770–8.

  27. Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv:1312.6114 [Preprint]. 2013. Available from http://arxiv.org/abs/1312.6114.

  28. Woo S, Park J, Lee JY, Kweon IS. CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3–19.

  29. Tan M, Santos CD, Xiang B, Zhou B. LSTM-based deep learning models for non-factoid answer selection. arXiv:1511.04108 [Preprint]. 2015. Available from http://arxiv.org/abs/1511.04108.

  30. Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, et al. Natural language inference by tree-based convolution and heuristic matching. arXiv:1512.08422 [Preprint]. 2015. Available from http://arxiv.org/abs/1512.08422.

  31. Shen Y, He X, Gao J, Deng L, Mesnil G. A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management; 2014. p. 101–10.

  32. Hu B, Lu Z, Li H, Chen Q. Convolutional neural network architectures for matching natural language sentences. Adv Neural Info Process Syst. 2014;27.

  33. Huang PS, He X, Gao J, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management; 2013. p. 2333–8.

  34. Rocktäschel T, Grefenstette E, Hermann KM, Kočiskỳ T, Blunsom P. Reasoning about entailment with neural attention. arXiv:1509.06664 [Preprint]. 2015. Available from http://arxiv.org/abs/1509.06664.

  35. Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, et al. Teaching machines to read and comprehend. Adv Neural Info Process Syst. 2015;28.

  36. Yuan Z, Jun S. Network cooperating with multi-head attention for semantic sentence matching. In: 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES). IEEE. 2020;2020:215–8.

  37. Chen Q, Zhu X, Ling ZH, Inkpen D, Wei S. Neural natural language inference models enhanced with external knowledge. arXiv:1711.04289 [Preprint]. 2017. Available from http://arxiv.org/abs/1711.04289.

  38. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [Preprint]. 2018. Available from http://arxiv.org/abs/1810.04805.

  39. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Info Process Syst. 2017;30.

  40. Reimers N, Gurevych I. Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv:1908.10084 [Preprint]. 2019. Available from http://arxiv.org/abs/1908.10084.

  41. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: a lite BERT for self-supervised learning of language representations. arXiv:1909.11942 [Preprint]. 2019. Available from http://arxiv.org/abs/1909.11942.

  42. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. XLNet: Generalized autoregressive pretraining for language understanding. Adv Neural Info Process Syst. 2019;32.

  43. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 [Preprint]. 2019. Available from http://arxiv.org/abs/1907.11692.

  44. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.

    Article  MathSciNet  MATH  Google Scholar 

  45. Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S. Generating sentences from a continuous space. arXiv:1511.06349 [Preprint]. 2015. http://arxiv.org/abs/1511.06349.

  46. Miao Y, Yu L, Blunsom P. Neural variational inference for text processing. In: International Conference on Machine Learning. PMLR; 2016. p. 1727–36.

  47. Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, et al. A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31.

  48. Yang Z, Hu Z, Salakhutdinov R, Berg-Kirkpatrick T. Improved variational autoencoders for text modeling using dilated convolutions. In: International Conference on Machine Learning. PMLR; 2017. p. 3881–90.

  49. Liu D, Liu G. A transformer-based variational autoencoder for sentence generation. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019. p. 1–7.

  50. Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 562–70.

  51. Dowty D. Compositionality as an empirical problem. Direct Compositional. 2007;14:23–101.

    Google Scholar 

  52. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

    Article  Google Scholar 

  53. Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 [Preprint]. 2015. Available from http://arxiv.org/abs/1508.01991.

  54. Loshchilov I, Hutter F. Fixing weight decay regularization in Adam. 2018.

  55. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.

    MathSciNet  MATH  Google Scholar 

  56. Liu X, Chen Q, Deng C, Zeng H, Chen J, Li D, et al. LCQMC: a large-scale Chinese question matching corpus. In: Proceedings of the 27th International Conference on Computational Linguistics; 2018. p. 1952–62.

  57. Yang Y, Zhang Y, Tar C, Baldridge J. PAWS-X: a cross-lingual adversarial dataset for paraphrase identification. arXiv:1908.11828 [Preprint]. 2019. Available from http://arxiv.org/abs/1908.11828.

  58. Yang R, Zhang J, Gao X, Ji F, Chen H. Simple and effective text matching with richer alignment features. arXiv:1908.00300 [Preprint]. 2019. Available from http://arxiv.org/abs/1908.00300.

  59. Yu R, Lu W, Li Y, Yu J, Zhang G, Zhang X. Sentence semantic matching with hierarchical CNN based on dimension-augmented representation. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE; 2021. p. 1–8.

  60. Hu Z, Fu Z, Yin Y, de Melo G. Context-aware interaction network for question matching. arXiv:2104.08451 [Preprint]. 2021. Available from http://arxiv.org/abs/2104.08451.

  61. Cui Y, Yang Z, Liu T. PERT: Pre-training BERT with permuted language model. arXiv:2203.06906 [Preprint]. 2022. Available from http://arxiv.org/abs/2203.06906.

  62. Cui Y, Che W, Liu T, Qin B, Wang S, Hu G. Revisiting pre-trained models for Chinese natural language processing. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics; 2020. Available from https://doi.org/10.18653%2Fv1%2F2020.findings-emnlp.58.

  63. Zhang X, Li P, Li H. AMBERT: a pre-trained language model with multi-grained tokenization. arXiv:2008.11869 [Preprint]. 2020. Available from http://arxiv.org/abs/2008.11869.

  64. Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002. p. 311–8.

  65. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C. DiSAN: Directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32.

  66. Zhu Q, Su J, Bi W, Liu X, Ma X, Li X, et al. A batch normalized inference network keeps the KL vanishing away. arXiv:2004.12585 [Preprint]. 2020. Available from http://arxiv.org/abs/2004.12585.

  67. Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K. Improve diverse text generation by self labeling conditional variational auto encoder. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2019. p. 2767–71.

Download references

Funding

This research was funded by the National Natural Science Foundation of China (61966035), the International Cooperation Project of the Autonomous Region Science and Technology Department “Data-driven Sino-Russian Cloud Computing Sharing Platform Construction” (2020E01023), the Autonomous Region Natural Science Foundation: Government Affairs Multi-source Heterogeneous Data Fusion and Key Technologies for Mining Research (2021D01C083), and the Autonomous Region Science and Technology Program Youth Science Fund Project: Research on the Risk Assessment Method of Comprehensive Energy System Operation Based on Deep Neural Networks (2022D01C83).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all the individual participants included in the study.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Xiao, G., Qian, Y. et al. VSCA: A Sentence Matching Model Incorporating Visual Perception. Cogn Comput 15, 323–336 (2023). https://doi.org/10.1007/s12559-022-10074-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-022-10074-8

Keywords

Navigation