VSCA: A Sentence Matching Model Incorporating Visual Perception

Zhang, Zhe; Xiao, Guangli; Qian, Yurong; Ma, Mengnan; Leng, Hongyong; Zhang, Tao

doi:10.1007/s12559-022-10074-8

VSCA: A Sentence Matching Model Incorporating Visual Perception

Published: 08 December 2022

Volume 15, pages 323–336, (2023)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Zhe Zhang^1,2,
Guangli Xiao¹,
Yurong Qian^1,2,
Mengnan Ma^1,2,
Hongyong Leng^1,3 &
…
Tao Zhang ORCID: orcid.org/0000-0002-4563-4301^1,2

190 Accesses
Explore all metrics

Abstract

Stacking multiple layers of attention networks can significantly improve a model’s performance. However, this also increases the model’s time and space complexity, making it difficult for the model to capture detailed information on the underlying features. We propose a novel sentence matching model (VSCA) that uses a new attention mechanism based on variational autoencoders (VAE), which exploits the contextual information in sentences to construct a basic attention feature map and combines it with VAE to generate multiple sets of related attention feature maps for fusion. Furthermore, VSCA introduces a spatial attention mechanism that combines visual perception to capture multilevel semantic information. The experimental results show that our proposed model outperforms pretrained models such as BERT on the LCQMC dataset and performs well on the PAWS-X data. Our work consists of two parts. The first part compares the proposed sentence matching model with state-of-the-art pretrained models such as BERT. The second part conducts innovative research on applying VAE and spatial attention mechanisms in NLP. The experimental results on the related datasets show that the proposed method has satisfactory performance, and VSCA can capture rich attentional information and detailed information with less time and space complexity. This work provides insights into the application of VAE and spatial attention mechanisms in NLP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Context-Aware Dual-Attention Network for Natural Language Inference

Densely-Connected Transformer with Co-attentive Information for Matching Text Sequences

CGSPN : cascading gated self-attention and phrase-attention network for sentence modeling

Article 24 June 2020

Yanping Fu & Yun Liu

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Notes

References

Wang Z, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences. arXiv:1702:03814 [Preprint]. 2017. Available from: http://arxiv.org/abs/1702.03814.
Zhang X, Sun X, Wang H. Duplicate question identification by integrating framenet with neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32.
Bogdanova D, dos Santos C, Barbosa L, Zadrozny B. Detecting semantically equivalent questions in online user forums. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning. 2015. p. 123–31.
Chen J, Chen Q, Liu X, Yang H, Lu D, Tang B. The BQ corpus: a large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018. p. 4946–51.
Bowman SR, Angeli G, Potts C, Manning CD. A large annotated corpus for learning natural language inference. arXiv:1508.05326 [Preprint]. 2015. Available from: http://arxiv.org/abs/1508.05326.
Iftene A, Balahur A. Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing; 2007. p. 125–30.
Madnani N, Tetreault J, Chodorow M. Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2012. p. 182–90.
Yin W, Schütze H, Xiang B, Zhou B. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist. 2016;4:259–72.
Article Google Scholar
Dolan W, Quirk C, Brockett C, Dolan B. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. 2004.
Liu Q, Huang Z, Huang Z, Liu C, Chen E, Su Y, et al. Finding similar exercises in online education systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018. p. 1821–30.
Clark P, Etzioni O, Khot T, Sabharwal A, Tafjord O, Turney P, et al. Combining retrieval, statistics, and inference to answer elementary science questions. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016;30.
Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H. Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Info Sci. 2020;514:88–105.
Article Google Scholar
Xiao L, Wissmann D, Brown M, Jablonski S. Information extraction from the web: System and techniques. Appl Intell. 2004;21(2):195–224.
Article MATH Google Scholar
Gálvez-López D, Tardos JD. Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot. 2012;28(5):1188–97.
Article Google Scholar
Landauer TK, Dumais ST. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev. 1997;104(2):211.
Article Google Scholar
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustics Speech Signal Process. 1980;28(4):357–66.
Article Google Scholar
Parikh AP, Täckström O, Das D, Uszkoreit J. A decomposable attention model for natural language inference. arXiv:1606.01933 [Preprint]. 2016. Available from http://arxiv.org/abs/1606.01933.
Chen Q, Zhu X, Ling Z, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. arXiv:1609.06038 [Preprint]. 2016. Available from http://arxiv.org/abs/1609.06038.
Ghaeini R, Hasan SA, Datla V, Liu J, Lee K, Qadir A, et al. DR-BiLSTM: dependent reading bidirectional LSTM for natural language inference. arXiv:1802.05577 [Preprint]. 2018. Available from http://arxiv.org/abs/1802.05577.
Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T. Attention-fused deep matching network for natural language inference. In: IJCAI; 2018. p. 4033–40.
Park C, Song H, Lee C. S3-NET: SRU-based sentence and self-matching networks for machine reading comprehension. ACM Trans Asian Low-Resource Language Info Process (TALLIP). 2020;19(3):1–14.
Article Google Scholar
Peng D, Wu S, Liu C. MPSC: a multiple-perspective semantics-crossover model for matching sentences. IEEE Access. 2019;7:61320–30.
Article Google Scholar
Yu W, Yang K, Yao H, Sun X, Xu P. Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing. 2017;237:235–41.
Article Google Scholar
Bjerva J, Plank B, Bos J. Semantic tagging with deep residual networks. arXiv:1609.07053 [Preprint]. 2016. Available from http://arxiv.org/abs/1609.07053.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit; 2016. p. 770–8.
Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv:1312.6114 [Preprint]. 2013. Available from http://arxiv.org/abs/1312.6114.
Woo S, Park J, Lee JY, Kweon IS. CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3–19.
Tan M, Santos CD, Xiang B, Zhou B. LSTM-based deep learning models for non-factoid answer selection. arXiv:1511.04108 [Preprint]. 2015. Available from http://arxiv.org/abs/1511.04108.
Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, et al. Natural language inference by tree-based convolution and heuristic matching. arXiv:1512.08422 [Preprint]. 2015. Available from http://arxiv.org/abs/1512.08422.
Shen Y, He X, Gao J, Deng L, Mesnil G. A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management; 2014. p. 101–10.
Hu B, Lu Z, Li H, Chen Q. Convolutional neural network architectures for matching natural language sentences. Adv Neural Info Process Syst. 2014;27.
Huang PS, He X, Gao J, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management; 2013. p. 2333–8.
Rocktäschel T, Grefenstette E, Hermann KM, Kočiskỳ T, Blunsom P. Reasoning about entailment with neural attention. arXiv:1509.06664 [Preprint]. 2015. Available from http://arxiv.org/abs/1509.06664.
Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, et al. Teaching machines to read and comprehend. Adv Neural Info Process Syst. 2015;28.
Yuan Z, Jun S. Network cooperating with multi-head attention for semantic sentence matching. In: 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES). IEEE. 2020;2020:215–8.
Chen Q, Zhu X, Ling ZH, Inkpen D, Wei S. Neural natural language inference models enhanced with external knowledge. arXiv:1711.04289 [Preprint]. 2017. Available from http://arxiv.org/abs/1711.04289.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [Preprint]. 2018. Available from http://arxiv.org/abs/1810.04805.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Info Process Syst. 2017;30.
Reimers N, Gurevych I. Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv:1908.10084 [Preprint]. 2019. Available from http://arxiv.org/abs/1908.10084.
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: a lite BERT for self-supervised learning of language representations. arXiv:1909.11942 [Preprint]. 2019. Available from http://arxiv.org/abs/1909.11942.
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. XLNet: Generalized autoregressive pretraining for language understanding. Adv Neural Info Process Syst. 2019;32.
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 [Preprint]. 2019. Available from http://arxiv.org/abs/1907.11692.
Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.
Article MathSciNet MATH Google Scholar
Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S. Generating sentences from a continuous space. arXiv:1511.06349 [Preprint]. 2015. http://arxiv.org/abs/1511.06349.
Miao Y, Yu L, Blunsom P. Neural variational inference for text processing. In: International Conference on Machine Learning. PMLR; 2016. p. 1727–36.
Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, et al. A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31.
Yang Z, Hu Z, Salakhutdinov R, Berg-Kirkpatrick T. Improved variational autoencoders for text modeling using dilated convolutions. In: International Conference on Machine Learning. PMLR; 2017. p. 3881–90.
Liu D, Liu G. A transformer-based variational autoencoder for sentence generation. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019. p. 1–7.
Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 562–70.
Dowty D. Compositionality as an empirical problem. Direct Compositional. 2007;14:23–101.
Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Article Google Scholar
Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 [Preprint]. 2015. Available from http://arxiv.org/abs/1508.01991.
Loshchilov I, Hutter F. Fixing weight decay regularization in Adam. 2018.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
MathSciNet MATH Google Scholar
Liu X, Chen Q, Deng C, Zeng H, Chen J, Li D, et al. LCQMC: a large-scale Chinese question matching corpus. In: Proceedings of the 27th International Conference on Computational Linguistics; 2018. p. 1952–62.
Yang Y, Zhang Y, Tar C, Baldridge J. PAWS-X: a cross-lingual adversarial dataset for paraphrase identification. arXiv:1908.11828 [Preprint]. 2019. Available from http://arxiv.org/abs/1908.11828.
Yang R, Zhang J, Gao X, Ji F, Chen H. Simple and effective text matching with richer alignment features. arXiv:1908.00300 [Preprint]. 2019. Available from http://arxiv.org/abs/1908.00300.
Yu R, Lu W, Li Y, Yu J, Zhang G, Zhang X. Sentence semantic matching with hierarchical CNN based on dimension-augmented representation. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE; 2021. p. 1–8.
Hu Z, Fu Z, Yin Y, de Melo G. Context-aware interaction network for question matching. arXiv:2104.08451 [Preprint]. 2021. Available from http://arxiv.org/abs/2104.08451.
Cui Y, Yang Z, Liu T. PERT: Pre-training BERT with permuted language model. arXiv:2203.06906 [Preprint]. 2022. Available from http://arxiv.org/abs/2203.06906.
Cui Y, Che W, Liu T, Qin B, Wang S, Hu G. Revisiting pre-trained models for Chinese natural language processing. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics; 2020. Available from https://doi.org/10.18653%2Fv1%2F2020.findings-emnlp.58.
Zhang X, Li P, Li H. AMBERT: a pre-trained language model with multi-grained tokenization. arXiv:2008.11869 [Preprint]. 2020. Available from http://arxiv.org/abs/2008.11869.
Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002. p. 311–8.
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C. DiSAN: Directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32.
Zhu Q, Su J, Bi W, Liu X, Ma X, Li X, et al. A batch normalized inference network keeps the KL vanishing away. arXiv:2004.12585 [Preprint]. 2020. Available from http://arxiv.org/abs/2004.12585.
Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K. Improve diverse text generation by self labeling conditional variational auto encoder. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2019. p. 2767–71.

Download references

Funding

This research was funded by the National Natural Science Foundation of China (61966035), the International Cooperation Project of the Autonomous Region Science and Technology Department “Data-driven Sino-Russian Cloud Computing Sharing Platform Construction” (2020E01023), the Autonomous Region Natural Science Foundation: Government Affairs Multi-source Heterogeneous Data Fusion and Key Technologies for Mining Research (2021D01C083), and the Autonomous Region Science and Technology Program Youth Science Fund Project: Research on the Risk Assessment Method of Comprehensive Energy System Operation Based on Deep Neural Networks (2022D01C83).

Author information

Authors and Affiliations

School of Software, XinJiang University, Urumqi, 830000, China
Zhe Zhang, Guangli Xiao, Yurong Qian, Mengnan Ma, Hongyong Leng & Tao Zhang
Xinjiang Uygur Autonomous Region Key Laboratory of Signal Detection and Processing, XinJiang University, Urumqi, 830046, China
Zhe Zhang, Yurong Qian, Mengnan Ma & Tao Zhang
IT Academy, Beijing Institute of Technology, Beijing, 100081, China
Hongyong Leng

Authors

Zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guangli Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yurong Qian
View author publications
You can also search for this author in PubMed Google Scholar
Mengnan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hongyong Leng
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all the individual participants included in the study.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Z., Xiao, G., Qian, Y. et al. VSCA: A Sentence Matching Model Incorporating Visual Perception. Cogn Comput 15, 323–336 (2023). https://doi.org/10.1007/s12559-022-10074-8

Download citation

Received: 01 July 2022
Accepted: 02 November 2022
Published: 08 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s12559-022-10074-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

VSCA: A Sentence Matching Model Incorporating Visual Perception

Abstract

Access this article

Similar content being viewed by others

Context-Aware Dual-Attention Network for Natural Language Inference

Densely-Connected Transformer with Co-attentive Information for Matching Text Sequences

CGSPN : cascading gated self-attention and phrase-attention network for sentence modeling

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VSCA: A Sentence Matching Model Incorporating Visual Perception

Abstract

Access this article

Similar content being viewed by others

Context-Aware Dual-Attention Network for Natural Language Inference

Densely-Connected Transformer with Co-attentive Information for Matching Text Sequences

CGSPN : cascading gated self-attention and phrase-attention network for sentence modeling

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation