Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity

Li, Ruihao; Cheng, Lianglun; Wang, Depei; Tan, Junming

doi:10.1007/s11042-023-15509-4

Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity

Published: 02 May 2023

Volume 82, pages 46673–46694, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ruihao Li¹,
Lianglun Cheng²,
Depei Wang³ &
…
Junming Tan²

476 Accesses
1 Altmetric
Explore all metrics

Abstract

Textual Semantic Similarity is a crucial part of text matching tasks, and it has a very wide range of applications in natural language processing (NLP) tasks such as search engines, question-answering systems, information retrieval, natural language inference. Although there are a variety of approaches about textual semantic similarity, many do not succeed in achieving the semantic representation of a sentence or text that represents it well, and ignore that different words serve different roles in expressing the meaning of the whole sentence in different degrees. Therefore, our paper proposes a Siamese Bert network model to obtain textual semantic similarity. Firstly, we utilize the Bert network model to obtain the semantic features of each word in the sentence as input and utilize the merit of the Siamese network, reducing the training parameters, sharing the same encoder and feature weight information with each other. Then we use the attention mechanism to obtain more advanced semantic features. Furthermore, the similarity between two sentences can be derived by the methods of calculating the distance or concatenating their high-level semantic representations. In this paper, we apply the network structure to three related semantic similarity datasets, which perform better than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

Data availability

The data generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54:922–937. https://doi.org/10.1016/j.ipm.2018.06.005
Article Google Scholar
Ahmed U, Mukhiya SK, Srivastava G et al (2021) Attention-based deep entropy active learning using lexical algorithm for mental health treatment. Front Psychol 12:64–2347. https://doi.org/10.3389/fpsyg.2021.642347
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Bowman SR, Gauthier J, Rastogi A et al (2016) A fast unified model for parsing and sentence understanding. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp 1466–1477
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 632–642
Chen Q, Hu Q, Huang JX, He L (2018) CA-RNN: Using context-aligned recurrent neural networks for modeling sentence similarity. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the Thirty-second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. AAAI Press, pp 265–273
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:181004805 [cs]
Dolan WB, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora. Exploiting massively parallel news sources. COLING
Duan C, Cui L, Chen X et al (2018) Attention-fused deep matching network for natural language inference. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden, pp 4033–4040
Geng Z, Chen G, Han Y et al (2020) Semantic relation extraction using sequential and tree-structured LSTM with attention. Inf Sci 509:183–192. https://doi.org/10.1016/j.ins.2019.09.006
Article Google Scholar
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06). IEEE, New York, pp 1735–1742
He H, Gimpel K, Lin J (2015) Multi-perspective sentence similarity modeling with convolutional neural networks. pp 1576–1586
Huang K, Altosaar J, Ranganath R (2019) ClinicalBERT: modeling clinical notes and predicting hospital readmission. arXiv:1904.05342
Im J, Cho S (2017) Distance-based self-attention network for natural language inference. arXiv:171202047 [cs]
Ji Y, Eisenstein J (2013) Discriminative improvements to distributional sentence similarity. EMNLP
Jiang N, de Marneffe M-C (2019) Do you know that florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 4208–4213
Kong L, Han Z, Han Y, Qi H (2020) A deep paraphrase identification model interacting semantics with syntax. Complexity 2020:1–14. https://doi.org/10.1155/2020/9757032
Article Google Scholar
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234–1240
Article Google Scholar
Li Z, Lin H, Zheng W et al (2020) Interactive self-attentive siamese network for biomedical sentence similarity. IEEE Access 8:84093–84104. https://doi.org/10.1109/ACCESS.2020.2985685
Article Google Scholar
Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowl Based Syst 212:106–548. https://doi.org/10.1016/j.knosys.2020.106548
Liu B, Zhang T, Han FX et al (2018) Matching natural language sentences with hierarchical sentence factorization. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18. ACM Press, Lyon, pp 1237–1246
Liu Z, Lu C, Huang H et al (2020) Hierarchical multi-granularity attention- based hybrid neural network for text classification. IEEE Access 8:149362–149371. https://doi.org/10.1109/ACCESS.2020.3016727
Article Google Scholar
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:150804025 [cs]
Madnani N, Tetreault JR, Chodorow M (2012) Re-examining machine translation metrics for paraphrase identification. NAACL
Mansoor M, Rehman Z, ur, Shaheen M et al (2020) Deep learning based Semantic Similarity detection using text data. ITC 49:495–510. https://doi.org/10.5755/j01.itc.49.4.27118
Article Google Scholar
Mou L, Men R, Li G et al (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Berlin, pp 130–136
Mueller J, Thyagarajan A (2016) Siamese recurrent architectures for learning sentence similarity. AAAI
Neculoiu P, Versteegh M, Rotaru M (2016) Learning text similarity with siamese recurrent networks. In: Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, Berlin, Germany, pp 148–157
Peinelt N, Nguyen D, Liakata M (2020) tBERT: Topic Models and BERT joining forces for semantic similarity detection. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp 7047–7055
Pontes EL, Huet S, Linhares AC, Torres-Moreno J (2018) Predicting the semantic textual similarity with siamese CNN and LSTM. JEPTALNRECITAL
Quan Z, Wang Z, Le Y, Yao B, Li K, Yin J (2019) An efficient framework for sentence similarity modeling. IEEE/ACM Trans Audio Speech Lang Process 27:853–865
Article Google Scholar
Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 3980–3990
Rocktäschel T, Grefenstette E, Hermann KM et al (2016) Reasoning about entailment with neural attention. arXiv:150906664 [cs]
Sahi M, Gupta V (2017) A novel technique for detecting plagiarism in documents exploiting information sources. Cogn Comput 9:852–867. https://doi.org/10.1007/s12559-017-9502-4
Article Google Scholar
Saric F, Glavas G, Karan M, Šnajder J, Basic BD (2012) TakeLab: Systems for measuring semantic text similarity. *SEMEVAL
Shao Y (2017) HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate semantic textual similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, pp 130–133
Shao Y, Lin JC-W, Srivastava G et al (2021) Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recognit Lett 145:157–164. https://doi.org/10.1016/j.patrec.2021.02.008
Article Google Scholar
Song Y, Hu W, He L (2019) Using fractional latent topic to enhance recurrent neural network in text similarity modeling. DASFAA
Viswanathan S, Damodaran N, Simon A, George A, Kumar MA, Soman K (2019) Detection of duplicates in Quora and Twitter corpus. Advances in big data and cloud computing. ed: Springer, Berlin, pp 519–528
Book Google Scholar
Wang C, Ge S, Jiang Z et al (2021) SiamFuseNet: a pseudo-siamese network for detritus detection from polarized microscopic images of river sands. Comput Geosci 156:104912. https://doi.org/10.1016/j.cageo.2021.104912
Article Google Scholar
Wang Q, Li B, Xiao T et al (2019) Learning deep transformer models for machine translation. arXiv:190601787 [cs]
Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia, pp 4144–4150
Wang Y, Di X, Li J et al (2018) Sentence similarity learning method based on attention hybrid model. J Phys: Conf Ser 1069:012119. https://doi.org/10.1088/1742-6596/1069/1/012119
Article Google Scholar
Wu Z, Nguyen T-S, Ong DC (2020) Structured self-attention weights encode semantics in sentiment analysis. arXiv:201004922 [cs]
Zhu W, Yao T, Ni J et al (2018) Dependency-based siamese long short-term memory network for learning sentence representations. PLoS ONE 13:e0193919. https://doi.org/10.1371/journal.pone.0193919
Article Google Scholar
Zhu X, Guo K, Fang H et al (2021) Cross view capture for stereo image super-resolution. IEEE Trans Multimedia 1–1. https://doi.org/10.1109/TMM.2021.3092571
Zhu X, Guo K, Ren S et al (2022) Lightweight image Super-Resolution with expectation-maximization attention mechanism. IEEE Trans Circuits Syst Video Technol 32:1273–1284. https://doi.org/10.1109/TCSVT.2021.3078436
Article Google Scholar

Download references

Acknowledgements

This research is supported by Key-Area Research and Development Program of Guangdong Province under Grant 2019B010153002, Key Program of NSFC-Guangdong Joint Funds under Grant U1701262 and U1801263, National Natural Science Foundation of China under Grant 62002071 and Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069.

Author information

Authors and Affiliations

School of Physics and Optoelectronic Engineering, Guangdong University of Technology, Guangzhou, 510006, China
Ruihao Li
School of Computers, Guangdong University of Technology, Guangzhou, 510006, China
Lianglun Cheng & Junming Tan
School of Automation, Guangdong University of Technology, Guangzhou, 510006, China
Depei Wang

Authors

Ruihao Li
View author publications
You can also search for this author in PubMed Google Scholar
Lianglun Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Depei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junming Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruihao Li.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Li, R., Cheng, L., Wang, D. et al. Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity. Multimed Tools Appl 82, 46673–46694 (2023). https://doi.org/10.1007/s11042-023-15509-4

Download citation

Received: 12 November 2021
Revised: 06 June 2022
Accepted: 19 April 2023
Published: 02 May 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11042-023-15509-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning approaches for text-to-SQL

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

About this article

Cite this article

Keywords

Navigation

Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning approaches for text-to-SQL

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

About this article

Cite this article

Share this article

Keywords

Search

Navigation