Abstract
Textual Semantic Similarity is a crucial part of text matching tasks, and it has a very wide range of applications in natural language processing (NLP) tasks such as search engines, question-answering systems, information retrieval, natural language inference. Although there are a variety of approaches about textual semantic similarity, many do not succeed in achieving the semantic representation of a sentence or text that represents it well, and ignore that different words serve different roles in expressing the meaning of the whole sentence in different degrees. Therefore, our paper proposes a Siamese Bert network model to obtain textual semantic similarity. Firstly, we utilize the Bert network model to obtain the semantic features of each word in the sentence as input and utilize the merit of the Siamese network, reducing the training parameters, sharing the same encoder and feature weight information with each other. Then we use the attention mechanism to obtain more advanced semantic features. Furthermore, the similarity between two sentences can be derived by the methods of calculating the distance or concatenating their high-level semantic representations. In this paper, we apply the network structure to three related semantic similarity datasets, which perform better than other approaches.
Similar content being viewed by others
Data availability
The data generated and analyzed during the current study are available from the corresponding author on reasonable request.
References
Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54:922–937. https://doi.org/10.1016/j.ipm.2018.06.005
Ahmed U, Mukhiya SK, Srivastava G et al (2021) Attention-based deep entropy active learning using lexical algorithm for mental health treatment. Front Psychol 12:64–2347. https://doi.org/10.3389/fpsyg.2021.642347
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Bowman SR, Gauthier J, Rastogi A et al (2016) A fast unified model for parsing and sentence understanding. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp 1466–1477
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 632–642
Chen Q, Hu Q, Huang JX, He L (2018) CA-RNN: Using context-aligned recurrent neural networks for modeling sentence similarity. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the Thirty-second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. AAAI Press, pp 265–273
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:181004805 [cs]
Dolan WB, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora. Exploiting massively parallel news sources. COLING
Duan C, Cui L, Chen X et al (2018) Attention-fused deep matching network for natural language inference. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden, pp 4033–4040
Geng Z, Chen G, Han Y et al (2020) Semantic relation extraction using sequential and tree-structured LSTM with attention. Inf Sci 509:183–192. https://doi.org/10.1016/j.ins.2019.09.006
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06). IEEE, New York, pp 1735–1742
He H, Gimpel K, Lin J (2015) Multi-perspective sentence similarity modeling with convolutional neural networks. pp 1576–1586
Huang K, Altosaar J, Ranganath R (2019) ClinicalBERT: modeling clinical notes and predicting hospital readmission. arXiv:1904.05342
Im J, Cho S (2017) Distance-based self-attention network for natural language inference. arXiv:171202047 [cs]
Ji Y, Eisenstein J (2013) Discriminative improvements to distributional sentence similarity. EMNLP
Jiang N, de Marneffe M-C (2019) Do you know that florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 4208–4213
Kong L, Han Z, Han Y, Qi H (2020) A deep paraphrase identification model interacting semantics with syntax. Complexity 2020:1–14. https://doi.org/10.1155/2020/9757032
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234–1240
Li Z, Lin H, Zheng W et al (2020) Interactive self-attentive siamese network for biomedical sentence similarity. IEEE Access 8:84093–84104. https://doi.org/10.1109/ACCESS.2020.2985685
Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowl Based Syst 212:106–548. https://doi.org/10.1016/j.knosys.2020.106548
Liu B, Zhang T, Han FX et al (2018) Matching natural language sentences with hierarchical sentence factorization. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18. ACM Press, Lyon, pp 1237–1246
Liu Z, Lu C, Huang H et al (2020) Hierarchical multi-granularity attention- based hybrid neural network for text classification. IEEE Access 8:149362–149371. https://doi.org/10.1109/ACCESS.2020.3016727
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:150804025 [cs]
Madnani N, Tetreault JR, Chodorow M (2012) Re-examining machine translation metrics for paraphrase identification. NAACL
Mansoor M, Rehman Z, ur, Shaheen M et al (2020) Deep learning based Semantic Similarity detection using text data. ITC 49:495–510. https://doi.org/10.5755/j01.itc.49.4.27118
Mou L, Men R, Li G et al (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Berlin, pp 130–136
Mueller J, Thyagarajan A (2016) Siamese recurrent architectures for learning sentence similarity. AAAI
Neculoiu P, Versteegh M, Rotaru M (2016) Learning text similarity with siamese recurrent networks. In: Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, Berlin, Germany, pp 148–157
Peinelt N, Nguyen D, Liakata M (2020) tBERT: Topic Models and BERT joining forces for semantic similarity detection. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp 7047–7055
Pontes EL, Huet S, Linhares AC, Torres-Moreno J (2018) Predicting the semantic textual similarity with siamese CNN and LSTM. JEPTALNRECITAL
Quan Z, Wang Z, Le Y, Yao B, Li K, Yin J (2019) An efficient framework for sentence similarity modeling. IEEE/ACM Trans Audio Speech Lang Process 27:853–865
Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 3980–3990
Rocktäschel T, Grefenstette E, Hermann KM et al (2016) Reasoning about entailment with neural attention. arXiv:150906664 [cs]
Sahi M, Gupta V (2017) A novel technique for detecting plagiarism in documents exploiting information sources. Cogn Comput 9:852–867. https://doi.org/10.1007/s12559-017-9502-4
Saric F, Glavas G, Karan M, Šnajder J, Basic BD (2012) TakeLab: Systems for measuring semantic text similarity. *SEMEVAL
Shao Y (2017) HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate semantic textual similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, pp 130–133
Shao Y, Lin JC-W, Srivastava G et al (2021) Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recognit Lett 145:157–164. https://doi.org/10.1016/j.patrec.2021.02.008
Song Y, Hu W, He L (2019) Using fractional latent topic to enhance recurrent neural network in text similarity modeling. DASFAA
Viswanathan S, Damodaran N, Simon A, George A, Kumar MA, Soman K (2019) Detection of duplicates in Quora and Twitter corpus. Advances in big data and cloud computing. ed: Springer, Berlin, pp 519–528
Wang C, Ge S, Jiang Z et al (2021) SiamFuseNet: a pseudo-siamese network for detritus detection from polarized microscopic images of river sands. Comput Geosci 156:104912. https://doi.org/10.1016/j.cageo.2021.104912
Wang Q, Li B, Xiao T et al (2019) Learning deep transformer models for machine translation. arXiv:190601787 [cs]
Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia, pp 4144–4150
Wang Y, Di X, Li J et al (2018) Sentence similarity learning method based on attention hybrid model. J Phys: Conf Ser 1069:012119. https://doi.org/10.1088/1742-6596/1069/1/012119
Wu Z, Nguyen T-S, Ong DC (2020) Structured self-attention weights encode semantics in sentiment analysis. arXiv:201004922 [cs]
Zhu W, Yao T, Ni J et al (2018) Dependency-based siamese long short-term memory network for learning sentence representations. PLoS ONE 13:e0193919. https://doi.org/10.1371/journal.pone.0193919
Zhu X, Guo K, Fang H et al (2021) Cross view capture for stereo image super-resolution. IEEE Trans Multimedia 1–1. https://doi.org/10.1109/TMM.2021.3092571
Zhu X, Guo K, Ren S et al (2022) Lightweight image Super-Resolution with expectation-maximization attention mechanism. IEEE Trans Circuits Syst Video Technol 32:1273–1284. https://doi.org/10.1109/TCSVT.2021.3078436
Acknowledgements
This research is supported by Key-Area Research and Development Program of Guangdong Province under Grant 2019B010153002, Key Program of NSFC-Guangdong Joint Funds under Grant U1701262 and U1801263, National Natural Science Foundation of China under Grant 62002071 and Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Li, R., Cheng, L., Wang, D. et al. Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity. Multimed Tools Appl 82, 46673–46694 (2023). https://doi.org/10.1007/s11042-023-15509-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15509-4