Skip to main content
Log in

Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Textual Semantic Similarity is a crucial part of text matching tasks, and it has a very wide range of applications in natural language processing (NLP) tasks such as search engines, question-answering systems, information retrieval, natural language inference. Although there are a variety of approaches about textual semantic similarity, many do not succeed in achieving the semantic representation of a sentence or text that represents it well, and ignore that different words serve different roles in expressing the meaning of the whole sentence in different degrees. Therefore, our paper proposes a Siamese Bert network model to obtain textual semantic similarity. Firstly, we utilize the Bert network model to obtain the semantic features of each word in the sentence as input and utilize the merit of the Siamese network, reducing the training parameters, sharing the same encoder and feature weight information with each other. Then we use the attention mechanism to obtain more advanced semantic features. Furthermore, the similarity between two sentences can be derived by the methods of calculating the distance or concatenating their high-level semantic representations. In this paper, we apply the network structure to three related semantic similarity datasets, which perform better than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54:922–937. https://doi.org/10.1016/j.ipm.2018.06.005

    Article  Google Scholar 

  2. Ahmed U, Mukhiya SK, Srivastava G et al (2021) Attention-based deep entropy active learning using lexical algorithm for mental health treatment. Front Psychol 12:64–2347. https://doi.org/10.3389/fpsyg.2021.642347

  3. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  4. Bowman SR, Gauthier J, Rastogi A et al (2016) A fast unified model for parsing and sentence understanding. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp 1466–1477

  5. Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 632–642

  6. Chen Q, Hu Q, Huang JX, He L (2018) CA-RNN: Using context-aligned recurrent neural networks for modeling sentence similarity. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the Thirty-second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. AAAI Press, pp 265–273

  7. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:181004805 [cs]

  8. Dolan WB, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora. Exploiting massively parallel news sources. COLING

  9. Duan C, Cui L, Chen X et al (2018) Attention-fused deep matching network for natural language inference. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden, pp 4033–4040

  10. Geng Z, Chen G, Han Y et al (2020) Semantic relation extraction using sequential and tree-structured LSTM with attention. Inf Sci 509:183–192. https://doi.org/10.1016/j.ins.2019.09.006

    Article  Google Scholar 

  11. Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06). IEEE, New York, pp 1735–1742

  12. He H, Gimpel K, Lin J (2015) Multi-perspective sentence similarity modeling with convolutional neural networks. pp 1576–1586

  13. Huang K, Altosaar J, Ranganath R (2019) ClinicalBERT: modeling clinical notes and predicting hospital readmission. arXiv:1904.05342

  14. Im J, Cho S (2017) Distance-based self-attention network for natural language inference. arXiv:171202047 [cs]

  15. Ji Y, Eisenstein J (2013) Discriminative improvements to distributional sentence similarity. EMNLP

  16. Jiang N, de Marneffe M-C (2019) Do you know that florence is packed with visitors? Evaluating state-of-the-art models of speaker commitment. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 4208–4213

  17. Kong L, Han Z, Han Y, Qi H (2020) A deep paraphrase identification model interacting semantics with syntax. Complexity 2020:1–14. https://doi.org/10.1155/2020/9757032

    Article  Google Scholar 

  18. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234–1240

    Article  Google Scholar 

  19. Li Z, Lin H, Zheng W et al (2020) Interactive self-attentive siamese network for biomedical sentence similarity. IEEE Access 8:84093–84104. https://doi.org/10.1109/ACCESS.2020.2985685

    Article  Google Scholar 

  20. Lin JC-W, Shao Y, Djenouri Y, Yun U (2021) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowl Based Syst 212:106–548. https://doi.org/10.1016/j.knosys.2020.106548

  21. Liu B, Zhang T, Han FX et al (2018) Matching natural language sentences with hierarchical sentence factorization. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18. ACM Press, Lyon, pp 1237–1246

  22. Liu Z, Lu C, Huang H et al (2020) Hierarchical multi-granularity attention- based hybrid neural network for text classification. IEEE Access 8:149362–149371. https://doi.org/10.1109/ACCESS.2020.3016727

    Article  Google Scholar 

  23. Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:150804025 [cs]

  24. Madnani N, Tetreault JR, Chodorow M (2012) Re-examining machine translation metrics for paraphrase identification. NAACL

  25. Mansoor M, Rehman Z, ur, Shaheen M et al (2020) Deep learning based Semantic Similarity detection using text data. ITC 49:495–510. https://doi.org/10.5755/j01.itc.49.4.27118

    Article  Google Scholar 

  26. Mou L, Men R, Li G et al (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Berlin, pp 130–136

  27. Mueller J, Thyagarajan A (2016) Siamese recurrent architectures for learning sentence similarity. AAAI

  28. Neculoiu P, Versteegh M, Rotaru M (2016) Learning text similarity with siamese recurrent networks. In: Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, Berlin, Germany, pp 148–157

  29. Peinelt N, Nguyen D, Liakata M (2020) tBERT: Topic Models and BERT joining forces for semantic similarity detection. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp 7047–7055

  30. Pontes EL, Huet S, Linhares AC, Torres-Moreno J (2018) Predicting the semantic textual similarity with siamese CNN and LSTM. JEPTALNRECITAL

  31. Quan Z, Wang Z, Le Y, Yao B, Li K, Yin J (2019) An efficient framework for sentence similarity modeling. IEEE/ACM Trans Audio Speech Lang Process 27:853–865

    Article  Google Scholar 

  32. Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 3980–3990

  33. Rocktäschel T, Grefenstette E, Hermann KM et al (2016) Reasoning about entailment with neural attention. arXiv:150906664 [cs]

  34. Sahi M, Gupta V (2017) A novel technique for detecting plagiarism in documents exploiting information sources. Cogn Comput 9:852–867. https://doi.org/10.1007/s12559-017-9502-4

    Article  Google Scholar 

  35. Saric F, Glavas G, Karan M, Šnajder J, Basic BD (2012) TakeLab: Systems for measuring semantic text similarity. *SEMEVAL

  36. Shao Y (2017) HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate semantic textual similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, pp 130–133

  37. Shao Y, Lin JC-W, Srivastava G et al (2021) Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recognit Lett 145:157–164. https://doi.org/10.1016/j.patrec.2021.02.008

    Article  Google Scholar 

  38. Song Y, Hu W, He L (2019) Using fractional latent topic to enhance recurrent neural network in text similarity modeling. DASFAA

  39. Viswanathan S, Damodaran N, Simon A, George A, Kumar MA, Soman K (2019) Detection of duplicates in Quora and Twitter corpus. Advances in big data and cloud computing. ed: Springer, Berlin, pp 519–528

    Book  Google Scholar 

  40. Wang C, Ge S, Jiang Z et al (2021) SiamFuseNet: a pseudo-siamese network for detritus detection from polarized microscopic images of river sands. Comput Geosci 156:104912. https://doi.org/10.1016/j.cageo.2021.104912

    Article  Google Scholar 

  41. Wang Q, Li B, Xiao T et al (2019) Learning deep transformer models for machine translation. arXiv:190601787 [cs]

  42. Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia, pp 4144–4150

  43. Wang Y, Di X, Li J et al (2018) Sentence similarity learning method based on attention hybrid model. J Phys: Conf Ser 1069:012119. https://doi.org/10.1088/1742-6596/1069/1/012119

    Article  Google Scholar 

  44. Wu Z, Nguyen T-S, Ong DC (2020) Structured self-attention weights encode semantics in sentiment analysis. arXiv:201004922 [cs]

  45. Zhu W, Yao T, Ni J et al (2018) Dependency-based siamese long short-term memory network for learning sentence representations. PLoS ONE 13:e0193919. https://doi.org/10.1371/journal.pone.0193919

    Article  Google Scholar 

  46. Zhu X, Guo K, Fang H et al (2021) Cross view capture for stereo image super-resolution. IEEE Trans Multimedia 1–1. https://doi.org/10.1109/TMM.2021.3092571

  47. Zhu X, Guo K, Ren S et al (2022) Lightweight image Super-Resolution with expectation-maximization attention mechanism. IEEE Trans Circuits Syst Video Technol 32:1273–1284. https://doi.org/10.1109/TCSVT.2021.3078436

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by Key-Area Research and Development Program of Guangdong Province under Grant 2019B010153002, Key Program of NSFC-Guangdong Joint Funds under Grant U1701262 and U1801263, National Natural Science Foundation of China under Grant 62002071 and Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruihao Li.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, R., Cheng, L., Wang, D. et al. Siamese BERT Architecture Model with attention mechanism for Textual Semantic Similarity. Multimed Tools Appl 82, 46673–46694 (2023). https://doi.org/10.1007/s11042-023-15509-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15509-4

Keywords

Navigation