Text semantic matching with an enhanced sample building method based on contrastive learning

Wu, Lishan; Hu, Jie; Teng, Fei; Li, Tianrui; Du, Shengdong

doi:10.1007/s13042-023-01823-8

Text semantic matching with an enhanced sample building method based on contrastive learning

Original Article
Published: 23 March 2023

Volume 14, pages 3105–3112, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Lishan Wu¹,
Jie Hu ORCID: orcid.org/0000-0002-0587-380X¹,
Fei Teng¹,
Tianrui Li^1,2,3 &
…
Shengdong Du¹

396 Accesses
3 Citations
Explore all metrics

Abstract

Text semantic matching aims to determine whether two pieces of text point to the same semantic, which has been widely applied in clinical terminology standardization, recommendation systems, and other scenarios. Recently, many existing methods introduce the idea of contrast learning, to construct positive sample pairs and negative sample pairs for text semantic matching tasks. These methods first construct positive samples by using data augmentation and then use other samples within the same group as negative samples. However, the existing mainstream data enhancement methods like dropout ignore the impact of sentence length structure, and the implementation of the word repetition method is relatively complex. On the other hand, a sufficient number of negative samples is also crucial to the quality of model training. In this paper, we propose an enhanced sample building method (ESNCSE) to construct positive samples and negative samples for text semantic matching tasks. To generate positive sample pairs, we randomly insert some punctuation marks into the original text, which aims to add noise simply and efficiently. For the expansion of the number of negative samples without increasing calculation cost, we utilize the momentum contrast based on the sentence embedding method with soft negative sample (SNCSE). The experiment results on text semantic similarity task show that the average Spearman correlation coefficient is 79.74% for BERT-base and 80.64% for BERT-large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text semantic matching algorithm based on the introduction of external knowledge under contrastive learning

Article 24 July 2024

Text Semantic Matching Research Based on Parallel Dropout

SPCSE: Soft Positive Enhanced Contrastive Learning for Sentence Embeddings

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Yuejun S, Zhiqiang L, Zhihao Y (2021) Standardization of clinical terminology based on bert. Chin J Inf 35:47582
Google Scholar
Kenton JDM-WC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of annual conference of the North American chapter of the Association for computational linguistics: human language technologies, pp 4171–4186
Yan Y, Li R, Wang S, Zhang F, Xu W (2021) ConSERT: a contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers)
Wu X, Gao C, Zang L, Han J, Wang Z, Hu S (2021) ESimCSE: enhanced sample building method for contrastive learning of unsupervised sentence embedding. arXiv:2109.04380 (arXiv preprint)
Gao T, Yao X, Chen D (2021) SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 6894–6910
Karimi A, Rossi L, Prati A (2021) Aeda: an easier data augmentation technique for text classification. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 2748–2754
Wang H, Li Y, Huang Z, Dou Y, Kong L, Shao J (2022) SNCSE: contrastive learning for unsupervised sentence embedding with soft negative samples. arXiv:2201.05979 (arXiv preprint)
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (arXiv preprint)
Li B, Zhou H, He J, Wang M, Yang Y, Li L (2020) On the sentence embeddings from pre-trained language models. arXiv:2011.05864 (arXiv preprint)
Su J, Cao J, Liu W, Ou Y (2021) Whitening sentence representations for better semantics and faster retrieval. arXiv:2103.15316 (arXiv preprint)
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2021) Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv:2107.13586 (arXiv preprint)
Oord Avd, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv:1807.03748 (arXiv preprint)
Agirre E, Diab M, Cer D, Gonzalez-Agirre A (2012) SemEval-2012 Task 6: a pilot on semantic textual similarity. In: First joint conference on lexical and computational semantics (* SEM), volume 2: proceedings of the main conference and the shared task: semantic textual similarity, pp 385–393
Agirre E, Cer D, Diab M, Gonzalez-Agirre A, Guo W (2013) SEM 2013 shared task: semantic textual similarity. In: Second joint conference on lexical and computational semantics (* SEM), volume 1: proceedings of the main conference and the shared task: semantic textual similarity, pp 32–43
Agirre E, Banea C, Cardie C, Cer DM, Diab MT, Gonzalez-Agirre A, Guo W, Mihalcea R, Rigau G, Wiebe J(2014) SemEval-2014 task 10: multilingual semantic textual similarity. In: SemEval@ COLING, pp 81–91
Agirre E, Banea C, Cardie C, Cer D, Diab M, Gonzalez-Agirre A, Guo W, Lopez-Gazpio I, Maritxalar M, Mihalcea R (2015) SemEval-2015 Task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pp 252–263
Agirre E, Banea C, Cer D, Diab M, Gonzalez Agirre A, Mihalcea R, Rigau Claramunt G, Wiebe J (2016) SemEval-2016 Task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th international workshop on semantic evaluation; 2016 Jun 16–17; San Diego, CA. Stroudsburg (PA): ACL; 2016, pp 497-511. ACL (Association for Computational Linguistics)
Cer D, Diab M, Agirre E, Lopez-Gazpio I, Specia L (2017) SemEval-2017 Task 1: semantic textual similarity multilingual and cross-lingual focused evaluation. arXiv:1708.00055 (arXiv preprint)
Marelli M, Menini S, Baroni M, Bentivogli L, Bernardi R, Zamparelli R (2014) A sick cure for the evaluation of compositional distributional semantic models. In: Proceedings of the 9th international conference on language resources and evaluation (LREC’14), pp 216–223

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos: 62176221, 62272398, 62276215), and the key research and development program of Sichuan Province, China (Nos: 2023YFG0354, 2022YFH0020).

Author information

Authors and Affiliations

School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China
Lishan Wu, Jie Hu, Fei Teng, Tianrui Li & Shengdong Du
Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China
Tianrui Li
National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China
Tianrui Li

Authors

Lishan Wu
View author publications
You can also search for this author inPubMed Google Scholar
Jie Hu
View author publications
You can also search for this author inPubMed Google Scholar
Fei Teng
View author publications
You can also search for this author inPubMed Google Scholar
Tianrui Li
View author publications
You can also search for this author inPubMed Google Scholar
Shengdong Du
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jie Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, L., Hu, J., Teng, F. et al. Text semantic matching with an enhanced sample building method based on contrastive learning. Int. J. Mach. Learn. & Cyber. 14, 3105–3112 (2023). https://doi.org/10.1007/s13042-023-01823-8

Download citation

Received: 10 September 2022
Accepted: 14 March 2023
Published: 23 March 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s13042-023-01823-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text semantic matching with an enhanced sample building method based on contrastive learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text semantic matching algorithm based on the introduction of external knowledge under contrastive learning

Text Semantic Matching Research Based on Parallel Dropout

SPCSE: Soft Positive Enhanced Contrastive Learning for Sentence Embeddings

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now