Regularized Contrastive Learning of Semantic Search

Tan, Mingxi; Rolland, Alexis; Tian, Andong

doi:10.1007/978-3-031-17120-8_10

Mingxi Tan¹¹,
Alexis Rolland¹¹ &
Andong Tian¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

Abstract

Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Text Semantic Matching Research Based on Parallel Dropout

ConIsI: A Contrastive Framework with Inter-sentence Interaction for Self-supervised Sentence Representation

Multi-task Learning Based Keywords Weighted Siamese Model for Semantic Retrieval

Notes

References

Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: 5th International Conference on Learning Representations, ICLR, Conference Track Proceedings. OpenReview.net, Toulon, France (2017)
Google Scholar
Zhou, W., Ge, T., Wei, F., Zhou, M., Xu, k.: Scheduled DropHead: a regularization method for transformer models. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 1971–1980. Association for Computational Linguistics, Online (2020)
Google Scholar
Anaby-Tavor, A., et al.: Not enough data? deep learning to the rescue! In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, pp. 7383–7390 (2020)
Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. In: arXiv:1907.11692 (2019)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T-Y.: MPNet: masked and permuted pre-training for language understanding. In: arXiv:2004.09297v2 (2020)
Su, J., Cao, J., Liu, W., Ou, Y.: Whitening sentence representations for better semantics and faster retrieval. In: arXiv:2103.15316 (2021)
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 6849–6910 (2021)
Google Scholar
Reimers, N., Gurevych, I.: SentenceBERT: Sentence embeddings using Siamese BERTnetworks. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 3980–3990 (2019)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 1735–1742. IEEE, New York, NY, USA (2006)
Google Scholar
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, ICML, pp. 9929–9939 (2020)
Google Scholar
Rahutomo, F., Kitasuka, T., Aritsugi, M.: Semantic cosine similarity. In: The 7th International Student Conference on Advanced Science and Technology, ICAST, vol. 4 (2012)
Google Scholar
Ethayarajh, K.: How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pp. 55–65. Association for Computational Linguistics, Hong Kong, China (2019)
Google Scholar
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 9119–9130 (2020)
Google Scholar
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 670–680 (2017)
Google Scholar
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: SemEval-2012 task 6: a pilot on semantic textual similarity. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics, SemEval, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics, Montreal, Canada (2012)
Google Scholar
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *SEM 2013 shared task: semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pp. 32–43. Association for Computational Linguistics, Atlanta, Ceorgia, USA (2013)
Google Scholar
Agirre, E., et al.: SemEval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91. Association for Computational Linguistics, Dublin, Ireland (2014)
Google Scholar
Agirre, E.,et al.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252–263. Association for Computational Linguistics, Denver, Colorado, USA (2015)
Google Scholar
Agirre, E., et al.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 497–511. Association for Computational Linguistics, San Diego, California, USA (2016)
Google Scholar
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval, pp. 1–14. Association for Computational Linguistics, Vancouver, Canada (2017)
Google Scholar
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC2014, pp. 216–223. European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Google Scholar
Zhang, X., Sun, H., Yue, X., Jesrani, E., Lin, S., Sun, H.: COUGH: a Challenge dataset and models for COVID-19 FAQ retrieval. In: arXiv:2010.12800v1 (2020)
Karan, M., Snajder, J.: Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval. Expert Syst. Appl. 91, 418–433 (2018)
Article Google Scholar
Merikoski, J.K.: On the trace and the sum of elements of a matrix. Linear Algebra Appl. 60, 216–223. https://doi.org/10.1016/0024-3795(84)90078-8 (2014)

Download references

Author information

Authors and Affiliations

Ubisoft La Forge, Ubisoft, China
Mingxi Tan, Alexis Rolland & Andong Tian

Authors

Mingxi Tan
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Rolland
View author publications
You can also search for this author in PubMed Google Scholar
Andong Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingxi Tan .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Wei Lu
Nanjing University, Nanjing, China
Shujian Huang
Soochow University, Suzhou, China
Yu Hong
Soochow University, Soochow, China
Xiabing Zhou

A A APPENDIX

1.1 A.1 A Training Details

The Base model is $SRoberta_{base}$ [8]. We download their weights from SentenceTransformers. $SRoberta_{base}$ is trained on MNLI and SNLI by using the entailment pairs as positive instances and the contradiction pairs as hard negative instances.

For STS Tasks. We carry out the grid search of batch size $\in $[32,64] and $\phi \in [[0.01],[0.01,0.02],[0.01,0.02,0.03],[0.01,0.02,0.03,0.04],[0.01,0.02,0.03,0.04,$

0.05]]. We firstly fine-tune $SRoberta_{base}$ with different $\phi $ on MNLI and SNLI with Eq. 2, by using the entailment pairs as positive instances and the contradiction pairs as hard negative instances. The purpose of this step is to train several entropy models who can generate different semantic similar embedding for every sentence. Since the pre-trained model has already trained on these datasets, the training process will converge quickly. We use the early stopping method to quickly stop the training process if the loss doesn’t decrease within 3 steps. Then we continue to fine-tune the pre-trained $SRoberta_{base}$, by taking the augmented embeddings and the positive instances and entailment instances of MNLI and SNLI into the contrastive objectives with the Eq. 3. We train this model for 1 epoch, evaluate it every 10% of samples on the development set of STS-B by Spearman-Correlation. When the batch size is 64 and $\phi =[0.01,0.0,0.03,0.04]$, our model achieves the best accuracy.

For FAQ Datasets. We implement RCL on the same pre-trained model as above. And apply the same grid-search as above for batch size and $\phi $. We firstly fine-tune an initial model based on the pre-trained $SRoberta_{base}$ for several epochs with Eq. 1 and save the model who has the highest MAP value on the development dataset. Based on this initial model, we continue to fine-tune the entropy model with different $\phi $ with Eq. 2. Finally, we fine-tune the final retrieval model based on the pre-trained $SRoberta_{base}$ with the augmented embeddings for several epochs, saving the model who has the highest MAP value on the development dataset. When batch size is 64 and $\phi =[0.01,0.02,0.03,0.04]$, the retrieval model achieves the best accuracy.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, M., Rolland, A., Tian, A. (2022). Regularized Contrastive Learning of Semantic Search. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-17120-8_10
Published: 24 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Regularized Contrastive Learning of Semantic Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Text Semantic Matching Research Based on Parallel Dropout

ConIsI: A Contrastive Framework with Inter-sentence Interaction for Self-supervised Sentence Representation

Multi-task Learning Based Keywords Weighted Siamese Model for Semantic Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A A APPENDIX

A A APPENDIX

1.1 A.1 A Training Details

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships