Skip to main content

Regularized Contrastive Learning of Semantic Search

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2022)

Abstract

Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/sunlab-osu/covid-faq.

  2. 2.

    https://takelab.fer.hr/data/faqir/.

References

  1. Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: 5th International Conference on Learning Representations, ICLR, Conference Track Proceedings. OpenReview.net, Toulon, France (2017)

    Google Scholar 

  2. Zhou, W., Ge, T., Wei, F., Zhou, M., Xu, k.: Scheduled DropHead: a regularization method for transformer models. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 1971–1980. Association for Computational Linguistics, Online (2020)

    Google Scholar 

  3. Anaby-Tavor, A., et al.: Not enough data? deep learning to the rescue! In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, pp. 7383–7390 (2020)

    Google Scholar 

  4. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. In: arXiv:1907.11692 (2019)

  5. Song, K., Tan, X., Qin, T., Lu, J., Liu, T-Y.: MPNet: masked and permuted pre-training for language understanding. In: arXiv:2004.09297v2 (2020)

  6. Su, J., Cao, J., Liu, W., Ou, Y.: Whitening sentence representations for better semantics and faster retrieval. In: arXiv:2103.15316 (2021)

  7. Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 6849–6910 (2021)

    Google Scholar 

  8. Reimers, N., Gurevych, I.: SentenceBERT: Sentence embeddings using Siamese BERTnetworks. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 3980–3990 (2019)

    Google Scholar 

  9. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  10. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 1735–1742. IEEE, New York, NY, USA (2006)

    Google Scholar 

  11. Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, ICML, pp. 9929–9939 (2020)

    Google Scholar 

  12. Rahutomo, F., Kitasuka, T., Aritsugi, M.: Semantic cosine similarity. In: The 7th International Student Conference on Advanced Science and Technology, ICAST, vol. 4 (2012)

    Google Scholar 

  13. Ethayarajh, K.: How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pp. 55–65. Association for Computational Linguistics, Hong Kong, China (2019)

    Google Scholar 

  14. Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 9119–9130 (2020)

    Google Scholar 

  15. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 670–680 (2017)

    Google Scholar 

  16. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: SemEval-2012 task 6: a pilot on semantic textual similarity. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics, SemEval, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics, Montreal, Canada (2012)

    Google Scholar 

  17. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *SEM 2013 shared task: semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pp. 32–43. Association for Computational Linguistics, Atlanta, Ceorgia, USA (2013)

    Google Scholar 

  18. Agirre, E., et al.: SemEval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91. Association for Computational Linguistics, Dublin, Ireland (2014)

    Google Scholar 

  19. Agirre, E.,et al.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252–263. Association for Computational Linguistics, Denver, Colorado, USA (2015)

    Google Scholar 

  20. Agirre, E., et al.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 497–511. Association for Computational Linguistics, San Diego, California, USA (2016)

    Google Scholar 

  21. Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval, pp. 1–14. Association for Computational Linguistics, Vancouver, Canada (2017)

    Google Scholar 

  22. Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC2014, pp. 216–223. European Language Resources Association (ELRA), Reykjavik, Iceland (2014)

    Google Scholar 

  23. Zhang, X., Sun, H., Yue, X., Jesrani, E., Lin, S., Sun, H.: COUGH: a Challenge dataset and models for COVID-19 FAQ retrieval. In: arXiv:2010.12800v1 (2020)

  24. Karan, M., Snajder, J.: Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval. Expert Syst. Appl. 91, 418–433 (2018)

    Article  Google Scholar 

  25. Merikoski, J.K.: On the trace and the sum of elements of a matrix. Linear Algebra Appl. 60, 216–223. https://doi.org/10.1016/0024-3795(84)90078-8 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingxi Tan .

Editor information

Editors and Affiliations

A A APPENDIX

A A APPENDIX

1.1 A.1 A Training Details

The Base model is \(SRoberta_{base}\) [8]. We download their weights from SentenceTransformers. \(SRoberta_{base}\) is trained on MNLI and SNLI by using the entailment pairs as positive instances and the contradiction pairs as hard negative instances.

For STS Tasks. We carry out the grid search of batch size \(\in \)[32,64] and \(\phi \in [[0.01],[0.01,0.02],[0.01,0.02,0.03],[0.01,0.02,0.03,0.04],[0.01,0.02,0.03,0.04,\)

0.05]]. We firstly fine-tune \(SRoberta_{base}\) with different \(\phi \) on MNLI and SNLI with Eq. 2, by using the entailment pairs as positive instances and the contradiction pairs as hard negative instances. The purpose of this step is to train several entropy models who can generate different semantic similar embedding for every sentence. Since the pre-trained model has already trained on these datasets, the training process will converge quickly. We use the early stopping method to quickly stop the training process if the loss doesn’t decrease within 3 steps. Then we continue to fine-tune the pre-trained \(SRoberta_{base}\), by taking the augmented embeddings and the positive instances and entailment instances of MNLI and SNLI into the contrastive objectives with the Eq. 3. We train this model for 1 epoch, evaluate it every 10% of samples on the development set of STS-B by Spearman-Correlation. When the batch size is 64 and \(\phi =[0.01,0.0,0.03,0.04]\), our model achieves the best accuracy.

For FAQ Datasets. We implement RCL on the same pre-trained model as above. And apply the same grid-search as above for batch size and \(\phi \). We firstly fine-tune an initial model based on the pre-trained \(SRoberta_{base}\) for several epochs with Eq. 1 and save the model who has the highest MAP value on the development dataset. Based on this initial model, we continue to fine-tune the entropy model with different \(\phi \) with Eq. 2. Finally, we fine-tune the final retrieval model based on the pre-trained \(SRoberta_{base}\) with the augmented embeddings for several epochs, saving the model who has the highest MAP value on the development dataset. When batch size is 64 and \(\phi =[0.01,0.02,0.03,0.04]\), the retrieval model achieves the best accuracy.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tan, M., Rolland, A., Tian, A. (2022). Regularized Contrastive Learning of Semantic Search. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17120-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17119-2

  • Online ISBN: 978-3-031-17120-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics