Abstract
Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: 5th International Conference on Learning Representations, ICLR, Conference Track Proceedings. OpenReview.net, Toulon, France (2017)
Zhou, W., Ge, T., Wei, F., Zhou, M., Xu, k.: Scheduled DropHead: a regularization method for transformer models. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 1971–1980. Association for Computational Linguistics, Online (2020)
Anaby-Tavor, A., et al.: Not enough data? deep learning to the rescue! In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, pp. 7383–7390 (2020)
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. In: arXiv:1907.11692 (2019)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T-Y.: MPNet: masked and permuted pre-training for language understanding. In: arXiv:2004.09297v2 (2020)
Su, J., Cao, J., Liu, W., Ou, Y.: Whitening sentence representations for better semantics and faster retrieval. In: arXiv:2103.15316 (2021)
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 6849–6910 (2021)
Reimers, N., Gurevych, I.: SentenceBERT: Sentence embeddings using Siamese BERTnetworks. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 3980–3990 (2019)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 1735–1742. IEEE, New York, NY, USA (2006)
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, ICML, pp. 9929–9939 (2020)
Rahutomo, F., Kitasuka, T., Aritsugi, M.: Semantic cosine similarity. In: The 7th International Student Conference on Advanced Science and Technology, ICAST, vol. 4 (2012)
Ethayarajh, K.: How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pp. 55–65. Association for Computational Linguistics, Hong Kong, China (2019)
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 9119–9130 (2020)
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Empirical Methods in Natural Language Processing, EMNLP, pp. 670–680 (2017)
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A.: SemEval-2012 task 6: a pilot on semantic textual similarity. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics, SemEval, vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics, Montreal, Canada (2012)
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: *SEM 2013 shared task: semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pp. 32–43. Association for Computational Linguistics, Atlanta, Ceorgia, USA (2013)
Agirre, E., et al.: SemEval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 81–91. Association for Computational Linguistics, Dublin, Ireland (2014)
Agirre, E.,et al.: SemEval-2015 task 2: semantic textual similarity, English, Spanish and pilot on interpretability. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2015), pp. 252–263. Association for Computational Linguistics, Denver, Colorado, USA (2015)
Agirre, E., et al.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 497–511. Association for Computational Linguistics, San Diego, California, USA (2016)
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval, pp. 1–14. Association for Computational Linguistics, Vancouver, Canada (2017)
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC2014, pp. 216–223. European Language Resources Association (ELRA), Reykjavik, Iceland (2014)
Zhang, X., Sun, H., Yue, X., Jesrani, E., Lin, S., Sun, H.: COUGH: a Challenge dataset and models for COVID-19 FAQ retrieval. In: arXiv:2010.12800v1 (2020)
Karan, M., Snajder, J.: Paraphrase-focused learning to rank for domain-specific frequently asked questions retrieval. Expert Syst. Appl. 91, 418–433 (2018)
Merikoski, J.K.: On the trace and the sum of elements of a matrix. Linear Algebra Appl. 60, 216–223. https://doi.org/10.1016/0024-3795(84)90078-8 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A A APPENDIX
A A APPENDIX
1.1 A.1 A Training Details
The Base model is \(SRoberta_{base}\) [8]. We download their weights from SentenceTransformers. \(SRoberta_{base}\) is trained on MNLI and SNLI by using the entailment pairs as positive instances and the contradiction pairs as hard negative instances.
For STS Tasks. We carry out the grid search of batch size \(\in \)[32,64] and \(\phi \in [[0.01],[0.01,0.02],[0.01,0.02,0.03],[0.01,0.02,0.03,0.04],[0.01,0.02,0.03,0.04,\)
0.05]]. We firstly fine-tune \(SRoberta_{base}\) with different \(\phi \) on MNLI and SNLI with Eq. 2, by using the entailment pairs as positive instances and the contradiction pairs as hard negative instances. The purpose of this step is to train several entropy models who can generate different semantic similar embedding for every sentence. Since the pre-trained model has already trained on these datasets, the training process will converge quickly. We use the early stopping method to quickly stop the training process if the loss doesn’t decrease within 3 steps. Then we continue to fine-tune the pre-trained \(SRoberta_{base}\), by taking the augmented embeddings and the positive instances and entailment instances of MNLI and SNLI into the contrastive objectives with the Eq. 3. We train this model for 1 epoch, evaluate it every 10% of samples on the development set of STS-B by Spearman-Correlation. When the batch size is 64 and \(\phi =[0.01,0.0,0.03,0.04]\), our model achieves the best accuracy.
For FAQ Datasets. We implement RCL on the same pre-trained model as above. And apply the same grid-search as above for batch size and \(\phi \). We firstly fine-tune an initial model based on the pre-trained \(SRoberta_{base}\) for several epochs with Eq. 1 and save the model who has the highest MAP value on the development dataset. Based on this initial model, we continue to fine-tune the entropy model with different \(\phi \) with Eq. 2. Finally, we fine-tune the final retrieval model based on the pre-trained \(SRoberta_{base}\) with the augmented embeddings for several epochs, saving the model who has the highest MAP value on the development dataset. When batch size is 64 and \(\phi =[0.01,0.02,0.03,0.04]\), the retrieval model achieves the best accuracy.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, M., Rolland, A., Tian, A. (2022). Regularized Contrastive Learning of Semantic Search. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-17120-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)