Abstract
Contrastive learning methods have been widely applied to text classification. However, the training objective of traditional contrastive learning is different from that of classification because introducing contrastive learning may weaken clustering structures in the latent space. Simultaneously, employing cross-entropy loss in classification training results in learning only discrete features and limits the model’s learning ability. This paper proposes a general two-stage contrastive training framework integrating a novel contrastive learning and classification training method. Meanwhile, we introduce a constraint term for the training objective of contrastive learning to enable the model to learn appropriate levels of instance-level knowledge, which facilitates subsequent classification tasks. Furthermore, In the classification training stage, we present a novel classification loss function that concentrates on correct class predictions and strengthens the associations between incorrect and correct classes, thus enabling the model to learn continuous features and better explore the implicit relationships between classes. In addition, we also optimize the weight initialization of the classifier to improve classification performance. Our experiments on various text classification benchmarks and some challenging few-shot classification tasks demonstrate the effectiveness of our proposed method.





Similar content being viewed by others
Data availability
The datasets generated or analyzed during this study are available in the https://github.com/alexa/dialoglue and https://git.uwaterloo.ca/jimmylin/Castor-data/-/tree/master/datasets. In addition, the source codes for this study are available in the https://github.com/shuaizujiaofu/GenerCTC/tree/main
References
Yan Y, Li R, Wang S, et al (2021) Consert: A contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 5065–5075
Gao T, Yao X, Chen D (2021) Simcse: Simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Association for Computational Linguistics (ACL), pp 6894–6910
Chen J, Zhang R, Mao Y, et al (2022) Contrastnet: A contrastive learning framework for few-shot text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10492–10500
Jia O, Huang H, Ren J et al (2023) Contrastive learning with text augmentation for text classification. Appl Intell 56(16):19522–19531
Khosla P, Teterwak P, Wang C et al (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
Wang A, Singh A, Michael J, et al (2019) Glue: A multi-task benchmark and analysis platform for natural language understanding. In: 7th International Conference on Learning Representations, ICLR 2019
Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186
Zhang CB, Jiang PT, Hou Q et al (2021) Delving deep into label smoothing. IEEE Trans Image Process 30:5984–5996
Wang DB, Wen Y, Pan L, et al (2021) Learning from noisy labels with complementary loss functions. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10111–10119
Ji H, Ke P, Hu Z, et al (2023) Tailoring language generation models under total variation distance. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net
Cui Z, Wang Q, Guo J et al (2022) Few-shot classification of façade defects based on extensible classifier and contrastive learning. Autom Constr 141:104381
Tian R, Shi H (2022) A transfer-based few-shot classification approach via masked manifold mixup and fuzzy memory contrastive learning. Neural Comput Appl 35(14):10069–10082
Zheng Z, Feng X, Yu H et al (2023) Unsupervised few-shot image classification via one-vs-all contrastive learning. Appl Intell 53(7):7833–7847
Tian R, Shi H (2023) Momentum memory contrastive learning for transfer-based few-shot classification. Appl Intell 53(1):864–878
Gunel B, Du J, Conneau A, et al (2021) Supervised contrastive learning for pre-trained language model fine-tuning. In: International Conference on Learning Representations
Bommes L, Hoffmann M, Buerhop-Lutz C et al (2022) Anomaly detection in ir images of pv modules using supervised contrastive learning. Prog Photovoltaics Res Appl 30(6):597–614
Mou Y, Xu H (2023) Bridge pre-training and clustering: A unified contrastive learning framework for ood intent discovery. IEEE Access
Mou Y, He K, Wu Y, et al (2022) Disentangled knowledge transfer for ood intent discovery with unified contrastive learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 46–53
Pan L, Hang CW, Sil A, et al (2022) Improved text classification via contrastive adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 11130–11138
Fang H, Xie P (2022) An end-to-end contrastive self-supervised learning framework for language understanding. Trans Associat Comput Linguist 10:1324–1340
Chuang YS, Dangovski R, Luo H, et al (2022) Diffcse: Difference-based contrastive learning for sentence embeddings. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, pp 249–256
He K, Zhang X, Ren S, et al (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034
Fontes CH, Embiruçu M (2021) An approach combining a new weight initialization method and constructive algorithm to configure a single feedforward neural network for multi-class classification. Eng Appl Artif Intell 106:104495
Narkhede MV, Bartakke PP, Sutaone MS (2022) A review on weight initialization strategies for neural networks. Artif Intell Rev 55(1):291–322
Lin TY, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Zhang X, Lu W, Pan Y et al (2021) Empirical study on tangent loss function for classification with deep neural networks. Comput Electri Eng 90:107000
Li B, Zhou H, He J, et al (2020) On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 9119–9130
Wu X, Gao C, Zang L, et al (2022) Esimcse: Enhanced sample building method for contrastive learning of unsupervised sentence embedding. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 3898–3907
Yang J, Zhang D, Frangi AF et al (2004) Two-dimensional pca: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137
Casanueva I, Temčinas T, Gerz D, et al (2020) Efficient intent detection with dual sentence encoders. In: Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pp 38–45
Liu X, Eshghi A, Swietojanski P, et al (2021) Benchmarking natural language understanding services for building conversational agents. In: Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, Springer, pp 165–183
Larson S, Mahendran A, Peper JJ, et al (2019) An evaluation dataset for intent classification and out-of-scope prediction. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 1311–1316
Yehudai A, Vetzler M, Mass Y, et al (2022) Qaid: Question answering inspired few-shot intent detection. In: The Eleventh International Conference on Learning Representations
Debole F, Sebastiani F (2005) An analysis of the relative hardness of reuters-21578 subsets. J Am Soc Inform Sci Technol 56(6):584–596
Yang P, Sun X, Li W, et al (2018) Sgm: Sequence generation model for multi-label classification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 3915–3926
Wang W, Wei F, Dong L et al (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv Neural Inf Process Syst 33:5776–5788
Wolf T, Debut L, Sanh V, et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38–45
Pal A, Selvakumar M, Sankarasubbu M (2020) Magnet: Multi-label text classification using attention-based graph neural network. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, INSTICC. SciTePress, pp 494–505
Liu H, Yuan C, Wang X (2020) Label-wise document pre-training for multi-label text classification. In: Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, Springer, pp 641–653
Vulić I, Su PH, Coope S, et al (2021) Convfit: Conversational fine-tuning of pretrained language models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 1151–1168
Mehri S, Eric M (2021) Example-driven intent prediction with observers. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 2979–2992
Goodfellow IJ, Shlens J, Szegedy C (2015) Explain harness adv examples. stat 1050:20
Zhang J, Bui T, Yoon S, et al (2021) Few-shot intent detection via contrastive pre-training and fine-tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 1906–1912
Zhang H, Liang H, Zhan LM et al (2023) Revisit few-shot intent classification with plms: Direct fine-tuning vs. continual pre-training. Find Associat Comput Linguist: ACL 2023:11105–11121
Liu N, Wang Q, Ren J (2021) Label-embedding bi-directional attentive model for multi-label text classification. Neural Process Lett 53:375–389
Song R, Liu Z, Chen X et al (2023) Label prompt for multi-label text classification. Appl Intell 53(8):8761–8775
Powell MJ (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput J 7(2):155–162
Zhang D, Nan F, Wei X, et al (2021) Supporting clustering with contrastive learning. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5419–5430
Funding
This work is supported by the Key Cooperation Project of the Chongqing Municipal Education Commission(Grant No. HZ2021008) and Research Project of Graduate Education and Teaching Reform of Chongqing Municipal Education Commission (Grant No. yjg223087).
Author information
Authors and Affiliations
Contributions
Jianjun Lei was cobtributed conceptualization, methodology, discussion, and writing–review and editing. Sida Chen was involved conceptualization, methodology, experiment, and writing–original draft. Ying Wang was pefformed conceptualization, methodology, investigation, data curation, and writing–review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical approval
The manuscript was reviewed and ethical approved for publication by all authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lei, J., Chen, S. & Wang, Y. GenerCTC: a general two-stage contrastive training framework for text classification. J Supercomput 81, 101 (2025). https://doi.org/10.1007/s11227-024-06628-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06628-2