GenerCTC: a general two-stage contrastive training framework for text classification

Lei, Jianjun; Chen, Sida; Wang, Ying

doi:10.1007/s11227-024-06628-2

GenerCTC: a general two-stage contrastive training framework for text classification

Published: 29 October 2024

Volume 81, article number 101, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jianjun Lei¹,
Sida Chen¹ &
Ying Wang¹

142 Accesses
Explore all metrics

Abstract

Contrastive learning methods have been widely applied to text classification. However, the training objective of traditional contrastive learning is different from that of classification because introducing contrastive learning may weaken clustering structures in the latent space. Simultaneously, employing cross-entropy loss in classification training results in learning only discrete features and limits the model’s learning ability. This paper proposes a general two-stage contrastive training framework integrating a novel contrastive learning and classification training method. Meanwhile, we introduce a constraint term for the training objective of contrastive learning to enable the model to learn appropriate levels of instance-level knowledge, which facilitates subsequent classification tasks. Furthermore, In the classification training stage, we present a novel classification loss function that concentrates on correct class predictions and strengthens the associations between incorrect and correct classes, thus enabling the model to learn continuous features and better explore the implicit relationships between classes. In addition, we also optimize the weight initialization of the classifier to improve classification performance. Our experiments on various text classification benchmarks and some challenging few-shot classification tasks demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Contrastive Learning for Long-Tailed Multi-label Text Classification

SimSCL: A Simple Fully-Supervised Contrastive Learning Framework for Text Representation

Adaptable Focal Loss for Imbalanced Text Classification

Data availability

The datasets generated or analyzed during this study are available in the https://github.com/alexa/dialoglue and https://git.uwaterloo.ca/jimmylin/Castor-data/-/tree/master/datasets. In addition, the source codes for this study are available in the https://github.com/shuaizujiaofu/GenerCTC/tree/main

References

Yan Y, Li R, Wang S, et al (2021) Consert: A contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 5065–5075
Gao T, Yao X, Chen D (2021) Simcse: Simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Association for Computational Linguistics (ACL), pp 6894–6910
Chen J, Zhang R, Mao Y, et al (2022) Contrastnet: A contrastive learning framework for few-shot text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10492–10500
Jia O, Huang H, Ren J et al (2023) Contrastive learning with text augmentation for text classification. Appl Intell 56(16):19522–19531
Article Google Scholar
Khosla P, Teterwak P, Wang C et al (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
Google Scholar
Wang A, Singh A, Michael J, et al (2019) Glue: A multi-task benchmark and analysis platform for natural language understanding. In: 7th International Conference on Learning Representations, ICLR 2019
Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186
Zhang CB, Jiang PT, Hou Q et al (2021) Delving deep into label smoothing. IEEE Trans Image Process 30:5984–5996
Article Google Scholar
Wang DB, Wen Y, Pan L, et al (2021) Learning from noisy labels with complementary loss functions. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10111–10119
Ji H, Ke P, Hu Z, et al (2023) Tailoring language generation models under total variation distance. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net
Cui Z, Wang Q, Guo J et al (2022) Few-shot classification of façade defects based on extensible classifier and contrastive learning. Autom Constr 141:104381
Article Google Scholar
Tian R, Shi H (2022) A transfer-based few-shot classification approach via masked manifold mixup and fuzzy memory contrastive learning. Neural Comput Appl 35(14):10069–10082
Article Google Scholar
Zheng Z, Feng X, Yu H et al (2023) Unsupervised few-shot image classification via one-vs-all contrastive learning. Appl Intell 53(7):7833–7847
Article Google Scholar
Tian R, Shi H (2023) Momentum memory contrastive learning for transfer-based few-shot classification. Appl Intell 53(1):864–878
Article Google Scholar
Gunel B, Du J, Conneau A, et al (2021) Supervised contrastive learning for pre-trained language model fine-tuning. In: International Conference on Learning Representations
Bommes L, Hoffmann M, Buerhop-Lutz C et al (2022) Anomaly detection in ir images of pv modules using supervised contrastive learning. Prog Photovoltaics Res Appl 30(6):597–614
Article Google Scholar
Mou Y, Xu H (2023) Bridge pre-training and clustering: A unified contrastive learning framework for ood intent discovery. IEEE Access
Mou Y, He K, Wu Y, et al (2022) Disentangled knowledge transfer for ood intent discovery with unified contrastive learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 46–53
Pan L, Hang CW, Sil A, et al (2022) Improved text classification via contrastive adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 11130–11138
Fang H, Xie P (2022) An end-to-end contrastive self-supervised learning framework for language understanding. Trans Associat Comput Linguist 10:1324–1340
Article Google Scholar
Chuang YS, Dangovski R, Luo H, et al (2022) Diffcse: Difference-based contrastive learning for sentence embeddings. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, pp 249–256
He K, Zhang X, Ren S, et al (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034
Fontes CH, Embiruçu M (2021) An approach combining a new weight initialization method and constructive algorithm to configure a single feedforward neural network for multi-class classification. Eng Appl Artif Intell 106:104495
Article Google Scholar
Narkhede MV, Bartakke PP, Sutaone MS (2022) A review on weight initialization strategies for neural networks. Artif Intell Rev 55(1):291–322
Article Google Scholar
Lin TY, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Zhang X, Lu W, Pan Y et al (2021) Empirical study on tangent loss function for classification with deep neural networks. Comput Electri Eng 90:107000
Article Google Scholar
Li B, Zhou H, He J, et al (2020) On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 9119–9130
Wu X, Gao C, Zang L, et al (2022) Esimcse: Enhanced sample building method for contrastive learning of unsupervised sentence embedding. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 3898–3907
Yang J, Zhang D, Frangi AF et al (2004) Two-dimensional pca: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137
Article Google Scholar
Casanueva I, Temčinas T, Gerz D, et al (2020) Efficient intent detection with dual sentence encoders. In: Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pp 38–45
Liu X, Eshghi A, Swietojanski P, et al (2021) Benchmarking natural language understanding services for building conversational agents. In: Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, Springer, pp 165–183
Larson S, Mahendran A, Peper JJ, et al (2019) An evaluation dataset for intent classification and out-of-scope prediction. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 1311–1316
Yehudai A, Vetzler M, Mass Y, et al (2022) Qaid: Question answering inspired few-shot intent detection. In: The Eleventh International Conference on Learning Representations
Debole F, Sebastiani F (2005) An analysis of the relative hardness of reuters-21578 subsets. J Am Soc Inform Sci Technol 56(6):584–596
Article Google Scholar
Yang P, Sun X, Li W, et al (2018) Sgm: Sequence generation model for multi-label classification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 3915–3926
Wang W, Wei F, Dong L et al (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv Neural Inf Process Syst 33:5776–5788
Google Scholar
Wolf T, Debut L, Sanh V, et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38–45
Pal A, Selvakumar M, Sankarasubbu M (2020) Magnet: Multi-label text classification using attention-based graph neural network. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, INSTICC. SciTePress, pp 494–505
Liu H, Yuan C, Wang X (2020) Label-wise document pre-training for multi-label text classification. In: Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, Springer, pp 641–653
Vulić I, Su PH, Coope S, et al (2021) Convfit: Conversational fine-tuning of pretrained language models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 1151–1168
Mehri S, Eric M (2021) Example-driven intent prediction with observers. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 2979–2992
Goodfellow IJ, Shlens J, Szegedy C (2015) Explain harness adv examples. stat 1050:20
Google Scholar
Zhang J, Bui T, Yoon S, et al (2021) Few-shot intent detection via contrastive pre-training and fine-tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 1906–1912
Zhang H, Liang H, Zhan LM et al (2023) Revisit few-shot intent classification with plms: Direct fine-tuning vs. continual pre-training. Find Associat Comput Linguist: ACL 2023:11105–11121
Google Scholar
Liu N, Wang Q, Ren J (2021) Label-embedding bi-directional attentive model for multi-label text classification. Neural Process Lett 53:375–389
Article Google Scholar
Song R, Liu Z, Chen X et al (2023) Label prompt for multi-label text classification. Appl Intell 53(8):8761–8775
Article Google Scholar
Powell MJ (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput J 7(2):155–162
Article MathSciNet Google Scholar
Zhang D, Nan F, Wei X, et al (2021) Supporting clustering with contrastive learning. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5419–5430

Download references

Funding

This work is supported by the Key Cooperation Project of the Chongqing Municipal Education Commission(Grant No. HZ2021008) and Research Project of Graduate Education and Teaching Reform of Chongqing Municipal Education Commission (Grant No. yjg223087).

Author information

Authors and Affiliations

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Jianjun Lei, Sida Chen & Ying Wang

Authors

Jianjun Lei
View author publications
You can also search for this author inPubMed Google Scholar
Sida Chen
View author publications
You can also search for this author inPubMed Google Scholar
Ying Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jianjun Lei was cobtributed conceptualization, methodology, discussion, and writing–review and editing. Sida Chen was involved conceptualization, methodology, experiment, and writing–original draft. Ying Wang was pefformed conceptualization, methodology, investigation, data curation, and writing–review and editing.

Corresponding author

Correspondence to Jianjun Lei.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

The manuscript was reviewed and ethical approved for publication by all authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lei, J., Chen, S. & Wang, Y. GenerCTC: a general two-stage contrastive training framework for text classification. J Supercomput 81, 101 (2025). https://doi.org/10.1007/s11227-024-06628-2

Download citation

Accepted: 16 October 2024
Published: 29 October 2024
DOI: https://doi.org/10.1007/s11227-024-06628-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GenerCTC: a general two-stage contrastive training framework for text classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Contrastive Learning for Long-Tailed Multi-label Text Classification

SimSCL: A Simple Fully-Supervised Contrastive Learning Framework for Text Representation

Adaptable Focal Loss for Imbalanced Text Classification

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now