Skip to main content

Advertisement

Log in

GenerCTC: a general two-stage contrastive training framework for text classification

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Contrastive learning methods have been widely applied to text classification. However, the training objective of traditional contrastive learning is different from that of classification because introducing contrastive learning may weaken clustering structures in the latent space. Simultaneously, employing cross-entropy loss in classification training results in learning only discrete features and limits the model’s learning ability. This paper proposes a general two-stage contrastive training framework integrating a novel contrastive learning and classification training method. Meanwhile, we introduce a constraint term for the training objective of contrastive learning to enable the model to learn appropriate levels of instance-level knowledge, which facilitates subsequent classification tasks. Furthermore, In the classification training stage, we present a novel classification loss function that concentrates on correct class predictions and strengthens the associations between incorrect and correct classes, thus enabling the model to learn continuous features and better explore the implicit relationships between classes. In addition, we also optimize the weight initialization of the classifier to improve classification performance. Our experiments on various text classification benchmarks and some challenging few-shot classification tasks demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets generated or analyzed during this study are available in the https://github.com/alexa/dialoglue and https://git.uwaterloo.ca/jimmylin/Castor-data/-/tree/master/datasets. In addition, the source codes for this study are available in the https://github.com/shuaizujiaofu/GenerCTC/tree/main

References

  1. Yan Y, Li R, Wang S, et al (2021) Consert: A contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 5065–5075

  2. Gao T, Yao X, Chen D (2021) Simcse: Simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Association for Computational Linguistics (ACL), pp 6894–6910

  3. Chen J, Zhang R, Mao Y, et al (2022) Contrastnet: A contrastive learning framework for few-shot text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10492–10500

  4. Jia O, Huang H, Ren J et al (2023) Contrastive learning with text augmentation for text classification. Appl Intell 56(16):19522–19531

    Article  Google Scholar 

  5. Khosla P, Teterwak P, Wang C et al (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673

    Google Scholar 

  6. Wang A, Singh A, Michael J, et al (2019) Glue: A multi-task benchmark and analysis platform for natural language understanding. In: 7th International Conference on Learning Representations, ICLR 2019

  7. Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186

  8. Zhang CB, Jiang PT, Hou Q et al (2021) Delving deep into label smoothing. IEEE Trans Image Process 30:5984–5996

    Article  Google Scholar 

  9. Wang DB, Wen Y, Pan L, et al (2021) Learning from noisy labels with complementary loss functions. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10111–10119

  10. Ji H, Ke P, Hu Z, et al (2023) Tailoring language generation models under total variation distance. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net

  11. Cui Z, Wang Q, Guo J et al (2022) Few-shot classification of façade defects based on extensible classifier and contrastive learning. Autom Constr 141:104381

    Article  Google Scholar 

  12. Tian R, Shi H (2022) A transfer-based few-shot classification approach via masked manifold mixup and fuzzy memory contrastive learning. Neural Comput Appl 35(14):10069–10082

    Article  Google Scholar 

  13. Zheng Z, Feng X, Yu H et al (2023) Unsupervised few-shot image classification via one-vs-all contrastive learning. Appl Intell 53(7):7833–7847

    Article  Google Scholar 

  14. Tian R, Shi H (2023) Momentum memory contrastive learning for transfer-based few-shot classification. Appl Intell 53(1):864–878

    Article  Google Scholar 

  15. Gunel B, Du J, Conneau A, et al (2021) Supervised contrastive learning for pre-trained language model fine-tuning. In: International Conference on Learning Representations

  16. Bommes L, Hoffmann M, Buerhop-Lutz C et al (2022) Anomaly detection in ir images of pv modules using supervised contrastive learning. Prog Photovoltaics Res Appl 30(6):597–614

    Article  Google Scholar 

  17. Mou Y, Xu H (2023) Bridge pre-training and clustering: A unified contrastive learning framework for ood intent discovery. IEEE Access

  18. Mou Y, He K, Wu Y, et al (2022) Disentangled knowledge transfer for ood intent discovery with unified contrastive learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 46–53

  19. Pan L, Hang CW, Sil A, et al (2022) Improved text classification via contrastive adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 11130–11138

  20. Fang H, Xie P (2022) An end-to-end contrastive self-supervised learning framework for language understanding. Trans Associat Comput Linguist 10:1324–1340

    Article  Google Scholar 

  21. Chuang YS, Dangovski R, Luo H, et al (2022) Diffcse: Difference-based contrastive learning for sentence embeddings. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics

  22. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  23. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, pp 249–256

  24. He K, Zhang X, Ren S, et al (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034

  25. Fontes CH, Embiruçu M (2021) An approach combining a new weight initialization method and constructive algorithm to configure a single feedforward neural network for multi-class classification. Eng Appl Artif Intell 106:104495

    Article  Google Scholar 

  26. Narkhede MV, Bartakke PP, Sutaone MS (2022) A review on weight initialization strategies for neural networks. Artif Intell Rev 55(1):291–322

    Article  Google Scholar 

  27. Lin TY, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988

  28. Zhang X, Lu W, Pan Y et al (2021) Empirical study on tangent loss function for classification with deep neural networks. Comput Electri Eng 90:107000

    Article  Google Scholar 

  29. Li B, Zhou H, He J, et al (2020) On the sentence embeddings from pre-trained language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 9119–9130

  30. Wu X, Gao C, Zang L, et al (2022) Esimcse: Enhanced sample building method for contrastive learning of unsupervised sentence embedding. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 3898–3907

  31. Yang J, Zhang D, Frangi AF et al (2004) Two-dimensional pca: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137

    Article  Google Scholar 

  32. Casanueva I, Temčinas T, Gerz D, et al (2020) Efficient intent detection with dual sentence encoders. In: Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pp 38–45

  33. Liu X, Eshghi A, Swietojanski P, et al (2021) Benchmarking natural language understanding services for building conversational agents. In: Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, Springer, pp 165–183

  34. Larson S, Mahendran A, Peper JJ, et al (2019) An evaluation dataset for intent classification and out-of-scope prediction. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 1311–1316

  35. Yehudai A, Vetzler M, Mass Y, et al (2022) Qaid: Question answering inspired few-shot intent detection. In: The Eleventh International Conference on Learning Representations

  36. Debole F, Sebastiani F (2005) An analysis of the relative hardness of reuters-21578 subsets. J Am Soc Inform Sci Technol 56(6):584–596

    Article  Google Scholar 

  37. Yang P, Sun X, Li W, et al (2018) Sgm: Sequence generation model for multi-label classification. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 3915–3926

  38. Wang W, Wei F, Dong L et al (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv Neural Inf Process Syst 33:5776–5788

    Google Scholar 

  39. Wolf T, Debut L, Sanh V, et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38–45

  40. Pal A, Selvakumar M, Sankarasubbu M (2020) Magnet: Multi-label text classification using attention-based graph neural network. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, INSTICC. SciTePress, pp 494–505

  41. Liu H, Yuan C, Wang X (2020) Label-wise document pre-training for multi-label text classification. In: Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, Springer, pp 641–653

  42. Vulić I, Su PH, Coope S, et al (2021) Convfit: Conversational fine-tuning of pretrained language models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 1151–1168

  43. Mehri S, Eric M (2021) Example-driven intent prediction with observers. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 2979–2992

  44. Goodfellow IJ, Shlens J, Szegedy C (2015) Explain harness adv examples. stat 1050:20

    Google Scholar 

  45. Zhang J, Bui T, Yoon S, et al (2021) Few-shot intent detection via contrastive pre-training and fine-tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 1906–1912

  46. Zhang H, Liang H, Zhan LM et al (2023) Revisit few-shot intent classification with plms: Direct fine-tuning vs. continual pre-training. Find Associat Comput Linguist: ACL 2023:11105–11121

    Google Scholar 

  47. Liu N, Wang Q, Ren J (2021) Label-embedding bi-directional attentive model for multi-label text classification. Neural Process Lett 53:375–389

    Article  Google Scholar 

  48. Song R, Liu Z, Chen X et al (2023) Label prompt for multi-label text classification. Appl Intell 53(8):8761–8775

    Article  Google Scholar 

  49. Powell MJ (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput J 7(2):155–162

    Article  MathSciNet  Google Scholar 

  50. Zhang D, Nan F, Wei X, et al (2021) Supporting clustering with contrastive learning. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5419–5430

Download references

Funding

This work is supported by the Key Cooperation Project of the Chongqing Municipal Education Commission(Grant No. HZ2021008) and Research Project of Graduate Education and Teaching Reform of Chongqing Municipal Education Commission (Grant No. yjg223087).

Author information

Authors and Affiliations

Authors

Contributions

Jianjun Lei was cobtributed conceptualization, methodology, discussion, and writing–review and editing. Sida Chen was involved conceptualization, methodology, experiment, and writing–original draft. Ying Wang was pefformed conceptualization, methodology, investigation, data curation, and writing–review and editing.

Corresponding author

Correspondence to Jianjun Lei.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

The manuscript was reviewed and ethical approved for publication by all authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lei, J., Chen, S. & Wang, Y. GenerCTC: a general two-stage contrastive training framework for text classification. J Supercomput 81, 101 (2025). https://doi.org/10.1007/s11227-024-06628-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06628-2

Keywords