research-article

DualCL: Principled Supervised Contrastive Learning as Mutual Information Maximization for Text Classification

Authors:

Yongyi MaoAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 4362 - 4371

https://doi.org/10.1145/3589334.3645668

Published: 13 May 2024 Publication History

Abstract

Text classification is a fundamental task in web content mining. Although the existing supervised contrastive learning (SCL) approach combined with pre-trained language models (PLMs) has achieved leading performance in text classification, it lacks fundamental principles. Theoretically motivated by a derived lower bound of mutual information maximization, we propose a dual contrastive learning framework DualCL that satisfies three properties, i.e., parameter-free, augmentation-easy and label-aware. DualCL generates classifier parameters from the PLM and simultaneously uses them for classification and as augmented views of the input text for supervised contrastive learning. Extensive experiments conclusively demonstrate that DualCL excels in learning superior text representations and consistently outperforms baseline models.

Supplemental Material

MP4 File

Supplemental video

Download
27.32 MB

References

[1]

Junfan Chen, Richong Zhang, Yongyi Mao, and Jie Xu. 2022. ContrastNet: A Contrastive Learning Framework for Few-Shot Text Classification. In AAAI. 10492--10500.

[2]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML. 1597--1607.

[3]

Jianfeng Chi, William Shand, Yaodong Yu, Kai-Wei Chang, Han Zhao, and Yuan Tian. 2022. Conditional Supervised Contrastive Learning for Fair Text Classification. In Findings of the Association for Computational Linguistics: EMNLP. 2736--2756.

[4]

Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Workshop of NeurIPS.

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.

[6]

Xiaowen Ding, Bing Liu, and Philip S Yu. 2008. A holistic lexicon-based approach to opinion mining. In WSDM. 231--240.

[7]

Gamaleldin Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, and Samy Bengio. 2018. Large Margin Deep Networks for Classification. In NeurIPS, Vol. 31. 842--852.

[8]

Murthy Ganapathibhotla and Bing Liu. 2008. Mining opinions in comparative sentences. In COLING. 241--248.

[9]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021a. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In EMNLP. 6894--6910.

[10]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021b. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In EMNLP.

[11]

Beliz Gunel, Jingfei Du, Alexis Conneau, and Veselin Stoyanov. 2021. Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning. In ICLR.

[12]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR. 1735--1742.

[13]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR. 9729--9738.

[14]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[15]

Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In ACL. 328--339.

[16]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. In NeurIPS, Vol. 33. 18661--18673.

[17]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. 1746--1751.

[18]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR.

[19]

Xin Li and Dan Roth. 2002. Learning question classifiers. In COLING.

[20]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint (2019).

[21]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint (2018).

[22]

Bo Pang and Lillian Lee. 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In ACL.

[23]

Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, and Jun Zhu. 2019. Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness. In ICLR.

[24]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Preprint (2018).

[25]

Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng. 2013. Parsing with compositional vector grammars. In ACL. 455--465.

[26]

Xi'ao Su, Ran Wang, and Xinyu Dai. 2022. Contrastive Learning-Enhanced Nearest Neighbor Mechanism for Multi-Label Text Classification. In ACL. 672--679.

[27]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998--6008.

[28]

Ivan Vulic, I n igo Casanueva, Georgios Spithourakis, Avishek Mondal, Tsung-Hsien Wen, and Pawel Budzianowski. 2022. Multi-Label Intent Detection via Contrastive Task Specialization of Sentence Encoders. In EMNLP. 7544--7559.

[29]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR.

[30]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In NeurIPS, Vol. 32.

[31]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In NeurIPS, Vol. 28. 649--657.

[32]

Yanzhao Zhang, Richong Zhang, Samuel Mensah, Xudong Liu, and Yongyi Mao. 2022a. Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives. In AAAI. 11730--11738.

[33]

Zhilu Zhang and Mert R Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In NeurIPS. 8792--8802.

[34]

Zhenyu Zhang, Yuming Zhao, Meng Chen, and Xiaodong He. 2022b. Label Anchored Contrastive Learning for Language Understanding. In NAACL. 1437--1449.

[35]

Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, and Yi Yang. 2023. WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings. In ACL. 12135--12148. io

Cited By

Wu YWan J(2025)A survey of text classification based on pre-trained language modelNeurocomputing10.1016/j.neucom.2024.128921616(128921)Online publication date: Feb-2025
https://doi.org/10.1016/j.neucom.2024.128921

Index Terms

DualCL: Principled Supervised Contrastive Learning as Mutual Information Maximization for Text Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

SPContrastNet: A Self-Paced Contrastive Learning Model for Few-Shot Text Classification
Meta-learning has recently promoted few-shot text classification, which identifies target classes based on information transferred from source classes through a series of small tasks or episodes. Existing works constructing their meta-learner on ...
A Simple Semi-Supervised Joint Learning Framework for Few-shot Text Classification
AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

The lack of labeled data is the bottleneck restricting deep text classification algorithm. State-of-the-art for most existing deep text classification methods follow the two-step transfer learning paradigm: pre-training a large model on an auxiliary task,...
Few-Shot Classification with Contrastive Learning
Computer Vision – ECCV 2022
Abstract
A two-stage training paradigm consisting of sequential pre-training and meta-training stages has been widely used in current few-shot learning (FSL) research. Many of these methods use self-supervised learning and contrastive learning to achieve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
199
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)13

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu YWan J(2025)A survey of text classification based on pre-trained language modelNeurocomputing10.1016/j.neucom.2024.128921616(128921)Online publication date: Feb-2025
https://doi.org/10.1016/j.neucom.2024.128921

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten