Abstract
Various contrastive learning models have been successfully applied to representation learning for downstream tasks. The positive samples used in contrastive learning are often derived from augmented data, which improve the performance of many computer vision tasks while still not being fully utilized for natural language processing tasks, such as text classification. The existing data augmentation methods have been rarely applied to contrastive learning in the field of NLP. In this paper, we propose a Text Augmentation Contrastive Learning Representation model, TACLR, that combines the easy text augmentation techniques (i.e., synonym replacement, random insertion, random swap and random deletion) and textMixup augmentation method with contrastive learning for text classification task. Furthermore, we propose a unified method that allows flexibly adapting supervised, semi-supervised and unsupervised learning. Experimental results on five text classification datasets show that our TACLR can significantly improve text classification accuracies. We also provide extensive ablation studies for exploring the validity of each component of our model. The source code of our work is publicly available from https://gitlab.com/models-for-paper/taclr.
Similar content being viewed by others
References
Kim Y (2015) Convolutional neural networks for sentence classification. arXiv:1408.5882
Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. CoRR arXiv:1807.03748
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. arXiv:2002.05709
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. arXiv:2003.04297
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 297–304
Wang WY, Yang D (2015) That’s so annoying!!!: a lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets. In: Proceedings of the 2015 conference on empirical methods in natural language processing, association for computational linguistics, lisbon, portugal, pp 2557–2563. https://doi.org/10.18653/v1/D15-1306. https://www.aclweb.org/anthology/D15-1306
Buckchash H, Raman B (2020) Dutrinet: dual-stream triplet siamese network for self-supervised action recognition by modeling temporal correlations. In: 2020 IEEE 32nd international conference on tools with artificial intelligence, ICTAI. IEEE, pp 488-495
Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA (2020) Albumentations: fast and flexible image augmentations. Information vol 11(2). https://doi.org/10.3390/info11020125https://www.mdpi.com/2078-2489/11/2/125
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: beyond empirical risk minimization. arXiv:1710.09412
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Wang X, Qi G-J (2021) Contrastive learning with stronger augmentations. https://openreview.net/forum?id=KJSC_AsN14
Bachman P, Hjelm RD, Buchwalter W (2019) Learning representations by maximizing mutual information across views. arXiv:1906.00910
Harris E, Marcu A, Painter M, Niranjan M, Hare AP-BJ (2020) Fmix: enhancing mixed sample data augmentation, vol 2(3), p 4 . arXiv:2002.12047
Zhang X, Wang Q, Zhang J, Zhong Z (2019) Adversarial autoaugment. arXiv:1912.11188
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2018) Autoaugment: learning augmentation policies from data. arXiv:1805.09501
He X, Zhao K, Chu X (2019) Automl: a survey of the state-of-the-art. arXiv:1908.00709 1
Wei J, Zou K (2019) Eda: easy data augmentation techniques for boosting performance on text classification tasks. arXiv:1901.11196
Cheng Y, Tu Z, Meng F, Zhai J, Liu Y (2018) Towards robust neural machine translation. arXiv:1805.06130
Guo H, Mao Y, Zhang R (2019) Augmenting data with mixup for sentence classification: an empirical study. CoRR arXiv:1905.08941
Guo Z, Liu Z, Ling Z, Wang S, Jin L, Li Y (2020) Text classification by contrastive learning and cross-lingual data augmentation for alzheimer’s disease detection. In: Proceedings of the 28th international conference on computational linguistics, pp 6161–6171
Wang Z, Wang P, Huang L, Sun X, Wang H (2022) Incorporating hierarchy into text encoder : a contrastive learning approach for hierarchical text classification. arXiv:2203.03825
Chen Q, Zhang R, Zheng Y, Mao Y (2022) Dual contrastive learning: text classification via label-aware data augmentation. arXiv:2201.08702
Sennrich R, Haddow B, Birch A (2015) Improving neural machine translation models with monolingual data. arXiv:1511.06709
Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2019) Unsupervised data augmentation for consistency training. arXiv:1904.12848
Wang WY, Yang D (2015) That’s so annoying!!!: a lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using# petpeeve tweets. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2557–2563
Wu X, Lv S, Zang L, Han J, Hu S (2019) Conditional BERT contextual augmentation. CoRR arXiv:1812.06705
Qu Y, Shen D, Shen Y, Sajeev S, Chen W, Han J (2021) Co{da}: contrast-enhanced and diversity-promoting data augmentation for natural language understanding. In: International conference on learning representations. https://openreview.net/forum?id=Ozk9MrX1hvA
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, IEEE, pp 1735–1742
Oord Avd, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv:1807.03748
Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv:1808.06670
Wu Z, Xiong Y, Yu S, Lin D (2018) Unsupervised feature learning via non-parametric instance-level discrimination. arXiv:1805.01978
He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. CoRR arXiv:1911.05722
Kim S, Lee G, Bae S, Yun S-Y (2020) Mixco: mix-up contrastive learning for visual representation. arXiv:2010.06300
Giorgi JM, Nitski O, Bader GD, Wang B (2020) Declutr: deep contrastive learning for unsupervised textual representations. arXiv:2006.03659
Miyato T, Dai AM, Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. arXiv:1605.07725
Johnson R, Zhang T (2015) Semi-supervised convolutional neural networks for text categorization via region embedding. Adv Neural Inf Process Syst 28:919
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Computat 18(7):1527–1554
Maaløe L, Sønderby CK, Sønderby SK, Winther O (2016) Auxiliary deep generative models. In: International conference on machine learning, PMLR, pp 1445–1453
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Liu Q, Gao Z, Liu B (2015) Automated rule selection for aspect extraction in opinion mining. In: Twenty-fourth international joint conference on artificial intelligence
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. arXiv:0506075
Socher R, Bauer J, Manning CD, Ng AY (2013) Parsing with compositional vector grammars. In: Proceedings of the 51st annual meeting of the association for computational linguistics (vol 1: long papers), pp 455–465
Li X, Roth D (2002) Learning question classifiers. In: COLING 2002: the 19th international conference on computational linguistics
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. arXiv:0409058
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv:1802.05365
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jia, O., Huang, H., Ren, J. et al. Contrastive learning with text augmentation for text classification. Appl Intell 53, 19522–19531 (2023). https://doi.org/10.1007/s10489-023-04453-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04453-3