skip to main content
research-article

Graph-based Text Classification by Contrastive Learning with Text-level Graph Augmentation

Published: 13 February 2024 Publication History

Abstract

Text Classification (TC) is a fundamental task in the information retrieval community. Nowadays, the mainstay TC methods are built on the deep neural networks, which can learn much more discriminative text features than the traditional shallow learning methods. Among existing deep TC methods, the ones based on Graph Neural Network (GNN) have attracted more attention due to the superior performance. Technically, the GNN-based TC methods mainly transform the full training dataset to a graph of texts; however, they often neglect the dependency between words, so as to miss potential semantic information of texts, which may be significant to exactly represent them. To solve the aforementioned problem, we generate graphs of words instead, so as to capture the dependency information of words. Specifically, each text is translated into a graph of words, where neighboring words are linked. We learn the node features of words by a GNN-like procedure and then aggregate them as the graph feature to represent the current text. To further improve the text representations, we suggest a contrastive learning regularization term. Specifically, we generate two augmented text graphs for each original text graph, we constrain the representations of the two augmented graphs from the same text close and the ones from different texts far away. We propose various techniques to generate the augmented graphs. Upon those ideas, we develop a novel deep TC model, namely Text-level Graph Networks with Contrastive Learning (TGNcl). We conduct a number of experiments to evaluate the proposed TGNcl model. The empirical results demonstrate that TGNcl can outperform the existing state-of-the-art TC models.

References

[1]
Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recogn. 37, 9 (2004), 1757–1771.
[2]
Guibin Chen, Deheng Ye, Zhenchang Xing, Jieshan Chen, and Erik Cambria. 2017. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In Proceedings of the International Joint Conference on Neural Networks.2377–2383.
[3]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning.1597–1607.
[4]
Yifan Chen, Yang Wang, Pengjie Ren, Meng Wang, and Maarten de Rijke. 2022. Bayesian feature interaction selection for factorization machines. Artif. Intell. 302 (2022), 103589.
[5]
Yifan Chen, Yang Wang, Xiang Zhao, Jie Zou, and Maarten de Rijke. 2020. Block-aware item similarity models for top-N recommendation. ACM Trans. Inf. Syst. 38, 4 (2020), 42:1–42:26.
[6]
Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 13, 1 (1967), 21–27.
[7]
Jacob Devlin, Mingwei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 4171–4186.
[8]
Terrance Devries and Graham W. Taylor. 2017. Improved regularization of convolutional neural networks with cutout. CoRR abs/1708.04552 (2017).
[9]
André Elisseeff and Jason Weston. 2001. A kernel method for multi-labelled classification. In Neural Information Processing Systems: Natural and Synthetic.681–687.
[10]
Hongchao Fang and Pengtao Xie. 2020. CERT: Contrastive self-supervised learning for language understanding. CoRR. abs/2005.12766 (2020).
[11]
Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the International Conference on Learning Representations, Conference Track Proceedings.
[12]
Renchu Guan, Yonghao Liu, Xiaoyue Feng, and Ximing Li. 2021. VPALG: Paper-publication prediction with graph neural networks. In Proceedings of the International Conference on Information and Knowledge Management.617–626.
[13]
Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 297–304.
[14]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition.9729–9738.
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
[16]
Mohsin Iqbal, Asim Karim, and Faisal Kamiran. 2019. Balancing prediction errors for robust sentiment classification. ACM Trans. Knowl. Discov. Data 13, 3 (2019), 33:1–33:21.
[17]
Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the International Conference on Knowledge Discovery and Data Mining.935–944.
[18]
Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning.137–142.
[19]
Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 562–570.
[20]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomás Mikolov. 2017. Bag of tricks for efficient text classification. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. 427–431.
[21]
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.655–665.
[22]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.1746–1751.
[23]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016).
[24]
Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2019. Predict then propagate: Graph neural networks meet personalized PageRank. In Proceedings of the International Conference on Learning Representations.
[25]
Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura E. Barnes, and Donald E. Brown. 2019. Text classification algorithms: A survey. Information 10, 4 (2019), 150.
[26]
Haejun Lee, Drew A. Hudson, Kangwook Lee, and Christopher D. Manning. 2020. SLM: Learning a discourse language representation with sentence unshuffling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.1551–1562.
[27]
Ji Young Lee and Franck Dernoncourt. 2016. Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 515–520.
[28]
Boyu Li, Ting Guo, Xingquan Zhu, Qian Li, Yang Wang, and Fang Chen. 2023. SGCCL: Siamese graph contrastive consensus learning for personalized recommendation. In Proceedings of the ACM International Conference on Web Search and Data Mining. 589–597.
[29]
Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, and Lifang He. 2020. A survey on text classification: From shallow to deep learning. CoRR. abs/2008.00364 (2020).
[30]
Ximing Li and Yang Wang. 2020. Recovering accurate labeling information from partially valid data for effective multi-label learning. In Proceedings of the International Joint Conference on Artificial Intelligence. 1373–1380.
[31]
Yaojin Lin, Qinghua Hu, Jinghua Liu, Xingquan Zhu, and Xindong Wu. 2022. MULFE: Multi-label learning via label-specific feature space ensemble. ACM Trans. Knowl. Discov. Data 16, 1 (2022), 5:1–5:24.
[32]
Han Liu, Caixia Yuan, and Xiaojie Wang. 2020. Label-wise document pre-training for multi-label text classification. In Proceedings of the 9th CCF International Conference on Natural Language Processing and Chinese Computing.641–653.
[33]
Naiyin Liu, Qianlong Wang, and Jiangtao Ren. 2021. Label-embedding bi-directional attentive model for multi-label text classification. Neural Process. Lett. 53, 1 (2021), 375–389.
[34]
Pengfei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuanjing Huang. 2015. Multi-timescale long short-term memory neural network for modelling sentences and documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.2326–2335.
[35]
Qianwen Ma, Chunyuan Yuan, Wei Zhou, and Songlin Hu. 2021. Label-specific dual graph neural network for multi-label text classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 3855–3864.
[36]
Jinseok Nam, Eneldo Loza Mencía, Hyunwoo J. Kim, and Johannes Fürnkranz. 2017. Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Advances in Neural Information Processing Systems. 5413–5423.
[37]
Yoshiki Niwa and Yoshihiko Nitta. 1994. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of the International Conference on Computational Linguistics.304–309.
[38]
Ankit Pal, Muru Selvakumar, and Malaikannan Sankarasubbu. 2020. MAGNET: Multi-label text classification using attention-based graph neural network. In International Conference on Agents and Artificial Intelligence.494–505.
[39]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.1532–1543.
[40]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 2227–2237.
[41]
Kechen Qin, Cheng Li, Virgil Pavlu, and Javed A. Aslam. 2019. Adapting RNN sequence prediction model to multi-label set prediction. In Conference of the North American Chapter of the Association for Computational Linguistics. 3181–3190.
[42]
Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Mach. Learn. 85, 3 (2011), 333–359.
[43]
Stephen E. Robertson and Karen Spärck Jones. 1976. Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27, 3 (1976), 129–146.
[44]
Andras Rozsa, Ethan M. Rudd, and Terrance E. Boult. 2016. Adversarial diversity and hard positive generation. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops.410–417.
[45]
Timothy N. Rubin, America Chambers, Padhraic Smyth, and Mark Steyvers. 2012. Statistical topic models for multi-label document classification. Mach. Learn. 88, 1-2 (2012), 157–208.
[46]
Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, and Weizhu Chen. 2020. A simple but tough-to-beat data augmentation approach for natural language understanding and generation. CoRR abs/2009.13818 (2020).
[47]
Xi’ao Su, Ran Wang, and Xinyu Dai. 2022. Contrastive learning-enhanced nearest neighbor mechanism for multi-label text classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 672–679.
[48]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 1556–1566.
[49]
Che-Ping Tsai and Hung-yi Lee. 2020. Order-free learning alleviating exposure bias in multi-label classification. In Proceedings of the AAAI Conference on Artificial Intelligence.6038–6045.
[50]
Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. Int. J. Data Warehous. Min. 3, 3 (2007), 1–13.
[51]
Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018).
[52]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.
[53]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2017. Graph attention networks. CoRR abs/1710.10903 (2017).
[54]
Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, and Dacheng Tao. 2022. A contrastive cross-channel data augmentation framework for aspect-based sentiment analysis. In Proceedings of the International Conference on Computational Linguistics. 6691–6704.
[55]
Bing Wang, Ximing Li, Zhiyao Yang, Yuanyuan Guan, Jiayin Li, and Shengsheng Wang. 2023. Unsupervised sentence representation learning with frequency-induced adversarial tuning and incomplete sentence filtering. CoRR abs/2305.08655 (2023).
[56]
Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint embedding of words and labels for text classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2321–2331.
[57]
Yequan Wang, Aixin Sun, Jialong Han, Ying Liu, and Xiaoyan Zhu. 2018. Sentiment analysis by capsules. In Proceedings of the World Wide Web Conference on World Wide Web. 1165–1174.
[58]
Lianghao Xia, Chao Huang, Yong Xu, Jiashu Zhao, Dawei Yin, and Jimmy X. Huang. 2022. Hypergraph contrastive collaborative filtering. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 70–79.
[59]
Lin Xiao, Xin Huang, Boli Chen, and Liping Jing. 2019. Label-specific document representation for multi-label text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 466–475.
[60]
Yuanmeng Yan, Rumei Li, Sirui Wang, Fuzheng Zhang, Wei Wu, and Weiran Xu. 2021. ConSERT: A contrastive framework for self-supervised sentence representation transfer. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 5065–5075.
[61]
Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence generation model for multi-label classification. In Proceedings of the International Conference on Computational Linguistics.3915–3926.
[62]
Zichao Yang, Diyi Yang, hris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 1480–1489.
[63]
Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 7370–7377.
[64]
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph contrastive learning with augmentations. In Advances in Neural Information Processing Systems. 5812–5823.
[65]
Chengyuan Zhang, Yang Wang, Lei Zhu, Jiayu Song, and Hongzhi Yin. 2021. Multi-graph heterogeneous interaction fusion for social recommendation. ACM Trans. Inf. Syst. 40, 2 (2021), 1–26.
[66]
Minling Zhang and Zhihua Zhou. 2006. Multi-label neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18, 10 (2006), 1338–1351.
[67]
Minling Zhang and Zhihua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recogn. 40, 4 (2007), 2038–2048.
[68]
Minling Zhang and Zhihua Zhou. 2014. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 8 (2014), 1819–1837.
[69]
Qianwen Zhang, Ximing Zhang, Zhao Yan, Ruifang Liu, Yunbo Cao, and Minling Zhang. 2021. Correlation-guided representation for multi-label text classification. In Proceedings of the International Joint Conference on Artificial Intelligence. 3363–3369.
[70]
Ximing Zhang, Qian-Wen Zhang, Zhao Yan, Ruifang Liu, and Yunbo Cao. 2021. Enhancing label correlation feedback in multi-label text classification via multi-task learning. In Findings of the Association for Computational Linguistics.1190–1200.
[71]
Daniel Zügner, Oliver Borchert, Amir Akbarnejad, and Stephan Günnemann. 2020. Adversarial attacks on graph neural networks: Perturbations and their patterns. ACM Trans. Knowl. Discov. Data 14, 5 (2020), 57:1–57:31.

Cited By

View all
  • (2025)Few-shot Hierarchical Text Classification with Bidirectional Path Constraint by label weightingPattern Recognition Letters10.1016/j.patrec.2025.01.025190(81-88)Online publication date: Apr-2025
  • (2024)Escaping the neutralization effect of modality features fusion in multimodal Fake News DetectionInformation Fusion10.1016/j.inffus.2024.102500111(102500)Online publication date: Nov-2024
  • (2024)A criteria-based classification model using augmentation and contrastive learning for analyzing imbalanced statement dataHeliyon10.1016/j.heliyon.2024.e3292910:12(e32929)Online publication date: Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 4
May 2024
707 pages
EISSN:1556-472X
DOI:10.1145/3613622
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2024
Online AM: 22 December 2023
Accepted: 15 December 2023
Received: 14 October 2022
Published in TKDD Volume 18, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multi-label classification
  2. graph representation
  3. label correlation
  4. contrastive learning
  5. graph augmentation

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)694
  • Downloads (Last 6 weeks)52
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Few-shot Hierarchical Text Classification with Bidirectional Path Constraint by label weightingPattern Recognition Letters10.1016/j.patrec.2025.01.025190(81-88)Online publication date: Apr-2025
  • (2024)Escaping the neutralization effect of modality features fusion in multimodal Fake News DetectionInformation Fusion10.1016/j.inffus.2024.102500111(102500)Online publication date: Nov-2024
  • (2024)A criteria-based classification model using augmentation and contrastive learning for analyzing imbalanced statement dataHeliyon10.1016/j.heliyon.2024.e3292910:12(e32929)Online publication date: Jun-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media