Skip to main content
Log in

Hybrid sampling-based contrastive learning for imbalanced node classification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Imbalanced node classification is a vital task because it widely exists in many real-world applications, such as financial fraud detection, anti-money laundering, drug reaction prediction and so on. However, many recent methods are for balanced graph-structured datasets, and do not perform well on imbalanced data. Therefore, we propose a hybrid sampling-based contrastive learning method (HSCL) for imbalanced node classification to alleviate this problem. The core of our method is to adopt the hybrid sampling method in contrastive learning, that is, undersampling majority classes and oversampling minority classes, to achieve a balance of samples from different classes in contrastive learning and thus obtain a discriminative representation. HSCL has been evaluated extensively on five real-world data sets. Experimental results show that the proposed method obtains better performance than other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://github.com/tkipf/gcn/tree/master/gcn/data.

  2. https://github.com/shchur/gnn-benchmark/raw/master/data/npz/ms_academic_cs.npz.

  3. https://github.com/shchur/gnn-benchmark/raw/master/data/npz/amazon_electronics_computers.npz.

  4. https://github.com/tkipf/pygcn.

  5. https://github.com/TianxiangZhao/GraphSmote.

  6. https://github.com/codeshareabc/DRGCN.

References

  1. Mohammadrezaei M, Shiri ME, Rahmani AM (2018) Identifying fake accounts on social networks based on graph analysis and classification algorithms. Security and Communication Networks 2018(1):1–8

    Article  Google Scholar 

  2. Masumshah R, Aghdam R, Eslahchi C (2021) A neural network-based method for polypharmacy side effects prediction. BMC bioinformatics 22(1):1–17

    Article  Google Scholar 

  3. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations

  4. Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, pp. 1024–1034

  5. Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations

  6. Li R, Wang S, Zhu F, Huang J (2018) Adaptive graph convolutional neural networks. In: AAAI Conference on Artificial Intelligence, pp. 3546–3553

  7. Ghorbani M, Kazi A, Baghshah MS, Rabiee HR, Navab N (2022) RA-GCN: Graph convolutional network for disease prediction problems with imbalanced data. Medical Image Analysis 75:102272

    Article  Google Scholar 

  8. Breuer A, Eilat R, Weinsberg U (2020) Friend or Faux: Graph-based early detection of fake accounts on social networks. In: The Web Conference, pp. 1287–1297

  9. Tianxiang Z, Xiang Z, Suhang W (2021) GraphSMOTE: Imbalanced node classification on graphs with graph neural networks. In: The ACM International Conference on Web Search and Data Mining, pp. 833–841

  10. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357

    Article  MATH  Google Scholar 

  11. Min S, Yufei T, Xingquan Z, Wilson D, Jianxun L (2020) Multi-class imbalanced graph convolutional network learning. In: International Joint Conference on Artificial Intelligence, pp. 2879–2885

  12. Dong J, Lin T (2019) MarginGAN: Adversarial training in semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 10440–10449

  13. He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9726–9735

  14. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607

  15. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: Advances in Neural Information Processing Systems, pp. 18661–18673

  16. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186

  17. Gao T, Yao X, Chen D (2021) SimCSE: Simple contrastive learning of sentence embeddings. In: Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910

  18. Bin L, Wangda L, Xiang L, Lin G, Min Y, Xiaoqi Y, Ruifeng X (2021) Enhancing aspect-based sentiment analysis with supervised contrastive learning. In: The ACM International Conference on Information and Knowledge Management, pp. 3242–3247

  19. Zeng Z, He K, Yan Y, Liu Z, Wu Y, Xu H, Jiang H, Xu W (2021) Modeling discriminative representations for out-of-domain detection with supervised contrastive learning. In: The Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pp. 870–878

  20. Cho YS, Kim S, Lee JH (2021) Source model selection for transfer learning of image classification using supervised contrastive loss. In: IEEE International Conference on Big Data and Smart Computing, pp. 325–329

  21. Zhang J, Zou J, Su Z, Tang J, Kang Y, Xu H, Liu Z, Fan S (2022) A class-aware supervised contrastive learning framework for imbalanced fault diagnosis. Knowledge-Based Systems 252:109437

    Article  Google Scholar 

  22. Zhong W, Raahemi B, Liu J (2009) Learning on class imbalanced data to classify peer-to-peer applications in IP traffic using resampling techniques. In: International Joint Conference on Neural Networks, pp. 3548–3554

  23. Han X, Cui R, Lan Y, Kang Y, Deng J, Jia N (2019) A gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. International Journal of Machine Learning and Cybernetics 10(12):3687–3699

    Article  Google Scholar 

  24. Ponce AG, Sánchez JS, Valdovinos RM, Marcial-Romero JR (2021) DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem. Expert Systems with Applications 168:114301

    Article  Google Scholar 

  25. Mirzaei B, Rahmati F, Nezamabadi-Pour H (2022) A score-based preprocessing technique for class imbalance problems. Pattern Analysis and Applications, 1–19

  26. Mishra NK, Singh PK (2021) Feature construction and smote-based imbalance handling for multi-label learning. Information Science 563:342–357

    Article  MathSciNet  Google Scholar 

  27. Zhou Z, Liu X (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18(1):63–77

    Article  MathSciNet  Google Scholar 

  28. Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems 29(8):3573–3587

    Article  Google Scholar 

  29. Bo Y, Xiaoli M (2012) Sampling+ Reweighting: Boosting the performance of adaboost on imbalanced datasets. In: International Joint Conference on Neural Networks, pp. 1–6

  30. Zhu H, Liu H, Fu A (2021) Class-weighted neural network for monotonic imbalanced classification. International Journal of Machine Learning and Cybernetics 12(4):1191–1201

    Article  Google Scholar 

  31. Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32(1):4–24

    Article  MathSciNet  Google Scholar 

  32. Xu W, Yuan K, Li W, Ding W (2022) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Transactions on Emerging Topics in Computational Intelligence

  33. Li W, Zhou H, Xu W, Wang X-Z, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Transactions on Neural Networks and Learning Systems

  34. Han H, Wang W, Mao B (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887

  35. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-Level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482

  36. Chen D, Lin Y, Zhao G, Ren X, Li P, Zhou J, Sun X (2021) Topology-imbalance learning for semi-supervised node classification. In: Advances in Neural Information Processing Systems, pp. 29885–29897

  37. Sun K, Zhu Z, Lin Z (2021) AdaGCN: Adaboosting graph convolutional networks into deep models. In: International Conference on Learning Representations

  38. Zhao T, Zhang X, Wang S (2022) Synthetic over-sampling for imbalanced node classification with graph neural networks. arXiv preprint arXiv:2206.05335

  39. Shi S, Qiao K, Yang S, Wang L, Chen J, Yan B (2021) Boosting-GNN: Boosting algorithm for graph networks on imbalanced node classification. Frontiers Neurorobotics 15:775688

    Article  Google Scholar 

  40. Xin Z, Chen G, Chen J, Zhao S, Wang Z, Fang A, Pan Z, Cui L (2022) Mgpool: multi-granular graph pooling convolutional networks representation learning. International Journal of Machine Learning and Cybernetics 13(3):783–796

    Article  Google Scholar 

  41. Xu W, Li W (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Transactions on Cybernetics 46(2):366–379

    Article  MathSciNet  Google Scholar 

  42. Xu W, Yuan K, Li W (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Applied Intelligence 52(8):9148–9173

    Article  Google Scholar 

  43. Yuning Y, Tianlong C, Yongduo S, Ting C, Zhangyang W, Yang S (2020) Graph contrastive learning with augmentations. In: Advances in Neural Information Processing Systems, pp. 5812–5823

  44. Qiu J, Chen Q, Dong Y, Zhang J, Yang H, Ding M, Wang K, Tang J (2020) GCC: Graph contrastive coding for graph neural network pre-training. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1150–1160

  45. Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. In: The Web Conference, pp. 2069–2080

  46. Li T, Cao P, Yuan Y, Fan L, Yang Y, Feris RS, Indyk P, Katabi D (2022) Targeted supervised contrastive learning for long-tailed recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6918–6928

  47. Haibo H, Yang B, Garcia EA, Shutao L (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks, pp. 1322–1328

  48. Barua S, Islam MM, Yao X, Murase K (2012) MWMOTE-Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering 26(2):405–425

    Article  Google Scholar 

  49. Nyamabo AK, Yu H, Shi J-Y (2021) SSI-DDI: Substructure-substructure interactions for drug-drug interaction prediction. Briefings in Bioinformatics 22(6):133

    Article  Google Scholar 

  50. Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Applied Soft Computing 56:94–106

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No.61876103, U21A20473 and 61772323).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiye Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, C., Wang, J., Wei, W. et al. Hybrid sampling-based contrastive learning for imbalanced node classification. Int. J. Mach. Learn. & Cyber. 14, 989–1001 (2023). https://doi.org/10.1007/s13042-022-01677-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01677-6

Keywords

Navigation