Abstract
Imbalanced node classification is a vital task because it widely exists in many real-world applications, such as financial fraud detection, anti-money laundering, drug reaction prediction and so on. However, many recent methods are for balanced graph-structured datasets, and do not perform well on imbalanced data. Therefore, we propose a hybrid sampling-based contrastive learning method (HSCL) for imbalanced node classification to alleviate this problem. The core of our method is to adopt the hybrid sampling method in contrastive learning, that is, undersampling majority classes and oversampling minority classes, to achieve a balance of samples from different classes in contrastive learning and thus obtain a discriminative representation. HSCL has been evaluated extensively on five real-world data sets. Experimental results show that the proposed method obtains better performance than other state-of-the-art methods.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Mohammadrezaei M, Shiri ME, Rahmani AM (2018) Identifying fake accounts on social networks based on graph analysis and classification algorithms. Security and Communication Networks 2018(1):1–8
Masumshah R, Aghdam R, Eslahchi C (2021) A neural network-based method for polypharmacy side effects prediction. BMC bioinformatics 22(1):1–17
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations
Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, pp. 1024–1034
Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations
Li R, Wang S, Zhu F, Huang J (2018) Adaptive graph convolutional neural networks. In: AAAI Conference on Artificial Intelligence, pp. 3546–3553
Ghorbani M, Kazi A, Baghshah MS, Rabiee HR, Navab N (2022) RA-GCN: Graph convolutional network for disease prediction problems with imbalanced data. Medical Image Analysis 75:102272
Breuer A, Eilat R, Weinsberg U (2020) Friend or Faux: Graph-based early detection of fake accounts on social networks. In: The Web Conference, pp. 1287–1297
Tianxiang Z, Xiang Z, Suhang W (2021) GraphSMOTE: Imbalanced node classification on graphs with graph neural networks. In: The ACM International Conference on Web Search and Data Mining, pp. 833–841
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357
Min S, Yufei T, Xingquan Z, Wilson D, Jianxun L (2020) Multi-class imbalanced graph convolutional network learning. In: International Joint Conference on Artificial Intelligence, pp. 2879–2885
Dong J, Lin T (2019) MarginGAN: Adversarial training in semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 10440–10449
He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9726–9735
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: Advances in Neural Information Processing Systems, pp. 18661–18673
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186
Gao T, Yao X, Chen D (2021) SimCSE: Simple contrastive learning of sentence embeddings. In: Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910
Bin L, Wangda L, Xiang L, Lin G, Min Y, Xiaoqi Y, Ruifeng X (2021) Enhancing aspect-based sentiment analysis with supervised contrastive learning. In: The ACM International Conference on Information and Knowledge Management, pp. 3242–3247
Zeng Z, He K, Yan Y, Liu Z, Wu Y, Xu H, Jiang H, Xu W (2021) Modeling discriminative representations for out-of-domain detection with supervised contrastive learning. In: The Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pp. 870–878
Cho YS, Kim S, Lee JH (2021) Source model selection for transfer learning of image classification using supervised contrastive loss. In: IEEE International Conference on Big Data and Smart Computing, pp. 325–329
Zhang J, Zou J, Su Z, Tang J, Kang Y, Xu H, Liu Z, Fan S (2022) A class-aware supervised contrastive learning framework for imbalanced fault diagnosis. Knowledge-Based Systems 252:109437
Zhong W, Raahemi B, Liu J (2009) Learning on class imbalanced data to classify peer-to-peer applications in IP traffic using resampling techniques. In: International Joint Conference on Neural Networks, pp. 3548–3554
Han X, Cui R, Lan Y, Kang Y, Deng J, Jia N (2019) A gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. International Journal of Machine Learning and Cybernetics 10(12):3687–3699
Ponce AG, Sánchez JS, Valdovinos RM, Marcial-Romero JR (2021) DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem. Expert Systems with Applications 168:114301
Mirzaei B, Rahmati F, Nezamabadi-Pour H (2022) A score-based preprocessing technique for class imbalance problems. Pattern Analysis and Applications, 1–19
Mishra NK, Singh PK (2021) Feature construction and smote-based imbalance handling for multi-label learning. Information Science 563:342–357
Zhou Z, Liu X (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18(1):63–77
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems 29(8):3573–3587
Bo Y, Xiaoli M (2012) Sampling+ Reweighting: Boosting the performance of adaboost on imbalanced datasets. In: International Joint Conference on Neural Networks, pp. 1–6
Zhu H, Liu H, Fu A (2021) Class-weighted neural network for monotonic imbalanced classification. International Journal of Machine Learning and Cybernetics 12(4):1191–1201
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32(1):4–24
Xu W, Yuan K, Li W, Ding W (2022) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Transactions on Emerging Topics in Computational Intelligence
Li W, Zhou H, Xu W, Wang X-Z, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Transactions on Neural Networks and Learning Systems
Han H, Wang W, Mao B (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-Level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482
Chen D, Lin Y, Zhao G, Ren X, Li P, Zhou J, Sun X (2021) Topology-imbalance learning for semi-supervised node classification. In: Advances in Neural Information Processing Systems, pp. 29885–29897
Sun K, Zhu Z, Lin Z (2021) AdaGCN: Adaboosting graph convolutional networks into deep models. In: International Conference on Learning Representations
Zhao T, Zhang X, Wang S (2022) Synthetic over-sampling for imbalanced node classification with graph neural networks. arXiv preprint arXiv:2206.05335
Shi S, Qiao K, Yang S, Wang L, Chen J, Yan B (2021) Boosting-GNN: Boosting algorithm for graph networks on imbalanced node classification. Frontiers Neurorobotics 15:775688
Xin Z, Chen G, Chen J, Zhao S, Wang Z, Fang A, Pan Z, Cui L (2022) Mgpool: multi-granular graph pooling convolutional networks representation learning. International Journal of Machine Learning and Cybernetics 13(3):783–796
Xu W, Li W (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Transactions on Cybernetics 46(2):366–379
Xu W, Yuan K, Li W (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Applied Intelligence 52(8):9148–9173
Yuning Y, Tianlong C, Yongduo S, Ting C, Zhangyang W, Yang S (2020) Graph contrastive learning with augmentations. In: Advances in Neural Information Processing Systems, pp. 5812–5823
Qiu J, Chen Q, Dong Y, Zhang J, Yang H, Ding M, Wang K, Tang J (2020) GCC: Graph contrastive coding for graph neural network pre-training. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1150–1160
Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. In: The Web Conference, pp. 2069–2080
Li T, Cao P, Yuan Y, Fan L, Yang Y, Feris RS, Indyk P, Katabi D (2022) Targeted supervised contrastive learning for long-tailed recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6918–6928
Haibo H, Yang B, Garcia EA, Shutao L (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks, pp. 1322–1328
Barua S, Islam MM, Yao X, Murase K (2012) MWMOTE-Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering 26(2):405–425
Nyamabo AK, Yu H, Shi J-Y (2021) SSI-DDI: Substructure-substructure interactions for drug-drug interaction prediction. Briefings in Bioinformatics 22(6):133
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Applied Soft Computing 56:94–106
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.61876103, U21A20473 and 61772323).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cui, C., Wang, J., Wei, W. et al. Hybrid sampling-based contrastive learning for imbalanced node classification. Int. J. Mach. Learn. & Cyber. 14, 989–1001 (2023). https://doi.org/10.1007/s13042-022-01677-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01677-6