Hybrid sampling-based contrastive learning for imbalanced node classification

Cui, Caixia; Wang, Jie; Wei, Wei; Liang, Jiye

doi:10.1007/s13042-022-01677-6

Hybrid sampling-based contrastive learning for imbalanced node classification

Original Article
Published: 24 October 2022

Volume 14, pages 989–1001, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Caixia Cui^1,2,
Jie Wang³,
Wei Wei¹ &
…
Jiye Liang¹

580 Accesses
2 Citations
Explore all metrics

Abstract

Imbalanced node classification is a vital task because it widely exists in many real-world applications, such as financial fraud detection, anti-money laundering, drug reaction prediction and so on. However, many recent methods are for balanced graph-structured datasets, and do not perform well on imbalanced data. Therefore, we propose a hybrid sampling-based contrastive learning method (HSCL) for imbalanced node classification to alleviate this problem. The core of our method is to adopt the hybrid sampling method in contrastive learning, that is, undersampling majority classes and oversampling minority classes, to achieve a balance of samples from different classes in contrastive learning and thus obtain a discriminative representation. HSCL has been evaluated extensively on five real-world data sets. Experimental results show that the proposed method obtains better performance than other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evidential Hybrid Re-sampling for Multi-class Imbalanced Data

Hellinger distance decision trees for PU learning in imbalanced data sets

Article 28 March 2023

Carlos Ortega Vázquez, Seppe vanden Broucke & Jochen De Weerdt

Applying MASI Algorithm to Improve the Classification Performance of Imbalanced Data in Fraud Detection

Notes

References

Mohammadrezaei M, Shiri ME, Rahmani AM (2018) Identifying fake accounts on social networks based on graph analysis and classification algorithms. Security and Communication Networks 2018(1):1–8
Article Google Scholar
Masumshah R, Aghdam R, Eslahchi C (2021) A neural network-based method for polypharmacy side effects prediction. BMC bioinformatics 22(1):1–17
Article Google Scholar
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations
Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, pp. 1024–1034
Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations
Li R, Wang S, Zhu F, Huang J (2018) Adaptive graph convolutional neural networks. In: AAAI Conference on Artificial Intelligence, pp. 3546–3553
Ghorbani M, Kazi A, Baghshah MS, Rabiee HR, Navab N (2022) RA-GCN: Graph convolutional network for disease prediction problems with imbalanced data. Medical Image Analysis 75:102272
Article Google Scholar
Breuer A, Eilat R, Weinsberg U (2020) Friend or Faux: Graph-based early detection of fake accounts on social networks. In: The Web Conference, pp. 1287–1297
Tianxiang Z, Xiang Z, Suhang W (2021) GraphSMOTE: Imbalanced node classification on graphs with graph neural networks. In: The ACM International Conference on Web Search and Data Mining, pp. 833–841
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357
Article MATH Google Scholar
Min S, Yufei T, Xingquan Z, Wilson D, Jianxun L (2020) Multi-class imbalanced graph convolutional network learning. In: International Joint Conference on Artificial Intelligence, pp. 2879–2885
Dong J, Lin T (2019) MarginGAN: Adversarial training in semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 10440–10449
He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9726–9735
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: Advances in Neural Information Processing Systems, pp. 18661–18673
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186
Gao T, Yao X, Chen D (2021) SimCSE: Simple contrastive learning of sentence embeddings. In: Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910
Bin L, Wangda L, Xiang L, Lin G, Min Y, Xiaoqi Y, Ruifeng X (2021) Enhancing aspect-based sentiment analysis with supervised contrastive learning. In: The ACM International Conference on Information and Knowledge Management, pp. 3242–3247
Zeng Z, He K, Yan Y, Liu Z, Wu Y, Xu H, Jiang H, Xu W (2021) Modeling discriminative representations for out-of-domain detection with supervised contrastive learning. In: The Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pp. 870–878
Cho YS, Kim S, Lee JH (2021) Source model selection for transfer learning of image classification using supervised contrastive loss. In: IEEE International Conference on Big Data and Smart Computing, pp. 325–329
Zhang J, Zou J, Su Z, Tang J, Kang Y, Xu H, Liu Z, Fan S (2022) A class-aware supervised contrastive learning framework for imbalanced fault diagnosis. Knowledge-Based Systems 252:109437
Article Google Scholar
Zhong W, Raahemi B, Liu J (2009) Learning on class imbalanced data to classify peer-to-peer applications in IP traffic using resampling techniques. In: International Joint Conference on Neural Networks, pp. 3548–3554
Han X, Cui R, Lan Y, Kang Y, Deng J, Jia N (2019) A gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets. International Journal of Machine Learning and Cybernetics 10(12):3687–3699
Article Google Scholar
Ponce AG, Sánchez JS, Valdovinos RM, Marcial-Romero JR (2021) DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem. Expert Systems with Applications 168:114301
Article Google Scholar
Mirzaei B, Rahmati F, Nezamabadi-Pour H (2022) A score-based preprocessing technique for class imbalance problems. Pattern Analysis and Applications, 1–19
Mishra NK, Singh PK (2021) Feature construction and smote-based imbalance handling for multi-label learning. Information Science 563:342–357
Article MathSciNet Google Scholar
Zhou Z, Liu X (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18(1):63–77
Article MathSciNet Google Scholar
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems 29(8):3573–3587
Article Google Scholar
Bo Y, Xiaoli M (2012) Sampling+ Reweighting: Boosting the performance of adaboost on imbalanced datasets. In: International Joint Conference on Neural Networks, pp. 1–6
Zhu H, Liu H, Fu A (2021) Class-weighted neural network for monotonic imbalanced classification. International Journal of Machine Learning and Cybernetics 12(4):1191–1201
Article Google Scholar
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32(1):4–24
Article MathSciNet Google Scholar
Xu W, Yuan K, Li W, Ding W (2022) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Transactions on Emerging Topics in Computational Intelligence
Li W, Zhou H, Xu W, Wang X-Z, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Transactions on Neural Networks and Learning Systems
Han H, Wang W, Mao B (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-Level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482
Chen D, Lin Y, Zhao G, Ren X, Li P, Zhou J, Sun X (2021) Topology-imbalance learning for semi-supervised node classification. In: Advances in Neural Information Processing Systems, pp. 29885–29897
Sun K, Zhu Z, Lin Z (2021) AdaGCN: Adaboosting graph convolutional networks into deep models. In: International Conference on Learning Representations
Zhao T, Zhang X, Wang S (2022) Synthetic over-sampling for imbalanced node classification with graph neural networks. arXiv preprint arXiv:2206.05335
Shi S, Qiao K, Yang S, Wang L, Chen J, Yan B (2021) Boosting-GNN: Boosting algorithm for graph networks on imbalanced node classification. Frontiers Neurorobotics 15:775688
Article Google Scholar
Xin Z, Chen G, Chen J, Zhao S, Wang Z, Fang A, Pan Z, Cui L (2022) Mgpool: multi-granular graph pooling convolutional networks representation learning. International Journal of Machine Learning and Cybernetics 13(3):783–796
Article Google Scholar
Xu W, Li W (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Transactions on Cybernetics 46(2):366–379
Article MathSciNet Google Scholar
Xu W, Yuan K, Li W (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Applied Intelligence 52(8):9148–9173
Article Google Scholar
Yuning Y, Tianlong C, Yongduo S, Ting C, Zhangyang W, Yang S (2020) Graph contrastive learning with augmentations. In: Advances in Neural Information Processing Systems, pp. 5812–5823
Qiu J, Chen Q, Dong Y, Zhang J, Yang H, Ding M, Wang K, Tang J (2020) GCC: Graph contrastive coding for graph neural network pre-training. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1150–1160
Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. In: The Web Conference, pp. 2069–2080
Li T, Cao P, Yuan Y, Fan L, Yang Y, Feris RS, Indyk P, Katabi D (2022) Targeted supervised contrastive learning for long-tailed recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6918–6928
Haibo H, Yang B, Garcia EA, Shutao L (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: International Joint Conference on Neural Networks, pp. 1322–1328
Barua S, Islam MM, Yao X, Murase K (2012) MWMOTE-Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering 26(2):405–425
Article Google Scholar
Nyamabo AK, Yu H, Shi J-Y (2021) SSI-DDI: Substructure-substructure interactions for drug-drug interaction prediction. Briefings in Bioinformatics 22(6):133
Article Google Scholar
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Applied Soft Computing 56:94–106
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No.61876103, U21A20473 and 61772323).

Author information

Authors and Affiliations

Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, Shanxi, China
Caixia Cui, Wei Wei & Jiye Liang
Department of Computer Science and Technology, Taiyuan Normal University, Jinzhong, 030619, Shanxi, China
Caixia Cui
College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan, 030024, Shanxi, China
Jie Wang

Authors

Caixia Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jiye Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiye Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cui, C., Wang, J., Wei, W. et al. Hybrid sampling-based contrastive learning for imbalanced node classification. Int. J. Mach. Learn. & Cyber. 14, 989–1001 (2023). https://doi.org/10.1007/s13042-022-01677-6

Download citation

Received: 21 March 2022
Accepted: 28 September 2022
Published: 24 October 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s13042-022-01677-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid sampling-based contrastive learning for imbalanced node classification

Abstract

Access this article

Similar content being viewed by others

Evidential Hybrid Re-sampling for Multi-class Imbalanced Data

Hellinger distance decision trees for PU learning in imbalanced data sets

Applying MASI Algorithm to Improve the Classification Performance of Imbalanced Data in Fraud Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid sampling-based contrastive learning for imbalanced node classification

Abstract

Access this article

Similar content being viewed by others

Evidential Hybrid Re-sampling for Multi-class Imbalanced Data

Hellinger distance decision trees for PU learning in imbalanced data sets

Applying MASI Algorithm to Improve the Classification Performance of Imbalanced Data in Fraud Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation