Skip to main content
Log in

A novel graph oversampling framework for node classification in class-imbalanced graphs

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Graph neural network (GNN) is a promising method to analyze graphs. Most existing GNNs adopt the class-balanced assumption, which cannot deal with class-imbalanced graphs well. The oversampling technique is effective in alleviating class-imbalanced problems. However, most graph oversampling methods generate synthetic minority nodes and their edges after applying GNNs. They ignore the problem that the representations of the original and synthetic minority nodes are dominated by majority nodes caused by aggregating neighbor information through GNN before oversampling. In this paper, we propose a novel graph oversampling framework, termed distribution alignment-based oversampling for node classification in class-imbalanced graphs (named Graph-DAO). Our framework generates synthetic minority nodes before GNN to avoid the dominance of majority nodes caused by message passing in GNNs. Additionally, we introduce a distribution alignment method based on the sum-product network to learn more information about minority nodes. To our best knowledge, it is the first to use the sum-product network to solve the class-imbalanced problem in node classification. A large number of experiments on four real datasets show that our method achieves the optimal results on the node classification task for class-imbalanced graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Li M, Zhu Z. Spatial-temporal fusion graph neural networks for traffic flow forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 4189–4196

  2. Wang Z, Wang C, Gao C, et al. An evolutionary autoencoder for dynamic community detection. Sci China Inf Sci, 2020, 63: 212205

    Article  MathSciNet  Google Scholar 

  3. Romanou A, Smeros P, Aberer K. On representation learning for scientific news articles using heterogeneous knowledge graphs. In: Proceedings of the Web Conference, Ljubljana, 2021. 422–425

  4. Sen P, Namata G, Bilgic M, et al. Collective classification in network data. AI Mag, 2008, 29: 93–106

    Google Scholar 

  5. Mohammadrezaei M, Shiri M E, Rahmani A M. Identifying fake accounts on social networks based on graph analysis and classification algorithms. Secur Commun Netw, 2018, 2018: 1–8

    Article  Google Scholar 

  6. Xu K, Hu W, Leskovec J, et al. How powerful are graph neural networks? In: Proceedings of the Learning Representations, New Orleans, 2019

  7. Ye Y, Ji S. Sparse graph attention networks. IEEE Trans Knowl Data Eng, 2023, 35: 905–916

    Google Scholar 

  8. Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the Learning Representations, Toulon, 2017

  9. Guo L, Tang L, Chen T, et al. DA-GCN: a domain-aware attentive graph convolution network for shared-account cross-domain sequential recommendation. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, 2021. 2483–2489

  10. Chen H, Zhuang F, Xiao L. AMA-GCN: adaptive multi-layer aggregation graph convolutional network for disease prediction. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, 2021. 2235–2241

  11. Eliasof M, Haber E, Treister E. PDE-GCN: novel architectures for graph neural networks motivated by partial differential equation. In: Proceedings of the Advances in Neural Information Processing Systems, 2021. 3836–3849

  12. You H, Lu Z, Zhou Z, et al. Early-bird GCNs: graph-network co-optimization towards more efficient GCN training and inference via drawing early-bird lottery tickets. In: Proceedings of the 34th Conference on Innovative Applications of Artificial Intelligence, 2022. 8910–8918

  13. Li S, Li W T, Wang W. CO-GCN for multi-view semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020. 4691–4698

  14. Tian L, Wu H. MI-GCN: node mutual information-based graph convolutional network. In: Proceedings of the Web Conference, Lyon, 2022. 996–1003

  15. Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the Advances in Neural Information Processing Systems, Long Beach, 2017. 1024–1034

  16. Yin J, Gan C, Zhao K, et al. A novel model for imbalanced data classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020. 6680–6687

  17. Menon A K, Jayasumana S, Rawat A S, et al. Long-tail learning via logit adjustment. In: Proceedings of the Learning Representations, 2021

  18. Chen D, Lin Y, Zhao G, et al. Topology-imbalance learning for semi-supervised node classification. In: Proceedings of the Advances in Neural Information Processing Systems, 2021. 29885–29897

  19. Park J, Song J, Yang E. GraphENS: neighbor-aware ego network synthesis for class-imbalanced node classification. In: Proceedings of the Learning Representations, 2022

  20. Zhao T, Zhang X, Wang S. GraphSMOTE: imbalanced node classification on graphs with graph neural networks. In: Proceedings of the Web Search and Data Mining, 2021. 833–841

  21. Shi M, Tang Y, Zhu X, et al. Multi-class imbalanced graph convolutional network learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020. 2879–2885

  22. Li X, Wen L, Deng Y, et al. Graph neural network with curriculum learning for imbalanced node classification. 2022. arXiv:2202.02529

  23. Wu L, Xia J, Gao Z, et al. GraphMixup: improving class-imbalanced node classification on graphs by self-supervised context prediction. In: Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, Grenoble, 2022. 519–535

  24. Song J, Park J, Yang E. TAM: topology-aware margin loss for class-imbalanced node classification. In: Proceedings of the Machine Learning, Baltimore, 2022. 20369–20383

  25. Wang K, An J, Zhou M, et al. Minority-weighted graph neural network for imbalanced node classification in social networks of internet of people. IEEE Int Things J, 2023, 10: 330–340

    Article  Google Scholar 

  26. Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321–357

    Article  Google Scholar 

  27. Wang Z, Ye X, Wang C, et al. RSDNE: exploring relaxed similarity and dissimilarity from completely-imbalanced labels for network embedding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 475–482

  28. Wang Y. Fair graph representation learning with imbalanced and biased data. In: Proceedings of the 15th ACM International Conference on Web Search and Data Mining, 2022. 1557–1558

  29. Ghorbani M, Kazi A, Baghshah M S, et al. RA-GCN: graph convolutional network for disease prediction problems with imbalanced data. Med Image Anal, 2022, 75: 102272

    Article  Google Scholar 

  30. Huang Z, Tang Y, Chen Y. A graph neural network-based node classification model on class-imbalanced graph data. Knowl-Based Syst, 2022, 244: 108538

    Article  Google Scholar 

  31. Shi S, Qiao K, Yang S, et al. AdaGCN: adaptive boosting algorithm for graph convolutional networks on imbalanced node classification. 2022. arXiv:2105.11625

  32. Qian Y, Zhang C, Zhang Y, et al. Co-modality graph contrastive learning for imbalanced node classification. In: Proceedings of the Advances in Neural Information Processing Systems, 2022

  33. Qu L, Zhu H, Zheng R, et al. ImGAGN: imbalanced network embedding via generative adversarial graph networks. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, 2021. 1390–1398

  34. Poon H, Domingos P. Sum-product networks: a new deep architecture. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, Barcelona, 2011. 337–346

  35. Gens R, Domingos P. Learning the structure of sum-product networks. In: Proceedings of the Machine Learning, Atlanta, 2013. 873–880

  36. Sánchez-Cauce R, París I, Díez F. Sum-product networks: a survey. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 3821–3839

    Google Scholar 

  37. Peharz R, Vergari A, Stelzner K, et al. Random sum-product networks: a simple and effective approach to probabilistic deep learning. In: Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, Tel Aviv, 2019. 334–344

  38. Peharz R, Vergari A, Stelzner K, et al. Einsum networks: fast and scalable learning of tractable probabilistic circuits. In: Proceedings of the 37th International Conference on Machine Learning, 2020. 7563–7574

  39. Dennis A, Ventura D. Learning the architecture of sum-product networks using clustering on variables. In: Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 3239–3247

  40. Papastavridis, John G. Tensor Calculus and Analytical Dynamics. New York: CRC Press, 1988

    Google Scholar 

  41. Nath A, Domingos P M. Learning relational sum-product networks. In: Proceedings of the National Conference on Artificial Intelligence, Austin, 2015. 2878–28867

  42. Zheng K, Pronobis A, Rao R. Learning graph-structured sum-product networks for probabilistic semantic maps. In: Proceedings of the National Conference on Artificial Intelligence, New Orleans, 2018. 4547–4555

  43. Xia R, Zhang Y, Zhang C. Multi-head variational graph autoencoder constrained by sum-product networks. In: Proceedings of the ACM Web Conference, Austin, 2023. 641–650

  44. Kipf T N, Welling M. Variational graph auto-encoders. In: Proceedings of the Advances in Neural Information Processing Systems, 2017. 1–3

  45. Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009

    Google Scholar 

  46. Romanou A, Smeros P, Aberer K. On representation learning for scientific news articles using heterogeneous knowledge graphs. In: Proceedings of the ACM Web Conference, 2021. 422–425

  47. Wang Z, Wang C, Li X, et al. Evolutionary Markov dynamics for network community detection. IEEE Trans Knowl Data Eng, 2022, 34: 1206–1220

    Article  Google Scholar 

  48. Tian Y, Zhang C, Guo Z. Recipe2Vec: multi-modal recipe representation learning with graph neural networks. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence, Vienna, 2022. 3473–3479

  49. Chen Z, Chen F, Zhang L, et al. Bridging the gap between spatial and spectral domains: a survey on graph neural networks. 2020. doi: https://doi.org/10.1145/3627816

  50. Wang X, Zhang M. How powerful are spectral graph neural networks. In: Proceedings of the Machine Learning, Baltimore, 2022. 23341–23362

  51. Zhu H, Koniusz P. Simple spectral graph convolution. In: Proceedings of the Learning Representations, 2021

  52. Verma V, Lamb A, Beckham C, et al. Manifold mixup: better representations by interpolating hidden states. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, 2019. 6438–6447

  53. Lu Q, Getoor L. Link-based classification. machine learning. In: Proceedings of the 20th International Conference, Washington, 2003. 496–503

  54. Buda M, Maki A, Mazurowski M A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw, 2018, 106: 249–259

    Article  Google Scholar 

  55. Cao K, Wei C, Gaidon A, et al. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2019. 1565–1576

  56. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2017. 701–710

  57. Mernyei P, Cangea C. Wiki-CS: a wikipedia-based benchmark for graph neural networks. In press, 2022. arXiv:2007.02901

  58. Yuan B, Ma X. Sampling + reweighting: boosting the performance of adaBoost on imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks, Brisbane, 2012. 1–6

  59. Ando S, Huang C Y. Deep over-sampling framework for classifying imbalanced data. In: Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, Skopje, 2017. 770–785

  60. Khan A H, Cao X, Li S, et al. BAS-ADAM: an ADAM based approach to improve the performance of beetle antennae search optimizer. IEEE CAA J Autom Sin, 2020, 7: 461–471

    Article  Google Scholar 

  61. Shen X, Zhu X, Jiang X, et al. Visualization of non-metric relationships by adaptive learning multiple maps t-SNE regularization. In: Proceedings of the IEEE BigData, Boston, 2017. 3882–3887

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2021ZD0112500), National Natural Science Foundation of China (Grant Nos. U22A2098, 62172185, 62202200, 62206105), and Fundamental Research Funds for the Central Universities, JLU.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xueyan Liu or Bo Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, R., Zhang, C., Zhang, Y. et al. A novel graph oversampling framework for node classification in class-imbalanced graphs. Sci. China Inf. Sci. 67, 162101 (2024). https://doi.org/10.1007/s11432-023-3897-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-023-3897-2

Keywords

Navigation