Abstract
Graph neural network (GNN) is a promising method to analyze graphs. Most existing GNNs adopt the class-balanced assumption, which cannot deal with class-imbalanced graphs well. The oversampling technique is effective in alleviating class-imbalanced problems. However, most graph oversampling methods generate synthetic minority nodes and their edges after applying GNNs. They ignore the problem that the representations of the original and synthetic minority nodes are dominated by majority nodes caused by aggregating neighbor information through GNN before oversampling. In this paper, we propose a novel graph oversampling framework, termed distribution alignment-based oversampling for node classification in class-imbalanced graphs (named Graph-DAO). Our framework generates synthetic minority nodes before GNN to avoid the dominance of majority nodes caused by message passing in GNNs. Additionally, we introduce a distribution alignment method based on the sum-product network to learn more information about minority nodes. To our best knowledge, it is the first to use the sum-product network to solve the class-imbalanced problem in node classification. A large number of experiments on four real datasets show that our method achieves the optimal results on the node classification task for class-imbalanced graphs.
Similar content being viewed by others
References
Li M, Zhu Z. Spatial-temporal fusion graph neural networks for traffic flow forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 4189–4196
Wang Z, Wang C, Gao C, et al. An evolutionary autoencoder for dynamic community detection. Sci China Inf Sci, 2020, 63: 212205
Romanou A, Smeros P, Aberer K. On representation learning for scientific news articles using heterogeneous knowledge graphs. In: Proceedings of the Web Conference, Ljubljana, 2021. 422–425
Sen P, Namata G, Bilgic M, et al. Collective classification in network data. AI Mag, 2008, 29: 93–106
Mohammadrezaei M, Shiri M E, Rahmani A M. Identifying fake accounts on social networks based on graph analysis and classification algorithms. Secur Commun Netw, 2018, 2018: 1–8
Xu K, Hu W, Leskovec J, et al. How powerful are graph neural networks? In: Proceedings of the Learning Representations, New Orleans, 2019
Ye Y, Ji S. Sparse graph attention networks. IEEE Trans Knowl Data Eng, 2023, 35: 905–916
Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the Learning Representations, Toulon, 2017
Guo L, Tang L, Chen T, et al. DA-GCN: a domain-aware attentive graph convolution network for shared-account cross-domain sequential recommendation. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, 2021. 2483–2489
Chen H, Zhuang F, Xiao L. AMA-GCN: adaptive multi-layer aggregation graph convolutional network for disease prediction. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, 2021. 2235–2241
Eliasof M, Haber E, Treister E. PDE-GCN: novel architectures for graph neural networks motivated by partial differential equation. In: Proceedings of the Advances in Neural Information Processing Systems, 2021. 3836–3849
You H, Lu Z, Zhou Z, et al. Early-bird GCNs: graph-network co-optimization towards more efficient GCN training and inference via drawing early-bird lottery tickets. In: Proceedings of the 34th Conference on Innovative Applications of Artificial Intelligence, 2022. 8910–8918
Li S, Li W T, Wang W. CO-GCN for multi-view semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020. 4691–4698
Tian L, Wu H. MI-GCN: node mutual information-based graph convolutional network. In: Proceedings of the Web Conference, Lyon, 2022. 996–1003
Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the Advances in Neural Information Processing Systems, Long Beach, 2017. 1024–1034
Yin J, Gan C, Zhao K, et al. A novel model for imbalanced data classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020. 6680–6687
Menon A K, Jayasumana S, Rawat A S, et al. Long-tail learning via logit adjustment. In: Proceedings of the Learning Representations, 2021
Chen D, Lin Y, Zhao G, et al. Topology-imbalance learning for semi-supervised node classification. In: Proceedings of the Advances in Neural Information Processing Systems, 2021. 29885–29897
Park J, Song J, Yang E. GraphENS: neighbor-aware ego network synthesis for class-imbalanced node classification. In: Proceedings of the Learning Representations, 2022
Zhao T, Zhang X, Wang S. GraphSMOTE: imbalanced node classification on graphs with graph neural networks. In: Proceedings of the Web Search and Data Mining, 2021. 833–841
Shi M, Tang Y, Zhu X, et al. Multi-class imbalanced graph convolutional network learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020. 2879–2885
Li X, Wen L, Deng Y, et al. Graph neural network with curriculum learning for imbalanced node classification. 2022. arXiv:2202.02529
Wu L, Xia J, Gao Z, et al. GraphMixup: improving class-imbalanced node classification on graphs by self-supervised context prediction. In: Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, Grenoble, 2022. 519–535
Song J, Park J, Yang E. TAM: topology-aware margin loss for class-imbalanced node classification. In: Proceedings of the Machine Learning, Baltimore, 2022. 20369–20383
Wang K, An J, Zhou M, et al. Minority-weighted graph neural network for imbalanced node classification in social networks of internet of people. IEEE Int Things J, 2023, 10: 330–340
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321–357
Wang Z, Ye X, Wang C, et al. RSDNE: exploring relaxed similarity and dissimilarity from completely-imbalanced labels for network embedding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 475–482
Wang Y. Fair graph representation learning with imbalanced and biased data. In: Proceedings of the 15th ACM International Conference on Web Search and Data Mining, 2022. 1557–1558
Ghorbani M, Kazi A, Baghshah M S, et al. RA-GCN: graph convolutional network for disease prediction problems with imbalanced data. Med Image Anal, 2022, 75: 102272
Huang Z, Tang Y, Chen Y. A graph neural network-based node classification model on class-imbalanced graph data. Knowl-Based Syst, 2022, 244: 108538
Shi S, Qiao K, Yang S, et al. AdaGCN: adaptive boosting algorithm for graph convolutional networks on imbalanced node classification. 2022. arXiv:2105.11625
Qian Y, Zhang C, Zhang Y, et al. Co-modality graph contrastive learning for imbalanced node classification. In: Proceedings of the Advances in Neural Information Processing Systems, 2022
Qu L, Zhu H, Zheng R, et al. ImGAGN: imbalanced network embedding via generative adversarial graph networks. In: Proceedings of the Conference on Knowledge Discovery and Data Mining, 2021. 1390–1398
Poon H, Domingos P. Sum-product networks: a new deep architecture. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, Barcelona, 2011. 337–346
Gens R, Domingos P. Learning the structure of sum-product networks. In: Proceedings of the Machine Learning, Atlanta, 2013. 873–880
Sánchez-Cauce R, París I, Díez F. Sum-product networks: a survey. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 3821–3839
Peharz R, Vergari A, Stelzner K, et al. Random sum-product networks: a simple and effective approach to probabilistic deep learning. In: Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, Tel Aviv, 2019. 334–344
Peharz R, Vergari A, Stelzner K, et al. Einsum networks: fast and scalable learning of tractable probabilistic circuits. In: Proceedings of the 37th International Conference on Machine Learning, 2020. 7563–7574
Dennis A, Ventura D. Learning the architecture of sum-product networks using clustering on variables. In: Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 3239–3247
Papastavridis, John G. Tensor Calculus and Analytical Dynamics. New York: CRC Press, 1988
Nath A, Domingos P M. Learning relational sum-product networks. In: Proceedings of the National Conference on Artificial Intelligence, Austin, 2015. 2878–28867
Zheng K, Pronobis A, Rao R. Learning graph-structured sum-product networks for probabilistic semantic maps. In: Proceedings of the National Conference on Artificial Intelligence, New Orleans, 2018. 4547–4555
Xia R, Zhang Y, Zhang C. Multi-head variational graph autoencoder constrained by sum-product networks. In: Proceedings of the ACM Web Conference, Austin, 2023. 641–650
Kipf T N, Welling M. Variational graph auto-encoders. In: Proceedings of the Advances in Neural Information Processing Systems, 2017. 1–3
Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009
Romanou A, Smeros P, Aberer K. On representation learning for scientific news articles using heterogeneous knowledge graphs. In: Proceedings of the ACM Web Conference, 2021. 422–425
Wang Z, Wang C, Li X, et al. Evolutionary Markov dynamics for network community detection. IEEE Trans Knowl Data Eng, 2022, 34: 1206–1220
Tian Y, Zhang C, Guo Z. Recipe2Vec: multi-modal recipe representation learning with graph neural networks. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence, Vienna, 2022. 3473–3479
Chen Z, Chen F, Zhang L, et al. Bridging the gap between spatial and spectral domains: a survey on graph neural networks. 2020. doi: https://doi.org/10.1145/3627816
Wang X, Zhang M. How powerful are spectral graph neural networks. In: Proceedings of the Machine Learning, Baltimore, 2022. 23341–23362
Zhu H, Koniusz P. Simple spectral graph convolution. In: Proceedings of the Learning Representations, 2021
Verma V, Lamb A, Beckham C, et al. Manifold mixup: better representations by interpolating hidden states. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, 2019. 6438–6447
Lu Q, Getoor L. Link-based classification. machine learning. In: Proceedings of the 20th International Conference, Washington, 2003. 496–503
Buda M, Maki A, Mazurowski M A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw, 2018, 106: 249–259
Cao K, Wei C, Gaidon A, et al. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2019. 1565–1576
Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2017. 701–710
Mernyei P, Cangea C. Wiki-CS: a wikipedia-based benchmark for graph neural networks. In press, 2022. arXiv:2007.02901
Yuan B, Ma X. Sampling + reweighting: boosting the performance of adaBoost on imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks, Brisbane, 2012. 1–6
Ando S, Huang C Y. Deep over-sampling framework for classifying imbalanced data. In: Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, Skopje, 2017. 770–785
Khan A H, Cao X, Li S, et al. BAS-ADAM: an ADAM based approach to improve the performance of beetle antennae search optimizer. IEEE CAA J Autom Sin, 2020, 7: 461–471
Shen X, Zhu X, Jiang X, et al. Visualization of non-metric relationships by adaptive learning multiple maps t-SNE regularization. In: Proceedings of the IEEE BigData, Boston, 2017. 3882–3887
Acknowledgements
This work was supported by National Key R&D Program of China (Grant No. 2021ZD0112500), National Natural Science Foundation of China (Grant Nos. U22A2098, 62172185, 62202200, 62206105), and Fundamental Research Funds for the Central Universities, JLU.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Xia, R., Zhang, C., Zhang, Y. et al. A novel graph oversampling framework for node classification in class-imbalanced graphs. Sci. China Inf. Sci. 67, 162101 (2024). https://doi.org/10.1007/s11432-023-3897-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-3897-2