Skip to main content

GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

  • 1109 Accesses

Abstract

Data imbalance, i.e., some classes may have much fewer samples than others, is a serious problem that can lead to unfavorable node classification. However, most existing GNNs are based on the assumption that node samples for different classes are balanced. In this case, directly training a GNN classifier with raw data would under-represent samples from those minority classes and result in sub-optimal performance. This paper proposes GraphMixup, a novel mixup-based framework for improving class-imbalanced node classification on graphs. However, directly performing mixup in the input space or embedding space may produce out-of-domain samples due to the extreme sparsity of minority classes; hence we construct semantic relation spaces that allow Feature Mixup to be performed at the semantic level. Moreover, we apply two context-based self-supervised techniques to capture both local and global information in the graph structure and specifically propose Edge Mixup to handle graph data. Finally, we develop a Reinforcement Mixup mechanism to adaptively determine how many samples are to be generated by mixup for those minority classes. Extensive experiments on three real-world datasets have shown that GraphMixup yields truly encouraging results for the task of class-imbalanced node classification. Codes are available at: https://github.com/LirongWu/GraphMixup.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43

    Chapter  Google Scholar 

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  3. Ghorbani, M., Kazi, A., Baghshah, M.S., Rabiee, H.R., Navab, N.: Ra-GCN: graph convolutional network for disease prediction problems with imbalanced data. Med. Image Anal. 75, 102272 (2022)

    Article  Google Scholar 

  4. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, pp. 1024–1034 (2017)

    Google Scholar 

  5. Huang, X., Li, J., Hu, X.: Label informed attributed network embedding. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 731–739 (2017)

    Google Scholar 

  6. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    Article  MATH  Google Scholar 

  7. Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019). https://doi.org/10.1186/s40537-019-0192-5

    Article  Google Scholar 

  8. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  10. Kipf, T.N., Welling, M.: Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)

  11. Leevy, J.L., Khoshgoftaar, T.M., Bauder, R.A., Seliya, N.: A survey on addressing high-class imbalance in big data. J. Big Data 5(1), 1–30 (2018). https://doi.org/10.1186/s40537-018-0151-6

    Article  Google Scholar 

  12. Mernyei, P., Cangea, C.: Wiki-CS: a wikipedia-based benchmark for graph neural networks. arXiv preprint arXiv:2007.02901 (2020)

  13. More, A.: Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048 (2016)

  14. Qu, L., Zhu, H., Zheng, R., Shi, Y., Yin, H.: ImGAGN: imbalanced network embedding via generative adversarial graph networks. arXiv preprint arXiv:2106.02817 (2021)

  15. Rout, N., Mishra, D., Mallick, M.K.: Handling imbalanced data: a survey. In: Reddy, M.S., Viswanath, K., K.M., S.P. (eds.) International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications. AISC, vol. 628, pp. 431–443. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5272-9_39

    Chapter  Google Scholar 

  16. Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI Mag. 29(3), 93 (2008)

    Google Scholar 

  17. Shi, M., Tang, Y., Zhu, X., Wilson, D., Liu, J.: Multi-class imbalanced graph convolutional network learning. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) (2020)

    Google Scholar 

  18. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  19. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    Google Scholar 

  20. White, C.C., III., White, D.J.: Markov decision processes. Eur. J. Oper. Res. 39(1), 1–16 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  21. Wu, L., Lin, H., Gao, Z., Tan, C., Li, S., et al.: Self-supervised on graphs: contrastive, generative, or predictive. arXiv preprint arXiv:2105.07342 (2021)

  22. Wu, L., Lin, H., Xia, J., Tan, C., Li, S.Z.: Multi-level disentanglement graph neural network. Neural Comput. Appl. 34(11), 9087–9101 (2022). https://doi.org/10.1007/s00521-022-06930-1

    Article  Google Scholar 

  23. Wu, L., Yuan, L., Zhao, G., Lin, H., Li, S.Z.: Deep clustering and visualization for end-to-end high-dimensional data analysis. In: IEEE Transactions on Neural Networks and Learning Systems (2022)

    Google Scholar 

  24. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

  25. Zhao, T., Zhang, X., Wang, S.: GraphSMOTE: imbalanced node classification on graphs with graph neural networks. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 833–841 (2021)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the Science and Technology Innovation 2030 - Major Project (No. 2021ZD0150100) and National Natural Science Foundation of China (No. U21A20427).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lirong Wu or Jun Xia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, L., Xia, J., Gao, Z., Lin, H., Tan, C., Li, S.Z. (2023). GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26412-2_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26411-5

  • Online ISBN: 978-3-031-26412-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics