skip to main content
10.1145/3573428.3573772acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

Imbalanced Nodes Classification for Graph Neural Networks Based on Valuable Sample Mining

Published:15 March 2023Publication History

ABSTRACT

Node classification is an important task in graph neural networks, but most existing studies assume that samples from different classes are balanced. However, the class imbalance problem is widespread and can seriously affect the model's performance. Reducing the adverse effects of imbalanced datasets on model training is crucial to improve the model's performance. Therefore, a new loss function FD-Loss is reconstructed based on the traditional algorithm-level approach to the imbalance problem. Firstly, we propose sample mismeasurement distance to filter edge-hard samples and simple samples based on the distribution. Then, the weight coefficients are defined based on the mismeasurement distance and used in the loss function weighting term, so that the loss function focuses only on valuable samples. Experiments on several benchmarks demonstrate that our loss function can effectively solve the sample node imbalance problem and improve the classification accuracy by 4% compared to existing methods in the node classification task.

References

  1. Wu, L., Lin, H., Gao, Z., Tan, C., & Li, S. 2021. GraphMixup: Improving Class-Imbalanced Node Classification on Graphs by Self-supervised Context Prediction. ArXiv, abs/2106.11133.Google ScholarGoogle Scholar
  2. Liu, Z., Chen, C., Yang, X., Zhou, J., Li, X., & Song, L. 2018. Heterogeneous Graph Neural Networks for Malicious Account Detection. Proceedings of the 27th ACM International Conference on Information and Knowledge Management.Google ScholarGoogle Scholar
  3. Seyda Ertekin, Jian Huang, and C Lee Giles. Active Learning for Class Imbalance Problem. In the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, 2007, pages 823–824, 2007.Google ScholarGoogle Scholar
  4. Wang, Xinyue, Bo Liu, Siyu Cao, Liping Jing and Jian Yu. “Important sampling based active learning for imbalance classification.” Science China Information Sciences 63, 2020: 1-14.Google ScholarGoogle Scholar
  5. Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of artificial intelligence research, 16: 321–357, 2002.Google ScholarGoogle Scholar
  6. Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence, WCCI 2008, Hong Kong, China, June 1-6, 2008, pages 1322–1328. IEEE, 2008.Google ScholarGoogle Scholar
  7. Iman Nekooeimehr and Susana K Lai-Yuen. Adaptive Semi-unsupervised Weighted Oversampling (A-SUWO) for Imbalanced Datasets. Expert Systems with Applications, 46:405–416, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning Deep Representation for Imbalanced Classification. In the 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pages 5375–5384, 2016.Google ScholarGoogle Scholar
  9. Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to Reweight Examples for Robust Deep Learning. In International Conference on Machine Learning, pages 4334–4343. PMLR, 2018.Google ScholarGoogle Scholar
  10. Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge J. Belongie. Class-Balanced Loss Based on Effective Number of Samples. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 9268–9277. Computer Vision Foundation / IEEE, 2019.Google ScholarGoogle Scholar
  11. Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Aréchiga, and Tengyu Ma. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 1565–1576, 2019.Google ScholarGoogle Scholar
  12. William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 1024–1034, 2017.Google ScholarGoogle Scholar
  13. Wei Li, Ruihan Bao, Keiko Harimoto, Deli Chen, Jingjing Xu, and Qi Su. Modeling the Stock Relation with Graph Network for Overnight Stock Movement Prediction. In the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 4541–4547. ijcai.org, 2020.Google ScholarGoogle Scholar
  14. Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of Graph Neural Network Evaluation. arXiv preprint: 1811.05868, 2018.Google ScholarGoogle Scholar
  15. Wu, Zhenqin, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing and Vijay S. Pande. “MoleculeNet: A Benchmark for Molecular Machine Learning.” arXiv: Learning, 2017: n. pag.Google ScholarGoogle Scholar
  16. Nikolay G Prokoptsev, AE Alekseenko, and Yaroslav Aleksandrovich Kholodov. Traffic Flow Speed Prediction on Transportation Graph with Convolutional Neural Networks. Computer research and modeling, 10(3):359–367, 2018.Google ScholarGoogle Scholar
  17. Min Shi, Yufei Tang, Xingquan Zhu, David A. Wilson, and Jianxun Liu. Multi-Class Imbalanced Graph Convolutional Network Learning. In the 29th International Joint Conference on Artifificial Intelligence, IJCAI 2020, pages 2879–2885. ijcai.org, 2020.Google ScholarGoogle Scholar
  18. Zhang, Zhilu and Mert Rory Sabuncu. “Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels.” NeurIPS, 2018.Google ScholarGoogle Scholar
  19. Lin, Tsung-Yi, Priya Goyal, Ross B. Girshick, Kaiming He and Piotr Dollár. “Focal Loss for Dense Object Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2020: 318-327.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ivan Tomek Two Modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, 1976.Google ScholarGoogle Scholar
  21. Inderjeet Mani and I Zhang. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In Workshop on Learning from Imbalanced Datasets, volume 126, 2003.Google ScholarGoogle Scholar
  22. Miroslav Kubat, Stan Matwin, Addressing the Curse of Imbalanced Training Sets: One-sided Selection. In the 14th International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, July 8-12, 1997, volume 97, pages 179–186. Morgan Kaufmann, 1997.Google ScholarGoogle Scholar
  23. Min Shi, Yufei Tang, Xingquan Zhu, David A. Wilson, and Jianxun Liu. Multi-Class Imbalanced Graph Convolutional Network Learning. In the 29th International Joint Conference on Artifificial Intelligence, IJCAI 2020, pages 2879–2885. ijcai.org, 2020.Google ScholarGoogle Scholar
  24. Mahsa Ghorbani, Anees Kazi, Mahdieh Soleymani Baghshah, Hamid R. Rabiee, and Nassir Navab. RA-GCN: Graph Convolutional Network for Disease Prediction Problems with Imbalanced Data. arXiv preprint: 2103.00221, 2021.Google ScholarGoogle Scholar
  25. Shi Shuhao, Kai Qiao, Shuai Yang, Linyuan Wang, Jian Chen and Bin Yan. “Boosting-GNN: Boosting Algorithm for Graph Networks on Imbalanced Node Classification.” Frontiers in Neurorobotics 15, 2021: n. pag.Google ScholarGoogle Scholar
  26. Zhao Tianxiang, Xiang Zhang and Suhang Wang. “GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks.” Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021: n. pag.Google ScholarGoogle Scholar
  27. Chen Deli, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie Zhou and Xu Sun. “Topology-Imbalance Learning for Semi-Supervised Node Classification.” NeurIPS, 2021.Google ScholarGoogle Scholar
  28. Ciano, G., Rossi, A., Bianchini, M., & Scarselli, F. 2022. On Inductive–Transductive Learning With Graph Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 758-769.Google ScholarGoogle Scholar
  29. Kipf, Thomas and Max Welling. “Semi-Supervised Classification with Graph Convolutional Networks.” ArXiv abs/1609.02907. 2017: n. pag.Google ScholarGoogle Scholar
  30. Yang, Z., Cohen, W.W., & Salakhutdinov, R. 2016. Revisiting Semi-Supervised Learning with Graph Embeddings. ArXiv, abs/1603.08861.Google ScholarGoogle Scholar
  31. Ley, M. 2002. The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. SPIRE.Google ScholarGoogle Scholar
  32. Shchur, O., Mumme, M., Bojchevski, A., & Günnemann, S. 2018. Pitfalls of Graph Neural Network Evaluation. ArXiv, abs/1811.05868.Google ScholarGoogle Scholar
  33. Velickovic, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio’ and Yoshua Bengio. “Graph Attention Networks.” ArXiv abs/1710.10903. 2018: n. pag.Google ScholarGoogle Scholar
  34. Hamilton, W.L., Ying, Z., & Leskovec, J. 2017. Inductive Representation Learning on Large Graphs. NIPS.Google ScholarGoogle Scholar

Index Terms

  1. Imbalanced Nodes Classification for Graph Neural Networks Based on Valuable Sample Mining

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
      October 2022
      1999 pages
      ISBN:9781450397148
      DOI:10.1145/3573428

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 March 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate508of972submissions,52%
    • Article Metrics

      • Downloads (Last 12 months)28
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format