ABSTRACT
Node classification is an important task in graph neural networks, but most existing studies assume that samples from different classes are balanced. However, the class imbalance problem is widespread and can seriously affect the model's performance. Reducing the adverse effects of imbalanced datasets on model training is crucial to improve the model's performance. Therefore, a new loss function FD-Loss is reconstructed based on the traditional algorithm-level approach to the imbalance problem. Firstly, we propose sample mismeasurement distance to filter edge-hard samples and simple samples based on the distribution. Then, the weight coefficients are defined based on the mismeasurement distance and used in the loss function weighting term, so that the loss function focuses only on valuable samples. Experiments on several benchmarks demonstrate that our loss function can effectively solve the sample node imbalance problem and improve the classification accuracy by 4% compared to existing methods in the node classification task.
- Wu, L., Lin, H., Gao, Z., Tan, C., & Li, S. 2021. GraphMixup: Improving Class-Imbalanced Node Classification on Graphs by Self-supervised Context Prediction. ArXiv, abs/2106.11133.Google Scholar
- Liu, Z., Chen, C., Yang, X., Zhou, J., Li, X., & Song, L. 2018. Heterogeneous Graph Neural Networks for Malicious Account Detection. Proceedings of the 27th ACM International Conference on Information and Knowledge Management.Google Scholar
- Seyda Ertekin, Jian Huang, and C Lee Giles. Active Learning for Class Imbalance Problem. In the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, 2007, pages 823–824, 2007.Google Scholar
- Wang, Xinyue, Bo Liu, Siyu Cao, Liping Jing and Jian Yu. “Important sampling based active learning for imbalance classification.” Science China Information Sciences 63, 2020: 1-14.Google Scholar
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of artificial intelligence research, 16: 321–357, 2002.Google Scholar
- Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence, WCCI 2008, Hong Kong, China, June 1-6, 2008, pages 1322–1328. IEEE, 2008.Google Scholar
- Iman Nekooeimehr and Susana K Lai-Yuen. Adaptive Semi-unsupervised Weighted Oversampling (A-SUWO) for Imbalanced Datasets. Expert Systems with Applications, 46:405–416, 2016.Google ScholarDigital Library
- Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning Deep Representation for Imbalanced Classification. In the 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pages 5375–5384, 2016.Google Scholar
- Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to Reweight Examples for Robust Deep Learning. In International Conference on Machine Learning, pages 4334–4343. PMLR, 2018.Google Scholar
- Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge J. Belongie. Class-Balanced Loss Based on Effective Number of Samples. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 9268–9277. Computer Vision Foundation / IEEE, 2019.Google Scholar
- Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Aréchiga, and Tengyu Ma. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 1565–1576, 2019.Google Scholar
- William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 1024–1034, 2017.Google Scholar
- Wei Li, Ruihan Bao, Keiko Harimoto, Deli Chen, Jingjing Xu, and Qi Su. Modeling the Stock Relation with Graph Network for Overnight Stock Movement Prediction. In the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 4541–4547. ijcai.org, 2020.Google Scholar
- Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of Graph Neural Network Evaluation. arXiv preprint: 1811.05868, 2018.Google Scholar
- Wu, Zhenqin, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing and Vijay S. Pande. “MoleculeNet: A Benchmark for Molecular Machine Learning.” arXiv: Learning, 2017: n. pag.Google Scholar
- Nikolay G Prokoptsev, AE Alekseenko, and Yaroslav Aleksandrovich Kholodov. Traffic Flow Speed Prediction on Transportation Graph with Convolutional Neural Networks. Computer research and modeling, 10(3):359–367, 2018.Google Scholar
- Min Shi, Yufei Tang, Xingquan Zhu, David A. Wilson, and Jianxun Liu. Multi-Class Imbalanced Graph Convolutional Network Learning. In the 29th International Joint Conference on Artifificial Intelligence, IJCAI 2020, pages 2879–2885. ijcai.org, 2020.Google Scholar
- Zhang, Zhilu and Mert Rory Sabuncu. “Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels.” NeurIPS, 2018.Google Scholar
- Lin, Tsung-Yi, Priya Goyal, Ross B. Girshick, Kaiming He and Piotr Dollár. “Focal Loss for Dense Object Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2020: 318-327.Google ScholarCross Ref
- Ivan Tomek Two Modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, 1976.Google Scholar
- Inderjeet Mani and I Zhang. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In Workshop on Learning from Imbalanced Datasets, volume 126, 2003.Google Scholar
- Miroslav Kubat, Stan Matwin, Addressing the Curse of Imbalanced Training Sets: One-sided Selection. In the 14th International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, July 8-12, 1997, volume 97, pages 179–186. Morgan Kaufmann, 1997.Google Scholar
- Min Shi, Yufei Tang, Xingquan Zhu, David A. Wilson, and Jianxun Liu. Multi-Class Imbalanced Graph Convolutional Network Learning. In the 29th International Joint Conference on Artifificial Intelligence, IJCAI 2020, pages 2879–2885. ijcai.org, 2020.Google Scholar
- Mahsa Ghorbani, Anees Kazi, Mahdieh Soleymani Baghshah, Hamid R. Rabiee, and Nassir Navab. RA-GCN: Graph Convolutional Network for Disease Prediction Problems with Imbalanced Data. arXiv preprint: 2103.00221, 2021.Google Scholar
- Shi Shuhao, Kai Qiao, Shuai Yang, Linyuan Wang, Jian Chen and Bin Yan. “Boosting-GNN: Boosting Algorithm for Graph Networks on Imbalanced Node Classification.” Frontiers in Neurorobotics 15, 2021: n. pag.Google Scholar
- Zhao Tianxiang, Xiang Zhang and Suhang Wang. “GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks.” Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021: n. pag.Google Scholar
- Chen Deli, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie Zhou and Xu Sun. “Topology-Imbalance Learning for Semi-Supervised Node Classification.” NeurIPS, 2021.Google Scholar
- Ciano, G., Rossi, A., Bianchini, M., & Scarselli, F. 2022. On Inductive–Transductive Learning With Graph Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 758-769.Google Scholar
- Kipf, Thomas and Max Welling. “Semi-Supervised Classification with Graph Convolutional Networks.” ArXiv abs/1609.02907. 2017: n. pag.Google Scholar
- Yang, Z., Cohen, W.W., & Salakhutdinov, R. 2016. Revisiting Semi-Supervised Learning with Graph Embeddings. ArXiv, abs/1603.08861.Google Scholar
- Ley, M. 2002. The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. SPIRE.Google Scholar
- Shchur, O., Mumme, M., Bojchevski, A., & Günnemann, S. 2018. Pitfalls of Graph Neural Network Evaluation. ArXiv, abs/1811.05868.Google Scholar
- Velickovic, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio’ and Yoshua Bengio. “Graph Attention Networks.” ArXiv abs/1710.10903. 2018: n. pag.Google Scholar
- Hamilton, W.L., Ying, Z., & Leskovec, J. 2017. Inductive Representation Learning on Large Graphs. NIPS.Google Scholar
Index Terms
- Imbalanced Nodes Classification for Graph Neural Networks Based on Valuable Sample Mining
Recommendations
Imbalanced Graph Classification via Graph-of-Graph Neural Networks
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementGraph Neural Networks (GNNs) have achieved unprecedented success in identifying categorical labels of graphs. However, most existing graph classification problems with GNNs follow the protocol of balanced data splitting, which misaligns with many real-...
GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data MiningNode classification is an important research topic in graph learning. Graph neural networks (GNNs) have achieved state-of-the-art performance of node classification. However, existing GNNs address the problem where node samples for different classes are ...
A graph neural network-based node classification model on class-imbalanced graph data
AbstractNode classification for highly imbalanced graph data is challenging, with existing graph neural networks (GNNs) typically utilizing a balanced class distribution to learn node embeddings on graph data. However, when dealing with an ...
Highlights- A novel GNN-based imbalanced node classification model GNN-INCM is proposed.
- ...
Comments