ALLIE: Active Learning on Large-scale Imbalanced Graphs

Published: 25 April 2022 Publication History


Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website.
We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.


  Data-Centric Graph Learning: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2024.348941211:1(1-20)Online publication date: Feb-2025
  Disentangled Active Learning on GraphsNeural Networks10.1016/j.neunet.2025.107130185(107130)Online publication date: May-2025
  Uncertainty for active learning on graphsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692641(14275-14307)Online publication date: 21-Jul-2024
  1. ALLIE: Active Learning on Large-scale Imbalanced Graphs
          WWW '22: Proceedings of the ACM Web Conference 2022
          April 2022
          Published: 25 April 2022


          Author Tags

          1. Graph neural networks
          2. active learning
          3. fraud detection
          4. reinforcement learning


          WWW '22: The ACM Web Conference 2022
          April 25 - 29, 2022
          Virtual Event, Lyon, France

          • (2025)Data-Centric Graph Learning: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2024.348941211:1(1-20)Online publication date: Feb-2025
          • (2025)Disentangled Active Learning on GraphsNeural Networks10.1016/j.neunet.2025.107130185(107130)Online publication date: May-2025
          • (2024)Uncertainty for active learning on graphsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692641(14275-14307)Online publication date: 21-Jul-2024
          • (2024)Neural Collapse Anchored Prompt Tuning for Generalizable Vision-Language ModelsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671690(4631-4640)Online publication date: 25-Aug-2024
          • (2024)When Imbalance Meets Imbalance: Structure-driven Learning for Imbalanced Graph ClassificationProceedings of the ACM Web Conference 202410.1145/3589334.3645629(905-913)Online publication date: 13-May-2024
          • (2024)Anomaly Detection Service for Blockchain Transactions Using Minimal Substitution-Based Label PropagationIEEE Transactions on Services Computing10.1109/TSC.2024.340760117:5(2054-2066)Online publication date: Sep-2024
          • (2024)Adaptive graph active learning with mutual information via policy learningExpert Systems with Applications10.1016/j.eswa.2024.124773255(124773)Online publication date: Dec-2024
          • (2023)Minimal Substitution-based Label Propagation for Anomalous Blockchain Detection2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00100(685-692)Online publication date: 14-Dec-2023
          • (2023)Deep active learning for misinformation detection using geometric deep learningOnline Social Networks and Media10.1016/j.osnem.2023.10024433(100244)Online publication date: Jan-2023
          • (2023)Boosting the Performance of Deployable Timestamped Directed GNNs via Time-Relaxed SamplingMachine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track10.1007/978-3-031-43427-3_12(190-206)Online publication date: 18-Sep-2023

