ABSTRACT
As malicious bots reside in a network to disrupt network stability, graph neural networks (GNNs) have emerged as one of the most popular bot detection methods. However, in most cases these graphs are significantly class-imbalanced. To address this issue, graph oversampling has recently been proposed to synthesize nodes and edges, which still suffers from graph heterophily, leading to suboptimal performance. In this paper, we propose HOVER, which implements Homophilic Oversampling Via Edge Removal for bot detection on graphs. Instead of oversampling nodes and edges within initial graph structure, HOVER designs a simple edge removal method with heuristic criteria to mitigate heterophily and learn distinguishable node embeddings, which are then used to oversample minority bots to generate a balanced class distribution without edge synthesis. Experiments on TON IoT networks demonstrate the state-of-the-art performance of HOVER on bot detection with high graph heterophily and extreme class imbalance.
- Seyed Ali Alhosseini, Raad Bin Tareaf, Pejman Najafi, and Christoph Meinel. 2019. Detect me if you can: Spam bot detection using inductive representation learning. In Companion Proceedings of The 2019 World Wide Web Conference. 148--153.Google ScholarDigital Library
- Abdullah Alsaedi, Nour Moustafa, Zahir Tari, Abdun Mahmood, and Adnan Anwar. 2020. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. Ieee Access, Vol. 8 (2020), 165130--165150.Google ScholarCross Ref
- Moitrayee Chatterjee, Akbar Siami Namin, and Prerit Datta. 2018. Evidence fusion for malicious bot detection in IoT. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 4545--4548.Google ScholarCross Ref
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, Vol. 16 (2002), 321--357.Google ScholarCross Ref
- Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 3438--3445.Google ScholarCross Ref
- Lingwei Chen, Xiaoting Li, and Dinghao Wu. 2021. Enhancing robustness of graph convolutional networks via dropping graph connections. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14--18, 2020, Proceedings, Part III. Springer, 412--428.Google ScholarDigital Library
- Lun Du, Xiaozhou Shi, Qiang Fu, Xiaojun Ma, Hengyu Liu, Shi Han, and Dongmei Zhang. 2022. GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily. In Proceedings of the ACM Web Conference 2022. 1550--1558.Google ScholarDigital Library
- Yijun Duan, Xin Liu, Adam Jatowt, Hai-tao Yu, Steven Lynden, Kyoung-Sook Kim, and Akiyoshi Matono. 2022. Anonymity can Help Minority: A Novel Synthetic Data Over-Sampling Strategy on Multi-label Graphs. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 20--36.Google Scholar
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- Quan Li, Lingwei Chen, Yong Cai, and Dinghao Wu. 2023. Hierarchical Graph Neural Network for Patient Treatment Preference Prediction with External Knowledge. In Advances in Knowledge Discovery and Data Mining: 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Osaka, Japan, May 25--28, 2023, Proceedings, Part III. Springer, 204--215.Google ScholarDigital Library
- Quan Li, Xiaoting Li, Lingwei Chen, and Dinghao Wu. 2022. Distilling Knowledge on Text Graph for Social Media Attribute Inference. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024--2028.Google ScholarDigital Library
- Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. 2021. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems (2021).Google Scholar
- Yixin Liu, Yizhen Zheng, Daokun Zhang, Vincent Lee, and Shirui Pan. 2022. Beyond Smoothing: Unsupervised Graph Representation Learning with Edge Heterophily Discriminating. arXiv preprint arXiv:2211.14065 (2022).Google Scholar
- Wai Weng Lo, Gayan Kulatilleke, Mohanad Sarhan, Siamak Layeghy, and Marius Portmann. 2023. XG-BoT: An explainable deep graph neural network for botnet detection and forensics. Internet of Things, Vol. 22 (2023), 100747.Google ScholarCross Ref
- Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Mingde Zhao, Shuyuan Zhang, Xiao-Wen Chang, and Doina Precup. 2022. Revisiting heterophily for graph neural networks. arXiv preprint arXiv:2210.07606 (2022).Google Scholar
- Dongsheng Luo, Wei Cheng, Wenchao Yu, Bo Zong, Jingchao Ni, Haifeng Chen, and Xiang Zhang. 2021. Learning to drop: Robust graph neural network via topological denoising. In Proceedings of the 14th ACM international conference on web search and data mining. 779--787.Google ScholarDigital Library
- Yao Ma, Xiaorui Liu, Neil Shah, and Jiliang Tang. 2021. Is homophily a necessity for graph neural networks? arXiv preprint arXiv:2106.06134 (2021).Google Scholar
- Nour Moustafa. 2021. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets. Sustainable Cities and Society, Vol. 72 (2021), 102994.Google ScholarCross Ref
- Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-gcn: Geometric graph convolutional networks. arXiv preprint arXiv:2002.05287 (2020).Google Scholar
- Bruno Martins Rahal, Aldri Santos, and Michele Nogueira. 2020. A distributed architecture for DDoS prediction and bot detection. IEEE Access, Vol. 8 (2020), 159756--159772.Google ScholarCross Ref
- Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. 2019. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903 (2019).Google Scholar
- Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), Vol. 42, 3 (2017), 1--21.Google ScholarDigital Library
- Huijun Wu, Chen Wang, Yuriy Tyshetskiy, Andrew Docherty, Kai Lu, and Liming Zhu. 2019. Adversarial examples on graph data: Deep insights into attack and defense. arXiv preprint arXiv:1903.01610 (2019).Google Scholar
- Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, and Danai Koutra. 2022. Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 1287--1292.Google ScholarCross Ref
- Bonan Zhang, Jingjin Li, Chao Chen, Kyungmi Lee, and Ickjai Lee. 2022. A Practical Botnet Traffic Detection System Using GNN. In Cyberspace Safety and Security: 13th International Symposium, CSS 2021, Virtual Event, November 9--11, 2021, Proceedings 13. Springer, 66--78.Google ScholarDigital Library
- Junjie Zhang, Roberto Perdisci, Wenke Lee, Xiapu Luo, and Unum Sarfraz. 2013. Building a scalable system for stealthy P2P-botnet detection. IEEE transactions on information forensics and security, Vol. 9, 1 (2013), 27--38.Google Scholar
- Tianxiang Zhao, Xiang Zhang, and Suhang Wang. 2021. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining. 833--841.Google ScholarDigital Library
- Xin Zheng, Yixin Liu, Shirui Pan, Miao Zhang, Di Jin, and Philip S Yu. 2022. Graph neural networks for graphs with heterophily: A survey. arXiv preprint arXiv:2202.07082 (2022).Google Scholar
- Jiawei Zhou, Zhiying Xu, Alexander M Rush, and Minlan Yu. 2020. Automating botnet detection with graph neural networks. arXiv preprint arXiv:2003.06344 (2020).Google Scholar
- Jiong Zhu, Ryan A Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K Ahmed, and Danai Koutra. 2021. Graph neural networks with heterophily. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11168--11176.Google ScholarCross Ref
- Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7793--7804.Google Scholar
Index Terms
- HOVER: Homophilic Oversampling via Edge Removal for Class-Imbalanced Bot Detection on Graphs
Recommendations
Detect Me If You Can: Spam Bot Detection Using Inductive Representation Learning
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceSpam Bots have become a threat to online social networks with their malicious behavior, posting misinformation messages and influencing online platforms to fulfill their motives. As spam bots have become more advanced over time, creating algorithms to ...
On-Demand Bot Detection and Archival System
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionUnusually high correlation in activities among users in social media is an indicator of bot behavior. We have developed a system, called DeBot, that identifies such bots in Twitter network. Our system reports and archives thousands of bot accounts every ...
Bot Detection in Reddit Political Discussion
SocialSense'19: Proceedings of the Fourth International Workshop on Social SensingThe existence of social media bots on political forums can muddle the perception of public opinion. Bot detection has been successful on platforms such as Twitter, Facebook and Youtube. However, our research focuses on characterizing suspicious behavior ...
Comments