short-paper

HOVER: Homophilic Oversampling via Edge Removal for Class-Imbalanced Bot Detection on Graphs

Authors:
Bradley Ashmore

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA

0009-0004-5411-4799
View Profile

,
Lingwei Chen

Wright State University, Dayton, OH, USA

Wright State University, Dayton, OH, USA

0000-0003-1550-6170
View Profile

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementOctober 2023Pages 3728–3732https://doi.org/10.1145/3583780.3615264

Published:21 October 2023Publication History

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 3728–3732

ABSTRACT

As malicious bots reside in a network to disrupt network stability, graph neural networks (GNNs) have emerged as one of the most popular bot detection methods. However, in most cases these graphs are significantly class-imbalanced. To address this issue, graph oversampling has recently been proposed to synthesize nodes and edges, which still suffers from graph heterophily, leading to suboptimal performance. In this paper, we propose HOVER, which implements Homophilic Oversampling Via Edge Removal for bot detection on graphs. Instead of oversampling nodes and edges within initial graph structure, HOVER designs a simple edge removal method with heuristic criteria to mitigate heterophily and learn distinguishable node embeddings, which are then used to oversample minority bots to generate a balanced class distribution without edge synthesis. Experiments on TON IoT networks demonstrate the state-of-the-art performance of HOVER on bot detection with high graph heterophily and extreme class imbalance.

References

Seyed Ali Alhosseini, Raad Bin Tareaf, Pejman Najafi, and Christoph Meinel. 2019. Detect me if you can: Spam bot detection using inductive representation learning. In Companion Proceedings of The 2019 World Wide Web Conference. 148--153.Google ScholarDigital Library
Abdullah Alsaedi, Nour Moustafa, Zahir Tari, Abdun Mahmood, and Adnan Anwar. 2020. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. Ieee Access, Vol. 8 (2020), 165130--165150.Google ScholarCross Ref
Moitrayee Chatterjee, Akbar Siami Namin, and Prerit Datta. 2018. Evidence fusion for malicious bot detection in IoT. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 4545--4548.Google ScholarCross Ref
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, Vol. 16 (2002), 321--357.Google ScholarCross Ref
Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 3438--3445.Google ScholarCross Ref
Lingwei Chen, Xiaoting Li, and Dinghao Wu. 2021. Enhancing robustness of graph convolutional networks via dropping graph connections. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14--18, 2020, Proceedings, Part III. Springer, 412--428.Google ScholarDigital Library
Lun Du, Xiaozhou Shi, Qiang Fu, Xiaojun Ma, Hengyu Liu, Shi Han, and Dongmei Zhang. 2022. GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily. In Proceedings of the ACM Web Conference 2022. 1550--1558.Google ScholarDigital Library
Yijun Duan, Xin Liu, Adam Jatowt, Hai-tao Yu, Steven Lynden, Kyoung-Sook Kim, and Akiyoshi Matono. 2022. Anonymity can Help Minority: A Novel Synthetic Data Over-Sampling Strategy on Multi-label Graphs. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 20--36.Google Scholar
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
Quan Li, Lingwei Chen, Yong Cai, and Dinghao Wu. 2023. Hierarchical Graph Neural Network for Patient Treatment Preference Prediction with External Knowledge. In Advances in Knowledge Discovery and Data Mining: 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Osaka, Japan, May 25--28, 2023, Proceedings, Part III. Springer, 204--215.Google ScholarDigital Library
Quan Li, Xiaoting Li, Lingwei Chen, and Dinghao Wu. 2022. Distilling Knowledge on Text Graph for Social Media Attribute Inference. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024--2028.Google ScholarDigital Library
Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. 2021. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems (2021).Google Scholar
Yixin Liu, Yizhen Zheng, Daokun Zhang, Vincent Lee, and Shirui Pan. 2022. Beyond Smoothing: Unsupervised Graph Representation Learning with Edge Heterophily Discriminating. arXiv preprint arXiv:2211.14065 (2022).Google Scholar
Wai Weng Lo, Gayan Kulatilleke, Mohanad Sarhan, Siamak Layeghy, and Marius Portmann. 2023. XG-BoT: An explainable deep graph neural network for botnet detection and forensics. Internet of Things, Vol. 22 (2023), 100747.Google ScholarCross Ref
Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Mingde Zhao, Shuyuan Zhang, Xiao-Wen Chang, and Doina Precup. 2022. Revisiting heterophily for graph neural networks. arXiv preprint arXiv:2210.07606 (2022).Google Scholar
Dongsheng Luo, Wei Cheng, Wenchao Yu, Bo Zong, Jingchao Ni, Haifeng Chen, and Xiang Zhang. 2021. Learning to drop: Robust graph neural network via topological denoising. In Proceedings of the 14th ACM international conference on web search and data mining. 779--787.Google ScholarDigital Library
Yao Ma, Xiaorui Liu, Neil Shah, and Jiliang Tang. 2021. Is homophily a necessity for graph neural networks? arXiv preprint arXiv:2106.06134 (2021).Google Scholar
Nour Moustafa. 2021. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets. Sustainable Cities and Society, Vol. 72 (2021), 102994.Google ScholarCross Ref
Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-gcn: Geometric graph convolutional networks. arXiv preprint arXiv:2002.05287 (2020).Google Scholar
Bruno Martins Rahal, Aldri Santos, and Michele Nogueira. 2020. A distributed architecture for DDoS prediction and bot detection. IEEE Access, Vol. 8 (2020), 159756--159772.Google ScholarCross Ref
Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. 2019. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903 (2019).Google Scholar
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), Vol. 42, 3 (2017), 1--21.Google ScholarDigital Library
Huijun Wu, Chen Wang, Yuriy Tyshetskiy, Andrew Docherty, Kai Lu, and Liming Zhu. 2019. Adversarial examples on graph data: Deep insights into attack and defense. arXiv preprint arXiv:1903.01610 (2019).Google Scholar
Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, and Danai Koutra. 2022. Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 1287--1292.Google ScholarCross Ref
Bonan Zhang, Jingjin Li, Chao Chen, Kyungmi Lee, and Ickjai Lee. 2022. A Practical Botnet Traffic Detection System Using GNN. In Cyberspace Safety and Security: 13th International Symposium, CSS 2021, Virtual Event, November 9--11, 2021, Proceedings 13. Springer, 66--78.Google ScholarDigital Library
Junjie Zhang, Roberto Perdisci, Wenke Lee, Xiapu Luo, and Unum Sarfraz. 2013. Building a scalable system for stealthy P2P-botnet detection. IEEE transactions on information forensics and security, Vol. 9, 1 (2013), 27--38.Google Scholar
Tianxiang Zhao, Xiang Zhang, and Suhang Wang. 2021. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining. 833--841.Google ScholarDigital Library
Xin Zheng, Yixin Liu, Shirui Pan, Miao Zhang, Di Jin, and Philip S Yu. 2022. Graph neural networks for graphs with heterophily: A survey. arXiv preprint arXiv:2202.07082 (2022).Google Scholar
Jiawei Zhou, Zhiying Xu, Alexander M Rush, and Minlan Yu. 2020. Automating botnet detection with graph neural networks. arXiv preprint arXiv:2003.06344 (2020).Google Scholar
Jiong Zhu, Ryan A Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K Ahmed, and Danai Koutra. 2021. Graph neural networks with heterophily. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11168--11176.Google ScholarCross Ref
Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7793--7804.Google Scholar

Index Terms

HOVER: Homophilic Oversampling via Edge Removal for Class-Imbalanced Bot Detection on Graphs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Network security

Recommendations

Detect Me If You Can: Spam Bot Detection Using Inductive Representation Learning
WWW '19: Companion Proceedings of The 2019 World Wide Web Conference

Spam Bots have become a threat to online social networks with their malicious behavior, posting misinformation messages and influencing online platforms to fulfill their motives. As spam bots have become more advanced over time, creating algorithms to ...
Read More
On-Demand Bot Detection and Archival System
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web Companion

Unusually high correlation in activities among users in social media is an indicator of bot behavior. We have developed a system, called DeBot, that identifies such bots in Twitter network. Our system reports and archives thousands of bot accounts every ...
Read More
Bot Detection in Reddit Political Discussion
SocialSense'19: Proceedings of the Fourth International Workshop on Social Sensing

The existence of social media bots on political forums can muddle the perception of public opinion. Bot detection has been successful on platforms such as Twitter, Facebook and Youtube. However, our research focuses on characterizing suspicious behavior ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bot detection
graph convolutional networks
homophily and heterophily
imbalanced classes
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 141
  Total Downloads
- Downloads (Last 12 months)141
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HOVER: Homophilic Oversampling via Edge Removal for Class-Imbalanced Bot Detection on Graphs

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detect Me If You Can: Spam Bot Detection Using Inductive Representation Learning

On-Demand Bot Detection and Archival System

Bot Detection in Reddit Political Discussion