research-article

Detecting Malicious Accounts in Online Developer Communities Using Deep Learning

Authors:

Pan HuiAuthors Info & Claims

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 1251 - 1260

https://doi.org/10.1145/3357384.3357971

Published: 03 November 2019 Publication History

Abstract

Online developer communities like GitHub provide services such as distributed version control and task management, which allow a massive number of developers to collaborate online. However, the openness of the communities makes themselves vulnerable to different types of malicious attacks, since the attackers can easily join and interact with legitimate users. In this work, we formulate the malicious account detection problem in online developer communities, and propose GitSec, a deep learning-based solution to detect malicious accounts. GitSec distinguishes malicious accounts from legitimate ones based on the account profiles as well as dynamic activity characteristics. On one hand, GitSec makes use of users' descriptive features from the profiles. On the other hand, GitSec processes users' dynamic behavioral data by constructing two user activity sequences and applying a parallel neural network design to deal with each of them, respectively. An attention mechanism is used to integrate the information generated by the parallel neural networks. The final judgement is made by a decision maker implemented by a supervised machine learning-based classifier. Based on the real-world data of GitHub users, our extensive evaluations show that GitSec is an accurate detection system, with an F1-score of 0.922 and an AUC value of 0.940.

References

[1]

Muhammad Al-Qurishi, M. Shamim Hossain, Majed A. AlRubaian, Sk. Md. Mizanur Rahman, and Atif Alamri. 2018. Leveraging Analysis of User Behavior to Identify Malicious Activities in Large-Scale Social Networks. IEEE Transactions on Industrial Informatics, Vol. 14, 2 (2018), 799--813.

[2]

Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Ler'i a, José Lorenzo, Matei Ripeanu, and Konstantin Beznosov. 2015. Integro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs. In Proc. of NDSS .

[3]

Leo Breiman. 2001. Random Forests. Machine Learning, Vol. 45, 1 (2001), 5--32.

Digital Library

[4]

Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the Detection of Fake Accounts in Large Scale Social Online Services. In Proc. of NSDI .

[5]

Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael J. Witbrock, Mark A. Hasegawa-Johnson, and Thomas S. Huang. 2017. Dilated Recurrent Neural Networks. In Proc. of NIPS. 76--86.

Digital Library

[6]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proc. of ACM KDD .

Digital Library

[7]

Kyunghyun Cho, Bart van Merrienboer, cC aglar Gü lcc ehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proc. of EMNLP .

[8]

Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. 2012. Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? IEEE Trans. Dependable Sec. Comput., Vol. 9, 6 (2012), 811--824.

Digital Library

[9]

Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters, Vol. 27, 8 (2006), 861--874.

Digital Library

[10]

Oana Goga, Giridhari Venkatadri, and Krishna P. Gummadi. 2015. The Doppelg"anger Bot Attack: Exploring Identity Impersonation in Online Social Networks. In Proc. of ACM IMC .

[11]

Neil Zhenqiang Gong, Mario Frank, and Prateek Mittal. 2014. SybilBelief: A Semi-Supervised Learning Approach for Structure-Based Sybil Detection . IEEE Transactions on Information Forensics and Security, Vol. 9, 6 (2014), 976--987.

Digital Library

[12]

Qingyuan Gong, Yang Chen, Xinlei He, Zhou Zhuang, Tianyi Wang, Hong Huang, Xin Wang, and Xiaoming Fu. 2018a. DeepScan: Exploiting Deep Learning for Malicious Account Detection in Location-Based Social Networks . IEEE Communications Magazine, Vol. 56, 11 (2018), 21--27.

[13]

Qingyuan Gong, Yang Chen, Jiyao Hu, Qiang Cao, Pan Hui, and Xin Wang. 2018b. Understanding Cross-site Linking in Online Social Networks . ACM Transactions on the Web, Vol. 12, 4 (2018), 25:1--25:29.

Digital Library

[14]

Georgios Gousios, Margaret-Anne Storey, and Alberto Bacchelli. 2016. Work Practices and Challenges in Pull-Based Development: The Contributor's Perspective. In Proc. of ICSE .

Digital Library

[15]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (Nov. 1997), 1735--1780.

Digital Library

[16]

Long Jin, Yang Chen, Tianyi Wang, Pan Hui, and Athanasios V. Vasilakos. 2013. Understanding User Behavior in Online Social Networks: A Survey . Communications Magazine, IEEE, Vol. 51, 9 (2013), 144--150.

[17]

Jan Koutn'i k, Klaus Greff, Faustino J. Gomez, and Jü rgen Schmidhuber. 2014. A Clockwork RNN. In Proc. of ICML .

[18]

Srijan Kumar, Justin Cheng, Jure Leskovec, and V.S. Subrahmanian. 2017. An Army of Me: Sockpuppets in Online Discussion Communities. In Proc. of WWW .

[19]

Libo Li, Frank Goethals, Bart Baesens, and Monique Snoeck. 2017. Predicting software revision outcomes on GitHub using structural holes theory. Computer Networks, Vol. 114 (2017), 114--124.

[20]

Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks. In Proc. of ACK KDD .

Digital Library

[21]

Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. 2016. Detecting Rumors from Microblogs with Recurrent Neural Networks. In Proc. of IJCAI .

[22]

Quinn McNemar. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, Vol. 12, 2 (1947), 153--157.

[23]

Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. 2016. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. In Proc. of NIPS .

[24]

Fabian Pedregosa, Gaë l Varoquaux, Alexandre Gramfort, and et al. 2011. Scikit-learn: Machine Learning in Python . Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.

Digital Library

[25]

Liudmila Ostroumova Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Proc. of NeurIPS .

[26]

J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning .Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Digital Library

[27]

Yoshihiko Suhara, Yinzhan Xu, and Alex `Sandy' Pentland. 2017. DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks. In Proc. of WWW .

[28]

Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In Proc. of ICSE .

Digital Library

[29]

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In Proc. of ESEC/FSE .

Digital Library

[30]

Bimal Viswanath, Muhammad Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander Krishnamurthy, and Alan Mislove. 2014. Towards Detecting Anomalous User Behavior in Online Social Networks. In Proc. of USENIX Security .

[31]

Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are How You Click: Clickstream Analysis for Sybil Detection. In Proc. of USENIX Security .

Digital Library

[32]

Bernard Lewis Welch. 1951. On the comparison of several mean values: an alternative approach. Biometrika, Vol. 38, 3/4 (1951), 330--336.

[33]

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proc. of ICML .

Digital Library

[34]

Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2014. Uncovering Social Network Sybils in the Wild. ACM Trans. Knowl. Discov. Data, Vol. 8, 1 (2014), 2:1--2:29.

Digital Library

Cited By

Yang ZXu BZhang JKang HShi JHe JLo D(2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
https://doi.org/10.1109/TSE.2024.3361661
Mandal SKanjilal A(2024)Requirement-Driven Developer Recommendation Framework Based on Github Developer Social NetworkIntelligent Computing Systems and Applications10.1007/978-981-97-5412-0_17(231-246)Online publication date: 20-Sep-2024
https://doi.org/10.1007/978-981-97-5412-0_17
Salloum S(2024)Detecting Malicious Accounts in Cyberspace: Enhancing Security in ChatGPT and BeyondArtificial Intelligence in Education: The Power and Dangers of ChatGPT in the Classroom10.1007/978-3-031-52280-2_42(653-666)Online publication date: 30-Mar-2024
https://doi.org/10.1007/978-3-031-52280-2_42
Show More Cited By

Index Terms

Detecting Malicious Accounts in Online Developer Communities Using Deep Learning
1. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Social aspects of security and privacy

Recommendations

Uncovering Large Groups of Active Malicious Accounts in Online Social Networks
CCS '14: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security

The success of online social networks has attracted a constant interest in attacking and exploiting them. Attackers usually control malicious accounts, including both fake and compromised real user accounts, to launch attack campaigns such as social ...
On Detecting Growing-Up Behaviors of Malicious Accounts in Privacy-Centric Mobile Social Networks
ACSAC '21: Proceedings of the 37th Annual Computer Security Applications Conference

Privacy-centric mobile social network (PC-MSN), which allows users to build intimate and private social circles, is an increasingly popular type of online social networks (OSNs). Because of strict usage policy enforced by PC-MSNs (such as restricted ...
Rise of spam and compromised accounts in online social networks

Ever increasing fame and obsession for social networks has also coxswained a dramatic increase in the presence of malicious activities. As a result, various researchers have proposed different features and techniques to detect and reduce this menace. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

November 2019

3373 pages

ISBN:9781450369763

DOI:10.1145/3357384

General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

CIKM '19

Sponsor:

CIKM '19: The 28th ACM International Conference on Information and Knowledge Management

November 3 - 7, 2019

Beijing, China

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
547
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)3

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang ZXu BZhang JKang HShi JHe JLo D(2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
https://doi.org/10.1109/TSE.2024.3361661
Mandal SKanjilal A(2024)Requirement-Driven Developer Recommendation Framework Based on Github Developer Social NetworkIntelligent Computing Systems and Applications10.1007/978-981-97-5412-0_17(231-246)Online publication date: 20-Sep-2024
https://doi.org/10.1007/978-981-97-5412-0_17
Salloum S(2024)Detecting Malicious Accounts in Cyberspace: Enhancing Security in ChatGPT and BeyondArtificial Intelligence in Education: The Power and Dangers of ChatGPT in the Classroom10.1007/978-3-031-52280-2_42(653-666)Online publication date: 30-Mar-2024
https://doi.org/10.1007/978-3-031-52280-2_42
Zhang WZhang YHuang YChen FWang JHu X(2023)GUFAD: A Graph-based Unsupervised Fraud Account Detection FrameworkProceedings of the 2023 4th International Conference on Machine Learning and Computer Application10.1145/3650215.3650286(401-406)Online publication date: 27-Oct-2023
https://dl.acm.org/doi/10.1145/3650215.3650286
Gong QLiu YZhang JChen YLi QXiao YWang XHui P(2023)Detecting Malicious Accounts in Online Developer Communities Using Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323783835:10(10633-10649)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TKDE.2023.3237838
Arin EKutlu M(2023)Deep Learning Based Social Bot Detection on TwitterIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.325442918(1763-1772)Online publication date: 2023
https://doi.org/10.1109/TIFS.2023.3254429
Benedetti GVerderame LMerlo A(2023)A Preliminary Study of Privilege Life Cycle in Software Management Platform Automation Workflows2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW59978.2023.00007(21-28)Online publication date: Jul-2023
https://doi.org/10.1109/EuroSPW59978.2023.00007
Sun ZRuan HCao YChen YWang X(2022)Analysis and Prediction of the IPv6 Traffic over Campus Networks in ShanghaiFuture Internet10.3390/fi1412035314:12(353)Online publication date: 27-Nov-2022
https://doi.org/10.3390/fi14120353
Shareef SSridevi RRaju VRao K(2022)A Novel Framework for Secure Blockchain Transactions2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC)10.1109/ICAAIC53929.2022.9792758(1311-1318)Online publication date: 9-May-2022
https://doi.org/10.1109/ICAAIC53929.2022.9792758
Gao MChen YGong QWang XHui P(2022)Understanding Scholar Social Networks: Taking SCHOLAT as an ExampleComputer Supported Cooperative Work and Social Computing10.1007/978-981-19-4549-6_25(326-339)Online publication date: 22-Jul-2022
https://doi.org/10.1007/978-981-19-4549-6_25
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten