skip to main content
10.1145/3357384.3357971acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Detecting Malicious Accounts in Online Developer Communities Using Deep Learning

Published: 03 November 2019 Publication History

Abstract

Online developer communities like GitHub provide services such as distributed version control and task management, which allow a massive number of developers to collaborate online. However, the openness of the communities makes themselves vulnerable to different types of malicious attacks, since the attackers can easily join and interact with legitimate users. In this work, we formulate the malicious account detection problem in online developer communities, and propose GitSec, a deep learning-based solution to detect malicious accounts. GitSec distinguishes malicious accounts from legitimate ones based on the account profiles as well as dynamic activity characteristics. On one hand, GitSec makes use of users' descriptive features from the profiles. On the other hand, GitSec processes users' dynamic behavioral data by constructing two user activity sequences and applying a parallel neural network design to deal with each of them, respectively. An attention mechanism is used to integrate the information generated by the parallel neural networks. The final judgement is made by a decision maker implemented by a supervised machine learning-based classifier. Based on the real-world data of GitHub users, our extensive evaluations show that GitSec is an accurate detection system, with an F1-score of 0.922 and an AUC value of 0.940.

References

[1]
Muhammad Al-Qurishi, M. Shamim Hossain, Majed A. AlRubaian, Sk. Md. Mizanur Rahman, and Atif Alamri. 2018. Leveraging Analysis of User Behavior to Identify Malicious Activities in Large-Scale Social Networks. IEEE Transactions on Industrial Informatics, Vol. 14, 2 (2018), 799--813.
[2]
Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Ler'i a, José Lorenzo, Matei Ripeanu, and Konstantin Beznosov. 2015. Integro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs. In Proc. of NDSS .
[3]
Leo Breiman. 2001. Random Forests. Machine Learning, Vol. 45, 1 (2001), 5--32.
[4]
Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the Detection of Fake Accounts in Large Scale Social Online Services. In Proc. of NSDI .
[5]
Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael J. Witbrock, Mark A. Hasegawa-Johnson, and Thomas S. Huang. 2017. Dilated Recurrent Neural Networks. In Proc. of NIPS. 76--86.
[6]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proc. of ACM KDD .
[7]
Kyunghyun Cho, Bart van Merrienboer, cC aglar Gü lcc ehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proc. of EMNLP .
[8]
Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. 2012. Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? IEEE Trans. Dependable Sec. Comput., Vol. 9, 6 (2012), 811--824.
[9]
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters, Vol. 27, 8 (2006), 861--874.
[10]
Oana Goga, Giridhari Venkatadri, and Krishna P. Gummadi. 2015. The Doppelg"anger Bot Attack: Exploring Identity Impersonation in Online Social Networks. In Proc. of ACM IMC .
[11]
Neil Zhenqiang Gong, Mario Frank, and Prateek Mittal. 2014. SybilBelief: A Semi-Supervised Learning Approach for Structure-Based Sybil Detection . IEEE Transactions on Information Forensics and Security, Vol. 9, 6 (2014), 976--987.
[12]
Qingyuan Gong, Yang Chen, Xinlei He, Zhou Zhuang, Tianyi Wang, Hong Huang, Xin Wang, and Xiaoming Fu. 2018a. DeepScan: Exploiting Deep Learning for Malicious Account Detection in Location-Based Social Networks . IEEE Communications Magazine, Vol. 56, 11 (2018), 21--27.
[13]
Qingyuan Gong, Yang Chen, Jiyao Hu, Qiang Cao, Pan Hui, and Xin Wang. 2018b. Understanding Cross-site Linking in Online Social Networks . ACM Transactions on the Web, Vol. 12, 4 (2018), 25:1--25:29.
[14]
Georgios Gousios, Margaret-Anne Storey, and Alberto Bacchelli. 2016. Work Practices and Challenges in Pull-Based Development: The Contributor's Perspective. In Proc. of ICSE .
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (Nov. 1997), 1735--1780.
[16]
Long Jin, Yang Chen, Tianyi Wang, Pan Hui, and Athanasios V. Vasilakos. 2013. Understanding User Behavior in Online Social Networks: A Survey . Communications Magazine, IEEE, Vol. 51, 9 (2013), 144--150.
[17]
Jan Koutn'i k, Klaus Greff, Faustino J. Gomez, and Jü rgen Schmidhuber. 2014. A Clockwork RNN. In Proc. of ICML .
[18]
Srijan Kumar, Justin Cheng, Jure Leskovec, and V.S. Subrahmanian. 2017. An Army of Me: Sockpuppets in Online Discussion Communities. In Proc. of WWW .
[19]
Libo Li, Frank Goethals, Bart Baesens, and Monique Snoeck. 2017. Predicting software revision outcomes on GitHub using structural holes theory. Computer Networks, Vol. 114 (2017), 114--124.
[20]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks. In Proc. of ACK KDD .
[21]
Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. 2016. Detecting Rumors from Microblogs with Recurrent Neural Networks. In Proc. of IJCAI .
[22]
Quinn McNemar. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, Vol. 12, 2 (1947), 153--157.
[23]
Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. 2016. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. In Proc. of NIPS .
[24]
Fabian Pedregosa, Gaë l Varoquaux, Alexandre Gramfort, and et al. 2011. Scikit-learn: Machine Learning in Python . Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[25]
Liudmila Ostroumova Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Proc. of NeurIPS .
[26]
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning .Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[27]
Yoshihiko Suhara, Yinzhan Xu, and Alex `Sandy' Pentland. 2017. DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks. In Proc. of WWW .
[28]
Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In Proc. of ICSE .
[29]
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In Proc. of ESEC/FSE .
[30]
Bimal Viswanath, Muhammad Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander Krishnamurthy, and Alan Mislove. 2014. Towards Detecting Anomalous User Behavior in Online Social Networks. In Proc. of USENIX Security .
[31]
Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are How You Click: Clickstream Analysis for Sybil Detection. In Proc. of USENIX Security .
[32]
Bernard Lewis Welch. 1951. On the comparison of several mean values: an alternative approach. Biometrika, Vol. 38, 3/4 (1951), 330--336.
[33]
Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proc. of ICML .
[34]
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2014. Uncovering Social Network Sybils in the Wild. ACM Trans. Knowl. Discov. Data, Vol. 8, 1 (2014), 2:1--2:29.

Cited By

View all
  • (2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
  • (2024)Requirement-Driven Developer Recommendation Framework Based on Github Developer Social NetworkIntelligent Computing Systems and Applications10.1007/978-981-97-5412-0_17(231-246)Online publication date: 20-Sep-2024
  • (2024)Detecting Malicious Accounts in Cyberspace: Enhancing Security in ChatGPT and BeyondArtificial Intelligence in Education: The Power and Dangers of ChatGPT in the Classroom10.1007/978-3-031-52280-2_42(653-666)Online publication date: 30-Mar-2024
  • Show More Cited By

Index Terms

  1. Detecting Malicious Accounts in Online Developer Communities Using Deep Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
    November 2019
    3373 pages
    ISBN:9781450369763
    DOI:10.1145/3357384
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. malicious account detection
    3. online developer community
    4. social networks

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    CIKM '19
    Sponsor:

    Acceptance Rates

    CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
    • (2024)Requirement-Driven Developer Recommendation Framework Based on Github Developer Social NetworkIntelligent Computing Systems and Applications10.1007/978-981-97-5412-0_17(231-246)Online publication date: 20-Sep-2024
    • (2024)Detecting Malicious Accounts in Cyberspace: Enhancing Security in ChatGPT and BeyondArtificial Intelligence in Education: The Power and Dangers of ChatGPT in the Classroom10.1007/978-3-031-52280-2_42(653-666)Online publication date: 30-Mar-2024
    • (2023)GUFAD: A Graph-based Unsupervised Fraud Account Detection FrameworkProceedings of the 2023 4th International Conference on Machine Learning and Computer Application10.1145/3650215.3650286(401-406)Online publication date: 27-Oct-2023
    • (2023)Detecting Malicious Accounts in Online Developer Communities Using Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.323783835:10(10633-10649)Online publication date: 1-Oct-2023
    • (2023)Deep Learning Based Social Bot Detection on TwitterIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.325442918(1763-1772)Online publication date: 2023
    • (2023)A Preliminary Study of Privilege Life Cycle in Software Management Platform Automation Workflows2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW59978.2023.00007(21-28)Online publication date: Jul-2023
    • (2022)Analysis and Prediction of the IPv6 Traffic over Campus Networks in ShanghaiFuture Internet10.3390/fi1412035314:12(353)Online publication date: 27-Nov-2022
    • (2022)A Novel Framework for Secure Blockchain Transactions2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC)10.1109/ICAAIC53929.2022.9792758(1311-1318)Online publication date: 9-May-2022
    • (2022)Understanding Scholar Social Networks: Taking SCHOLAT as an ExampleComputer Supported Cooperative Work and Social Computing10.1007/978-981-19-4549-6_25(326-339)Online publication date: 22-Jul-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media