skip to main content
10.1145/3357384.3357876acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework

Published: 03 November 2019 Publication History

Abstract

Online underground forums have been widely used by cybercriminals to exchange knowledge and trade in illicit products or services, which have played a central role in the cybercriminal ecosystem. In order to combat the evolving cybercrimes, in this paper, we propose and develop an intelligent system named iDetective to automate the analysis of underground forums for the identification of key players (i.e., users who play the vital role in the value chain). In iDetective, we first introduce an attributed heterogeneous information network (AHIN) for user representation and use a meta-path based approach to incorporate higher-level semantics to build up relatedness over users in underground forums; then we propose Player2Vec to efficiently learn node (i.e., user) representations in AHIN for key player identification. In Player2Vec, we first map the constructed AHIN to a multi-view network which consists of multiple single-view attributed graphs encoding the relatedness over users depicted by different designed meta-paths; then we employ graph convolutional network (GCN) to learn embeddings of each single-view attributed graph; later, an attention mechanism is designed to fuse different embeddings learned based on different single-view attributed graphs for final representations. Comprehensive experiments on the data collections from different underground forums (i.e., Hack Forums, Nulled) are conducted to validate the effectiveness of iDetective in key player identification by comparisons with alternative approaches.

References

[1]
Ahmed Abbasi, Weifeng Li, Victor Benjamin, Shiyu Hu, and Hsinchun Chen. 2014. Descriptive analytics: Examining expert hackers in web forums. In IEEE Joint Intelligence and Security Informatics Conference. IEEE, 56--63.
[2]
Tarique Anwar and Muhammad Abulaish. 2015. Ranking radically influential web forum users. IEEE Transactions on Information Forensics and Security, Vol. 10, 6 (2015), 1289--1298.
[3]
James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 1993--2001.
[4]
BlackHatworld. 2018. . https://www.blackhatworld.com/.
[5]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).
[6]
Zhiyong Cheng, Ying Ding, Xiangnan He, Lei Zhu, Xuemeng Song, and Mohan S Kankanhalli. 2018. A^ 3NCF: An Adaptive Aspect Attention Model for Rating Prediction. In International Joint Conferences on Artificial Intelligence. 3748--3754.
[7]
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 135--144.
[8]
Yujie Fan, Shifu Hou, Yiming Zhang, Yanfang Ye, and Melih Abdulhayoglu. 2018a. Gotcha-sly malware! Scorpion: a metagraph2vec based malware detection system. In KDD .
[9]
Yujie Fan, Yiming Zhang, Shifu Hou, Lingwei Chen, Yanfang Ye, Chuan Shi, Liang Zhao, and Shouhuai Xu. 2019. iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. IJCAI, 2272--2278. https://doi.org/10.24963/ijcai.2019/315
[10]
Yujie Fan, Yiming Zhang, Yanfang Ye, and Xin Li. 2018b. Automatic Opioid User Detection from Twitter: Transductive Ensemble Built on Different Meta-graph Based Similarities over Heterogeneous Information Network. In International Joint Conference on Artificial Intelligence. 3357--3363.
[11]
Tao-yang Fu, Wang-Chien Lee, and Zhen Lei. 2017. HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1797--1806.
[12]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth Tnternational Conference on Artificial Intelligence and Statistics. 249--256.
[13]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855--864.
[14]
HackForums. 2018. . https://hackforums.net/.
[15]
Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015).
[16]
Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. 2017. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In KDD .
[17]
Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 2018. Leveraging Meta-path based Context for Top-N Recommendation with A Neural Co-Attention Model. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1531--1540.
[18]
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1595--1604.
[19]
Aleksandar Hudic, Katharina Krombholz, Thomas Otterbein, Christian Platzer, and Edgar Weippl. 2014. Automated Analysis of Underground Marketplaces. In IFIP International Conference on Digital Forensics. Springer, 31--42.
[20]
Mohammad Karami and Damon McCoy. 2013. Understanding the emerging threat of ddos-as-a-service. In The 6th USENIX Workshop on Large-Scale Exploits and Emergent Threats .
[21]
Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016).
[22]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188--1196.
[23]
Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper Insights Into Graph Convolutional Networks for Semi-Supervised Learning. In Thirty-Second AAAI Conference on Artificial Intelligence .
[24]
Weifeng Li and Hsinchun Chen. 2014. Identifying top sellers in underground economy using deep learning-based sentiment analysis. In IEEE Joint Intelligence and Security Informatics Conference. IEEE, 64--67.
[25]
Xiang Li, Yao Wu, Martin Ester, Ben Kao, Xin Wang, and Yudian Zheng. 2017. Semi-supervised clustering in attributed heterogeneous information networks. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1621--1629.
[26]
Giancarlo De Maio, Alexandros Kapravelos, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2014. Pexy: The other side of exploit kits. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 132--151.
[27]
Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations . 55--60.
[28]
Nulled. 2018. . https://www.nulled.to.
[29]
Sergio Pastrana, Daniel R Thomas, Alice Hutchings, and Richard Clayton. 2018. CrimeBB: Enabling Cybercrime Research on Underground Forums at Scale. In Proceedings of the 27th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1845--1854.
[30]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 701--710.
[31]
Statista. 2018. Global retail e-commerce sales 2014--2021 . https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/.
[32]
Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, Vol. 3, 2 (2012), 1--159.
[33]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, Vol. 4, 11 (2011), 992--1003.
[34]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067--1077.
[35]
Xuning Tang, Christopher C Yang, and Mi Zhang. 2012. Who will be participating next? predicting the participation of Dark Web community. In Proceedings of the ACM SIGKDD Workshop on Intelligence and Security Informatics. ACM, 1--7.
[36]
Chenguang Wang, Yangqiu Song, Haoran Li, and Jiawei Zhang. 2016. Text Classification with Heterogeneous Information Network Kernels. In Thirtieth AAAI Conference on Artificial Intelligence. 2130--2136.
[37]
Christopher C Yang, Xuning Tang, and Bhavani M Thuraisingham. 2010. An analysis of user influence ranking algorithms on dark web forums. In Proceedings of the ACM SIGKDD Workshop on Intelligence and Security Informatics. ACM, 10.
[38]
Yanfang Ye, Shifu Hou, Lingwei Chen, Jingwei Lei, Wenqiang Wan, Jiabin Wang, Qi Xiong, and Fudong Shao. 2019. Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection. In IJCAI .
[39]
Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, Xin Li, Liang Zhao, Chuan Shi, Jiabin Wang, and Qi Xiong. 2019. Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network. In The World Wide Web Conference. ACM, 3448--3454.
[40]
Yiming Zhang, Yujie Fan, Yanfang Ye, Liang Zhao, Jiabin Wang, Qi Xiong, and Fudong Shao. 2018. KADetector: Automatic Identification of Key Actors in Online Hack Forums Based on Structured Heterogeneous Information Network. In IEEE International Conference on Big Knowledge. IEEE, 154--161.

Cited By

View all
  • (2025)Federated Semantic Web Framework for Enhanced Financial Risk Control and Data AnalysisInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.36746921:1(1-19)Online publication date: 17-Jan-2025
  • (2024)Does online anonymous market vendor reputation matter?Proceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699160(4641-4656)Online publication date: 14-Aug-2024
  • (2024)Cost-Sensitive GNN-Based Imbalanced Learning for Mobile Social Network Fraud DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.330265111:2(2675-2690)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attributed heterogeneous information network
  2. key player identification
  3. network embedding
  4. underground forums

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)269
  • Downloads (Last 6 weeks)26
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Federated Semantic Web Framework for Enhanced Financial Risk Control and Data AnalysisInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.36746921:1(1-19)Online publication date: 17-Jan-2025
  • (2024)Does online anonymous market vendor reputation matter?Proceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699160(4641-4656)Online publication date: 14-Aug-2024
  • (2024)Cost-Sensitive GNN-Based Imbalanced Learning for Mobile Social Network Fraud DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.330265111:2(2675-2690)Online publication date: Apr-2024
  • (2024)GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud DetectionIEEE Transactions on Big Data10.1109/TBDATA.2024.335297810:4(528-542)Online publication date: Aug-2024
  • (2024)Homophilic and Heterophilic-Aware Sparse Graph Transformer for Financial Fraud Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650212(1-8)Online publication date: 30-Jun-2024
  • (2024)HGAttack: Transferable Heterogeneous Graph Adversarial Attack2024 IEEE International Conference on Agents (ICA)10.1109/ICA63002.2024.00028(100-105)Online publication date: 4-Dec-2024
  • (2024)Reinforced Cost-Sensitive Graph Network for Detecting Fraud Leaders in Telecom FraudIEEE Access10.1109/ACCESS.2024.344826012(173638-173646)Online publication date: 2024
  • (2024)GNNRI: detecting anomalous social network users through heterogeneous information networks and user relevance explorationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02392-0Online publication date: 26-Sep-2024
  • (2024)CausalFD: causal invariance-based fraud detection against camouflaged preferenceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02209-015:11(5053-5070)Online publication date: 27-May-2024
  • (2024)EGNN-AD: An Effective Graph Neural Network-Based Approach for Anomaly Detection on Edge-Attributed GraphsDatabase Systems for Advanced Applications10.1007/978-981-97-5572-1_21(321-331)Online publication date: 31-Aug-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media