research-article

Public Access

Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework

Authors:

Chuan ShiAuthors Info & Claims

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 549 - 558

https://doi.org/10.1145/3357384.3357876

Published: 03 November 2019 Publication History

Abstract

Online underground forums have been widely used by cybercriminals to exchange knowledge and trade in illicit products or services, which have played a central role in the cybercriminal ecosystem. In order to combat the evolving cybercrimes, in this paper, we propose and develop an intelligent system named iDetective to automate the analysis of underground forums for the identification of key players (i.e., users who play the vital role in the value chain). In iDetective, we first introduce an attributed heterogeneous information network (AHIN) for user representation and use a meta-path based approach to incorporate higher-level semantics to build up relatedness over users in underground forums; then we propose Player2Vec to efficiently learn node (i.e., user) representations in AHIN for key player identification. In Player2Vec, we first map the constructed AHIN to a multi-view network which consists of multiple single-view attributed graphs encoding the relatedness over users depicted by different designed meta-paths; then we employ graph convolutional network (GCN) to learn embeddings of each single-view attributed graph; later, an attention mechanism is designed to fuse different embeddings learned based on different single-view attributed graphs for final representations. Comprehensive experiments on the data collections from different underground forums (i.e., Hack Forums, Nulled) are conducted to validate the effectiveness of iDetective in key player identification by comparisons with alternative approaches.

References

[1]

Ahmed Abbasi, Weifeng Li, Victor Benjamin, Shiyu Hu, and Hsinchun Chen. 2014. Descriptive analytics: Examining expert hackers in web forums. In IEEE Joint Intelligence and Security Informatics Conference. IEEE, 56--63.

Digital Library

[2]

Tarique Anwar and Muhammad Abulaish. 2015. Ranking radically influential web forum users. IEEE Transactions on Information Forensics and Security, Vol. 10, 6 (2015), 1289--1298.

Digital Library

[3]

James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 1993--2001.

[4]

BlackHatworld. 2018. . https://www.blackhatworld.com/.

[5]

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).

[6]

Zhiyong Cheng, Ying Ding, Xiangnan He, Lei Zhu, Xuemeng Song, and Mohan S Kankanhalli. 2018. A^ 3NCF: An Adaptive Aspect Attention Model for Rating Prediction. In International Joint Conferences on Artificial Intelligence. 3748--3754.

[7]

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 135--144.

Digital Library

[8]

Yujie Fan, Shifu Hou, Yiming Zhang, Yanfang Ye, and Melih Abdulhayoglu. 2018a. Gotcha-sly malware! Scorpion: a metagraph2vec based malware detection system. In KDD .

[9]

Yujie Fan, Yiming Zhang, Shifu Hou, Lingwei Chen, Yanfang Ye, Chuan Shi, Liang Zhao, and Shouhuai Xu. 2019. iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. IJCAI, 2272--2278. https://doi.org/10.24963/ijcai.2019/315

[10]

Yujie Fan, Yiming Zhang, Yanfang Ye, and Xin Li. 2018b. Automatic Opioid User Detection from Twitter: Transductive Ensemble Built on Different Meta-graph Based Similarities over Heterogeneous Information Network. In International Joint Conference on Artificial Intelligence. 3357--3363.

[11]

Tao-yang Fu, Wang-Chien Lee, and Zhen Lei. 2017. HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1797--1806.

[12]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth Tnternational Conference on Artificial Intelligence and Statistics. 249--256.

[13]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855--864.

Digital Library

[14]

HackForums. 2018. . https://hackforums.net/.

[15]

Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015).

[16]

Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. 2017. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In KDD .

Digital Library

[17]

Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 2018. Leveraging Meta-path based Context for Top-N Recommendation with A Neural Co-Attention Model. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1531--1540.

Digital Library

[18]

Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1595--1604.

Digital Library

[19]

Aleksandar Hudic, Katharina Krombholz, Thomas Otterbein, Christian Platzer, and Edgar Weippl. 2014. Automated Analysis of Underground Marketplaces. In IFIP International Conference on Digital Forensics. Springer, 31--42.

[20]

Mohammad Karami and Damon McCoy. 2013. Understanding the emerging threat of ddos-as-a-service. In The 6th USENIX Workshop on Large-Scale Exploits and Emergent Threats .

Digital Library

[21]

Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016).

[22]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188--1196.

Digital Library

[23]

Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper Insights Into Graph Convolutional Networks for Semi-Supervised Learning. In Thirty-Second AAAI Conference on Artificial Intelligence .

[24]

Weifeng Li and Hsinchun Chen. 2014. Identifying top sellers in underground economy using deep learning-based sentiment analysis. In IEEE Joint Intelligence and Security Informatics Conference. IEEE, 64--67.

Digital Library

[25]

Xiang Li, Yao Wu, Martin Ester, Ben Kao, Xin Wang, and Yudian Zheng. 2017. Semi-supervised clustering in attributed heterogeneous information networks. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1621--1629.

Digital Library

[26]

Giancarlo De Maio, Alexandros Kapravelos, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2014. Pexy: The other side of exploit kits. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 132--151.

[27]

Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations . 55--60.

[28]

Nulled. 2018. . https://www.nulled.to.

[29]

Sergio Pastrana, Daniel R Thomas, Alice Hutchings, and Richard Clayton. 2018. CrimeBB: Enabling Cybercrime Research on Underground Forums at Scale. In Proceedings of the 27th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1845--1854.

Digital Library

[30]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 701--710.

Digital Library

[31]

Statista. 2018. Global retail e-commerce sales 2014--2021 . https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/.

[32]

Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, Vol. 3, 2 (2012), 1--159.

Digital Library

[33]

Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, Vol. 4, 11 (2011), 992--1003.

Digital Library

[34]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067--1077.

Digital Library

[35]

Xuning Tang, Christopher C Yang, and Mi Zhang. 2012. Who will be participating next? predicting the participation of Dark Web community. In Proceedings of the ACM SIGKDD Workshop on Intelligence and Security Informatics. ACM, 1--7.

Digital Library

[36]

Chenguang Wang, Yangqiu Song, Haoran Li, and Jiawei Zhang. 2016. Text Classification with Heterogeneous Information Network Kernels. In Thirtieth AAAI Conference on Artificial Intelligence. 2130--2136.

[37]

Christopher C Yang, Xuning Tang, and Bhavani M Thuraisingham. 2010. An analysis of user influence ranking algorithms on dark web forums. In Proceedings of the ACM SIGKDD Workshop on Intelligence and Security Informatics. ACM, 10.

Digital Library

[38]

Yanfang Ye, Shifu Hou, Lingwei Chen, Jingwei Lei, Wenqiang Wan, Jiabin Wang, Qi Xiong, and Fudong Shao. 2019. Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection. In IJCAI .

[39]

Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, Xin Li, Liang Zhao, Chuan Shi, Jiabin Wang, and Qi Xiong. 2019. Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network. In The World Wide Web Conference. ACM, 3448--3454.

Digital Library

[40]

Yiming Zhang, Yujie Fan, Yanfang Ye, Liang Zhao, Jiabin Wang, Qi Xiong, and Fudong Shao. 2018. KADetector: Automatic Identification of Key Actors in Online Hack Forums Based on Structured Heterogeneous Information Network. In IEEE International Conference on Big Knowledge. IEEE, 154--161.

Cited By

Wang S(2025)Federated Semantic Web Framework for Enhanced Financial Risk Control and Data AnalysisInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.36746921:1(1-19)Online publication date: 17-Jan-2025
https://doi.org/10.4018/IJSWIS.367469
Cuevas AChristin NBalzarotti DXu W(2024)Does online anonymous market vendor reputation matter?Proceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699160(4641-4656)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3699160
Hu XChen HChen HLiu SLi XZhang SWang YXue X(2024)Cost-Sensitive GNN-Based Imbalanced Learning for Mobile Social Network Fraud DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.330265111:2(2675-2690)Online publication date: Apr-2024
https://doi.org/10.1109/TCSS.2023.3302651
Show More Cited By

Index Terms

Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework

Recommendations

Semi-supervised Clustering in Attributed Heterogeneous Information Networks
WWW '17: Proceedings of the 26th International Conference on World Wide Web

A heterogeneous information network (HIN) is one whose nodes model objects of different types and whose links model objects' relationships. In many applications, such as social networks and RDF-based knowledge bases, information can be modeled as HINs. ...
Attributed network community detection based on network embedding and parameter-free clustering
Abstract
In recent years, many attributednetwork have emerged, such as Facebook networks in social networks, protein networks and academic citation networks. In order to find communities where the nodes are tightly connected and have attributes similar to ...
Fast Attributed Multiplex Heterogeneous Network Embedding
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

In recent years, heterogeneous network representation learning has attracted considerable attentions with the consideration of multiple node types. However, most of them ignore the rich set of network attributes (attributed network) and different types ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

November 2019

3373 pages

ISBN:9781450369763

DOI:10.1145/3357384

General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
DoJ/NIJ

Conference

CIKM '19

Sponsor:

CIKM '19: The 28th ACM International Conference on Information and Knowledge Management

November 3 - 7, 2019

Beijing, China

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
1,485
Total Downloads

Downloads (Last 12 months)269
Downloads (Last 6 weeks)26

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang S(2025)Federated Semantic Web Framework for Enhanced Financial Risk Control and Data AnalysisInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.36746921:1(1-19)Online publication date: 17-Jan-2025
https://doi.org/10.4018/IJSWIS.367469
Cuevas AChristin NBalzarotti DXu W(2024)Does online anonymous market vendor reputation matter?Proceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699160(4641-4656)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3699160
Hu XChen HChen HLiu SLi XZhang SWang YXue X(2024)Cost-Sensitive GNN-Based Imbalanced Learning for Mobile Social Network Fraud DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.330265111:2(2675-2690)Online publication date: Apr-2024
https://doi.org/10.1109/TCSS.2023.3302651
Hu XChen HZhang JChen HLiu SLi XWang YXue X(2024)GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud DetectionIEEE Transactions on Big Data10.1109/TBDATA.2024.335297810:4(528-542)Online publication date: Aug-2024
https://doi.org/10.1109/TBDATA.2024.3352978
Wang XXiangfeng LWang XYu H(2024)Homophilic and Heterophilic-Aware Sparse Graph Transformer for Financial Fraud Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650212(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650212
Zhao HZeng ZWang YYe DMiao C(2024)HGAttack: Transferable Heterogeneous Graph Adversarial Attack2024 IEEE International Conference on Agents (ICA)10.1109/ICA63002.2024.00028(100-105)Online publication date: 4-Dec-2024
https://doi.org/10.1109/ICA63002.2024.00028
Gao PLi ZZhou DZhang L(2024)Reinforced Cost-Sensitive Graph Network for Detecting Fraud Leaders in Telecom FraudIEEE Access10.1109/ACCESS.2024.344826012(173638-173646)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3448260
Li YSun XYang RSun XChen SWang SBhuiyan MZomaya AXu J(2024)GNNRI: detecting anomalous social network users through heterogeneous information networks and user relevance explorationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02392-0Online publication date: 26-Sep-2024
https://doi.org/10.1007/s13042-024-02392-0
Song YWei YYuan HSun QFu XWang LLi X(2024)CausalFD: causal invariance-based fraud detection against camouflaged preferenceInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02209-015:11(5053-5070)Online publication date: 27-May-2024
https://doi.org/10.1007/s13042-024-02209-0
Wang HHooi BHe DLiu JXiao X(2024)EGNN-AD: An Effective Graph Neural Network-Based Approach for Anomaly Detection on Edge-Attributed GraphsDatabase Systems for Advanced Applications10.1007/978-981-97-5572-1_21(321-331)Online publication date: 31-Aug-2024
https://doi.org/10.1007/978-981-97-5572-1_21
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten