ABSTRACT
The phishing scams pose a serious threat to the ecosystem of Ethereum which is one of the largest blockchains in the world. Such a type of cyberattack recently has caused losses of millions of dollars. In this paper, we propose a Self-supervised IncrEmental deep Graph lEarning (SIEGE) model, for the phishing scam detection problem on Ethereum. To overcome the data scalability challenge, we propose splitting the original Ethereum transaction data and constructing transaction graphs for each split. Confronted with the minimal labeled data available, we resort to graph-based self-supervised learning. We design a spatial pretext task to learn high-quality node embeddings inside a single graph split, as well as an incremental learning paradigm and a temporal pretext task to facilitate information flow between different graph splits. To evaluate the effectiveness of SIEGE, we gather a real-world dataset consisting of six-month Ethereum transaction records. The results demonstrate that our model consistently outperforms baseline approaches in both transductive and inductive settings.
- Massimo Bartoletti, Salvatore Carta, Tiziana Cimoli, and Roberto Saia. 2020. Dissecting Ponzi schemes on Ethereum: identification, analysis, and impact. Future Generation Computer Systems, Vol. 102 (2020), 259--277.Google ScholarDigital Library
- Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-Based Local Outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA) (SIGMOD '00). Association for Computing Machinery, New York, NY, USA, 93--104. https://doi.org/10.1145/342009.335388Google ScholarDigital Library
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.htmlGoogle Scholar
- Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral Networks and Locally Connected Networks on Graphs. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6203Google Scholar
- Liang Chen, Jiaying Peng, Yang Liu, Jintang Li, Fenfang Xie, and Zibin Zheng. 2021. Phishing Scams Detection in Ethereum Transaction Network. ACM Trans. Internet Techn., Vol. 21, 1 (2021), 10:1--10:16. https://doi.org/10.1145/3398071Google ScholarDigital Library
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020b. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 1597--1607. http://proceedings.mlr.press/v119/chen20j.htmlGoogle Scholar
- Ting Chen, Zihao Li, Yuxiao Zhu, Jiachi Chen, Xiapu Luo, John Chi-Shing Lui, Xiaodong Lin, and Xiaosong Zhang. 2020c. Understanding Ethereum via Graph Analysis. ACM Trans. Internet Techn., Vol. 20, 2 (2020), 18:1--18:32. https://doi.org/10.1145/3381036Google ScholarDigital Library
- Weili Chen, Xiongfeng Guo, Zhiguang Chen, Zibin Zheng, and Yutong Lu. 2020a. Phishing Scam Detection on Ethereum: Towards Financial Security for Blockchain Ecosystem. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, Christian Bessiere (Ed.). ijcai.org, 4506--4512. https://doi.org/10.24963/ijcai.2020/621Google ScholarCross Ref
- Weili Chen, Zibin Zheng, Jiahui Cui, Edith C. H. Ngai, Peilin Zheng, and Yuren Zhou. 2018. Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23--27, 2019, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 1409--1418. https://doi.org/10.1145/3178876.3186046Google ScholarDigital Library
- Oscar Delgado-Mohatar, José María Sierra Camara, and Eloy Anguiano. 2020. Blockchain-based semi-autonomous ransomware. Future Gener. Comput. Syst., Vol. 112 (2020), 589--603. https://doi.org/10.1016/j.future.2020.02.037Google ScholarCross Ref
- Songgaojun Deng, Huzefa Rangwala, and Yue Ning. 2019. Learning Dynamic Context Graphs for Predicting Social Events. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 1007--1016. https://doi.org/10.1145/3292500.3330919Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https://doi.org/10.18653/v1/n19-1423Google ScholarCross Ref
- Wenqi Fan, Yao Ma, Qing Li, Yuan He, Yihong Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph Neural Networks for Social Recommendation. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 417--426. https://doi.org/10.1145/3308558.3313488Google ScholarDigital Library
- Michael Fleder, Michael S Kester, and Sudeep Pillai. 2015. Bitcoin transaction graph analysis. arXiv preprint arXiv:1502.01657 (2015).Google Scholar
- Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur. 2017. Protein Interface Prediction using Graph Convolutional Networks. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 6530--6539. https://proceedings.neurips.cc/paper/2017/hash/f507783927f2ec2737ba40afbd17efb5-Abstract.htmlGoogle Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM, 855--864. https://doi.org/10.1145/2939672.2939754Google ScholarDigital Library
- William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 1024--1034. https://proceedings.neurips.cc/paper/2017/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.htmlGoogle Scholar
- Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, 9726--9735. https://doi.org/10.1109/CVPR42600.2020.00975Google ScholarCross Ref
- Sepp Hochreiter and Jü rgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarDigital Library
- Huawei Huang, Wei Kong, Sicong Zhou, Zibin Zheng, and Song Guo. 2021. A Survey of State-of-the-Art on Blockchains: Theories, Modelings, and Tools. ACM Comput. Surv., Vol. 54, 2 (2021), 44:1--44:42. https://doi.org/10.1145/3441692Google ScholarDigital Library
- Wei Jin, Tyler Derr, Haochen Liu, Yiqi Wang, Suhang Wang, Zitao Liu, and Jiliang Tang. 2020. Self-supervised Learning on Graphs: Deep Insights and New Direction. CoRR, Vol. abs/2006.10141 (2020). showeprint[arXiv]2006.10141 https://arxiv.org/abs/2006.10141Google Scholar
- Longlong Jing and Yingli Tian. 2021. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 11 (2021), 4037--4058. https://doi.org/10.1109/TPAMI.2020.2992393Google ScholarCross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYglGoogle Scholar
- Sijia Li, Gaopeng Gou, Chang Liu, Chengshang Hou, Zhenzhen Li, and Gang Xiong. 2022a. TTAGN: Temporal Transaction Aggregation Graph Network for Ethereum Phishing Scams Detection. In WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, Fré dérique Laforest, Raphaël Troncy, Elena Simperl, Deepak Agarwal, Aristides Gionis, Ivan Herman, and Lionel Médini (Eds.). ACM, 661--669. https://doi.org/10.1145/3485447.3512226Google ScholarDigital Library
- Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George Chen. 2022b. Ecod: Unsupervised outlier detection using empirical cumulative distribution functions. IEEE Transactions on Knowledge and Data Engineering (2022).Google ScholarDigital Library
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining. 413--422. https://doi.org/10.1109/ICDM.2008.17Google ScholarDigital Library
- Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, and Jie Tang. 2020a. Self-supervised Learning: Generative or Contrastive. CoRR, Vol. abs/2006.08218 (2020). showeprint[arXiv]2006.08218 https://arxiv.org/abs/2006.08218Google Scholar
- Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, and Jie Tang. 2020b. Self-supervised Learning: Generative or Contrastive. CoRR, Vol. abs/2006.08218 (2020). [arXiv]2006.08218 https://arxiv.org/abs/2006.08218Google Scholar
- Damiano Di Francesco Maesa, Andrea Marino, and Laura Ricci. 2016. Uncovering the bitcoin blockchain: an analysis of the full users graph. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 537--546.Google ScholarCross Ref
- Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. 2016. A fistful of Bitcoins: characterizing payments among men with no names. Commun. ACM, Vol. 59, 4 (2016), 86--93. https://doi.org/10.1145/2896384Google ScholarDigital Library
- Satoshi Nakamoto. 2019. Bitcoin: A peer-to-peer electronic cash system. Technical Report. Manubot.Google Scholar
- Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, Vol. 35 (2022), 27730--27744.Google Scholar
- Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. 2020. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 5363--5370. https://ojs.aaai.org/index.php/AAAI/article/view/5984Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014, Sofus A. Macskassy, Claudia Perlich, Jure Leskovec, Wei Wang, and Rayid Ghani (Eds.). ACM, 701--710. https://doi.org/10.1145/2623330.2623732Google ScholarDigital Library
- Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. 2000. Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Rec., Vol. 29, 2 (may 2000), 427--438. https://doi.org/10.1145/335191.335437Google ScholarDigital Library
- Dorit Ron and Adi Shamir. 2013. Quantitative analysis of the full bitcoin transaction graph. In Financial Cryptography and Data Security: 17th International Conference, FC 2013, Okinawa, Japan, April 1-5, 2013, Revised Selected Papers 17. Springer, 6--24.Google ScholarCross Ref
- Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Comput., Vol. 13, 7 (jul 2001), 1443--1471. https://doi.org/10.1162/089976601750264965Google ScholarDigital Library
- Chaochen Shi, Yong Xiang, Jiangshan Yu, Longxiang Gao, Keshav Sood, and Robin Ram Mohan Doss. 2022. A Bytecode-based Approach for Smart Contract Classification. In IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022, Honolulu, HI, USA, March 15-18, 2022. IEEE, 1046--1054. https://doi.org/10.1109/SANER53432.2022.00122Google ScholarCross Ref
- Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang. 2003. A novel anomaly detection scheme based on principal component classifier. Technical Report. Miami Univ Coral Gables Fl Dept of Electrical and Computer Engineering.Google Scholar
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015, Aldo Gangemi, Stefano Leonardi, and Alessandro Panconesi (Eds.). ACM, 1067--1077. https://doi.org/10.1145/2736277.2741093Google ScholarDigital Library
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).Google Scholar
- Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2017. Graph Attention Networks. CoRR, Vol. abs/1710.10903 (2017). showeprint[arXiv]1710.10903 http://arxiv.org/abs/1710.10903Google Scholar
- Petar Velickovic, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. 2018. Deep Graph Infomax. CoRR, Vol. abs/1809.10341 (2018). [arXiv]1809.10341 http://arxiv.org/abs/1809.10341Google Scholar
- Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, and Zhongyuan Wang. 2019. Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 968--977. https://doi.org/10.1145/3292500.3330836Google ScholarDigital Library
- Jinhuan Wang, Pengtao Chen, Shanqing Yu, and Qi Xuan. 2021. TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts. CoRR, Vol. abs/2104.08767 (2021). [arXiv]2104.08767 https://arxiv.org/abs/2104.08767Google Scholar
- Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, and Charles E. Leiserson. 2019. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. CoRR, Vol. abs/1908.02591 (2019). showeprint[arXiv]1908.02591 http://arxiv.org/abs/1908.02591Google Scholar
- Gavin Wood et al. 2014. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, Vol. 151, 2014 (2014), 1--32.Google Scholar
- Jiajing Wu, Qi Yuan, Dan Lin, Wei You, Weili Chen, Chuan Chen, and Zibin Zheng. 2022. Who Are the Phishers? Phishing Scam Detection on Ethereum via Network Embedding. IEEE Trans. Syst. Man Cybern. Syst., Vol. 52, 2 (2022), 1156--1166. https://doi.org/10.1109/TSMC.2020.3016821Google ScholarCross Ref
- Qi Yuan, Baoying Huang, Jie Zhang, Jiajing Wu, Haonan Zhang, and Xi Zhang. 2020. Detecting Phishing Scams on Ethereum Based on Transaction Records. In IEEE International Symposium on Circuits and Systems, ISCAS 2020, Sevilla, Spain, October 10-21, 2020. IEEE, 1--5. https://doi.org/10.1109/ISCAS45731.2020.9180815Google ScholarCross Ref
- Yue Zhao, Zain Nasrullah, and Zheng Li. 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of Machine Learning Research, Vol. 20, 96 (2019), 1--7. http://jmlr.org/papers/v20/19-011.htmlGoogle Scholar
Index Terms
- SIEGE: Self-Supervised Incremental Deep Graph Learning for Ethereum Phishing Scam Detection
Recommendations
TGC: Transaction Graph Contrast Network for Ethereum Phishing Scam Detection
ACSAC '23: Proceedings of the 39th Annual Computer Security Applications ConferencePhishing scams have become the most serious type of crime involved in Ethereum. However, existing methods ignore the natural camouflage and sparse distribution of phishing scams in Ethereum leading to unsatisfactory performance, and they are also ...
Self-supervised Graph Learning with Segmented Graph Channels
Machine Learning and Knowledge Discovery in DatabasesAbstractSelf-supervised graph learning adopts self-defined signals as supervision to learn representations. This learning paradigm solves the critical problem of utilizing unlabeled graph data. Conventional self-supervised graph learning methods rely on ...
Phishing scam detection on ethereum: towards financial security for blockchain ecosystem
IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial IntelligenceIn recent years, blockchain technology has created a new cryptocurrency world and has attracted a lot of attention. It also is rampant with various scams. For example, phishing scams have grabbed a lot of money and have become an important threat to ...
Comments