Abstract
Ethereum is one of the largest blockchain programming platforms. Users in Ethereum are identified using public-private key addresses, which are difficult to connect to real-world identities. This has led to a variety of illegal activities being encouraged. However, based on their transactions’ functional roles, these addresses can be linked and identified. In this paper, we proposed a methodology for predicting the functional roles of Ethereum addresses using machine learning. We build machine learning models to predict the functional role of an address based on various features derived from the transactional history over varying window sizes. We have used labeled dataset of 300 million transactions that are publicly available on the Ethereum blockchain. The test data results show that the XGBoost classifier with eleven features vector and 200 window sizes can predict the role of an unseen address with the best achievable accuracy of 73%. We have also trained and tested the deep learning models on the dataset, CNN model predicted the labels with 86% accuracy. Using machine learning models, we have also devised a measure of anonymity and compared it for unlabelled addresses. Further, to qualitatively validate our prediction, we also discovered Ethereum addresses used on the dark web pages and predicted their functional roles with our trained models. Most of these addresses were behaving like Wallet_app, Shapeshift, and Mining and this prediction was aligned with the background information extracted from the context of address usage on the dark web page.
Similar content being viewed by others
Availability of data and materials
Data will be available on the basis of the request.
Code availability
Code repository of implementations and results available on request.
References
Wood G et al (2014) Ethereum: A secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper 151(2014):1–32
Market C (2020) Coin Market Cap, Top 100 Cryptocurrencies by Market Capitalization. https://coinmarketcap.com/
Bambrough B (2020) Ethereum Is Beating Bitcoin In More Ways Than One. https://www.forbes.com/sites/billybambrough/2020/07/14/ethereum-is-beating-bitcoin-in-more-ways-than-one/#5d0f6a783c69
Butler S (2019) Criminal use of cryptocurrencies: a great new threat or is cash still king? J Cyber Policy 4(3):326–345. https://doi.org/10.1080/23738871.2019.1680720
Foley S, Karlsen JR, Putniņš TJ (2019) Sex, drugs, and bitcoin: How much illegal activity is financed through cryptocurrencies? Rev Financ Stud 32(5):1798–1853
Groysman I (2018) Revolution in Crime: How Cryptocurrencies Have Changed the Criminal Landscape, CUNY Academic Works. https://academicworks.cuny.edu/jj_etds/87
Yeoh P (2019) Banks’ vulnerabilities to money laundering activities. J Money Laund Control
ERAZO F (2020) Bitcoin Activity on the Dark Web Grew by 65% in Q1 2020, Says Study. https://cointelegraph.com/news/bitcoin-activity-on-the-dark-web-grew-by-65-in-q1-2020-says-study. Accessed 12 Jan 2020
Harlev MA, SunYin H, Langenheldt KC, Mukkamala R, Vatrapu R (2018) Breaking bad: De-anonymising entity types on the bitcoin blockchain using supervised machine learning. In: Proceedings of the 51st Hawaii International Conference on System Sciences, pp 3497–3506
Biryukov A, Khovratovich D, Pustogarov I (2014) Deanonymisation of clients in bitcoin p2p network. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. CCS ’14, pp. 15–29. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2660267.2660379
Santamaria Ortega M (2013) The bitcoin transaction graph anonymity
Spagnuolo M, Maggi F, Zanero S (2014) Bitiodine: Extracting intelligence from the bitcoin network. In: International Conference on Financial Cryptography and Data Security, pp. 457–468. Springer
Klusman R (2018) Deanonymisation in ethereum using existing methods for bitcoin
Béres F, Seres IA, Benczúr AA, Quintyne-Collins M (2020) Blockchain is watching you: Profiling and deanonymizing ethereum users. arXiv preprint arXiv:2005.14051
Victor F (2017) Address clustering heuristics for Ethereum
Abraham J, Higdon D, Nelson J, Ibarra J (2018) Cryptocurrency price prediction using tweet volumes and sentiment analysis. SMU Data Sci Rev 1(3):1
Chen M, Narwal N, Schultz M (2019) Predicting price changes in ethereum. International Journal on Computer Science and Engineering (IJCSE) ISSN, 0975–3397
Kumar D, Rath S (2020) Predicting the trends of price for ethereum using deep learning techniques. In: Artificial Intelligence and Evolutionary Computations in Engineering Systems, pp. 103–114. Springer
Lamon C, Nielsen E, Redondo E (2017) Cryptocurrency price prediction using news and social media sentiment. SMU Data Sci Rev 1(3):1–22
Chen T, Zhu Y, Li Z, Chen J, Li X, Luo X, Lin X, Zhange X (2018) Understanding ethereum via graph analysis. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp 1484–1492. IEEE
Wil (2019) Ethereum 101, Externally Owned Accounts (EOAs). https://kauri.io/ethereum-101-part-4-accounts-transactions-and-me/7e79b6932f8a41a4bcbbd194fd2fcc3a/a
Zheng Z, Xie S, Dai H, Chen X, Wang H (2017) An overview of blockchain technology: Architecture, consensus, and future trends. In: 2017 IEEE International Congress on Big Data (BigData Congress), pp. 557–564
Wahab A, Mehmood W (2018) Survey of consensus protocols. CoRR abs/1810.03357 arXiv:1810.03357
Qian C, Ouyang K (2015) Predicting bitcoin transactions from blockchain records through recursive clustering
Ron D, Shamir A (2013) Quantitative analysis of the full bitcoin transaction graph. In: International Conference on Financial Cryptography and Data Security, pp 6–24. Springer
Meiklejohn S, Pomarole M, Jordan G, Levchenko K, McCoy D, Voelker GM, Savage S (2013) A fistful of bitcoins: characterizing payments among men with no names. In: Proceedings of the 2013 Conference on Internet Measurement Conference, pp 127–140
Jourdan M, Blandin S, Wynter L, Deshpande P (2018) Characterizing entities in the bitcoin blockchain. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp 55–62. IEEE
Ermilov D, Panov M, Yanovich Y (2017) Automatic bitcoin address clustering. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 461–466. IEEE
Wu J, Lin D, Zheng Z, Yuan Q (2019) T-edge: Temporal weighted multidigraph embedding for ethereum transaction network analysis. arXiv preprint arXiv:1905.08038
Ferretti S, D’Angelo G (2019) On the ethereum blockchain structure: A complex networks theory perspective. Concurr Comput Pract Exp 5493
Chan W, Olmsted A (2017) Ethereum transaction graph analysis. In: 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), pp 498–500. IEEE
Higgins S (2016) Gatecoin Hack. https://www.coindesk.com/gatecoin-2-million-bitcoin-ether-security-breach
Chen W, Zheng Z, Cui J, Ngai E, Zheng P, Zhou Y (2018) Detecting ponzi schemes on ethereum: Towards healthier blockchain technology. In: Proceedings of the 2018 World Wide Web Conference, pp 1409–1418
Liu L, Tsai W-T, Bhuiyan MZA, Peng H, Liu M (2021) Blockchain-enabled fraud discovery through abnormal smart contract detection on ethereum. Futur Gener Comput Syst. https://doi.org/10.1016/j.future.2021.08.023
Bartoletti M, Carta S, Cimoli T, Saia R (2020) Dissecting ponzi schemes on ethereum: Identification, analysis, and impact. Futur Gener Comput Syst 102:259–277. https://doi.org/10.1016/j.future.2019.08.014
Payette J, Schwager S, Murphy JW (2017) Characterizing the ethereum address space
Day A (2018) Ethereum Dataset hosted on Google BigQuery. https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics
Etherscan: Dex label (2020) https://etherscan.io/accounts/label/dex
Etherscan: Exchange label (2020) https://etherscan.io/accounts/label/exchange
Etherscan: Mining label (2020) https://etherscan.io/accounts/label/mining
Etherscan: ICO wallet label (2020) https://etherscan.io/accounts/label/ico-wallets
Etherscan: Walletapp label (2020) https://etherscan.io/accounts/label/wallet-app
Etherscan: Mining label (2020) https://etherscan.io/accounts/label/bitfinex
Etherscan: Compromised label (2020) https://etherscan.io/accounts/label/compromised
Etherscan: Mining label (2020) https://etherscan.io/accounts/label/shapeshift
Etherscan: Mining label (2020) https://etherscan.io/accounts/label/phish-hack
Kastner E (2020) HISTORY OF THE DARK WEB. https://www.soscanhelp.com/blog/history-of-the-dark-web
NIG (2020) Taking on the Dark Web: Law Enforcement Experts ID Investigative Needs. https://nij.ojp.gov/topics/articles/taking-dark-web-law-enforcement-experts-id-investigative-needs
Goodison SE, Woods D, Barnum JD, Kemerer AR, Jackson BA (2019) Identifying law enforcement needs for conducting criminal investigations involving evidence on the dark web
Lee S, Yoon C, Kang H, Kim Y, Kim Y, Han D, Son S, Shin S (2019) Cybercriminal minds: an investigative study of cryptocurrency abuses in the dark web. In: Network and Distributed System Security Symposium, pp 1–15. Internet Society
Daniel: Daniel Page (2020) http://danielas3rtn54uwmofdo3x2bsdifr47huasnmbgqzfrec5ubupvtpid.onion.ly/. Accessed 15 Nov 2020
Tor: Donations for Tor Development (2020) https://onionsearchengine.com/donation.php. Accessed 4 Feb 2021
Tor: Guerrilamail (2020) http://grrmailb3fxpjbwm.onion.ly. Accessed 4 Feb 2021
Etherscan: Word Label Cloud (2020) https://etherscan.io/labelcloud
Acknowledgements
This research project was partially funded by Blockchain Research Lab at Information Technology University (ITU), Lahore, Pakistan.
Funding
This research project was partially funded by the National Center of Cyber Security (NCCS) Pakistan.
Author information
Authors and Affiliations
Contributions
Tania Saleem: Conceptualization, Data Curation, Software, Validation, Visualization, Roles/Writing - original draft, Writing - review & editing, Methodology, Investigation, Formal Analysis, Project Administration. Muhammad Ismael: Conceptualization, Methodology, Software, Data Curation, Validation, Visualization, Writing - review & editing. Muhammad Umar Janjua: Conceptualization, Resources, Funding acquisition, Methodology, Writing-review & editing, Supervision, Project Administration. Abdul Rehman Ali: Methodology, Software, Visualization, Writing - review & editing. Awab Aqib: Roles/Writing - original draft, Writing - review & editing, Methodology. Ali Ahmed: Methodology, Supervision, Writing-review & editing. Saeed-ul Hassan: Supervision, Writing-review & editing.
Corresponding author
Ethics declarations
Ethics approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
On the basis of their contribution, all authors agree to participate.
Consent for publication
This publication has been consented to by all authors.
Conflict of interest/Competing interests
The Authors do not have any conflict to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection: 3 - Track on Blockchain
Guest Editor: Haojin Zhu
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saleem, T., Ismaeel, M., Janjua, M.U. et al. Predicting functional roles of Ethereum blockchain addresses. Peer-to-Peer Netw. Appl. 16, 2985–3002 (2023). https://doi.org/10.1007/s12083-023-01553-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-023-01553-2