A novel hybrid feature fusion model for detecting phishing scam on Ethereum using deep neural network

https://doi.org/10.1016/j.eswa.2022.118463Get rights and content

Highlights

  • A more effective approach to extract features from transaction records.

  • A model integrates manual feature engineering and transaction records analysing.

  • Achieves high F1 on the phishing scam accounts detection on Ethereum.

  • Outperforming the existing state-of-the-art methods.

Abstract

The development of blockchain technology has brought prosperity to the cryptocurrency market and has made the blockchain platform a hotbed of crimes. As one of the most rampant crimes, phishing scam has caused a huge economic loss to blockchain platforms and users. In order to address the threat to the financial security of blockchain, this paper proposes a model based on hybrid deep neural network to detect phishing scam accounts, namely LBPS (LSTM-FCN and BP neural network-based Phishing Scam accounts detection model), and verifies its effectiveness on Ethereum. The LBPS model provides a novel approach to analyse transaction records by adopting the BP neural network to obtain the implicit relationship between features extracted from transaction records and the LSTM-FCN neural network to capture the temporal feature from all transaction records of a target account. The experimental results demonstrate that the features selected in this paper could identify phishing scam accounts effectively. Moreover, the LBPS model performs better than the existing methods and baseline models with an F1-score of 97.86%.

Introduction

Blockchain is the underlying core technology of many cryptocurrencies, such as Bitcoin and Ethereum (Dan et al., 2020). According to coinmarkectcap.com,1 currently there are over 10,000 cryptocurrencies (or tokens) with a total market capitalization of over 1.2 trillion dollars. The World Economic Forum has conducted a prediction of the application of blockchain in financial scenarios and concluded that blockchain will reshape the financial market infrastructure in many aspects, including payments, insurance, deposits, and investment.2

However, with blockchain being widely used, its security issues have emerged gradually (Li et al., 2020, Quamara and Singh, 2022). Security threats against cryptocurrency applications and various crimes against blockchain platforms are showing in high incidence. Among these various crimes, the phishing scam is very rampant in many blockchain platforms (Chen et al., 2020a). From 2016 to 2018, researchers had discovered more than 2,000 phishing scam accounts on Ethereum which have defrauded over 36 million dollars worth of cryptocurrency from nearly 40,000 accounts.3 The increase in economic crimes raises public concerns about the security and further development of blockchain, affecting the value of cryptocurrencies seriously.

As the next-generation cryptocurrency and the platform of decentralized applications, Ethereum has made a great innovation and development of blockchain technology. The Ether, which supports the operation of Ethereum, is the second-largest cryptocurrency in terms of market capitalization and is worth over 400 billion dollars. However, according to a report from Chainalysis,4 the number of payment transactions absorbed by scams on Ethereum increased by 48% in 2020. Furthermore, phishing scams have accounted for more than 50% of all cyber-crimes since 2017,5 becoming the main threat to the financial security of Ethereum. Thus, an effective method for detecting phishing scam accounts on Ethereum is urgently needed.

Compared with traditional phishing (Gupta et al., 2018), phishing scams on Ethereum behave differently in the following three aspects. Firstly, phishing scams on Ethereum focus on the cryptocurrency assets of victims, while traditional phishing scams mostly target the privacy information of victims (Chen et al., 2020a). Secondly, the transaction records of Ethereum are publicly accessible, which benefits researchers to access and analyse them. Finally, the dataset of traditional phishing scams usually contains data of phishing websites (Almomani et al., 2022, Gupta et al., 2021, Rao and Pais, 2019), phishing emails (Alhogail and Alsabih, 2021, Salhi et al., 2021), and phishing SMS (Short Message Service) (Mishra & Soni, 2020), while the dataset of phishing scams on Ethereum only contains the label data of accounts. Therefore, a new Ethereum phishing scams dataset has to be established.

Currently, some researchers have conducted researches on phishing scam accounts detection for blockchain platforms such as Ethereum. The existing methods, considering the difference of feature extraction, can be divided into two types: methods based on manual feature engineering and methods based on transaction records analysing. Methods based on manual feature engineering mainly rely on appropriate and effective features manually extracted and train the classifier based on traditional machine learning models (Chen et al., 2020a, Farrugia et al., 2020). Methods based on transaction records analysing rely on the effective method used to extract feature vectors from transaction graph, such as various graph embedding techniques (Chen et al., 2020b, Yuan et al., 2020a, Yuan et al., 2020b).

However, the existing methods for detecting of phishing scam accounts for Ethereum mainly suffer two main challenges:

Firstly, there is no research that integrates the method based on transaction records analysing and the method based on manual feature engineering into one model to detect phishing scam accounts. The transaction records can reveal the behaviour of the accounts and the features extracted by manual feature engineering can reflect the state of accounts. However, only analysing the features of accounts or transaction records makes it difficult to achieve a comprehensive and accurate characterization of the accounts, hindering the accurate detection of phishing scam accounts.

Secondly, methods based on manual feature engineering and transaction records analysing still need further enhancement to achieve better performance. The methods based on manual feature engineering rely on the features selected manually by the researcher, whose experience determines the effectiveness of the selected features. In addition, a part of features are derived from several basic features, such as the maximum, minimum, and mean of the original data (Farrugia et al., 2020, Wen et al., 2021). As a consequence, the features obtained are not very effective though they are large in number. Meanwhile, the existing methods based on transaction records analysing mainly rely on graph embedding techniques, resulting that the detection model focuses on the topological characteristics of accounts but ignores the temporal characteristics.

To address these challenges of the existing methods above, this paper proposes the LBPS model based on a hybrid deep neural network to detect phishing scam accounts on Ethereum. The LBPS model integrates the method based on manual feature engineering and the method based on transaction records analysing to perform comprehensive and effective feature extraction and achieve better performance on phishing scam accounts detection. Firstly, the addresses of labelled phishing scam accounts are obtained from etherscan.io6 and a transaction network of phishing scams on Ethereum is constructed based on these labelled accounts and related transaction records. Then, transfer features and state features are extracted by method based on manual feature engineering and transaction features are extracted by method based on transaction records analysing. Finally, a hybrid deep neural network model based on LSTM-FCN and BP neural network is constructed for further representation learning and phishing scam accounts detection. In the experiments, the effectiveness of the features is evaluated and the LBPS model is compared with the baseline models of existing methods.

Compared with existing studies, the main contributions of this paper are summarized as follows:

  • In order to detect phishing scam accounts on Ethereum, we propose the LBPS model that integrates the methods based on manual feature engineering and transaction records analysing, based on a hybrid deep neural network. The experimental results demonstrate that the LBPS model achieves high F1-score on the problem of the phishing scam accounts detection on Ethereum, outperforming the existing methods and baseline models.

  • This paper provides a new approach to extract features from transaction records. Compared with existing detection models based on manual feature engineering and transaction records analysing, the LBPS model that adopts LSTM-FCN and BP neural network can extract effective features. As an improvement of the methods based on manual feature engineering, the BP neural network is adopted to obtain the implicit relationship between the features extracted, improving the performance of the LBPS model. In addition, by treating transaction records as time series data and adopting the LSTM-FCN, the LBPS model can extract effective temporal features from the transaction records of the phishing scam accounts. The experiments demonstrate that this method is effective.

The rest of this paper is organized as follows. In Section 2, related works and achievements in this field are introduced. In Section 3, the proposed LBPS model is elaborated. The experiments design and results are described in Section 4. Finally, Section 6 concludes our work and plans for further research.

Section snippets

Related work

In this section, we summarized the studies on phishing scam accounts detection in recent years. All the studies have utilized the publicly available transaction records on the blockchain. By constructing a transaction graph where nodes denote the accounts and edges denote transactions, the problem of phishing scam accounts detection is transferred to the problem of the node classification. Considering the difference in feature extraction, the existing methods can be divided into two types:

Proposed method

In this section, we describe in detail the proposed method, which is shown in Fig. 1, which includes three main modules: Data Collection and Preprocessing Module, Feature Extraction Module, and Detection Module.

  • 1.

    Data Collection and Preprocessing Module: This module is used to collect data for subsequent use. We collected phishing addresses and benign addresses from etherscan.io, Ethereum transaction records from an Ethereum client, and statistical data of addresses calculated by using BigQuery.8

Experiments

In this section, we designed three experiments to evaluate the effectiveness of the proposed model for detecting phishing scam accounts on Ethereum. All experiments were conducted in a workstation environment equipped with Intel(R) Xeon(R) CPU E5-1650 v4 and NVIDIA RTX 2080 8G. The EPS-Dataset built in the paper is used in all experiments and the details of EPS-Dataset can be found in Section 3. In the experiments, we used 90% of the dataset for training set and the rest for testing. The

Theoretical implications

In our work, we propose a novel model for detecting phishing scam accounts on Ethereum, which provides new ideas and methods on transactions analysis for future research.

Firstly, a novel detection model called LBPS is proposed, which integrates the methods based on manual feature engineering and transaction records analysis, based on a hybrid deep neural network. The unique structure of the model makes it highly efficient, outperforming the existing methods. We believe that this point is

Conclusion

With the development of blockchain technology and the prosperity of cryptocurrency, phishing scams on blockchain have become increasingly rampant and the main threat to the security of blockchain. However, the existing methods to detect phishing scam accounts still need further improvement for the effective and comprehensive feature extraction as well as the overall performance of detecting.

In this paper, we propose a hybrid deep neural network-based phishing scam accounts detection model on

CRediT authorship contribution statement

Tingke Wen: Methodology, Software, Validation, Writing – original draft. Yuanxing Xiao: Methodology, Software, Writing – original draft. Anqi Wang: Methodology, Software. Haizhou Wang: Conceptualization, Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) under grant nos. 61802271, 61802270, 81602935, and 81773548. In addition, this work is also partially supported by Joint Research Fund of China Ministry of Education and China Mobile Company (No. CM20200409), the Key Research and Development Program of Science and Technology Department of Sichuan Province, PR China (No. 2020YFS0575), and the Sichuan University and Yibin Municipal People’s Government University and

References (32)

  • RaoR.S. et al.

    Jail-Phish: An improved search engine based phishing detection system

    Computers & Security

    (2019)
  • TuncelK.S. et al.

    Autoregressive forests for multivariate time series modeling

    Pattern Recognition

    (2018)
  • ZouX. et al.

    Integration of residual network and convolutional neural network along with various activation functions and global pooling for time series classification

    Neurocomputing

    (2019)
  • AlmomaniA. et al.

    Phishing website detection with semantic features based on machine learning classifiers-A comparative study

    International Journal on Semantic Web and Information Systems

    (2022)
  • Chen, W., Guo, X., Chen, Z., Zheng, Z., & Lu, Y. (2020). Phishing Scam Detection on Ethereum: Towards Financial...
  • ChenL. et al.

    Phishing scams detection in ethereum transaction network

    ACM Transactions on Internet Technology

    (2020)
  • Cited by (0)

    View full text