Keywords

1 Introduction

The rapid development of science and technology has prompted the financial industry to move from the real economy to Internet finance, especially with the emergence of blockchain technology, virtual digital currencies represented by Bitcoin have poured into the financial market. With virtual currency, low-cost, peer-to-peer cross-border transactions can be realized. The use of blockchain to implement transactions has lowered the entry threshold of the financial industry to a certain extent. Because the user participation, transaction and consensus mechanism of blockchain are open, the user scale is dynamic, and the participant identity is anonymous, which will lead to the more hidden process of money laundering crime, and the money laundering methods tend to be more complex and intelligent, thus bringing new problems and challenges to the traditional anti-money laundering supervision system.

Although it is difficult to achieve the supervision of blockchain transactions in traditional ways, the complete transaction data is open and transparent on the blockchain, mining the transaction data on the chain, establishing a multi-dimensional data model, and using technologies such as big data and artificial intelligence to achieve data-driven intelligent supervision schemes have become a new solution direction, and the primary goal of this intelligent supervision scheme is to accurately identify abnormal transactions on the blockchain: identify suspicious users (such as members of money laundering organizations) or suspicious transactions (such as credit card fraud transactions).

At present, representation learning based on graph structure data has become an important machine learning task, which is universally applicable in various structures such as social networks, cooperative networks, protein networks, and so on. Transactions on the blockchain can also be mapped as graph structure, the current blockchain anomaly detection model based on graph representation learning is mostly designed for static graphs, but transactions on the blockchain will change dynamically over time, as users continue to trade, the graph structure will also continue to change, so we not only need to pay attention to the information of the current moment in the graph, but also need to analyze the information of the historical moment on the graph.

Therefore, this paper combines the advantages of graph convolutional networks (GCN) in graph structure extraction, and the advantages of gated Recurrent Unit (GRU) in learning time series, and designs an anomaly detection model DyAEGCN learning graph structure and temporal characteristic information, which is used to mine the information in blockchain transaction data and realize a more essential portrayal of transaction data. Designed to improve abnormal transaction detection performance.

The DyAEGCN model takes the autoencoder as the framework. First of all, the encoder uses GCN to learn the structural characteristics of the network, aggregate the neighbor information of the nodes, and at the same time use the GRU adaptive update parameters to learn the time dynamics in the network, and finally build the loss in the comparison of the decoder reconstruction network with the real network. In this paper, the experiment is used to evaluate the node classification task, and compared with other algorithms, the experiment shows that the DyAEGCN model is better than the comparison algorithm in the edge classification task.

2 Related Work

The rapid development of blockchain technology has prompted the transformation of the Internet of Information to the Internet of Value, which has a wide range of application scenarios. However, due to the lack of regulatory mechanisms, many risks or violations of laws and regulations have been derived, such as money laundering, tax evasion and illegal ICO financing. Countries have incorporated blockchain technology into their regulatory systems, and abnormal transaction detection has played a positive role in promoting the healthy development of the blockchain industry. The conventional solution for abnormal transaction detection is to design an alert system based on fixed threshold rules to detect and flag suspicious transactions, and then make human decisions or judgments on suspicious behaviors. However, the challenges faced by such regulatory schemes are reported: 1) How to construct effective rules from massive and heterogeneous transaction data, and keep the rules advanced and relevant; 2) How to set alarm thresholds for the calibration of suspicious transaction behaviors.

The emergence of Internet finance such as virtual “digital currencies” has created enormous challenges for rules-based regulatory solutions. It has become a trend to break the traditional supervision thinking and build an intelligent supervision scheme based on data and using technologies such as artificial intelligence and big data analysis. The research work on abnormal transaction detection in intelligent supervision mainly focuses on supervised learning and unsupervised learning. Learn. Supervised learning predicts the classification of unknown data samples (test set) by using labeled data (training set) to learn to discriminate binary (such as legal versus illegal transactions) or multi-class machine learning detection models. Unsupervised learning explores the structure and characteristics of unlabeled data, finds the optimal division of clusters or classes, and regards outliers far from other sample points as outliers, that is, abnormal data. For example, Jullum et al. [1] used information such as sender/receiver background, transaction early behavior, and transaction history to train an XGBoost supervised predictive model to identify potential money laundering behaviors in financial transactions and applied to banks. Paula et al. [2] extracted 18 important features from related categories such as registration information, financial transactions, and electronic invoices, and combined auto-encoder (AE, auto-encoder) algorithm to train an unsupervised deep learning model to detect and anti-money laundering-related export fraud;

At present, representation learning based on graph-structured data has become an important machine learning task. The basic idea of graph representation learning is to learn the low-dimensional vector representation of nodes, which requires the vector to retain the structural information and attribute information of nodes in the graph as much as possible etc., are generally applicable in various structures such as social networks, cooperative networks, protein networks, etc. Transactions on the blockchain can also be mapped to a financial network with users as nodes and transactions between users as edges. For example, Weber et al. [3] map Bitcoin transactions into a huge and complex graph structure, and extract the number of transactions and transaction amounts and other related features, and then use the graph convolution network (GCN, graph convolution network) algorithm to distinguish illegal and legal transactions.

At present, most static graph representation learning can effectively learn the vector representation of nodes, but a large amount of real data in life shows complex time characteristics, and transactions based on blockchain are also dynamic. The graph structure and its attributes will evolve dynamically over time. Nodes and edges in the graph will be inserted and deleted over time, and node attributes and edge attributes will also change over time. Therefore, we not only need to pay attention to the information of the current moment in the graph, but also need to analyze the information of the historical moment on the graph. In this context, dynamic network modeling is important to accurately predict node attributes and future links.

As graph convolutional neural networks show great advantages in obtaining graph structure information, a new dynamic graph embedding learning method is to combine GCN and RNN, where GCN is used to extract information on graph structure, and RNN is used to extract information on graph structure. Used to model dynamic changes in the time dimension. Seo et al. [4] proposed two GCRN architectures, the common feature of which is to use GCN to learn the vector representation of nodes, and then input the vector sequence learned for a period of time into the LSTM model to model the dynamics on the sequence. The only difference between the two architectures is that one of the models modifies the Euclidean 2D convolution operation in the traditional LSTM to a graph convolution operation. Similarly, Manessi et al. [5] proposed WD-GCN/CD-GCN combining variants of LSTM and extended graph convolution operations to model graph structure and its long- and short-term dependencies. The difference is that the input of WD-GCN is a sequence of graphs, while the input of CD-GCN is an ordered sequence of corresponding node features. The Evolve-GCN model changes the idea of learning the dynamics of node representation in time series in the previous method, and instead learns the dynamics of GCN parameters in time series.

In view of the current situation and development trend of intelligent blockchain transaction supervision scheme, this paper takes full advantage of the advantages of graph convolutional neural network and recurrent neural network, and designs a dynamic graph representation learning model for mining blockchain transactions. The information in the data is designed to improve abnormal transaction detection performance.

3 The Structure of DynAEGCN

3.1 Problem Definition

We define dynamic graphs and representation learning for dynamic graphs as follows: a dynamic graph is represented as a sequence of multiple static graphs:

$$ \begin{array}{*{20}c} {G = \left\{ {G_1 ,\;G_2 , \ldots ,G_T } \right\},} \\ \end{array} $$
(1)

Where \(G_t = \left( {V_t ,E_t } \right)\) denotes the snapshot at time \({ }t,t \in \left\{ {1,2,...,T} \right\}\), The adjacency matrix of \(G_T\) is \(A_t \in R^{N \times N}\). A node representation on a dynamic graph is learned as a sequence:

$$ \begin{array}{*{20}c} {F = \left\{ {f_1 ,f_2 , \ldots ,f_T } \right\},\forall t \in \left\{ {1,2, \ldots ,T} \right\},} \\ \end{array} $$
(2)

Each of the mappings maps the nodes to a low-dimensional vector \((y_t )_v = f_t \left( v \right)\), so that the mapped vector can retain the original information of the node. That is to say, the more similar two points are in the original image, the closer their mapped vectors are.

We consider a multi-layer Graph Convolutional Network (GCN) with the following layer-wise propagation rule:

$$ \begin{array}{*{20}c} {H^{l + 1} = \sigma \left( {\hat{D}^{ - \frac{1}{2}} \hat{A}\hat{D}^{ - \frac{1}{2}} { }H^{\left( l \right){ }} W^{\left( l \right)} } \right),} \\ \end{array} $$
(3)

Here, \(\hat{A} = A + I\), \(I\) is the adjacency matrix of the undirected graph G with added self-connections. \({ }\hat{D}_{ii} = \sum j\hat{A}_{ij}\), \(\hat{D}{ }\) is the degree matrix of \(\hat{A}\); the operation \(\hat{D}_t^{ - \frac{1}{2}} \hat{A}_t \hat{D}_t^{ - \frac{1}{2}}\) is a heterogeneous normalization of the adjacency matrix as an approximate graph Convolution filter; \(W_t^l\) is the weight matrix of the lth layer; \({\upsigma }\left( \cdot \right)\) denotes an activation function. The input \(X_t^0\) of the first layer of the network is the feature matrix of the node, and each row of the matrix is the K-dimensional feature vector of each node.

3.2 The Architecture of DynAEGCN

This paper aims to solve the problems of complexity and dynamics in the dynamic transaction network of the blockchain. The model proposed in this paper adopts the classic unsupervised self-encoding framework learning, which uses the encoder to encode the input graph \(A\) to generate the feature \(X\), and the feature \(X\) is generated using the decoder, by minimizing the distance between \(A\) and \(A^{\prime}\), allowing the decoder to learn the ability to predict the graph while the encoder maps the input graph to a vector space (Fig. 1).

Fig. 1.
figure 1

The Architecture of DynAEGCN

  1. 1)

    In the encoder part, the graph convolutional network (GCN) model is adopted in the time dimension, and the structure and temporal feature information of the graph are learned by using the RNN to evolve the GCN parameters, in which the GCN adopts a two-layer neural network. This approach effectively performs model tuning, which focuses on the model itself rather than node embeddings. Therefore, there is no limit to node changes. Furthermore, for future graphs with new nodes without historical information, the evolved GCN is still plausible for them.

  2. 2)

    The decoder can reconstruct the original graph and compare it with the original graph to construct the loss, so that the link relationship between nodes can be learned more accurately.

3.3 Encoder

In the encoder part, we take advantage of the graph structure extraction advantage of GCN to learn the structural information under each time slice. GCN aggregates neighbor information through the defined spectral graph convolution, thereby extending the idea of convolution to the graph. Formally, for \(G_t = \left( {V_t ,E_t } \right)\) at time t, the input of the lth layer of GCN is the vector \(X_t^l\) and the adjacency matrix \(A_t\) output by the \(l - 1\) th layer, and the output is the updated node vector \(X_t^{l + 1}\). The operation at the \(l\) level is expressed as:

$$ \begin{array}{*{20}c} {X_t^{l + 1} = F\left( {X_t^l ,A_t ,W_t^l } \right) = \sigma \left( {\hat{D}_t^{ - \frac{1}{2}} \hat{A}_t \hat{D}_t^{ - \frac{1}{2}} X_t^l W_t^l } \right),} \\ \end{array} $$
(4)

Among them, the superscript \(l\) represents the l-th convolutional layer, and the subscript t represents the t-th time step; \(\hat{A}_t = A_t + I\), \(I\) is the identity matrix; \({ }\hat{D}_{ii} = \sum j\hat{A}_{ij}\), \(\hat{D}{ }\) is the degree matrix of \(\hat{A}\); the operation \(\hat{D}_t^{ - \frac{1}{2}} \hat{A}_t \hat{D}_t^{ - \frac{1}{2}}\) is a heterogeneous normalization of the adjacency matrix as an approximate graph Convolution filter; \(W_t^l\) is the weight matrix of the lth layer at time t; \(\sigma\) is the nonlinear activation function. The input \(X_t^0\) of the first layer of the network is the feature matrix of the node at time t, and each row of the matrix is the K-dimensional feature vector of each node. After the graph convolution operation of L layers, the neighbor information of the node is aggregated in the output vector of each time slice.

Considering the dynamic nature of the graph, the dynamic convolution layer adds an update mechanism to the static GCN architecture. Because when the graph structure changes, the weight parameters of the convolution operation should also be updated in order to adapt to the new graph structure. Recurrent Neural Network (RNN) is a type of recurrent neural network that takes sequence data as input, performs recursion in the evolution direction of the sequence, and connects all nodes (recurrent units) in a chain. Recurrent neural network has memory, parameter sharing and Turing completeness, so it has certain advantages in learning nonlinear characteristics of sequences.

In this paper, the RNN component is used to update the weight parameters of the GCN model. For each \(t \in \left\{ {1,2,...,T} \right\}\) and \(l \in \left\{ {1,2,...,L} \right\}\), the RNN components use the parameters of The initial value is used as input, and the updated \(W_t^l\) is output. Gated Recurrent Unit (GRU) is a variant of Recurrent Neural Network (RNN). Since RNN has the problem of gradient dispersion and gradient explosion, it is often far from the expected effect, so the GRU network is proposed. RNN and GRU networks are also modeled using the previous hidden state and the current input, the difference is that the latter uses reset gates and update gates in the internal structure of the device. GRU can introduce richer graph structure information for the update of weight parameters, so our architecture adopts the implementation of GRU. The weight update method of the lth layer at time t is as follows:

$$ \begin{array}{*{20}c} {W_t^l = G\left( {X_t^l ,W_{t - 1}^l } \right) = \left( {1 - Z_t^l } \right) \circ W_{t - 1}^l + Z_t^l \circ \hat{W}_t^l ,} \\ \end{array} $$
(5)
$$ \begin{array}{*{20}c} {Z_t^l = \sigma \left( {U_Z^l X_t^l + V_Z^l W_{t - 1}^l + B_Z^l } \right),} \\ \end{array} $$
(6)
$$ \begin{array}{*{20}c} {R_t^l = \sigma \left( {U_R^l X_t^l + V_R^l W_{t - 1}^l + B_R^l } \right),} \\ \end{array} $$
(7)
$$ \begin{array}{*{20}c} {\hat{W}_t^l = tanh\left( {U_W^l X_t^l + V_W^l \left( {R_t^l \circ W_{t - 1}^l } \right) + B_W^l } \right),} \\ \end{array} $$
(8)

Among them, \(Z_t^l\), \(R_t^l\), \(\hat{W}_t^l \) are the update gate output, reset gate output and pre-output, respectively.

The update of the weight matrix can be seen as applying standard GRU operations to each column of the matrix. The standard GRU operation is for between vectors, and the process of updating the GCN weight matrix is for the operation between matrices. The weight matrix \(\hat{W}_t^l\) at time t is used as the hidden state of the GRU; the node representation matrix \(X_t^l\) at the lth layer at time t is used as the input of the GRU unit to introduce the information of the current time; the GRU unit outputs the updated \(W_{t + 1}^l\), as the weight matrix at the next moment. The calculation of \(W_{t + 1}^l\) includes the information of the historical moment and the current moment. Since the weight matrix \(\hat{W}_t^l\) and the node representation matrix \(X_t^l\) have different column dimensions, the sampling of \(X_t^l\) is newly added to the network layer operation of this layer to achieve the same number of columns as \(\hat{W}_t^l\) Scheduling Method.

The GCN module aggregates the neighbor information of nodes, while the GRU updates the weight parameters with the time dimension. That is, the encoder can be expressed as:

$$ \begin{array}{*{20}c} {X_t^{l + 1} = F\left( {X_t^l ,A_t ,W_t^l } \right) = F\left( {X_t^l ,A_t ,G\left( {X_t^l ,W_{t - 1}^l } \right)} \right),} \\ \end{array} $$
(9)

3.4 Decoder

The decoder reconstructs the adjacency matrix from the information of the first t time steps learned by the encoder, which is the topological map of the predicted time t + 1. The decoder uses the dot product to reconstruct the original image, and the decoding process is expressed as:

$$ \begin{array}{*{20}c} {\hat{A} = \sigma \left( {ZZ^T } \right),} \\ \end{array} $$
(10)

where \(\hat{A}\) denotes the reconstructed adjacency matrix.

The adjacency matrix directly determines the topology of the graph, so the goal of this model is to make the reconstructed adjacency matrix as similar to the original adjacency matrix as possible, compare the two to construct a loss, and backpropagate the updated parameters to learn the representation of the hidden layer nodes.

3.5 Loss Function

To test the representation ability of the model, we train the model on a specific edge classification task. The task of edge classification has strong practical significance in many real-world scenarios. For example, to identify crimes in financial networks, it is necessary to conduct edge classification research on the connection between two accounts. The edge classification task under dynamic graph aims to predict the edge label category of an edge (u, v) at time t. To classify an edge, we need the node vector representation of the two endpoints of the edge. Given a vector representation of two nodes u and v connected by an edge at time t. Given that the vector representations of two nodes u and v connected by an edge at time t are \(X_t^u\) and \(X_t^v\) respectively, the parameter matrix P is used to predict the label of the edge (u, v):

$$ \begin{array}{*{20}c} {y_t^{uv} = softmax\left( {P\left[ {X_t^u ,X_t^v } \right]} \right),} \\ \end{array} $$
(11)

The cross-entropy loss function of the model is:

$$ \begin{array}{*{20}c} {L = - \mathop \sum \limits_{t = 1}^T \sum \limits_{\left( {u,v} \right)} \alpha_{uv} \mathop \sum \limits_{i = 1}^N \left( {Z_t^{uv} } \right){\text{log}}(y_t^{uv} )_i ,} \\ \end{array} $$
(12)

Among them, \(Z_t^{uv}\) represents the true label category of the edge; the weight parameter \(\alpha_{uv}\) is a hyperparameter for balancing the weight of the category distribution. The experimental datasets all have serious category imbalance problems, and the proportion of classification categories is balanced by adjusting \(\alpha_{uv}\).

4 Simulation Analysis

4.1 Datasets

Model validation is performed on two blockchain finance domain datasets from a trust scoring network between users of two different bitcoin trading websites. The Bitcoin OTC dataset is a network of trust scores among users extracted from Bitcoin trading websites. Users rate other users from –10 (completely distrusted) to +10 (completely trusted), and each rating has a corresponding timestamp representing the scoring time. The time span of the dataset is about 5 years, we set a time interval of 13.8 days, and the dataset produces a total of 138 time steps. The 138 time steps are split into training, validation and test sets. The category distribution of the Bitcoin OTC dataset is extremely uneven, 89% of the data are positive examples, and negative examples only account for a very small part.

The Bitcoin Alpha dataset is also a network of trust among Bitcoin users, but the user and rating data are drawn from another Bitcoin platform, BTC-Alpha. The scoring data is from November 8, 2010 to January 22, 2016, with a time interval of 13.6 days, and the dataset is divided into 140 time steps. Scores still range from –10 (complete distrust) to +10 (complete trust), with Bitcoin Alpha having a higher positive ratio (93%) than Bitcoin OTC.

4.2 Contrast Models

We compare the DynAEGCN model with some existing static and dynamic methods.

GCN is a static graph convolution model and a classical method for graph representation learning. The model uses spectral convolution to aggregate the neighbor information of nodes to learn the embedding vector of nodes. Because each time step in the dynamic graph will produce a snapshot of the graph, we use the same GCN model for each time step, that is, regardless of the dynamics of the graph, the GCN model trains the graph on each time step.

GCN-GRU combines GCN with sequence modeling. First, the representation vectors of nodes under each time snapshot are learned through GCN architecture, and then these vectors are input into GRU unit to learn the dynamics of node representation. The dynamic representation of this method is based on the node representation vector, which belongs to the node oriented method.

EvolveGCN is a model oriented approach. This method also combines GCN with RNN, but different from GCN-GRU, RNN is used in evolving GCN to model the update of GCN parameters. The whole model training is carried out from bottom to top along the convolution layer and from front to back in time dimension. The dynamics is modeled into the implicit vector of RNN. For each time step, the updated GCN weight parameters are learned, and the graph convolution operation is carried out to obtain the updated node representation.

4.3 Experimental Results

The table shows the comparison results of the classification performance of the DynAEGCN model and the comparison model. The unbalanced categories of the two datasets make the model classification ability face great challenges, but the DynAEGCN model in this paper achieves the best classification ability and has obvious advantages over other models. For the overall classification ability, we compare the accuracy rate and the weighted F1 value; since the subclass has stronger practical significance for anti-financial fraud, we also compare the F1 value of the subclass and the corresponding precision and recall rate (Table 1).

Table 1. Experimental results for edge classification tasks on the Bitcoin OTC dataset

It can be found from Table 2 that DynAEGCN has the highest classification accuracy and weighted F1 value, indicating that DynAEGCN has good overall classification performance. For the classification results of small classes, DynAEGCN also achieves the best F1 value and accuracy. Although the recall rate of DynAEGCN is slightly lower than that of GCN and GCN-GRU, DynAEGCN is still better than other methods because other methods have higher recall rate but lower accuracy rate. As the harmonic average of accuracy and recall, F1 is a more effective evaluation standard in classification performance (Figs.  2 and 3).

Table 2. Experimental results for edge classification tasks on the Bitcoin Alpha dataset
Fig. 2.
figure 2

Performance of edge classification

Fig. 3.
figure 3

F1 score over time

Fig. 4.
figure 4

Accuracy score over time

Furthermore, we plot the F1 value and classification accuracy over time on the test set. It can be seen that the static GCN method is obviously different from other dynamic methods, and the advantages of DynAEGCN are more obvious in each time step. In addition, the accuracy of GCN method is also lower. Because GCN is designed for static graphs and does not consider the dynamics of graphs, the performance disadvantages of GCN on dynamic graphs reflect the necessity and advantages of dynamic modeling. As can be seen from Fig. 4, the advantages of the DynAEGCN model can be maintained throughout the time axis. In particular, for time step 15, the classification ability of other methods is very poor, while the DynAEGCN model still retains the absolute advantage of F1 value. This is due to the dual modeling of spatiotemporal information by DynAEGCN model, which can have a relatively stable performance for abrupt changes in time series. In addition, compared with the two types of EvolveGCN, DynAEGCN has better classification performance, because EvolveGCN only focuses on the dynamics of model weight parameters and ignores the changes of graph structure. GCN-GRU has a relatively lower classification ability because although the historical information of each node is considered, DynAEGCN is still more advantageous for higher-level representation learning due to DynAEGCN’s unique spatio-temporal convolution operation and model update mechanism.

5 Conclusion

In this paper, a dynamic graph representation learning model DynAEGCN is proposed to mine the implicit relationship between blockchain transaction features. The advantage of the recurrent unit in learning time series, the information of the time dimension and the space dimension can be aggregated, and more effective node representation can be learned. Further, the edge classification task is performed in a dynamic financial network with extremely imbalanced classification of two categories, and the results show that the DynAEGCN model outperforms all the contrasting models. The research on dynamic graph representation learning in this paper has strong practical significance, and can also provide a variety of possibilities for future research directions. In the follow-up work, the scalability of the model can be further improved, and the graph representation learning task of the model can be extended to a wider range of fields, such as node classification, link prediction and clustering, etc., while increasing the learning and analysis of datasets in other fields.