1 Introduction

In the era of big data, how to analyze contemporary networks is an urgent problem to be solved. The research on complex networks can help us deal with applications such as node classification, link prediction, community discovery and so on. With the development of machine learning, network embedding, also named network representation learning, serves as a bridge connecting networked data analysis and traditional machine learning. It maps nodes in a network to low-dimensional spaces in order to form low-dimensional dense vectors that can be used as the input of traditional machine learning models to conduct the downstream tasks.

The existing network embedding methods mostly model static networks without dynamic attributes, that is, they assume that nodes and their edges in the networks don’t change with time. The early static network embedding methods were mainly based on the matrix decomposition [2, 18], which have high computational complexity and cannot adapt to the growing large-scale networks. With the development of artificial intelligence, Perozzi et al. proposed the classic Deepwalk algorithm [16] to apply neural networks to network embedding which is based on the Word2vec [12] model in natural language processing. Grover et al. proposed the Node2vec algorithm [8] to modify the random walk process of Deepwalk, which preserves both of the homogeneity and isomorphism of the networks by combining breadth-first search and depth-first search. Deepwalk and Node2vec algorithm are based on the Word2vec framework, which has a three-layer shallow neural network at its core. With the development of deep learning, Wang et al. proposed the SDNE algorithm [21] to apply the deep neural network to network embedding. The semi-supervised deep learning model preserves the local information as well as the global information in the networks through the first-order similarity module and the second-order similarity module. The GraphGAN model [22] proposed by Wang et al. and the ANE model [4] proposed by Dai et al. adopt the generative adversarial nets [6] to network embedding, which greatly improved the robustness of static network embeddings.

In recent years, the representation learning for static networks has gradually matured. However, as the network evolves over time, new edges may appear and expired edges may disappear. The evolving interactions among the nodes in networks make networks exhibit dynamic properties. The addition of time information makes the networks more complex, and dynamic networks often have large scales. The research on dynamic network is of great significance to solve practical application problems, such as community detection [1, 17] and link prediction [11]. Traditional dynamic network representation methods take snapshots of dynamic networks at specific times to process which is equivalent to split the dynamic network into multiple static network sequences. Thus, the static network embedding models can be extended to handle dynamic networks. Most of the existing dynamic network embedding methods are derived from the static network embedding models: Inspired by the static network embedding method based on matrix eigenvalue decomposition, Li et al. proposed DANE algorithm [10] to capture the evolving patterns of network structures and attribute information. DANE updates the current node embeddings based on the node embeddings obtained from the previous time. DHPE algorithm [24] preserved the high-order proximity based on generalized singular value decomposition and updated the node embeddings dynamically based on matrix perturbation theory. In addition to dynamic network embedding methods based on matrix decomposition, there are some dynamic network embedding methods extended from classic static network embedding methods, for instance, DNE algorithm [5] which extends the LINE model [19] to dynamic network embedding and DynGEM algorithm [7] which is based on the SDNE model. These snapshot-based embedding methods often ignore the network evolution patterns. Besides, the embedding methods which only process dynamic information on the snapshots are relatively coarse-grained. The HTNE algorithm [25] proposed by Zuo et al. leverages the Hawkes process to model the formation process of neighbor nodes, which provides a novel idea of dynamic network embedding. However, HTNE only takes the influence of historical neighbor nodes on current node embeddings into account, while ignoring the impact of network evolution properties.

Therefore, to overcome the drawbacks of the existing methods, this paper proposes MHDNE (Multivariate hawkes process dynamic network embedding) model to learn the dynamic network embedding. MHDNE integrates the historical edge information as well as network evolution properties to model the formation process of edges based on Hawkes process. The main contributions of this paper are summarized as follows:

  1. (1)

    We propose a novel approach for dynamic network embedding which preserves the impacts of historical information on the current network based on the Hawkes process.

  2. (2)

    The MHDNE model proposed in this paper considers the historical edge information as well as the network evolution properties, which captures the influence of historical information on the formation of current edges comprehensively.

  3. (3)

    Experimental results on real-world networks demonstrate that the embedding vectors learned from the proposed MHDNE model can achieve better performances than the state-of-the-art methods in node classification and network visualization.

2 Problem Definition

In this section, we formulate the problem of dynamic network embedding and give necessary definitions used throughout this paper as follows:

Definition 1

(Dynamic network). A dynamic network within time T can be defined as a collection \(G=\{G^1,G^2,...,G^T\}\) containing a series of network snapshots. The snapshot at time \(t(0<t<T)\) can be denoted as \(G^t=(V_t,E_t)\), where \(V_t\) and \(E_t\) denote the set of nodes and edges at time t respectively.

Definition 2

(Dynamic network embedding). Given a dynamic network \(G=\{G^1,G^2,...,G^T\}\), we map nodes in the snapshots to the low-dimensional space so that nodes can be represented as vectors. And the temporal and structural information can be preserved in the low-dimensional vector space.

Definition 3

(Ternary closure). The ternary closure generally refers to: two people who have common friends in a social circle are more likely to become good friends in the future. That is, if nodes a and b connect to the same node c in a network, edge between a and b is likely to form. The ternary closure theory affects the formation of networks which is an important characteristic reflecting the network evolution mechanism.

Definition 4

(Hawkes process). The Hawkes process, as a special linear self-excited point process, is widely used in economic analysis, social analysis, and geographic prediction. In the Hawkes process, the occurrence of new events is not only affected by the internal properties of the events, but also the historical events occurring at the previous moments. The generation intensity function of a new event can be defined as follows:

$$\begin{aligned} \lambda (t)=\gamma _t+\sum _{t_s<t}\varphi (t-t_s) \end{aligned}$$
(1)

Where \(\gamma _t\) indicates the base intensity of a new event, showing the spontaneous event occurring intensity at time t. \(\varphi (t-t_s)\) indicates the influence of historical events on the occurrence of new event which continuously decays with time. \(\varphi (t-t_s)\) can be expressed as \(\varphi (t-t_s)=\alpha \delta (t,t_s)\), where \(\alpha \) denotes the excitation intensity of the historical events to the current event, \(\delta (t,t_s)\) denotes the time decay coefficient of the historical event.

3 The Proposed Framework: MHDNE

In this section, first we generalize the MHDNE framework, and then we describe the core components of MHDNE in details. Finally, we introduce the model optimization.

3.1 MHDNE Framework

The framework of the MHDNE model proposed in this paper is shown in Fig. 1. First, we model the edge formation process in the dynamic network as two temporal sequences \(L_1\) and \(L_2\) which contain historical edge information and network evolution information, respectively. Then, based on the temporal sequences, we apply the Hawkes process to model the new edge formation process to integrate the historical information of the dynamic network as well as the evolution properties into the node embeddings.

Fig. 1.
figure 1

The framework of MHDNE.

3.2 Component Description

The formation of a dynamic network is a process with continuous emergence and disappearance of edges, so the formation of edges can be regarded as temporal point process. We can define the generation of edges as events in this process. And they form in two ways: one is that the edge exists in the historical moment and is preserved at the current moment. The other is that the edge never appears in the historical moment but forms at the current moment in the evolution process. These two kinds of edge generating ways are respectively related to the following two edge formation sequences.

  • Historical edge sequence: in a dynamic network, if there is an edge between nodes m and n at time \(t_i\). the edge can be denoted as a time-stamped tuple \((e_m^n,t_i)\). Then, the edges between nodes m and n within time T can be modeled as a temporal sequence \(L_1={(e_m^n,t_1){\longrightarrow }(e_m^n,t_2){\longrightarrow }...{\longrightarrow }(e_m^n,t_T)}\). Intuitively, nodes that have more interactions in the history tend to form edge at the current moment. Therefore, the generation intensity of the edge containing nodes m and n at the current time is affected by the historical edge \((e_m^n,t_i)\).

  • Open triangle sequence: if there is a common neighbor k between nodes m and n at time t, the formed open triangle can be denoted as a triple \((e_m^k,e_k^n,t)\). All of the open triangles composed of nodes m, n, and all of their common neighbors S at time t can be represented as a set \((T_{m,n}^{S},t)\). Then, an open triangle sequence within time T can be modeled as a temporal sequence \(L_2={(T_{m,n}^{S_1},t_1)}{\longrightarrow }{(T_{m,n}^{S_2},t_2)}{\longrightarrow }...{\longrightarrow }{(T_{m,n}^{S_T},t_T)}\). It can be known from the ternary closure theory that even if there is no edge between nodes m and n in history, if the two nodes have common neighbors, they tend to connect in the process of network evolution.

Since Hawkes process [9] well captures the exciting effects of historical information on the current events, we adapt it to model the edge formation process of nodes m and n based on the historical edge sequence \(L_1\). The generation intensity function for the arrival event \((e_m^n,t)\) can be formulated as:

$$\begin{aligned} \lambda _{e_m^n}^1(t)=\gamma _{m,n}+\sum _{t_s<t}\alpha _{e_x^y}\delta (t,t_s) \end{aligned}$$
(2)

where \(\lambda _{e_m^n}^1(t)\) indicates the probability of forming edge between node m and n at time t, \(\gamma _{m,n}\) indicates the base intensity of forming edge between nodes m and n at time t.The base intensity \(\gamma _{m,n}\) reflects the essential relationship between node m and n, Which can be denoted as the negative Euclidean distance between the embeddings of node m and n. i.e., \(\gamma _{m,n}=-\left\| v_m-v_n \right\| \), \(v_m\) and \(v_n\) are the embeddings of node m and n respectively. x and y indicates the corresponding historical nodes of node m and n respectively. \(\alpha _{e_x^y}\) denotes the influence of the historical edge \((e_x^y,t_s)\) on the new edge \((e_m^n,t)\). \(\delta (t,t_s)\) is a time decay function, usually expressed as an exponential form \(\delta (t,t_s)=exp(-\theta (t-t_s))\).

The more stable the local structure containing historical nodes x and y, the more likely the current nodes m and n are to be connected at current time. We can leverage the clustering coefficient of x and y to characterize the stability of the local structure of the network in the time-varying process. The local clustering coefficient refers to the ratio between the number of closed triangles containing node r and the number of triples containing node r in the network, which can be formulated as:

$$\begin{aligned} C_r=\frac{2E_r}{m_r(m_r-1)} \end{aligned}$$
(3)

where \(m_r\) denotes the number of edges associated with node r, \(E_r\) denotes the number of edges between nodes connecting to r. The larger the clustering coefficient of historical nodes x and y, the higher the probability of connecting edges between current nodes m and n. Thus, we can denote \(\alpha _{e_y^x}\) as \(\alpha _{e_y^x}={C_x}{C_y}g(v_x,v_y)\), the Eq. 2 can be updated as follows.

$$\begin{aligned} \lambda _{e_m^n}^1(t)=g(v_m,v_n)+\sum _{t_s<t}{C_x}{C_y}g(v_x,v_y)\delta (t,t_s) \end{aligned}$$
(4)

In addition to the fact that the connected edges appeared in history will appear at the current time with a certain probability, in the process of network evolution, it can be known from the ternary closure theory [23] that nodes with common neighbors in history tend to be connected at the current time. That is, when there is historical event \((e_m^k,e_k^n,t_s)\), the probability of occurring new event \((e_m^n,t)\) will increase. Similarly, we can use the local clustering coefficients of nodes to measure the intensity of this influence: the greater the clustering coefficients of nodes, the stronger the ternary closure process nearby, and the greater the probability of generating new event \((e_m^n,t)\). Meanwhile, the more common neighbors two nodes have, the more likely they are to connect. And the closer the historical event \((e_m^k,e_k^n,t_s)\) is to the current time, the greater the probability of generating edge \((e_m^n,t)\). Therefore, Eq. 4 can be updated to:

$$\begin{aligned} \lambda _{e_m^n}(t)=g(v_m,v_n)+\sum _{t_s<t}({C_x}{C_y}g(v_x,v_y)\delta (t,t_s)+\sum _{(e_m^k,e_k^n,t_s)\in L_2}{C_k}g(v_{k}',v_k)\delta (t,t_s)) \end{aligned}$$
(5)

where \(g(v_{k}',v_k)\) is the negative Euclidean distance between the current common neighbor \({k}'\) and the historical common neighbor k in the mapping space. It can be seen from Eq. 5, \(\lambda _{e_m^n}(t)\) may be a negative value, and the probability of generating new edges should be a positive value. Therefore, we take the index value of \(\lambda _{e_m^n}(t)\) as the final probability, namely \(\hat{\lambda _{e_m^n}(t)}=exp(\lambda _{e_m^n}(t))\).

Based on the Hawkes process, given the relevant historical edge sequence \(L_1\) and the open triangle sequence \(L_2\), we can obtain the probability of generating edge \((e_m^n,t)\) between node m and n at time t as follows.

$$\begin{aligned} P(n|m,H(L_1,L_2))=\frac{\hat{\lambda _{e_m^n}(t)}}{\sum _{n^* \in V,e_m^{n^*} \in E}{\hat{\lambda _{e_m^n}(t)}}} \end{aligned}$$
(6)

3.3 Model Optimization

For all nodes in the network, the likelihood function of the model can be denoted as follows.

$$\begin{aligned} log l =\sum _{m \in V}\sum _{n \in V,e_m^n \in E}log{P(n|m,H(L_1,L_2))} \end{aligned}$$
(7)

In order to reduce the computational complexity of the algorithm, we use the negative sampling method [13] to optimize the algorithm. The probability that a node is selected as a negative sample is related to the frequency at which it appears in the sequence, so we get samples according to the degree distribution of nodes. According to [13], the sampling probability of node \(v_i\) can be denoted as follows.

$$\begin{aligned} P(v_i)=\frac{{f(v_i)}^{\frac{3}{4}}}{\sum _{j=1}^{K}{f(v_i)}^{\frac{3}{4}}} \end{aligned}$$
(8)
figure a

Based on the historical edge information and the ternary closure property, the objective function of generating new edge \((e_m^n,t)\) can be denoted as follows.

$$\begin{aligned} O(X)=log\sigma (\hat{\lambda _{e_m^n}(t)})+{\sum _{i=1}^{K}{E_{{v_i}\sim P(v)}}[-log\sigma (\hat{\lambda _{e_m^{v_i}}(t)})]} \end{aligned}$$
(9)

where K is the number of negative samples. \(\sigma (x)\) is the sigmoid function.

Commonly, we adopt Stochastic Gradient Descent method [3] to optimize the objective function in Eq. 9. Algorithm 1 shows the core of our method.

4 Experiments

In this section, we validate the effectiveness of our model on two real-world datasets as shown in Table 1. First, we introduce the datasets and baseline methods we used in our experiments in details, and then we conduct downstream tasks: node classification and visualization. Finally, we analyze the parameter sensitivity of \(\theta \).

4.1 Datasets

  • DBLP [14]: the DBLP dataset contains a large amount of information about computer science publications. We built the dynamic network in our experiment through the co-author relationships of 28,085 authors in ten research fields over ten years. The categories of authors are the research field in which they published the most papers.

  • Epinions [20]: the Epinions dataset consists of comment information, user ID, product ID, and timestamp information. In our experiment, 21575 users belonging to five categories in ten years were extracted from the subset of Epinions dataset, and we build edges between users who comment on the same product. The categories of users are determined by the categories of the products they comment the most.

The detailed information of the two datasets is shown in Table 1.

Table 1. Datasets in experiments.

4.2 Baseline Methods

In this paper, MHDNE algorithm is compared with the following three baseline algorithms:

  • Avg Deepwalk algorithm: we conduct Deepwalk algorithm on different snapshots to get node embeddings in different time.

  • STWalk algorithm [15]: STWalk performs space-walk and time-walk on the constructed graphs which can capture the spatio-temporal behavior of nodes.

  • HTNE algorithm [25]: HTNE performs the dynamic network embedding based on the Hawkes process which can capture both the historical and current information from the perspective of neighbor nodes sequences.

4.3 Downstream Tasks

In this section, we carry out downstream tasks such as node classification and visualization to verify the feasibility and effectiveness of the proposed dynamic network representation method MHDNE. The experimental default parameters are set as follows: the vector dimension is 128, the negative sample number is 5, and the gradient drop learning rate is 0.01. In the algorithm using random walk and skip-gram model, the random walk length is 50, the number of walks is 100, and the window size is 10.

A. Classification

We conduct node classification task on the DBLP and Epinions datasets, the embeddings learned from different methods were classified by a linear SVM classifier. We repeat classification experiment ten times and take the average of Micro-F1 and Macro-F1 scores as the final classification results. The experimental results are shown in Tables 2 and 3.

Table 2. Multi-class node classification results in DBLP dataset.
Table 3. Multi-class node classification results in Epinions dataset.

We set the training set size varying from 15% to 90%. We can see from the Tables 2 and 3, the MHDNE algorithm proposed in this paper performs better than the baseline methods in node classification on the DBLP and Epinions datasets. On the DBLP dataset, when the training set is 75%, the MHDNE algorithm has the highest scores of Macro-F1 and Micro-F1, which are 3.72%–6.41%, 3.70%–5.94% higher than the comparison algorithms respectively. On the Epinions dataset, when the training set is 60%, the MHDNE algorithm has the highest scores of Macro-F1 and Micro-F1, which are 3.24%–8.06% and 2.90%–7.13% higher than the comparison algorithms. It can be seen from the experimental results that the integration of network historical information into current node embeddings is beneficial to improve the quality of embeddings. Especially when we integrate the historical edge information and the network evolution properties into network embedding, the obtained node embeddings perform better in classification.

B. Network visualization

We leverage the t-SNE algorithm to visualize the representation vectors of 2500 authors from four fields (Data Mining, Artificial Intelligence, Information Retrieval and Computer Vision.) in the DBLP dataset into the 2-dimensional space. We use different colors to indicate different research area. Specifically, we use purple dot to represent “Data Mining”, blue dot to represent “Artificial Intelligence”, orange dot to represent “Information Retrieval”, green dot to represent “Computer Vision”. Figure 2 demonstrates the visualization results of node embeddings obtained by different algorithms.

Fig. 2.
figure 2

Visualization of authors from four research areas.

As can be seen from the Fig. 2: the Avg Deepwalk algorithm can only map the authors in the “Artificial Intelligenc” field to an independent community, and the authors in the other three domains are confused; the STWalk algorithm maps authors in the “Data Mining” and “Computer Vision” domain to relatively scattered locations, failing to preserve the properties of such kind of nodes; the HTNE algorithm can Map part of authors in the fields of “Artificial Intelligence”, “Information retrieval” and “Computer Vision” to different communities, but map some authors in these three fields to the “Data Mining” field; compared with other algorithms, the proposed MHDNE algorithm can map authors into different communities and there are clear margins among different areas.

The visualization results indicate that the historical information combined with network evolution information in our method can help us do community detection. This is because the formation of a community is often related to historical information, which can assist us in discovering communities. The embeddings generated by our method MHDNE integrate historical information and network evolution information, which can preserve the community information better. Therefore, the embeddings learned by our method MHDNE perform better than other baseline methods in visualization.

Fig. 3.
figure 3

Sensitivity of \(\theta \) on DBLP and Epinions dataset.

5 Parameter Sensitivity

The time decay function can be expressed in exponential form \(\delta (t,t_s)=exp(-\theta (t-t_s))\), where \(\theta \) is the time decay coefficient excited by historical events. We observe the changes of classification accuracy with \(\theta \) varying from 0.01 to 1 to analyze the parameter sensitivity. We conduct the node classification on DBLP and Epinions, and set the training set to 75%. From Eq. 2, it can be known that: the larger the value of \(\theta \) is, the smaller the influence of historical events on current events. The experimental results are shown in Fig. 3, from which it can be seen that the optimal value of parameter \(\theta \) is different from each other in two datasets. On the DBLP dataset, when setting \(\theta \) to 0.2, the classification accuracy obtained is the highest; when \(0.2<\theta <0.4\), the classification accuracy remains basically unchanged; when \(\theta >0.4\), the classification accuracy decreases slightly with the increase of \(\theta \); On the Epinions dataset, when setting \(\theta \) to 0.3, the node classification accuracy is the highest; when \(\theta >0.4\), the classification accuracy greatly decreases with the increase of \(\theta \).

The main reason for the different experimental results on the two datasets is that the DBLP dataset is composed of authors and their co-author relationship, and the co-author relationships between authors will not change much in a short time. Therefore, the historical information of DBLP dataset has a small degree of time decay, and the value of \(\theta \) should be small. However, Epinions dataset constitutes social network through users and their comment behaviors, and its historical information has a large time decay degree, so \(\theta \) should be large.

6 Conclusion

To combine the dynamic properties of contemporary networks into network embedding, we proposed a dynamic network embedding method based on Hawkes process (MHDNE). Since the node embedding vectors learned by our model capture both the historical structure information and evolution mechanism, they perform well in the downstream tasks such as node classification and network visualization. At present, the research on network dynamic properties is still in its infancy, our method only takes the dynamic properties of homogeneous networks into account. However, networks in our real life may have both dynamic and heterogeneous features. How to take the rich heterogeneous information into the dynamic network embedding is the next research focus.