Keywords

1 Introduction

The complex network is a widely adopted scheme to represent many real-world complex systems, including social networks, information networks, service networks [8, 20], and so on. Identifying influential nodes in these complex networks is of great significance to many practical applications, such as friend recommendation, viral marketing, and software system analysis [3, 19]. Besides, identifying influential nodes plays an important role in understanding the structures and functions of complex networks. As a consequence, identifying influential nodes has attracted much attention in recent years [4, 18].

The problem of identifying influential nodes is typically formulated as node ranking. Due to the great theoretical and practical significance of identifying influential nodes, various methods have been proposed to rank nodes in complex networks. These node ranking methods can be roughly categorized into the local methods and the global methods. Typical existing node ranking methods include but not limited to degree centrality [7], local centrality [2], coreness centrality [10], cluster centrality [1], closeness centrality [16], betweenness centrality [6], and Katz centrality [9]. Among them, degree centrality is the simplest yet widely used metric, but it fails to identify influential nodes in some cases, because it is based on a node’s nearest neighbors that contain very limited information. Local centrality is an improved version of degree centrality by taking the fourth-order neighbors of each node into consideration. It has been pointed out that the location of a node is more significant than the number of its immediate neighbors in evaluating its spreading influence [10]. Therefore, coreness centrality has been proposed to determine the importance of nodes in a network more accurately [10]. Cluster centrality is developed based on the observation that the local clustering usually plays a negative role in the spreading process [14]. Unlike degree centrality, cluster centrality considers not only the number of the nearest neighbors but also the interactions among them. All these methods are neighborhood-based methods (i.e., local methods). That is, they are very time-efficient but lack performance guarantee. On the contrary, the closeness centrality, betweenness centrality, Katz centrality, etc. are path-based metrics (i.e., global methods). They take the global information of a network into consideration, thus can give better ranking results. However, these global methods are very time-consuming and consequently cannot be applied to large complex networks. Detailed descriptions of these methods can be found in [12].

In order to overcome the limitations of the existing node ranking methods, we propose a novel adaptive semi-local algorithm ASLA, which aims to seek a tradeoff between the time efficiency and the ranking accuracy. Specifically, to quantify the importance of nodes, ASLA first adaptively determines the potential influence scope for each node, and then calculates the influence value of each node based on such a personalized influence scope. Finally, the ranking list of all nodes is given by their influence values in a descending order. Extensive experimental studies on both synthetic and real-world networks have shown that ASLA can achieve excellent performance in terms of both effectiveness and efficiency.

2 Basic Concepts

Let \(G = (V, E)\) be an undirected and unweighted simple network with \(N = |V|\) nodes and \(M = |E|\) edges, where V and E denote the node set and the edge set respectively. In the following, we provide some basic concepts that form the foundation of the proposed ASLA algorithm.

Definition 1

( D-Shell). The d-shell of node u, denoted by \(S^d_u\), consists of all the nodes v at depth d of node u, that is, the length of the shortest path between nodes u and v is d. The 0-shell of node u is defined as the node itself. The 1-shell of node u is exactly its immediate neighbors.

Consider the toy network shown in Fig. 1. The 1-shell of node \(v_2\) consists of all its immediate neighbors, i.e., nodes \(v_1\), \(v_3\), \(v_4\), \(v_5\), \(v_6\), \(v_7\), \(v_8\), \(v_9\). While the 2-shell of node \(v_2\) only contains node \(v_{10}\).

Fig. 1.
figure 1

A toy network as examples. Node \(v_1\) plays the role of a bridge, thus, its influence value should be high, although its degree is low.

Definition 2

(Outer Degree). The outer degree of node v in the d-shell of node u, denoted by \(k^d_u(v)\), is defined as the number of edges that connect node v to nodes in the \((d + 1)\)-shell of node u. The total outer degree of the d-shell of node u, denoted by \(K^d_u\), is defined as the sum of the outer degrees of all the nodes v in the d-shell of node u. Thus, \( K^0_u = k_u, K^d_u = \sum _{v \in S^d_u} k^d_u(v), \) where \(k_u\) denotes the degree of node u, i.e., the number of edges incident to node u.

It is noted that the total outer degree of the d-shell is not necessarily the number of nodes in the \((d + 1)\)-shell. In fact, the total outer degree of the d-shell is the total number of edges connecting nodes in the d-shell to nodes in the \((d + 1)\)-shell. We define the total outer degree of the d-shell as the number of edges outward from this shell to the next shell instead of just merely the number of nodes in the next shell, the advantage of which is that this definition takes more structural information into account. Consider the network shown in Fig. 1 again. Assume that node \(v_{10}\) is the starting node, the total outer degree of the 1-shell is 8, obviously it is larger than the number of nodes (i.e., 5) in the 2-shell.

Definition 3

(Outer Density). The outer density of the d-shell starting from node u, denoted by \(D^d_u\), is defined as the total outer degree of this shell divided by the total number of nodes in this shell. That is, \(D^d_u = K^d_u / |S^d_u|\), where \(|S^d_u|\) denotes the total number of nodes in the d-shell of node u.

Definition 4

(Influence Scope). The influence scope of node u, denoted by C(u), is composed of its nearest neighbors, the next nearest neighbors, and so on. C(u) is formulated as \( C(u) = \bigcup _{0 \le d \le \bar{d}} S^d_u, \) where \(\bar{d}\) satisfies the condition that the outer density of the d-shell is no larger than some threshold \(\alpha \), i.e., \(D^{\bar{d}}_u \le \alpha \). When more than one d-shell satisfies the condition, \(\bar{d}\) is taken as the smallest d.

Definition 5

(Influence Value). The influence value of node u is defined as

$$\begin{aligned} F(u, \alpha ) = \sum _{0 \le d \le \bar{d}} f(d) (K^d_u + |S^d_u|), \end{aligned}$$
(1)

where f(d) is a nonincreasing function and the nonnegative parameter \(\alpha \) is used to control the scale of node u’s potential influence scope C(u).

Consider two extreme cases: \(\alpha = 0\) and \(\alpha \ge k_{max}\), \(k_{max}\) is the largest degree of nodes in network G. When \(\alpha = 0\), the d-shell will spread to much of the network, thus the influence scope tends to cover all the nodes in the entire network. When \(\alpha \ge k_{max}\), the d-shell cannot spread outward at all, thus the influence scope just contains the starting node itself. \(F(u, \alpha )\) takes node u’s neighbors from 1-hop up to \(\bar{d}\)-hop into consideration, which demonstrates that \(F(u, \alpha )\) is a semi-local measure for the importance of nodes in network G.

Now let us shed more light on \(F(u, \alpha )\). Suppose that \(f(d) = 1\) for all d values. According to Eq. (1), \(F(u, \alpha )\) is composed of two parts: \(\sum _{0 \le d \le \bar{d}} K^d_u\) and \(\sum _{0 \le d \le \bar{d}} |S^d_u|\). The first part is the sum of the total outer degrees of all the d-shells contained in the influence scope C(u), which represents the propagation ability of node u. The second part is the total number of nodes covered by C(u), which characterizes the aggregation ability of node u. Therefore, \(F(u, \alpha )\) captures the propagation ability and aggregation ability simultaneously. As a result, even one node has high aggregation ability, its influence value may be very small for its poor propagation ability. For example, as shown in Fig. 1, when \(\alpha \) = 1.0, the influence value of node \(v_{12}\) is 16, while the influence value of node \(v_1\) is 32. As we can see, node \(v_1\) plays the role of a bridge in the exemplary network. Therefore, even though node \(v_{12}\) has a larger degree, it is less influential than node \(v_1\).

Problem Statement. Given an undirected and unweighted network \(G = (V, E)\) and a parameter \(\alpha \ge 0\), the problem is to compute the influence value \(F(u, \alpha )\) for each node \(u \in V\) (Note that the final definition of \(F(u, \alpha )\) is given in Sect. 3).

3 The Adaptive Semi-local Algorithm

In this part, we propose a novel adaptive semi-local algorithm ASLA to combat the problem of node ranking in complex networks. ASLA aims to seek a tradeoff between the poor-quality local methods and the time-consuming global methods. Due to the introduction of the additional parameter \(\alpha \) to adaptively adjust the influence scope of each node flexibly, ASLA is expected to be not only more efficient than the global methods but also more accurate than the local methods.

It is clear that the function f(d) should be properly set to calculate the exact influence value \(F(u, \alpha )\). To be better in line with actual occasions that the closer d-shells tend to be more significant than the farther ones, we take the form of f(d) as \(e^{-\frac{d}{k_u}}\), i.e., \(f(d) = e^{-\frac{d}{k_u}}\). In general, nodes with larger degrees tend to have stronger ability to spread outward. Thus, we introduce the factor \(\frac{1}{k_u}\) into the exponent. Besides, to better capture the influence scope of each node, the parameter \(\alpha \) should be personalized. In this work, we define the parameter \(\alpha _u\) of node u as \(\frac{k_u}{k_{max}}\alpha \), i.e., \(\alpha _u = \frac{k_u}{k_{max}}\alpha \). One may note that the propagation ability and the aggregation ability defined in Eq. (1) are highly possible to fall in distinct ranges, which may be caused by the sparsity of networks. To reduce the skew of this two abilities, we normalize both abilities into the range of [0, 1]. Based on these discussions, the influence value (i.e., the semi-local centrality) of node u in network G is finally defined as

$$\begin{aligned} F(u, \alpha _u) = \sum _{0 \le d \le \bar{d}} e^{-\frac{d}{k_u}} (\frac{K^d_u}{M} + \frac{\vert S^d_u \vert }{N}). \end{aligned}$$
(2)
figure a

Our new method ASLA will calculate the influence value of each node according to Eq. (2), and the corresponding pseudocode is outlined in Algorithm 1, where TotalOuterDegree(ud) is a procedure to calculate the total outer degree of the d-shell of node u, and \(J[w] = u\) is used to indicate that node w has been contained in the influence scope of node u, i.e., \(w \in C(u)\). The inputs of Algorithm 1 include network G and parameter \(\alpha \). Then, it calculates the personalized parameter \(\alpha _u\) for each node \(u \in V\) (line 2). To determine the importance of all the nodes, Algorithm 1 is quite straightforward. It calculates the influence value of each node separately (line 3). More specifically, to calculate the influence value of each node \(u \in V\), the key points are to determine all the d-shells contained in C(u) and the corresponding total outer degree in each d-shell. To this end, Algorithm 1 resorts to a queue Q to store the nodes in each d-shell (line 4). In the d-th iteration, the nodes contained in Q just form the d-shell of node u (line 6). In order to find nodes in the next shell (i.e., the \((d + 1)\)-shell), Algorithm 1 searches all the neighbors of each node in the d-shell, then the nodes contained in the \((d + 1)\)-shell are added to Q (lines 11–14). Note that in each iteration, the calculation of the total outer degree of the d-shell can be finished by simply summing over the outer degrees of all nodes in this shell (line 7), thus we omit the details of the TotalOuterDegree(ud) procedure. The processing of node u terminates when the outer density reaches its threshold \(\alpha _u\) (lines 9–10).

Let \(N_{\alpha _u}\) denote the number of nodes contained in u’s influence scope C(u), and \(k_{avg}\) denote the average degree of all nodes. Then the processing of node u will take \(\mathcal {O}(N_{\alpha _u} k_{avg})\) time, for the reason that Algorithm 1 will traverse all the immediate neighbors of each node in C(u). Therefore, Algorithm 1 will take \(\mathcal {O}(N N_{\alpha } k_{avg})\) time to calculate the influence values for all the nodes. Here, we use \(N_{\alpha }\) to denote the average size of the influence scopes of all nodes. It is obvious that \(N_{\alpha }\) is closely related to \(\alpha \). When \(\alpha = 0\), we have \(N_{\alpha } \approx N\). However, when \(\alpha \ge k_{max}\), we have \(N_{\alpha } \approx 1\). Thus, the time complexity of Algorithm 1 can be controlled flexibly with a proper \(\alpha \). Moreover, Algorithm 1 is naturally suitable for parallel processing. Algorithm 1 also shows great advantages when one only cares about the importance of partial nodes.

To intuitively demonstrate the effectiveness of ASLA, we list the top-10 ranked nodes of different methods on the toy network (see Fig. 1) in Table 1. As can be seen, for betweenness centrality and closeness centrality, the top-3 ranked nodes are \(v_1\), \(v_2\), \(v_{10}\). When \(\alpha = 2.0\), ASLA can provide the same result. However, local centrality fails to identify these nodes. For example, it gives a very low rank to the bridge node \(v_1\). Since degree centrality gives low ranks to nodes with small degrees, it also fails to identity the bridge node \(v_1\).

Table 1. Comparison of the top-10 ranked nodes on the toy network as shown in Fig. 1. DC: Degree Centrality; LC: Local Centrality; BC: Betweenness Centrality; CC: Closeness Centrality; ASLA-1: ASLA with \(\alpha = 1.0\); ASLA-2: ASLA with \(\alpha = 2.0\).

4 Experiments

4.1 Experimental Settings

We conduct experiments on both synthetic and real-world networks. The synthetic networks are generated by SNAP [11], a general graph mining library. The generated synthetic networks include the Erdos-Renyi network ER (1K nodes, 5K edges) and the Power-Law network PL (5K nodes, 7K edges). The real-world networks are downloaded from Network Repository [15], including Epinions (27K nodes, 100K edges) and Douban (155K nodes, 327K edges).

The proposed method ASLA is compared with five representative node ranking methods including two global methods: betweenness centrality (BC) [6] and closeness centrality (CC) [16]; and three local methods: local centrality (LC) [2], k-core decomposition based coreness centrality (KC) [10], and hybrid centrality (HC) [17]. Since KC and the influence scope of ASLA are both relevant to the community structure [5] of networks, we also choose KC as a benchmark. HC can be treated as a combination of LC and KC. It considers the coreness and the neighborhood of a node simultaneously. Different from LC, HC just takes into account the information contained in the third-order neighbors of each node.

Fig. 2.
figure 2

Performance of different methods on synthetic networks.

Fig. 3.
figure 3

Effects of parameter \(\alpha \) on our method ASLA.

The brand-new robustness value metric (denoted by R) [13] is adopted to assess the quality of ranking results. Smaller R indicates better results. R is defined as \(R = \frac{1}{N} \sum ^N_{I = 1} \delta (I)\), where \(\delta (I)\) denotes the fraction of nodes in the largest connected component after removing I nodes from the original networks.

4.2 Results on Synthetic Networks

We first evaluate the performance of different ranking methods on the two synthetic networks. In this experiment, the parameter \(\alpha \) is fixed at 1.0. Recall that Algorithm 1 only involves a single parameter \(\alpha \), as the personalized threshold \(\alpha _u\) of node u is calculated as \(\alpha _u = \frac{k_u}{k_{max}} \alpha \). The results are shown in Fig. 2. As can be seen, BC has the best performance. While the performance of CC, LC, KC and HC is very poor. Even though CC is a global method, its performance is not satisfactory. Our method ASLA obtains comparable performance with BC. The results verify our hypothesis that ASLA is capable of achieving better performance than the local methods.

We then evaluate the effects of parameter \(\alpha \) on our method ASLA. For the ER network, we vary \(\alpha \) from 1 to 10. For the PL network, we vary \(\alpha \) from 10 to 100. The results are reported in Fig. 3. From Fig. 3, we can see that the performance of ASLA decreases as \(\alpha \) grows larger. This is because larger \(\alpha \) will lead to a smaller influence scope for each node, thus less information is considered when calculating the influence value. Therefore, when high ranking accuracy is required, \(\alpha \) should be set to a small number.

4.3 Results on Real-World Networks

In this part, we first evaluate the performance of different ranking methods on the real-world networks. The results are illustrated in Fig. 4. As observed on the synthetic networks, ASLA and BC obtain much better performance than the other four ranking methods. Although ASLA is a semi-local method, it characterizes the propagation ability and the aggregation ability of each node simultaneously. Thus, ASLA can obtain comparable performance with the global method BC.

Fig. 4.
figure 4

Performance of different methods on real-world networks.

We further evaluate the efficiency of ASLA. We first report the time cost of different ranking methods in Fig. 5(a). Note that we have adopted logarithmic scale for the y-axis in this figure. As can be seen, the global methods BC and CC run very slow, the local methods LC, KC and HC run much faster. We can also see that ASLA is much faster than the global methods. For example, BC takes about two days on the Douban network, while ASLA takes less than one hour. In addition, with the aid of the parameter \(\alpha \), the time complexity of ASLA can be controlled flexibly. Next, we test the time overheads of ASLA via varying \(\alpha \) from 10 to 100. The results are shown in Fig. 5(b), which is a double y-axes figure. It is observed that on both Epinions and Douban, the time cost of ASLA decreases rapidly as \(\alpha \) grows larger. These results indicate that ASLA is capable of being applied to large-scale complex networks.

Fig. 5.
figure 5

Comparison of time cost on real-world networks.

5 Conclusion

In this paper, we propose a novel adaptive semi-local method ASLA to identify influential nodes in complex networks. ASLA seeks a tradeoff between the time efficiency and the ranking accuracy. Therefore, ASLA can be applied to large complex networks. The main advantage of ASLA is that it can determine the potential influence scope for each node adaptively. Then the influence value of each node is calculated to capture the propagation ability and the aggregation ability simultaneously. Extensive experiments have been conducted and the results demonstrate the effectiveness and efficiency of ASLA. For future work, it is valuable to develop more advanced algorithms to further speed up ASLA. It is also valuable to design other functions to calculate the influence values of nodes.