Abstract
This paper focuses on the efficiency issue of computing and maintaining the eccentricity distribution on a large and perhaps dynamic small-world network. Eccentricity distribution evaluates the importance of each node in a graph, providing a node ranking for graph analytics; moreover, it is the key to the computation of two fundamental graph measurements, diameter, and radius. Existing eccentricity computation algorithms are not scalable enough to handle real large networks unless approximation is introduced. Such an approximation, however, leads to a prominent relative error on small-world networks whose diameters are notably short. Our solution optimizes existing eccentricity computation algorithms on their bottlenecks—one-node eccentricity computation and the upper/lower bounds update—based on a line of original insights; it also provides the first algorithm on maintaining the eccentricities of a dynamic graph without recomputing the eccentricity distribution upon each edge update. On real large small-world networks, our approach outperforms the state-of-the-art eccentricity computation approach by up to three orders of magnitude and our maintenance algorithm outperforms the recomputation baseline (recompute using our superior eccentricity computation approach) by up to two orders of magnitude, as demonstrated by our extensive evaluation.


























Similar content being viewed by others
Notes
The label size may differ during every execution of the \(\mathsf {PLL}\) approach due to the randomness in determining the node order [3].
References
Aingworth, D., Chekuri, C., Indyk, P., Motwani, R.: Fast estimation of diameter and shortest paths (without matrix multiplication). SIAM J. Comput. 28(4), 1167–1181 (1999)
Akiba, T., Iwata, Y., Kawata, Y.: An exact algorithm for diameters of large real directed graphs. In: International Symposium on Experimental Algorithms, pp. 56–67. Springer, Berlin (2015)
Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 349–360. ACM, New York (2013)
Almeida, P., Baquero, C., Cunha, A.: Fast distributed computation of distances in networks. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 5215–5220. IEEE, New York (2012)
Bisenius, P., Bergamin, E., Angriman, E., Meyerhenke, H.: Computing top-k closeness centrality in fully-dynamic graphs. In: 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 21–35. SIAM (2018)
Borassi, M., Crescenzi, P., Habib, M., Kosters, W.A., Marino, A., Takes, F.W.: Fast diameter and radius bfs-based computation in (weakly connected) real-world graphs: With an application to the six degrees of separation games. Theoret. Comput. Sci. 586, 59–80 (2015)
Chan, T.M.: All-pairs shortest paths for unweighted undirected graphs in o (mn) time. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 514–523. Society for Industrial and Applied Mathematics (2006)
Chechik, S., Larkin, D.H., Roditty, L., Schoenebeck, G., Tarjan, R.E., Williams, V.V.: Better approximation algorithms for the graph diameter. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1041–1052. Society for Industrial and Applied Mathematics, Philadelphia (2014)
Demetrescu, C., Italiano, G.F.: Experimental analysis of dynamic all pairs shortest path algorithms. ACM Trans. Algorithm. (TALG) 2(4), 578–601 (2006)
Fujiwara, Y., Onizuka, M., Kitsuregawa, M.: Real-time diameter monitoring for time-evolving graphs. In: International Conference on Database Systems for Advanced Applications, pp. 311–325. Springer, Berlin (2011)
Gaston, M.E., Kraetzl, M., Wallis, W.D.: Using graph diameter for change detection in dynamic networks. Australas. J. Comb. 35, 299–311 (2006)
Guare, J.: Six Degrees of Separation: A Play. Vintage, New York (1990)
Henderson, K.: Opex: Optimized eccentricity computation in graphs. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (2011)
Johnson, D.B.: Efficient algorithms for shortest paths in sparse networks. J. ACM (JACM) 24(1), 1–13 (1977)
Kas, M., Carley, K.M., Carley, L.R.: Incremental closeness centrality for dynamically changing social networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1250–1258. ACM, New York (2013)
Leskovec, J., Krevl, A.: Snap datasets: Stanford large network dataset collection (2014). http://snap.stanford.edu/data
Li, Z., Sun, D., Xu, F., Li, B.: Social network based anomaly detection of organizational behavior using temporal pattern mining. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 1112–1119. ACM, New York (2017)
Lü, L., Chen, D., Ren, X.-L., Zhang, Q.-M., Zhang, Y.-C., Zhou, T.: Vital nodes identification in complex networks. Phys. Rep. 650, 1–63 (2016)
Nathan, E., Zakrzewska, A., Riedy, J., Bader, D.: Local community detection in dynamic graphs using personalized centrality. Algorithms 10(3), 102 (2017)
Newman, M.E.J.: A measure of betweenness centrality based on random walks. Social Netw. 27(1), 39–54 (2005)
Okamoto, K., Chen, W., Li, X.-Y.: Ranking of closeness centrality for large-scale social networks. In: International Workshop on Frontiers in Algorithmics, pp. 186–195. Springer, Berlin (2008)
Riondato, M., Upfal, E.: Abra: Approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM Trans. Knowl. Discov. Data (TKDD) 12(5), 61 (2018)
Roditty, L., Vassilevska Williams, V.: Fast approximation algorithms for the diameter and radius of sparse graphs. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 515–524. ACM, New York (2013)
Sagharichian, M., Langouri, M.A., Naderi, H.: A fast method to exactly calculate the diameter of incremental disconnected graphs. World Wide Web 20(2), 399–416 (2017)
Sariyüce, A.E., Kaya, K., Saule, E., Çatalyiirek, Ü.V.: Incremental algorithms for closeness centrality. In: 2013 IEEE International Conference on Big Data, pp. 487–492. IEEE, New York (2013)
Shun, J.: An evaluation of parallel eccentricity estimation algorithms on undirected real-world graphs. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1095–1104. ACM, New York (2015)
Takes, F., Kosters, W.: Computing the eccentricity distribution of large graphs. Algorithms 6(1), 100–118 (2013)
Takes, F.W., Kosters, W.A.: Determining the diameter of small world networks. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1191–1196. ACM, New York (2011)
Then, M., Kaufmann, M., Chirigati, F., Hoang-Vu, T.-A., Pham, K., Kemper, A., Neumann, T., Huy, T.V.: The more the merrier: efficient multi-source graph traversal. Proc. VLDB Endow. 8(4), 449–460 (2014)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’networks. Nature 393(6684), 440 (1998)
West, D.B., et al.: Introduction to Graph Theory, vol. 2. Prentice Hall, Upper Saddle River, NJ (1996)
Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, pp. 664–673. ACM, New York (2014)
Yen, C.-C., Yeh, M.-Y., Chen, M.-S.: An efficient approach to updating closeness centrality and average path length in dynamic networks. In: 2013 IEEE 13th International Conference on Data Mining, pp. 867–876. IEEE, New York (2013)
Acknowledgements
Miao Qiao is supported by Marsden Fund UOA1732, Royal Society of New Zealand. Lu Qin is supported by ARC DP160101513. Ying Zhang is supported by ARC FT170100128 and DP180103096. Lijun Chang is supported by ARC DP160101513 and FT180100256. Xuemin Lin is supported by NSFC 61672235, DP170101628 and DP180103096.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
The proof of Lemma 4
Based on Definition 3, \(pecc(x| V_{\le \lambda }^z) = \max _{u\in V_{\le \lambda }^z} dist(x,u).\)\(dist(x, u) \le dist(x,z) + dist(z,u) \le dist(x,z) + \lambda \), for \(\forall u \in V_{\le \lambda }^z\). Therefore, \( pecc(x| V_{\le \lambda }^z) = \max _{u\in V_{\le \lambda }^z} dist(x,u) \le \max _{u\in V_{\le \lambda }^z} (dist(x, z) + \lambda ) = dist(x, z) + \lambda . \)
The proof of Lemma 5
According to the definition of a \(\lambda \)-partial set, \(V' \cup V_{\le \lambda }^z = V\), and the definition of the eccentricity \(ecc(x) = max_{u\in V} dist(u,x)\), \(ecc(x) = \max \{pecc(x|V'), pecc(x| V_{\le \lambda }^z\}\).
The proof of Lemma 9
Let a shortest path from u to x be \(\langle v_0, v_1, v_2, \ldots , v_k \rangle \) with \(v_0=u\), \(v_k=x\) and \(k=dist(x,u)\). Since the new snapshot is taken on a stable state, consider edges \((v_0, v_1)\), \((v_1, v_2), \ldots , (v_{k-1}, v_k)\), we have \(\overline{ecc}_{new}(u) = \overline{ecc}_{new}(v_0) \le \overline{ecc}_{new}(v_1) + 1 \le \overline{ecc}_{new}(v_2) + 2 \le \ldots \le \overline{ecc}_{new}(v_k) + k = \overline{ecc}_{new}(x) + k. \) Therefore, \(\overline{ecc}_{new}(u) \le \min \{\overline{ecc}_{old}(u), \overline{ecc}_{new}(x) + dist (x,u)\}.\) Similar proof can be applied on showing that \(\underline{ecc}_{new}(v_0) \ge \underline{ecc}_{new}(v_{1}) - 1 \ge \ldots \ge \underline{ecc}_{new}(v_k) - k.\) By plugging \(v_0 = u\), \(v_k = x\) and \(k = dist(x,u)\) in the above inequality, we complete the proof.
The proof of Lemma 10
Observe that in Step 2) of the iterative update, \(\overline{ecc}(l)\) of node l will be updated only if \(\overline{ecc}(r)\) of its neighbor r is small enough such that \(\overline{ecc}(l) > \overline{ecc}(r) + 1\). Once the update takes place, we conceptually associate with \(\overline{ecc}(l)\) a source \(\overline{ecc}(l).s \leftarrow r\) to record the source of the bound. Note that this source field may be overwritten upon a subsequent update; however, it will not be removed once created.
\(\overline{ecc}_{new}(y).s\) exists since \(\overline{ecc}(y)\) must have been updated to let \(\overline{ecc}_{new}(y) <\overline{ecc}_{old}(y)\). Now, we trace from y via the source link of \(\overline{ecc}_{new}(\cdot ).s\), generating a path \(\langle v'_0, v'_1, \ldots , v_{k'}'\rangle \) with \(v'_0 = y\), \(v'_i = \overline{ecc}_{new}(v'_{i-1}).s\), for each \(i\in [1, k']\) while \(\overline{ecc}_{new}(v_{k'}')\) does not have a source. Note that in this sequence, we have \(\overline{ecc}_{new}(v'_{i-1}) = \overline{ecc}_{new}(v'_i) + 1\) for all \(i\in [1,k']\); thus, the sequence cannot contain a loop and thus \(k'\le n\). We have \(\overline{ecc}_{new}(y) = \overline{ecc}_{new}(v_0') = \overline{ecc}_{new}(v_{k'}') + k'.\)
We prove that \(v'_{k'}\) must be the trigger node x. If otherwise, \(\overline{ecc}_{new}(v_{k'}')\) has no source, and thus, \(\overline{ecc}_{new}(v_{k'}') = \overline{ecc}_{old}(v_{k'}')\). Therefore, \(\overline{ecc}_{old}(y) > \overline{ecc}_{new}(y) = \overline{ecc}_{old}(v_{k'}') + k'\). According to pigeon principle, there must be \(\exists j\in [1,k']\) such that \(\overline{ecc}_{old}{v'_{j-1}} > \overline{ecc}_{old}{v'_j} + 1\)—violating the assumption that the old snapshot is stable.
The fact that \(v'_{k'} = x\) implies three important results:
-
1.
For \(v_j'\) with \(j \in [0,k')\), \(\overline{ecc}_{new}(v_j') < \overline{ecc}_{old}(v_j')\): if otherwise, the path would have stopped at j instead of \(k'\).
-
2.
\(\overline{ecc}_{new}(x) = ub_x < \overline{ecc}_{old}(x)\). Since if \(\overline{ecc}_{new}(x) = \overline{ecc}_{old}(x)\), there will be a violation to the assumption that the old snapshot is stable. Besides, \(\overline{ecc}_{new}(x)\) has no source; thus, it has not been updated in Step 2) of the iterative update. Therefore, \(\overline{ecc}_{new}(x) = \min \{\overline{ecc}_{old}(x), ub_x\} = ub_x < \overline{ecc}_{old}(x).\)
-
3.
\(\langle v'_0, v'_1, \ldots , v_{k'}'\rangle \) is a shortest path from y to x. Since \(k'\) is the length of a path from x to y, \(k' \ge dist(x,y)\). Based on Lemma 9, that is, \(\overline{ecc}_{new}(y) \le \overline{ecc}_{new}(x) + dist(x,y)\), it can be assured that \(k' = dist(x,y)\) since \(\overline{ecc}_{new}(y) = \overline{ecc}_{new}(x) + k'\).
From the above three results, we complete the proof.
The proof of Lemma 13
We only prove the lemma for the case of edge insertion since the case of edge deletion can be symmetrically proved. Let p (and \(p'\), resp.) be a shortest path from v to w in G (and in \(G'\), resp.) with length L(p) (or \(L(p')\), resp.). If \(p'\) does not include edge e, then both p and \(p'\) are paths between v and w in both G and \(G'\). Since p and \(p'\) are shortest paths of G and \(G'\), respectively, \(dist(v,w) = L(p) = L'(p) = dist'(v,w)\), contradiction. Therefore, \(p'\) must include e.
The proof of Lemma 14
We only consider the case of inserting the edge of e(a, b) to G since the case of edge deletion can be symmetrically proved. For two nodes \(u,v \in V\), if \(dist(u,v) \ne dist'(u,v)\), then all of the shortest paths from u to v on graph \(G'\) must pass e (Lemma 13), i.e., either \(dist'(u,a) = dist'(u, b) - 1\) or \(dist'(u,b) = dist'(u, a) - 1\). When \(dist'(u,a) = dist'(u, b) - 1\), since \(dist'(b,v) = dist(b,v)\) and \(dist'(u,a) = dist(u,a)\) (the shortest paths from u to a and from b to v on \(G'\) do not include e), and \(dist(u,v) \ne dist'(u,v)\), we have \(dist(u,b) \ne dist'(u,b)\), and \(dist(v,a) \ne dist'(v,a)\), \(u \in C^b\) and \(v \in C^a\). Similarly, when \(dist(u,b) = dist(u, a) - 1\), \(u \in C^a\) and \(v \in C^b\).
Comparison with approximate methods
We show that computing exact eccentricities is necessary by evaluating the approximate algorithms of \({\mathsf {HybridEcc}} \) [27] and \({\mathsf {kBFSEcc}} \) [26] on four graphs: Twitter, Youtube, Lastfm, and Indochina. The algorithms of \({\mathsf {HybridEcc}} \) and \({\mathsf {kBFSEcc}} \) have been introduced in Sect. 5; the descriptions of the four real graphs and the computation time of \({\mathsf {ECC}} \) are given in Table 5. All algorithms were run with a single thread.
We measure an approximate algorithm with its accuracy—the percentage of nodes whose eccentricities are correctly estimated—and its running time.
Table 5 shows the running time and accuracy of \(\mathsf {HybridEcc}\). In comparison with \(\mathsf {ECC}\), \(\mathsf {HybridEcc}\) shows a trade-off between the precision and efficiency. On Twitter, \(\mathsf {HybridEcc}\) uses longer running time to achieve a high accuracy (higher than 96%). On Indochina, \(\mathsf {HybridEcc}\) performs well both in terms of accuracy and running time. On Youtube and Lastfm, the running time of \(\mathsf {HybridEcc}\) is significantly lower than that of \(\mathsf {ECC}\) while the accuracy is below 66%. \(\mathsf {HybridEcc}\) has its accuracy varying dramatically on different graphs and thus fails in providing stable and reliable estimations.
The performance of \({\mathsf {kBFSEcc}} \) is highly dependent on a key parameter of k, and we show the accuracy of \({\mathsf {kBFSEcc}} \) in Fig. 27 and running time in Fig. 28 with k varying from 1 to \(2^{14} = 16{,}384\). Note that, in order to eliminate the impact of randomness, we used the same random seed for different values of k. This means that the sampled node set S of Phase 1 (see Sect. 5) of a smaller k is a subset of that of a larger k.
Figure 27 shows the accuracy of \({\mathsf {kBFSEcc}} \). On 2 out of 4 graphs, \({\mathsf {kBFSEcc}} \) achieves 100% accuracy with a small k: \(k = 2^4\) on Youtube, \(k = 2^8\) on Indochina. In contrast, on Twitter, it requires a large \(k = 2^{14}\) to reach 100% accuracy while it never achieves 100% accuracy on Lastfm. A weird fluctuation of the accuracy of \({\mathsf {kBFSEcc}} \) has been observed. Note that this phenomena is random-seed independent—similar phenomena can be observed under other random seeds. On all four graphs, an increase of k can dramatically reduce the accuracy. For example, the accuracy drops from \(95.59\%\) to \(39.01\%\) on Lastfm when k is increased from \(2^3\) to \(2^4\). In this sense, \({\mathsf {kBFSEcc}} \) still needs analysis on the relationship between k and the accuracy to establish the reliability of the estimation.
Figure 28 shows the running time of \({\mathsf {kBFSEcc}} \) that increases linearly with k. In general, \(\mathsf {kBFSEcc}\) reaches a high accuracy within \(10\%\) of the running time of \(\mathsf {ECC}\). In particular, on Indochina, \(\mathsf {kBFSEcc}\) reached 100% accuracy \(17.5\times \) times faster than \(\mathsf {ECC}\). However, this superiority is not guaranteed: when \(\mathsf {ECC}\) completed its computation on Twitter (\(k = 2^8\)), \({\mathsf {kBFSEcc}} \) can only achieve an accuracy of \(93.10\%\). In conclusion, in comparison with \(\mathsf {ECC}\), \(\mathsf {kBFSEcc}\) provides a faster yet less reliable estimation on the eccentricity distribution.
Corner case on road networks
We show the applicability of our algorithm by conducting experiments on road networks whose diameters are large. Three road networks Luxembourg, Usroads, and RoadNetPA were used (downloaded from Network RepositoryFootnote 7). Tables 4, 5 show the comparison of \({\mathsf {ECC}} \) and \({\mathsf {BoundEcc}} \) on road networks. \({\mathsf {ECC}} \) is slower than \({\mathsf {BoundEcc}} \) in all three graphs; \({\mathsf {ECC}} \) cannot finish the computation within one day on RoadNetPA. Since our algorithm is highly dependent on the features of small-world networks, it is not applicable to graphs with large diameters.
Detailed explanation of \(\mathsf {PLL}\)
Algorithm 7 illustrates the main steps of the \({\mathsf {PLL}} \) approach. In iteration i, node \(v_i\) performs pruned BFS (Lines 1–4). When \(v_i\) visits a node u (Line 5), we first check whether the current labels S can answer the distance between \(v_i\) and u. If so, u is pruned and we stop the traversal from u (Lines 6–7). Otherwise, \((v_i,dist(v_i,u))\) is inserted into S(u) (Line 8). Moreover, we continue BFS from u by adding the neighbors of u into the queue Q (Lines 9–12). Finally, the total set \(S = \{S(v)|v \in V\}\) is returned as the answer \({\mathsf {PLL}} \) (Line 13).

Additional experimental results
This section shows the results on the graphs apart from Slashdot, Twitter, DBLP, and Wiki-talk in the following four experiments.
-
Exp-2: Testing \({\mathsf {ECC}} \). Figure 29 shows the performance of \({\mathsf {ECC}} \) under a varying number k of reference nodes.
-
Exp-3: Testing \({\mathsf {ECC}}\text {-}{\mathsf {LS}} \). Figure 30 shows the processing time when varying the number k of reference nodes. Figure 31 shows the number of \(\mathsf {PLL}\) queries for \({\mathsf {ECC}} \) and \({\mathsf {ECC}}\text {-}{\mathsf {LS}} \).
-
Exp-5: Testing distance to reference nodes Figure 32 shows the distribution of the distance \(\lambda _0(u)\) from a node u to its reference node z. Figure 33 illustrates the average distance of a node to its nearest reference node.
-
Exp-8: Rule-effectiveness for edge insertion. Figure 34 shows the number of nodes pruned by each of Lemmas 19–21 and the number of nodes in \(C^{b'}\).
Rights and permissions
About this article
Cite this article
Li, W., Qiao, M., Qin, L. et al. Eccentricities on small-world networks. The VLDB Journal 28, 765–792 (2019). https://doi.org/10.1007/s00778-019-00566-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-019-00566-9