Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A large class of data routinely produced and collected by large corporations can be modeled as graphs, such as web pages crawled by Google (e.g., the web graph) and tweets collected by Twitter (e.g., the mention graph for users). Since graphs can capture complex dependencies and interactions, graph algorithms have become an essential component in many real-world applications [2, 8, 24], including business intelligence, social sciences, and data mining.

An essential property of graphs is that they are often dynamic. As new data and/or updates are being collected (or produced), the graph will evolve. For example, search engines will periodically crawl the web, and the web graph is evolving as web pages and hyper-links are created and/or deleted. Many applications must utilize the up-to-date graph in order to produce results that can reflect the current state. However, rerunning the computation over the entire graph is not efficient (considering the huge size of the graph), since it discards the work done in earlier runs no matter how little changes have been made.

The dynamic nature of graphs implies that performing incremental computation can improve efficiency dramatically. Incremental computation exploits the fact that only a small portion of the graph has changed. It reuses the result of the prior computation and performs computation only on the part of the graph that is affected by the change. Although a number of distributed frameworks have been proposed to support incremental computation on massive graphs [3, 6, 1517, 23], most of them apply synchronous updates, which require expensive synchronization barriers. In order to avoid the high synchronization cost, asynchronous updates have been proposed. In the asynchronous update model, a vertex performs the update using the most recent values instead of the values from the previous iteration (and there is no waiting time). Intuitively, we can expect asynchronous updates outperform synchronous updates since more up-to-date values are used and the synchronization barriers are bypassed. However, asynchronous updates might require more communications and perform useless computations (e.g., when no new value available to a vertex), and thus result in limited performance gain over synchronous updates.

In this paper, we provide an approach to efficiently apply asynchronous updates to incremental computation. We first describe a broad class of graph algorithms targeted by this paper. We then present our incremental computation approach through illustrating how to apply asynchronous updates to incremental computation. In order to address the challenge that asynchronous updates might require more communication and computation, we present a scheduling scheme to coordinate updates. Furthermore, we develop a distributed system to support our proposed asynchronous incremental computation approach. We evaluate our approach on a local cluster of machines as well as the Amazon EC2 cloud. More specifically, our main contributions are as follows:

  • We propose an approach to efficiently apply asynchronous updates to incremental computation on evolving graphs for a broad class of graph algorithms. In order to improve efficiency, a scheduling scheme is presented to coordinate asynchronous updates. The convergence of our proposed asynchronous incremental computation approach is proved.

  • We develop an asynchronous distributed framework, GraphIn, to support incremental computation. GraphIn eases the process of implementing graph algorithms with incremental computation in a distributed environment and does not require users to have the distributed programming experience.

  • We extensively evaluate our asynchronous incremental computation approach with several real-world graphs. The evaluation results show that our approach can accelerate the convergence speed by as much as 14x when compared to recomputation from scratch. Moreover, a scalability test on a 50-machine cluster demonstrates our approach works with massive graphs having tens of millions of vertices and a billion of edges.

2 Problem Setting

In this section, we first define the problem of performing algorithms on evolving graphs. We then describe a broad class of graph algorithms which we target.

2.1 Problem Formulation

Many graph algorithms leverage iterative updates to compute states (e.g., scores of importance, closenesses to a specified vertex) of the vertices until convergence points are reached. For example, PageRank iteratively refines the rank scores of the vertices (e.g., web pages) of a graph. Such a graph algorithm typically starts with some initial state and then iteratively refines it until convergence. We refer to this kind of graph algorithms as iterative graph algorithms.

We are interested in how to efficiently perform iterative graph algorithms on evolving graphs. More formally, if we use G to denote the original graph and \(G'\) to represent the new graph, the question we ask is: for an iterative graph algorithm, given \(G'\) and the convergence point on G, how to efficiently reach the convergence point on \(G'\).

2.2 Iterative Graph Algorithms

We here describe the iterative graph algorithms targeted by this paper. Typically, the update function of an iterative graph algorithm has the following form:

$$\begin{aligned} x^{(k)} = f(x^{(k-1)}), \end{aligned}$$
(1)

where the n-dimensional vector \(x^{(k)}\) presents the state of the graph at iteration k, each of its elements is the state for one vertex (e.g., \(x^{(k)}[i]\) for vertex i), and \(x^{(0)}\) is the initial state. A convergence point is a fixed point of the update function. That is, if \(x^{(*)}\) is a convergence point, we have \(x^{(*)} = f(x^{(*)})\).

The update function usually can be decomposed into a series of individual functions. In other words, we can update a vertex’s state (e.g., \(x_{j}\)) as follows:

$$\begin{aligned} x_{j}^{(k)}= c_j \star \sum _{i=1}^{n} \star f_{\{i,j\}}(x_{i}^{(k-1)}), \end{aligned}$$
(2)

where ‘\(\star \)’ is an abstract operator (\(\sum _{i=1}^{n}\star \) represents an operation sequence of length n by ‘\(\star \)’), \(c_j\) is a constant, and \(f_{\{i,j\}}(x_{i}^{k-1})\) is an individual function denoting the impact from vertex i to vertex j in the \(k^{\text {th}}\) iteration. The operator ‘\(\star \)’ typically has three candidates, ‘\(+\)’, ‘\(\min \)’, and ‘\(\max \)’. In this paper, we target the iterative graph algorithm that can compute the state in the form of Eq. (2).

2.3 Example Graph Algorithms

We next illustrate a series of well-known iterative graph algorithms, the update functions of which can be converted into the form of Eq. (2).

PageRank and Variants: PageRank is a well-known algorithm, which ranks vertices in a graph based on the stationary distribution of a random walk on the graph. Each element (e.g., \(r_j\)) of the score vector r can be computed iteratively as follows: \(r^{(k)}_j = \sum _{\{i|\{i\rightarrow j\}\in E\}} \frac{d r^{(k-1)}_i}{|N(i)|} + (1-d)e_j\), where d (\(0<d<1\)) is the damping factor, \(\{i\rightarrow j\}\) represents the edge from vertex i to vertex j, E is the set of edges, |N(i)| is the number of outgoing edges of vertex i, and e is a size-n vector with each entry being \(\frac{1}{n}\). We can convert the update function of PageRank into the form of Eq. (2). If \(\{i\rightarrow j\}\in E\), \(f_{\{i,j\}}(x_{i}^{(k-1)}) = d x^{(k-1)}_i/|N(i)|\), otherwise \(f_{\{i,j\}}(x_{i}^{(k-1)}) = 0\), \(c_j = (1-d)e_j\), and ‘\(\star \)’ is ‘+’.

The update function of Personalized PageRank [9] differs from that of PageRank only at vector e. Vector e of Personalized PageRank assigns non-zero values only to the entries indicating the personally preferred pages. Rooted PageRank [19] is a special case of Personalized PageRank. It captures the probability for two vertices to run into each other and uses this probability as the similarity score of those two vertices.

Shortest Paths: The shortest paths algorithm is a simple yet common graph algorithm which computes the shortest distances from a source vertex to all other vertices. Given a weighted graph, \(G = (V, E, W)\), where V is the set of vertices, E is the set of edges, and W is the weight matrix of the graph (if there is no edge between i and j, \(W[i,j] = \infty \)). Then the shortest distance (i.e., \(d_j\)) from the source vertex s to a vertex j can be calculated by performing the iterative updates: \(d^{(k)}_j = \min \{d^{(0)}_j, \min _i(d^{(k-1)}_i + W[i,j])\}\). For the initial state, we usually set \(d^{(0)}_s = 0\) and \(d^{(0)}_j = \infty \) for any vertex j other than s. We can map the update function of the shortest paths algorithm into the form of Eq. (2). If there is an edge from vertex i to vertex j, \(f_{\{i,j\}}(x_{i}^{(k-1)}) = x_{i}^{(k-1)} + W[i,j]\), otherwise \(f_{\{i,j\}}(x_{i}^{(k-1)}) = \infty \), \(c_j = d^{(0)}_j\), and ‘\(\star \)’ is ‘\(\min \)’.

Connected Components: The connected components algorithm is an important algorithm for understanding graphs. It aims to find the connected components in a graph. The main idea of the algorithm is to label each vertex with the maximum vertex id across all vertices in the component which it belongs to. Initially, a vertex j sets its component id \(p^{(0)}_j\) as its own vertex id, i.e., \(p^{(0)}_j = j\). Then the component id of vertex j can be iteratively updated by \(p^{(k)}_j = \max \{p^{(0)}_j, \max _{i\in N(j)}(p^{(k-1)}_i)\}\), where N(j) denotes vertex j’s neighbors. When no vertex in the graph changes its component id, the algorithm converges. As a result, the vertices having the same component id belong to the same component. We can map the update function of the connected components algorithm into the form of Eq. (2). If there is an edge from vertex i to vertex j, \(f_{\{i,j\}}(x_{i}^{(k-1)}) = x_{i}^{(k-1)}\), otherwise \(f_{\{i,j\}}(x_{i}^{(k-1)}) = -\infty \), \(c_j = j\), and ‘\(\star \)’ is ‘\(\max \)’.

Other Algorithms: There are many more iterative graph algorithms, update functions of which can be mapped into the form of Eq. (2). We name several ones here. Hitting time is a measure based on a random walk on the graph. Penalized hitting probability [8] and discounted hitting time [18] are variants of hitting time. The adsorption algorithm [2] is a graph-based label propagation algorithm proposed for personalized recommendation. HITS [10] utilizes a two-phase iterative update approach to rank web pages of a web graph. SALSA [13] is another link-based ranking algorithm for web graphs. Effective Importance [4] is a proximity measure to capture the local community structure of a vertex.

3 Asynchronous Incremental Computation

As the underlying graph evolves, the states of the vertices also change. Obviously, rerunning the computation from scratch over the new graph is not efficient, since it discards the work done in earlier runs. Intuitively, performing computations incrementally can improve efficiency. In this section, we present our asynchronous incremental computation approach. The convergence of our approach is proved.

3.1 Asynchronous Updates

In order to describe our asynchronous incremental computation approach, we define a time sequence \(\{t_0,t_1,\ldots ,t_{\infty }\}\). Let \(\hat{x}^{(k)}\) denote the state vector at time \(t_k\). Also, we introduce the delta state vector \(\varDelta \hat{x}^{(k)}\) to represent the difference between \(\hat{x}^{(k+1)}\) and \(\hat{x}^{(k)}\) in the operator ‘\(\star \)’ manner, i.e., \(\hat{x}^{(k+1)} = \hat{x}^{(k)} \star \varDelta \hat{x}^{(k)}\). The goal of introducing \(\varDelta \hat{x}^{(k)}\) is to perform accumulative computations. When the operator ‘\(\star \)’ has the commutative property and the associative property and the function \(f_{\{i,j\}}(x_i)\) has the distributive property over ‘\(\star \)’, the computation can be performed accumulatively. All the graph algorithms discussed in Sect. 2.3 satisfy these properties. It is straightforward to verify that accumulative computations are equivalent to normal computations. The benefit of performing accumulative computations is that only changes of the states (i.e., delta states) are used to compute new changes. If there is no change for the state of a vertex, no communication or computation is necessary. The general idea of separating fixed parts from changes and leveraging changes to compute new changes also shows efficiency in many other algorithms, such as Nonnegative Matrix Factorization [21] and Expectation-Maximization [22].

In our asynchronous incremental computation approach, each vertex i updates its \(\varDelta \hat{x}_i^{(k)}\) and \(\hat{x}_i^{(k)}\) independently and asynchronously, starting from \(\varDelta \hat{x}_i^{(0)}\) and \(\hat{x}_i^{(0)}\) (we will illustrate how to construct them soon). In other words, there are two separate operations for vertex j:

  • Accumulate operation: whenever receiving a value (e.g., \(f_{\{i,j\}}(\varDelta \hat{x}_i)\)) from a neighbor (e.g., i), perform \(\varDelta \hat{x}_j = \varDelta \hat{x}_j \star f_{\{i,j\}}(\varDelta \hat{x}_i)\);

  • Update operation: perform \(\hat{x}_j = \hat{x}_j \star \varDelta \hat{x}_j\); for any neighbor l, if \(f_{\{j,l\}}(\varDelta x^{(1)}_j) \ne o\), send \(f_{\{j,l\}}(\varDelta \hat{x}_j)\) to l; and then reset \(\varDelta \hat{x}_j\) to o;

where o is the identity value of the operator ‘\(\star \)’. That is, for \(\forall z \in R\), \(z = z \star o\) (if ‘\(\star \)’ is ‘\(+\)’, \(o = 0\); if ‘\(\star \)’ is ‘\(\min \)’, \(o = \infty \); if ‘\(\star \)’ is ‘\(\max \)’, \(o = -\infty \)). Basically, the accumulate operation accumulates received values between two consecutive updates on \(\hat{x}_j\). The update operation adjusts \(\hat{x}_j\) by absorbing \(\varDelta \hat{x}_j\), sends useful values to other vertices, and resets \(\varDelta \hat{x}_j\).

We now illustrate how to construct \(\hat{x}_i^{(0)}\) and \(\varDelta \hat{x}_i^{(0)}\) by leveraging the computation result on the previous graph, G. We need to make sure that the constructed \(\hat{x}_i^{(0)}\) and \(\varDelta \hat{x}_i^{(0)}\) can guarantee the correctness of the result on the new graph. Let \(\bar{x}^{(*)}\) denote the convergence point on G. We next show how to construct \(\hat{x}_i^{(0)}\) and \(\varDelta \hat{x}_i^{(0)}\) when the operator ‘\(\star \)’ is ‘\(+\)’ (for all the graph algorithms discussed in Sect. 2.3 except shortest paths and connected components) and when ‘\(\star \)’ is ‘\(\min \)/\(\max \)’ (shortest paths and connected components), respectively.

For an iterative graph algorithm with the operator ‘\(\star \)’ as ‘\(+\)’, we first leverage \(\bar{x}^{(*)}\) to construct \(\hat{x}^{(0)}\) in the following way: for a kept vertex (e.g., i), we set \(\hat{x}^{(0)}_i = \bar{x}^{(*)}_i\); for a newly added vertex (e.g., j), we set \(\hat{x}^{(0)}_j = 0\). In contrast, recomputation from scratch typically utilizes 0 as \(\hat{x}^{(0)}\) (where 0 is a vector with all its elements being zero). In order to construct \(\varDelta \hat{x}^{(0)}\), we compute \(\hat{x}^{(1)}\) using \(\hat{x}^{(1)} = f(\hat{x}^{(0)})\) and then construct \(\varDelta \hat{x}^{(0)}\) by making sure \(\varDelta \hat{x}^{(0)}\) satisfying \(\hat{x}^{(1)} = \hat{x}^{(0)} \star \varDelta \hat{x}^{(0)}\). Since ‘\(\star \)’ is ‘\(+\)’, we can calculate \(\varDelta \hat{x}^{(0)}\) by \(\varDelta \hat{x}^{(0)} = \hat{x}^{(1)} - \hat{x}^{(0)}\). It is important to note that here the deleted vertices and/or edges do not affect the way we construct \(\hat{x}_i^{(0)}\) and \(\varDelta \hat{x}_i^{(0)}\). In other words, no matter whether there are deleted vertices and/or edges, the way we construct \(\hat{x}_i^{(0)}\) and \(\varDelta \hat{x}_i^{(0)}\) can guarantee the correctness of the result on the new graph.

For an iterative graph algorithm with the operator ‘\(\star \)’ as ‘\(\min \)/\(\max \)’, we construct \(\hat{x}_i^{(0)}\) and \(\varDelta \hat{x}_i^{(0)}\) as follows. When the operator ‘\(\star \)’ is ‘\(\min \)’ (e.g., shortest paths), if any vertex’s initial state is not smaller than its final converged state, the algorithm will converge. This is because of the following reason. When the algorithm has not converged, in each iteration there must be at least one vertex whose state is becoming smaller, and thus the overall state vector is becoming closer to the final converged state vector. When there is no vertex changing its state, the algorithm converges. Generally, it is hard to know the final converged state vector. Therefore, for the shortest paths algorithm, recomputation from scratch usually sets the initial state of a vertex (other than the source vertex) as \(\infty \) to guarantee that it is not smaller than the final converged state. Fortunately, when the graph grows (vertices and/or edges are added and no vertices or edges are deleted), the previous converged state of a kept vertex must be not smaller than its converged state on the new graph. Therefore, for the graph growing scenario, we construct \(\hat{x}_i^{(0)}\) in the following way: for a kept vertex (e.g., i), we set \(\hat{x}^{(0)}_i = \bar{x}^{(*)}_i\); for a newly added vertex (e.g., j), we set \(\hat{x}^{(0)}_j = \infty \). Similarly, for the connected component algorithm, whose operator ‘\(\star \)’ is ‘\(\max \)’, we can construct \(\hat{x}_i^{(0)}\) (for the graph growing scenario) as follows: for a kept vertex (e.g., i), we set \(\hat{x}^{(0)}_i = \bar{x}^{(*)}_i\); for a newly added vertex (e.g., j), we set \(\hat{x}^{(0)}_j = j\). To construct \(\varDelta \hat{x}^{(0)}\), we also compute \(\hat{x}^{(1)}\) using \(\hat{x}^{(1)} = f(\hat{x}^{(0)})\) and then simply set \(\varDelta \hat{x}_j^{(0)} = \hat{x}_j^{(1)}\). It can satisfy \(\hat{x}^{(1)} = \hat{x}^{(0)} \star \varDelta \hat{x}^{(0)}\), no matter ‘\(\star \)’ is ‘\(\min \)’ or ‘\(\max \)’.

3.2 Selective Execution

One potential problem of basic asynchronous updates is that they might require more computation and communication when compared to their synchronous counterparts. This is because vertices are updated in a round-robin manner no matter how many new values available to a vertex. To solve this problem, instead of updating vertices in a round-robin manner, we update vertices selectively by identifying their importance. The motivation behind it is that not all vertices contributes the same to the convergence. We refer to this scheduling scheme as selective execution. The vertices are selected according to their importance (in terms of contribution to the convergence).

Our selective execution scheduling scheme selects a block of m vertices (instead of one) to update each round. The reason is that if one vertex is chosen to update at a time, the scheduling overhead (e.g., maintaining a priority queue to always choose the vertex with the highest importance) is high. Once the block of the selected vertices are updated, it selects another block to update. Every time our scheme selects the top-m vertices in terms of the importance value. The size of the block (i.e., m) balances the tradeoff between the gain from selective execution and the cost of selecting vertices. Setting m too small may incur considerable overhead, while setting m too large may degrade the effect of selective execution, e.g., if setting m as the number of total vertices, it degrades to the round-robin scheduling. We will discuss how to determine m in Sect. 4.1.

We then illustrate how to quantify a vertex’s importance when ‘\(\star \)’ is ‘\(\min \)/\(\max \)’ and when the operator ‘\(\star \)’ is ‘\(+\)’, respectively. Ideally, the vertex whose update decreases the distance to the fixed point (i.e., \(||x^{(*)} - \hat{x}^{(k)}||_1\)) most should have the highest importance. For an iterative graph algorithm with the operator ‘\(\star \)’ as ‘\(\min \)/\(\max \)’, the iterative updates either monotonically decrease (e.g., shortest paths) or monotonically increase (e.g., connected components) any element of \(\hat{x}^{(k)}\). For ease of exposition, we assume the monotonically decreasing case. In this case, \(x^{(*)}_j \le \hat{x}^{(k)}_j\) for any j, and thus we have \(||x^{(*)} - \hat{x}^{(k)}||_1 = ||\hat{x}^{(k)}||_1 - ||x^{(*)}||_1\). An update on vertex j decrease \(||\hat{x}^{(k)}||_1\) by \(|\hat{x}^{(k)}_j \star \varDelta \hat{x}^{(k)}_j - \hat{x}^{(k)}_j|\). Therefore, we use \(|\hat{x}^{(k)}_j \star \varDelta \hat{x}^{(k)}_j - \hat{x}^{(k)}_j|\) to represent the importance of the vertex j (denoted as \(\eta _j\)), i.e. \(\eta _j = |\hat{x}^{(k)}_j \star \varDelta \hat{x}^{(k)}_j - \hat{x}^{(k)}_j|\).

For an iterative graph algorithm with the operator ‘\(\star \)’ as ‘\(+\)’, it is difficult to directly measure how the distance to the fixed point decreases. Update one single vertex may even increase the distance to the fixed point. Fortunately, for such an algorithm, its update function (f()) typically can be seen as a \(||\cdot ||\)-contraction mapping. That is, there exists an \(\alpha \) (\(0 \le \alpha < 1\)), such that \(||f(x) - f(y)|| \le \alpha ||x - y||, \forall x, y \in R^n\). Therefore, we can provide an upper bound on the distance, as stated in Theorem 1. The proof is omitted due to the space limitation. We then analyze how the upper bound decreases.

Theorem 1

\(||x^{(*)} - \hat{x}^{(k+1)}||_1 \le \frac{||\varDelta \hat{x}^{(k+1)}||_1}{1-\alpha }\).

Without loss of generality, assume that current time is \(t_{k}\) and that during interval \([t_{k}, t_{k+1}]\) we only update vertex j. When updating vertex j, we accumulate \(\varDelta \hat{x}_j^{(k)}\) to \(\hat{x}_j\), send \(f_{(j,l)}(\varDelta \hat{x}_j^{(k)})\) to a vertex l (and the total sending out value is no larger than \(\alpha |\varDelta \hat{x}_j^{(k)}|\)), and reset \(\varDelta \hat{x}_j^{(k)}\) to 0. Therefore, we have the following theorem.

Theorem 2

\(||\varDelta \hat{x}^{(k+1)}||_1 \le ||\varDelta \hat{x}^{(k)}||_1 - (1-\alpha )|\varDelta \hat{x}_j^{(k)}|\).

Theorem 2 implies that the upper bound monotonically decreases. When updating vertex j, we have \(\frac{||\varDelta \hat{x}^{(k+1)}||_1}{1-\alpha } \le \frac{||\varDelta \hat{x}^{(k)}||_1}{1-\alpha } - |\varDelta \hat{x}_j^{(k)}|\). It shows that the reduction in the upper bound is at least \(|\varDelta \hat{x}_j^{(k)}|\). Given a graph, \(\alpha \) is a constant. Hence, we define the importance of the vertex j to be \(|\varDelta \hat{x}_j^{(k)}|\), i.e., \(\eta _j = \arg \max _j |\varDelta \hat{x}_j^{(k)}|\).

3.3 Convergence

Our asynchronous incremental computation approach yields the same result as recomputation from scratch. To prove it, we first show that if synchronous updates (i.e., \(x^{(k)} = f(x^{(k-1)})\)) converge (and synchronous updates converge for all the graph algorithms discussed in Sect. 2.3), any asynchronous update scheme that can guarantee every vertex is updated infinitely often (until its state is fixed) will yield the same result as synchronous updates, as stated in Lemma 1.

Lemma 1

If updates \(x^{(k)} = f(x^{(k-1)})\) converge to \(x^{(*)}\), any asynchronous update scheme that guarantees every vertex is updated infinitely often will converge to \(x^{(*)}\) as well, i.e., \(\hat{x}^{(\infty )} = x^{(*)}\).

We then show that our asynchronous incremental computation approach fulfills this requirement, as stated in Lemma 2. The proofs of both Lemmas 1 and 2 are omitted.

Lemma 2

Our asynchronous incremental computation approach can guarantee that every vertex is updated infinitely often (until its state is fixed).

We can also prove that recomputation from scratch converges to \(x^{(*)}\) (no matter what type of updates it uses). As a result, we have the following theorem.

Theorem 3

Our asynchronous incremental computation approach converges and yields the same result as recomputation from scratch.

4 Distributed Framework

Oftentimes, iterative graph algorithms in real-world applications need to process massive graphs. Hence, it is desirable to leverage the parallelism of a cluster of machines to run these algorithms. Furthermore, it is troublesome to implement asynchronous incremental computation for each individual algorithm. Therefore, we propose GraphIn, an in-memory asynchronous distributed framework, for supporting iterative graph algorithms with incremental computation. GraphIn provides several high-level APIs to users for implementing asynchronous incremental computation and meanwhile hides the complexity of distributed computation. It leverages the proposed selective execution to accelerate convergence.

GraphIn consists of a number of workers and one master. Workers perform vertex updates, and the master controls the flow of computation. The new graph and the previous computed result are taken as the input of GraphIn. The input graph is split into partitions and each worker is responsible for one partition. Each worker leverages an in-memory table to store the vertices assigned to it. A worker has two main operations for its stored vertices: the accumulate operation and the update operation, as illustrated in Sect. 3.1. The accumulate operation utilizes a user-defined function to aggregate incoming messages for a vertex and also triggers another user-defined function to calculate the vertex’s importance. The update operation uses a user-defined function to update the states of scheduled vertices and compute outgoing messages.

The prototype of GraphIn is built upon Maiter [26]. Maiter is designed for processing static graphs, and thus has inherent impediments to the execution of graph algorithms with incremental computation. First, it relies on the specific initial state to guarantee the convergence of a graph algorithm. However, incremental computation leverages the previous result as the initial state, which can be arbitrary. Second, although Maiter supports prioritized updates, its scheduling scheme assumes that \(\varDelta x_i\) is always positive for any vertex i, which can be not true under incremental computation. Last, the termination check mechanism of Maiter assumes that \(||x||_1\) varies monotonically, which can be not true as well under incremental computation. GraphIn removes all these impediments to efficiently support incremental computation.

4.1 Distributed Selective Execution

GraphIn leverages the proposed selective execution scheduling as its default scheduling scheme. Since a centralized approach of finding the top-m elements is inefficient in a distributed environment, GraphIn allows each worker to build its own selective execution scheduling. Round by round (except the first round in which all vertices are selected to derive \(\hat{x}^{(0)}\) and \(\varDelta \hat{x}^{(0)}\)), each worker selects its local top-m vertices in terms of the importance. The number m is crucial to the effect of selective execution.

For the iterative graph algorithm with the operator ‘\(\star \)’ as ‘\(+\)’, GraphIn learns m online. We use \(\mu \cdot n\) to quantify the overhead of selecting such m vertices (where \(\mu \) represents the amortized overhead), which is proportional to the total number (n) of vertices with an efficient selection algorithm (e.g., quick-select). Also, we assume that the average cost of updating one vertex is \(\nu \), and then the cost of updating those m vertices is \(\nu \cdot m\). Let c(m) be the total cost of updating those m vertices (including both selection and update), then \(c(m) = \mu \cdot n + \nu \cdot m\). Let \(g(m) = \sum _{j\in S}|\varDelta \hat{x}_j|\) (recall that \(|\varDelta \hat{x}_j|\) represents the importance of vertex i), where S denotes the set of the m selected vertices. For each round, we aim to find the m that can achieve the largest efficiency, i.e., \(m = \arg \max _m \frac{g(m)}{c(m)}\). It is computationally impossible to try every value (from 1 to n) to figure out the best m. Therefore, our practical approach chooses several values (0.05n, 0.1n, 0.25n, 0.5n, n), which cover the entire range of possible m, as the candidates. For each candidate m, we leverage quick-select to find the m-th \(|\varDelta \hat{x}_j|\), which is used as a threshold, and all \(|\varDelta \hat{x}_i|\) no less than the threshold are counted into g(m). By testing each candidate (we set \(\nu /\mu \) as 4 by default), we can figure out the best m and the set S. The practical approach leverages quick-select to avoid the time-consuming sorting, and thus takes O(n) time on extracting the top-m vertices instead of \(O(n \log n)\) time. For the iterative graph algorithm with the operator ‘\(\star \)’ as ‘\(\min \)/\(\max \)’, the importance of a vertex might be close to \(\infty \). If we still use the above idea, g(m) might easily be overflown. Therefore, in this case, we simply set m as 0.1n, which shows good performance in experiments. Note that if there are only \(m'\) (\(m' < m\)) vertices with the importance being larger than 0, we only select these \(m'\) vertices to update.

4.2 Distributed Termination Check

We design termination check mechanisms for the iterative graph algorithm with the operator ‘\(\star \)’ as ‘\(\min \)/\(\max \)’ and for that with the operator ‘\(\star \)’ as ‘\(+\)’, respectively. When ‘\(\star \)’ is ‘\(\min \)/\(\max \)’, \(||\hat{x}^{(k)}||_1\) monotonically decreases or increases. Therefore, we can utilize \(||\hat{x}^{(k)}||_1\) to perform the termination check. If \(||\hat{x}^{(k)}||_1 - ||\hat{x}^{(k-1)}||_1 = 0\), the algorithm has converged, and thus the computation can be terminated. When ‘\(\star \)’ is ‘\(+\)’, \(||x^{(*)} - \hat{x}^{(k)}||_1\) is the choice for measuring convergence. However, it is difficult to directly quantify \(||x^{(*)} - \hat{x}^{(k)}||_1\), since the fixed point \(x^{(*)}\) is always unknown during the computation. Fortunately, we know \(||x^{(*)} - \hat{x}^{(k)}||_1 \le ||\varDelta \hat{x}^{(k)}||_1/(1-\alpha )\) from Theorem 1, and thus can leverage \(||\varDelta \hat{x}^{(k)}||_1\) to measure convergence. We use the convergence criterion, \(||\varDelta \hat{x}^{(k)}||_1 \le \epsilon \), where the convergence tolerance \(\epsilon \) is a pre-defined constant.

GraphIn adopts a passively monitoring model to perform the termination check, which works by periodically (and the period is configurable) measuring \(||\hat{x}^{(k)}||_1\) if the operator ‘\(\star \)’ is ‘\(\min \)/\(\max \)’ (or \(||\varDelta \hat{x}^{(k)}||_1\) if ‘\(\star \)’ is ‘\(+\)’). To complete the measure, each worker computes the sum of \(|\hat{x}_j^{(k)}|\) (or \(|\varDelta \hat{x}_j^{(k)}|\)) of its local vertices and sends the local sum to the master. The master aggregates the local sums into a global sum. The challenge of performing such a distributed termination is to make sure that the local sum at each worker are calculated from the snapshot of the values at the same time (especially for \(|\varDelta \hat{x}_j^{(k)}|\)). To address the challenge, GraphIn asks all the workers to pause vertex updates before starting to calculate the local sums. The procedure of the distributed termination check is as follows.

  1. 1.

    When it is the time to perform the termination check, the master broadcasts a \(chk_{pre}\) message to all the workers.

  2. 2.

    Upon receiving the \(chk_{pre}\) message, every worker pauses vertex updates and then replies a \(chk_{ready}\) message to the master.

  3. 3.

    The master gathers those \(chk_{ready}\) messages from all the workers, and then broadcasts a \(chk_{begin}\) message to them.

  4. 4.

    Upon receiving the \(chk_{begin}\) message, every worker calculates the local sum, \(\sum _{j}|\hat{x}_j^{(k)}|\) (or \(\sum _{j}|\varDelta \hat{x}_j^{(k)}|\)), and reports it to the master.

  5. 5.

    The master aggregates the local sums to the global sum \(||\hat{x}^{(k)}||_1\) (or \(||\varDelta \hat{x}^{(k)}||_1\)). If \(||\hat{x}^{(k)}||_1 - ||\hat{x}^{(k-1)}||_1 \ne 0\) (or \(||\varDelta \hat{x}^{(k)}||_1 > \epsilon \)), the master broadcasts a \(chk_{fin}\) message to all the workers. Otherwise, it broadcasts a term message.

  6. 6.

    When a worker receives the \(chk_{fin}\) message, it resumes vertex updates. When a worker receives the term message, it dumps the result to a local disk and then terminates the computation.

It is important to note that since calculating the local sums is inexpensive and it is done periodically, the overhead of the termination check is ignorable.

5 Evaluation

In this section, we evaluate the performance of our asynchronous incremental computation approach. We compare it with re-computation from scratch. Both approaches are supported by GraphIn. To show the performance of the selective execution scheduling, we compare it with the round-robin scheduling. The performance of other distributed frameworks that can support synchronous incremental computation are also evaluated.

5.1 Experiment Setup

The experiments are performed on both a local cluster and a large-scale cluster on Amazon EC2. The local cluster consists of 4 machines. The large-scale cluster consists of 50 EC2 medium instances.

Table 1. Graph Dataset Summary

Two graph algorithms are implemented on GraphIn, PageRank and the shortest paths algorithm. For PageRank, the damping factor is set to 0.8, and if not stated otherwise, the convergence tolerance \(\epsilon \) (which is discussed in Sect. 4.2) is set to \(10^{-2}/n\) (n is the number of vertices of the corresponding graph). The shortest paths algorithm stops running only when the convergence point is reached (i.e., all the vertices reach their shortest paths to the source vertex). The measurement of each experiment is averaged over 10 runs. Real-world graphs of various sizes are used in the experiments and are summarized in Table 1.

5.2 Overall Performance

We first show the convergence time of PageRank on the local cluster. The convergence time is measured as the wall-clock time that PageRank uses to reach the convergence criterion (i.e., \(||\varDelta \hat{x}^{(k)}||_1 \le \epsilon \)). We consider both the edge change case and the vertex change case. Under the edge change case, we randomly pick a number of vertices to change their edges. In the graph evolving process, there are usually more added edges than deleted edges. Therefore, for \(80\,\%\) of the picked vertices, we add one outgoing edge to it with a randomly picked neighbor. For the rest \(20\,\%\) vertices, we remove one randomly picked edge from it. Under the vertex change case, we pick a number (e.g., p, some percentage of the total number of vertices) for each experiment. We add 0.8p new vertices to the graph and delete 0.2p vertices. For each added vertex, we put two edges (one incoming edge and one outgoing edge) with randomly picked neighbors. For each deleted vertex, we also delete all its edges.

Fig. 1.
figure 1

PageRank on Amz graph (edge change).

Fig. 2.
figure 2

Shortest paths on weighted Amz graph.

Figure 1 shows the performance on the Amz graph under the edge change case. We can see that incremental computation (denoted as “Incr”) is much faster than re-computation from scratch (denoted as “Re”) for different percentages of vertices with edge change. The selective execution scheduling (denoted as “Sel”) is faster than the round-robin scheduling (denoted as “R-R”) with either approach. The efficiency of incremental computation is more prominent when the change is smaller. For example, when the percentage of vertices with edge change is \(0.01\,\%\), incremental computation with the selective execution scheduling is about 10x faster than recomputation from scratch with the round-robin scheduling and 7x faster than recomputation from scratch with the selective execution scheduling. Not surprisingly, incremental computation takes longer time as the percentage of vertices with edge change becomes larger, and the convergence time of the re-computation is almost the same since the change to the graph is relatively small. Similar trends are observed for the vertex change case.

We then present the result of the shortest paths algorithm, which runs on weighted graphs. Here the convergence time is measured as the wall-clock time that the shortest paths algorithm uses to reach the convergence point. All the graphs summarized in Table 1 are unweighed. We generate a weighted graph by assigning weights to the Amz graph. The weight of each edge is an integer, which is randomly drawn from the rang [1, 100]. Figure 2 plots the performance comparison under the vertex adding case. The percentage means the ratio between the number of added vertices to the number of original vertices. For each added vertex, we put two weighted edges (one incoming edge and one outgoing edge) with randomly picked neighbors. From the figure, we can see that incremental computation with the selective execution scheduling is about 14x faster than recomputation from scratch with the round-robin scheduling when the percentage of added vertices is \(0.01\,\%\) and still 9x faster even when the percentage is \(10\,\%\). Similar results are observed for the edge adding case as well.

5.3 Comparison with Synchronous Incremental Computation

It is also possible to build a framework to support incremental computation upon other systems, such as Hadoop and Spark. To demonstrate the efficiency of GraphIn, we compare it with both Hadoop and Spark for the \(1\,\%\) of vertices with edge change scenario. We restrict our performance comparison to PageRank, since it is a representative graph algorithm. For fair comparison, we instruct both systems to use the prior result as the starting point. For Hadoop, if there is no change in the input of some Map/Reduce tasks, we proportionally discount the running time. In this way, we can simulate task-level reusing, which is the key of MapReduce-based incremental processing frameworks. For Spark, we choose its Graphx [7] component to implement PageRank.

Fig. 3.
figure 3

PageRank on different frameworks.

Figure 3 shows that GraphIn (especially with selective execution) is much faster than Hadoop and Spark. Hadoop is a disk-based system and uses synchronous updates. Even though Spark is a memory-based system, it also utilizes synchronous updates. Therefore, it is still slower than GraphIn.

5.4 Scaling Performance

We further evaluate incremental computation on the large-scale Amazon cluster to test its scalability. We consider the \(1\,\%\) of vertices with edge change scenario, and concentrate on PageRank (and set the convergence tolerance \(\epsilon \) to \(10^{-4}\)). We first use the three large real-world graphs, LJ, UK, and IT (both UK and IT have tens of millions of vertices and a billion of edges), as input graphs when all the 50 instances are used. As shown in Fig. 4a (note that the y-axis is in log scale), on the large-scale cluster incremental computation is still much faster than re-computation from scratch, and both approaches can benefit from the selective execution scheduling.

Fig. 4.
figure 4

Performance on Amazon cluster.

We then show the performance of incremental computation when different numbers of instances are used. Figure 4b shows the convergence time on LJ as we increase the number of instances from 10 to 50. It can be seen that by increasing the number of instances, the convergence time is reduced, and that the selective execution scheduling is always faster than the round-robin scheduling.

6 Related Work

Due to the dynamic nature of graphs in real-world applications, incremental computation has been studied extensively. In terms of iterative graph algorithms, most of the studies [1, 11, 12] focus on PageRank. The basic idea behind approaches in [11, 12] is that when a change happens in the graph, the effect of the change on the PageRank scores is mostly local. These approaches start with the exact PageRank scores of the original graph but provide approximate scores for the graph after the change, and the estimations may drift away from the exact scores. On the contrary, our approach can provide exact scores. The work in [1] utilizes the Monte Carlo method to approximate PageRank scores on evolving graphs. It precomputes a number of random walk segments for each vertex and stores them in distributed shared memory. Besides of the approximate result, it also incurs high memory overhead.

In recent years, the growing scale and importance of graph data have driven the development of a number of distributed graph systems. Graphx [7] is a graph system built on top of Spark. It stores graphs as tabular data and implements graph operations using distributed joins. PrIter [25], Maiter [26], and Prom [20], introduce prioritized updates to accelerate convergence. PrIter is a MapReduce-based framework, which requires synchronous iterations. Maiter and Prom utilize asynchronous iterative computation. All these graph systems aim at supporting graph computation on static graph structures.

There are several systems for supporting incremental parallel processing on massive datasets. Incoop [3] extends the MapReduce programming model to support incremental processing. It saves and reuses states at the granularity of individual Map or Reduce tasks. Continuous bulk processing (CBP) [15] provides a groupwise processing operator to reuse prior state for incremental analysis. Similarly, other systems like DryadInc [17] support incremental processing by allowing their applications to reuse prior computation results. However, most of the studies focus on one-pass applications rather than iterative applications. Several recent studies address the need of incremental processing for iterative applications. Kineograph [6] constructs incremental snapshots of the evolving graph and supports reusing prior states in processing later snapshots. Naiad [16] presents a timely dataflow computational mode, which allows stateful computation and nested iterations. Spark Streaming [23] extends the cyclic batch dataflow of original Spark to allow dynamic modification of the dataflow and thus supports iteration and incremental processing. However, most of these systems apply synchronous updates to incremental computation. Our work illustrates how to efficiently apply asynchronous updates to incremental computation.

7 Conclusion

In this paper, we propose an approach to efficiently apply asynchronous updates to incremental computation on evolving graphs. Our approach works for a family of iterative graph algorithms. We also present a scheduling scheme, selective execution, to coordinate asynchronous updates so as to accelerate convergence. Furthermore, to facilitate the implementation of iterative graph algorithms with incremental computation in a distributed environment, we design and implement an asynchronous distributed framework, GraphIn. Our evaluation results show that our asynchronous incremental computation approach can significantly boost the performance.