Elsevier

Information Systems

Volume 70, October 2017, Pages 18-31
Information Systems

Efficient processing of shortest path queries in evolving graph sequences

https://doi.org/10.1016/j.is.2017.05.004Get rights and content

Abstract

In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graph-structured data be maintained for analytical processing. We call a historical evolving graph sequence an EGS. We observe that in many applications, graphs of an EGS are large and numerous, and they often exhibit much redundancy among them. We study the problem of efficient shortest path query processing on an EGS and put forward a solution framework called FVF. Two algorithms, namely, FVF-F and FVF-H, are proposed. While the FVF-F algorithm works on a sequence of flat graph clusters, the FVF-H algorithm works on a hierarchy of such clusters. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in shortest query processing on EGSs. Comparing FVF-F and FVF-H, the latter gives a larger speedup, is more flexible in terms of memory requirements, and is far less sensitive to parameter values.

Introduction

Graphs are a pervasive structure that is used to model the state of the world in many real-life applications. For example, users and their relationships in a social network (such as Facebook and Flickr) can be modeled as a graph, with vertices representing users and edges representing friendships among users. In a dynamic world, such relationships are continuously evolving. For example, users join Facebook and friendships are established. A graph that models the world can only capture the world’s state at a particular instant, or just a “snapshot” of the world. To fully capture the dynamics of the world, we propose that a sizable collection of snapshots should be used. For example, snapshots of the Facebook graph should be taken periodically, forming a sequence of snapshot graphs. We call such a sequence an Evolving Graph Sequence or EGS for short. One can interrogate an EGS with many interesting graph-based queries that characterize the world the snapshot graphs depict. For example, given two vertices u and v, “What is the most popular shortest path that connects u to v among all the snapshots in an EGS?” “How does the centrality of a node evolve in an EGS?” Queries of this sort, which are hardly meaningful when applied to a single graph, provide valuable insights into the dynamic world.

Many traditional studies on graph queries and algorithms, such as shortest-path and reachability, focus on answering queries efficiently on a single graph. Different from those previous works, our goal is to efficiently answer queries on a large sequence of related graphs given in an EGS. We are particularly interested in those applications in which the snapshot graphs are large, numerous, and gradually evolving. The first two properties call for highly efficient algorithms to deal with the large amount of graph data. The third property implies that successive snapshots in a graph sequence are likely similar to each other. This property allows techniques that exploit redundancies among similar snapshots to be developed to achieve high efficiency.

To further illustrate how real-world information can be modeled as an EGS, consider social networks, such as Facebook, Youtube and Flickr. In a social network, people connect to and interact with others to share their interests and experience. Social network analysis (SNA) [1] is a research area whose aims are to capture the various interactions among users and to understand users’ behavior. In most studies, the interactions among users are modeled as graphs. For example, an edge between two (user) vertices could model a simple and somewhat static friendship relationship; or an edge can be used to model a more dynamic interaction, such as whether two users have communicated within a certain period of time, or whether they have written about the same topic. Numerous statistical measures have been defined and studied on social network graphs. These include global (graph-based) measures such as the diameters, radii, and degree distributions of the graphs; and local (vertex-based) measures such as centrality. To understand the dynamics of social networks, daily snapshots of various social networks have been collected [2], [3]. For example, an EGS of a few hundred daily snapshots of Facebook’s New Orleans regional network is publicly available [3]. The graphs are big (e.g., the last snapshot contains about 60,000 vertices and about 900,000 edges), numerous (hundreds and many more if the collection were continued), and gradually evolving (successive snapshots are very similar, sharing more than 99% of their edges). These snapshot graphs facilitate many interesting studies, particularly trend analysis, which discovers pattern of change over time. For example, Fig. 1 plots the shortest-path distances between two Facebook users over a one-year period (365 snapshots). This plot gives interesting insights to how “friendships” are established in the social network — we see that the users were disconnected until snapshot #178. Since then, they got closer to each other until they finally became friends at snapshot #365. This plot reveals a few key moments in their friendship development (at snapshots #186, #304, #365). By analyzing the changes in their shortest paths at those key moments and the snapshot graphs surrounding those moments, we can answer some interesting questions: Did the users get closer because of a completely new (and shorter) path appeared, which was “disjoint” from the previously shortest path (see Fig. 2(a))? Or was it because a “short-circuiting bridge” was established (Fig. 2(b))? Or was it because a new user had arrived that acted as a “common friend” of some users along the previously shortest path (Fig. 2(c))? How “important” was this common friend in the network and how does its importance evolve over time? (E.g., how does its centrality evolve across snapshots?) Furthermore, how does the diameter of the friendship network evolve over time? All these queries may provide interesting insights to social network analysts.

In the preliminary version of this paper [4], we presented a solution framework, FVF, for evaluating queries of this kind. This paper is an extension of [4]. In this version, we focus on using the FVF solution framework to efficiently answer shortest path queries on an EGS. We show how an EGS can be effectively partitioned into graph clusters, and how these clusters can be arranged in a hierarchical structure. We extend the FVF-F algorithm, which works with a sequence of flat graph clusters to the FVF-H algorithm, which answers shortest path queries based on the cluster hierarchy. We analyze the FVF-H algorithm and show the various advantages it has over the FVF-F algorithm. In particular, we show that FVF-H offers larger speedups, is much less sensitive to parameter values, and is highly flexible in terms of memory requirements.

The remainder of this paper is organized as follows. In Section 2, we introduce the FVF framework. Sections 3 and 4 respectively present the details of using the FVF framework to pre-process an EGS and to answer shortest path queries on an EGS. In Section 5, we present issues related to storing EGS. In Section 6, we present the FVF-H algorithm. In Sections 7 and 8, we present a case study and the experimental results, respectively. In Section 9, we discuss related work. In Section 10, we conclude the paper.

Section snippets

Find-Verify-Fix (FVF) Framework

Given an evolving graph sequence EGS=G1,,Gn, where each Gi is a directed graph, and a shortest path query Q=(u,v), our objective is to efficiently answer Q (i.e., find a shortest path from u to v) on each snapshot in the EGS. In this paper, we focus on directed graphs. Applying our solutions to undirected graphs is straightforward and therefore we will skip the discussion on undirected graphs.

To support efficient EGS shortest path query processing, we propose a Find-Verify-and-Fix (FVF)

Preprocessing

The preprocessing phase of FVF consists of two steps. First, we cluster similar snapshot graphs together. Then, we extract two representative graphs from each cluster. We first discuss the second step assuming that clustering has already been done (Section 3.1). Then, we discuss the graph clustering procedure (Section 3.2).

Shortest path query processing on EGS

In this section, we describe two baseline algorithms and one algorithm based on FVF to find the shortest path between a pair of vertices u and v for each snapshot Gi in an EGS.

Storing EGS

An EGS typically consists of a large number of big graphs. The storage requirement of an EGS could be big. In this section we discuss a few storage models for compressing EGS data that (1) are space efficient so that the compressed data is likely small enough to be stored in main memory and (2) can efficiently support the applications of the pruning lemmas of the FVF algorithm.

Hierarchical clustering

We note that the clustering parameter α has a significant impact on the performance of FVF. A larger α implies that G ∩  and G ∪  of a cluster are more similar, which then implies a tighter bounding requirement. This results in fewer snapshots per cluster and more clusters are formed from an EGS. When a cluster has too few graphs, say, only one snapshot, FVF would be degenerated to the NAIVE algorithm that executes an SPA on each snapshot. On the other hand, when a cluster has many graphs, say,

Case study

In this section we report a case study that demonstrates how interesting analytical queries can be answered by EGS processing. Fig. 1 shows how the shortest-path distance between two users 7058 and 7871 in the Facebook friendship graph changed over a one-year period before they finally became friends.3

Experimental evaluation

We evaluate the FVF framework on both real datasets and synthetic datasets. We call the FVF algorithm based on flat clustering as FVF-F and the one based on hierarchical clustering as FVF-H. In the experiments, we use BFS as SPA, and Frigioni’s algorithm [7] as DSPA. All algorithms are implemented in C++. All experiments are run on a Linux machine with 2.83 GHz Dual Core Intel(R) with 4GB of memory.

Table 2 shows the characteristics of three real EGS datasets: Internet, Flickr, FBfriend.4

Related work

In recent years, a plethora of work has focused on efficient algorithms and data structures for evaluating distance-based queries (e.g., shortest-path queries) and reachability queries [9] on a very large graph. For example, in order to efficiently evaluate shortest-path queries, various shortest-path indices have been developed (e.g., [10], [11], [12]). With those indices, a shortest-path query could be evaluated without accessing vertices that are irrelevant to the results. Other than

Conclusions

In domains like social networks, data evolution could be captured by a sequence of graphs. Graphs of this kind are usually large, numerous, and gradually evolving. We capture these evolving graphs in Evolving Graph Sequences (EGSs). In this paper, we demonstrated that interesting information can be obtained by posing queries on the various EGSs and we discussed how to store and query them in the context of shortest path queries. Our case study shows that interesting information can be unveiled

Acknowledgments

This work is partly supported by the Research Grants Council of Hong Kong (GRF 521012, 15200715, 15204116, 17254016, 17229116 and 17205115) and Research Committee of CUHK.

References (28)

  • D. Frigioni et al.

    Fully dynamic algorithms for maintaining shortest paths trees

    J. Algorithms

    (2000)
  • S. Wasserman et al.

    Social Network Analysis: Methods and Applications.

    (1994)
  • A. Mislove et al.

    Growth of the Flickr social network

    SIGCOMM Workshop on Social Networks (WOSN’08)

    (2008)
  • B. Viswanath et al.

    On the evolution of user interaction in Facebook

    SIGCOMM Workshop on Social Networks

    (2009)
  • C. Ren et al.

    On querying historical evolving graph sequences

    PVLDB

    (2011)
  • E. Chan et al.

    Shortest path tree computation in dynamic graphs

    IEEE Trans. Comput.

    (2009)
  • D. Eppstein, Z. Galil, G.F. Italiano, Dynamic graph algorithms,...
  • A.L. Barabási et al.

    Emergence of scaling in random networks

    Science

    (1999)
  • R. Jin et al.

    3-hop: a high-compression indexing scheme for reachability query

    SIGMOD Conference

    (2009)
  • F. Wei

    TEDI: efficient shortest path query answering on graphs

    SIGMOD Conference

    (2010)
  • Y. Xiao et al.

    Efficiently indexing shortest paths by exploiting symmetry in graphs

    EDBT

    (2009)
  • P. Zhao et al.

    On graph query optimization in large networks

    PVLDB

    (2010)
  • A.V. Goldberg et al.

    Computing the shortest path: A* search meets graph theory

    SODA

    (2005)
  • J. Maue et al.

    Goal-directed shortest-path queries using precomputed cluster distances

    ACM J. Exp. Algorithmics

    (2009)
  • Cited by (6)

    View full text