skip to main content
10.1145/2960414.2960419acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Time-evolving graph processing at scale

Published: 24 June 2016 Publication History

Abstract

Time-evolving graph-structured big data arises naturally in many application domains such as social networks and communication networks. However, existing graph processing systems lack support for efficient computations on dynamic graphs.
In this paper, we represent most computations on time evolving graphs into (1) a stream of consistent and resilient graph snapshots, and (2) a small set of operators that manipulate such streams of snapshots. We then introduce GraphTau, a time-evolving graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphTau quickly builds fault-tolerant graph snapshots as each small batch of new data arrives. GraphTau achieves high performance and fault tolerant graph stream processing via a number of optimizations. GraphTau also unifies data streaming and graph streaming processing. Our preliminary evaluations on two representative datasets show promising results. Besides performance benefit, GraphTau API relieves programmers from handling graph snapshot generation, windowing operators and sophisticated differential computation mechanisms.

References

[1]
T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. Millwheel: Fault-tolerant stream processing at internet scale. VLDB, 2013.
[2]
T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Fernández-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. VLDB, 2015.
[3]
Apache Flink. https://flink.apache.org.
[4]
Apache Giraph. http://giraph.apache.org.
[5]
P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, 2011.
[6]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004.
[7]
Z. Cai, D. Logothetis, and G. Siganos. Facilitating real-time graph mining. In CloudDB, 2012.
[8]
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In Eurosys, 2012.
[9]
J. Gonzalez, R. Xin, A. Dave, D. Crankshaw, and I. Franklin, Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, 2014.
[10]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.
[11]
W. Han, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V Prabhakaran, W. Chen, and E. Chen. Chronos: A graph engine for temporal graph analysis. In Eurosys, 2014.
[12]
A. Iyer, L. E. Li, and I. Stoica. Celliq: Real-time cellular network analytics at scale. NSDI, 2015.
[13]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In SOSP, 2013.
[14]
R. A. Rossi, B. Gallagher, J. Neville, and K. Henderson. Modeling dynamic behavior in large evolving graphs. In WSDM, 2013.
[15]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. SIGMOD, 2014.
[16]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
[17]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP, 2013.

Cited By

View all
  • (2024)D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural NetworksProceedings of the VLDB Endowment10.14778/3681954.368196117:11(2764-2777)Online publication date: 1-Jul-2024
  • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
  • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
  • Show More Cited By
  1. Time-evolving graph processing at scale

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    GRADES '16: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems
    June 2016
    54 pages
    ISBN:9781450347808
    DOI:10.1145/2960414
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    GRADES '16
    GRADES '16: Graph Data Management Experiences and Systems
    June 24, 2016
    California, Redwood Shores

    Acceptance Rates

    Overall Acceptance Rate 29 of 61 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural NetworksProceedings of the VLDB Endowment10.14778/3681954.368196117:11(2764-2777)Online publication date: 1-Jul-2024
    • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
    • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
    • (2024)GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCMIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344722243:11(3674-3684)Online publication date: Nov-2024
    • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
    • (2023)A Distributed Algorithm for Identifying Strongly Connected Components on Incremental Graphs2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00020(109-118)Online publication date: 17-Oct-2023
    • (2023)MAGMA: Proposing a Massive Historical Graph Management SystemAlgorithmic Aspects of Cloud Computing10.1007/978-3-031-33437-5_3(42-57)Online publication date: 26-May-2023
    • (2022)Anomaly Detection of Hydropower Units Based on Recurrent Neural NetworkFrontiers in Energy Research10.3389/fenrg.2022.85663510Online publication date: 7-Apr-2022
    • (2022)GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph StreamsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526146(325-339)Online publication date: 10-Jun-2022
    • (2022)GraphFly: Efficient Asynchronous Streaming Graphs Processing via Dependency-FlowSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00050(1-14)Online publication date: Nov-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media