skip to main content
10.1145/3178487.3178508acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation

Published: 10 February 2018 Publication History

Abstract

Replicas 1 of a vertex play an important role in existing distributed graph processing systems which make a single vertex to be parallel processed by multiple machines and access remote neighbors locally without any remote access. However, replicas of vertices introduce data coherency problem. Existing distributed graph systems treat replicas of a vertex v as an atomic and indivisible vertex, and use an eager data coherency approach to guarantee replicas atomicity. In eager data coherency approach, any changes to vertex data must be immediately communicated to all replicas of v, thus leading to frequent global synchronizations and communications.
In this paper, we propose a lazy data coherency approach, called LazyAsync, which treats replicas of a vertex as independent vertices and maintains the data coherency by computations, rather than communications in existing eager approach. Our approach automatically selects some data coherency points from the graph algorithm, and maintains all replicas to share the same global view only at such points, which means the replicas are enabled to maintain different local views between any two adjacent data coherency points. Based on PowerGraph, we develop a distributed graph processing system LazyGraph to implement the LazyAsync approach and exploit graph-aware optimizations. On a 48-node EC2-like cluster, LazyGraph outperforms PowerGraph on four widely used graph algorithms across a variety of real-world graphs, with a speedup ranging from 1.25x to 10.69x.

References

[1]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (SIGMOD 2010), ACM, pp. 135--146, 2010.
[2]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed Graphlab: a Framework for Machine Learning and Data Mining in the Cloud. In Proceedings of the VLDB Endowment, pp. 716--727, 2012.
[3]
Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation (OSDI, 2012), pp. 17--30, 2012.
[4]
X. Zhu, W. Chen, W. Zheng and X. Ma. Gemini: A Computation-Centric Distributed Graph Processing System. In Proceedings of t 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 301--316, 2016.
[5]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin and I. Stoica. Graphx: Graph Processing in A Distributed Dataflow Framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI 2014), pp. 599--613, 2014.
[6]
C. AVERY. Giraph: Large-scale graph processing infrastructure on hadoop. In Proceedings of the Hadoop Summit, 2011.
[7]
B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on A Memory Cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013), ACM, pp. 505--516, 2013.
[8]
R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys 2015), 2015.
[9]
C. Xie, R. Chen, H. Guan, B. Zang, and H. Chen. Sync or Async: Time to Fuse for Distributed Graph-Parallel Computation. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015), 50(8), pp. 194--204, 2015.
[10]
A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP 2015), ACM, pp. 410--424, 2015.
[11]
S. Seo, E. J. Yoon, J. Kim, and S. Jin. Hama: An Efficient Matrix Computation with the Mapreduce Framework. In Proceedings of 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom 2010), IEEE, pp. 721--726, 2010.
[12]
D. Gregor, and A. Lumsdaine. The Parallel BGL: A Generic Library for Distributed Graph Computations. In Parallel Object-Oriented Scientific Computing, 2015.
[13]
I. Hoque, and I. Gupta. LFGraph: Simple and Fast Distributed Graph Analytics. In Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (SIGOPS 2013), 2013.
[14]
C. H. Teixeira, A. J. Fonseca, M. Serafini, G. Siganos, M. J. Zaki, and A. Aboulnaga. Arabesque: A System for Distributed Graph Mining. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP 2015). pp. 425--440, 2015.
[15]
D. Nguyen, A. Lenharth, and K. Pingali. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the TwentyFourth ACM Symposium on Operating Systems Principles (SOSP 2013), pp. 456--471, 2013.
[16]
J. Shun, and G. E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP 2013), 48(8), pp. 135--146, 2013.
[17]
N. Sundaram, N. Satish, M. M. A. Patwary, S. R. Dulloor, M. J. Anderson, S. G. Vadlamudi, D. Das, and P. Dubey. Graphmat: High performance graph analytics made productive. In Proceedings of the VLDB Endowment (VLDB 2015), 8(11), pp. 1214--1225, 2015.
[18]
K. Zhang, R. Chen, and H. Chen. NUMA-Aware Graph-Structured Analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015), pp. 183--193, 2015.
[19]
G. Wang, W. Xie, A. J. Demers, and J. Gehrke. Asynchronous Large-Scale Graph Processing Made Easy. In CIDR. 2013.
[20]
U. V. Catalyurek, and C. Aykanat. Decomposing Irregularly Sparse Matrices for Parallel Matrix Vector Multiplication. In Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR 1996), pp. 75--86, 1996.
[21]
N. Jain, G. Liao, and T. L. Willke. GraphBuilder: A Scalable Graph ETL Framework. In First International Workshop on Graph Data Management Experiences and Systems (GRADES 2013), 2013.
[22]
G. Karypis and V. Kumar. Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs. In Proceedings of the 1996 ACM/IEEE conference on Supercomputing, 41(2), pp. 278--300, 1999.
[23]
K. Schloegel, G. Karypis, and V. Kumar. Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing (Euro-Par 2000), pp. 296--310, 2000.
[24]
I. Stanton and G. Kliot. Streaming Graph Partitioning for Large Distributed Graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2012), pp 1222--1230, 2012.
[25]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM 2014), pp.333--342, 2014.
[26]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a Social Network or a News Media? In Proceedings of 19th International World-Wide Web Conference (WWW 2010), pp. 591--600, 2010.
[27]
P. Boldi, B. Codenotti, M. Santini, and S. Vigna. UbiCrawler: A Scalable Fully Distributed Web Crawler. In Journal of Software: Practice and Experience, 34(8), pp. 711--726, 2004.
[28]
H. Haselgrove. Wikipedia page-to-page link database. http://haselgrove.id.au/wikipedia.htm, 2010.
[29]
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On Compressing Social Networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2009), pp. 219--228, 2009.
[30]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, 6(1), pp. 29--123, 2009.
[31]
SNAP: Stanford Network Analysis Platform. snap.stanford.edu/snap/index.html
[32]
9th DIMACS Implementation Challenge. http://www.dis.uniroma1.it/challenge9/download.shtml.
[33]
F. Bourse, M. Lelarge, and M. Vojnovic. Balanced Graph Edge Partition. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD 2014), pp. 1456--1465, 2014.
[34]
M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou. Gram: Scaling Graph Computation to the Trillions. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SOCC 2015), pp. 408--421, 2015.
[35]
S. Hong, S. Depner, T. Manhardt, J. Van Der Lugt, M. Verstraaten, and H. Chafi. Pgx.d: A Fast Distributed Graph Processing Engine. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'15), 2015.
[36]
A. Quamar, A. Deshpande, and J. Lin. Nscale: Neighborhood-Centric Analytics on Large graphs. In Proceedings of the VLDB Endowment, pp. 1673--1676, 2014.
[37]
J. Shi, Y. Yao, R. Chen, H. Chen, and F. Li. Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration. In Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016.
[38]
D. Zheng, D. Mhembere, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. In Proceedings of 13th USENIX Conference on File and Storage Technologies (FAST15), pp. 45--58, 2015.
[39]
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the Pulse of a Fast-Changing and Connected World. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys 2012), pp. 85--98, 2012.
[40]
W. Han, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: A Graph Engine for Temporal Graph Analysis. In Proceedings of the Ninth European Conference on Computer Systems(EuroSys 2014), 2014.
[41]
U. Khurana and A. Deshpande. Efficient Snapshot Retrieval over Historical Graph Data. In Proceedings of 2013 IEEE 29th International Conference on Data Engineering (ICDE 2013), pp. 997--1008, 2013.
[42]
P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In Proceedings of IEEE 31st International Conference on Data Engineering (ICDE 2015), 2015.
[43]
M. Zhang, Y. Wu, K. Chen, X. Qian, X. Li, and W. Zheng. Exploring the Hidden Dimension in Graph Processing. In Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI16), 2016.
[44]
J. Zhong, and B. He. Medusa: Simplified Graph Processing on GPUs. In IEEE Transactions on Parallel and Distributed Systems (TPDS 2013). 25(6), pp.1543--1552, 2013.
[45]
J. Zhong, and B. He. Parallel Graph Processing on Graphics Processors Made Easy. In Proceedings of the VLDB Endowment (VLDB 2013), 2013.
[46]
The laboratory for web algorithmic. http://law.dsi.unimi.it/datasets.php.
[47]
D. G. Murray, F. Mcsherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A Timely Dataflow System. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP 2013), 2013.
[48]
R. Chen, X. Ding, P. Wang, H. Chen, B. Zang, and H. Guan. Computation and Communication Efficient Graph Processing with Distributed Immutable View. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing (HPDC 2014), 2014.
[49]
S. Brin, and L. Page. The Anatomy of A Large-Scale Hypertextual Web Search Engine. In Proceedings of Seventh International World-Wide Web Conference (WWW 1998), 1998.
[50]
Gonzalez J. E., Low Y., Guestrin C., and O'HALLARON, D. Distributed Parallel Inference on Large Factor Graphs. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI 2009). pp. 203--212, 2009.
[51]
M. Han, and K. Daudjee Giraph. Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems. In Proceedings of the VLDB Endowment. 2015.
[52]
X. Ju, H. Jamjoom, K. G. Shin. Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merg. In Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2017.
[53]
R. R. McCune, T. Weninger, and G. Madey. Thinking Like a Vertex: a Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing. In ACM Computing Surveys, 48(2), 2015.
[54]
X. Shi, X. Luo, J. Liang, P. Zhao, S. Di, B. He, H. Jin. Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model. In IEEE Transactions on Knowledge and Data Engineering, 30 (1), pp 29--42, 2018.
[55]
L. Wang, F. Yang, L. Zhuang, H. Cui, F. Lv, X. Feng. Articulation Points Guided Redundancy Elimination for Betweenness Centrality. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2016), 51(8), 2016.
[56]
J. Zhao, H. Cui, J. Xue, X. Feng, Y. Yan, and W. Yang. An Empirical Model for Predicting Cross-core Performance Interference on Multicore Processors. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT 2013), pp. 201--212, 2013.

Cited By

View all
  • (2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
  • (2021)DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00039(371-384)Online publication date: Feb-2021
  • (2021)Sharing non‐cache‐coherent memory with bounded incoherenceConcurrency and Computation: Practice and Experience10.1002/cpe.641434:2Online publication date: Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2018
442 pages
ISBN:9781450349826
DOI:10.1145/3178487
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 1
    PPoPP '18
    January 2018
    426 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3200691
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed graph-parallel computation
  2. execution model
  3. lazy data coherency

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science of china
  • National Key Research and Development Program of China

Conference

PPoPP '18

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 20-Jan-2024
  • (2021)DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00039(371-384)Online publication date: Feb-2021
  • (2021)Sharing non‐cache‐coherent memory with bounded incoherenceConcurrency and Computation: Practice and Experience10.1002/cpe.641434:2Online publication date: Jun-2021
  • (2020)Bounded incoherenceProceedings of the Eleventh International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3380536.3380541(1-10)Online publication date: 22-Feb-2020
  • (2020)Optimizing ordered graph algorithms with GraphItProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377909(158-170)Online publication date: 22-Feb-2020
  • (2020)Practical parallel hypergraph algorithmsProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374527(232-249)Online publication date: 19-Feb-2020
  • (2019)PowerLyraACM Transactions on Parallel Computing (TOPC)10.1145/32989895:3(1-39)Online publication date: 22-Jan-2019
  • (2019)DiGraphProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304029(601-614)Online publication date: 4-Apr-2019
  • (2018)GraPUProceedings of the ACM Symposium on Cloud Computing10.1145/3267809.3267811(301-312)Online publication date: 11-Oct-2018
  • (2018)Cymbalo: An Efficient Graph Processing Framework for Machine Learning2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00090(572-579)Online publication date: Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media