skip to main content
10.1145/2567948.2580059acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

From graphs to tables the design of scalable systems for graph analytics

Published: 07 April 2014 Publication History

Abstract

From social networks to language modeling, the growing scale and importance of graph data has driven the development of new graph-parallel systems. In this talk, I will review the graph-parallel abstraction and describe how it can be used to express important machine learning and graph analytics algorithms like PageRank and Latent factor models. I will present how systems like GraphLab and Pregel exploit restrictions in the graph-parallel abstraction along with advances in distributed graph representation to efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. Unfortunately, the same restrictions that enable graph-parallel systems to achieve substantial performance gains also limit their ability to express many of the important stages in a typical graph-analytics pipeline. As a consequence, existing approaches to graph-analytics typically compose multiple systems through brittle and costly file interfaces. To fill the need for a holistic approach to graph-analytics we introduce GraphX, which unifies graph-parallel and data-parallel computation under a single API and system. I will show how a simple set of data-parallel operators can be used to express graph-parallel computation and how, by applying a collection of query optimizations derived from our work on graph-parallel systems, we can execute entire graph-analytics pipelines efficiently in a more general data-parallel distributed fault-tolerant system achieving performance comparable to specialized state-of-the-art systems.

References

[1]
D. J. Abadi et al. Sw-store: A vertically partitioned dbms for semantic web data management. VLDB'09.
[2]
P. Boldi et al. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW'11.
[3]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW'04.
[4]
J. Broekstra et al. Sesame: A generic architecture for storing and querying rdf and rdf schema. In ISWC 2002.
[5]
A. Buluç and J. R. Gilbert. The combinatorial blas: design, implementation, and applications. IJHPCA, 25(4):496--509, 2011.
[6]
U. V. Çatalyürek, C. Aykanat, and B. Uçar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput., 32(2):656--683, 2010.
[7]
R. Cheng et al. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys, 2012.
[8]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, 2004.
[9]
S. Ewen et al. Spinning fast iterative data flows. VLDB'12.
[10]
J. E. Gonzalez et al. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI '12.
[11]
N. Jain et al. Graphbuilder: Scalable graph etl framework. In GRADES '13.
[12]
Y. Low et al. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. VLDB'2012.
[13]
Y. Low et al. Graphlab: A new parallel framework for machine learning. In UAI, pages 340--349, 2010.
[14]
G. Malewicz et al. Pregel: a system for large-scale graph processing. In SIGMOD'10.
[15]
F. Manola and E. Miller. RDF primer. W3C Recommendation, 10:1--107, 2004.
[16]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In SOSP '13.
[17]
T. Neumann and G. Weikum. Rdf-3x: A risc-style engine for rdf. VLDB'08.
[18]
E. Prud'hommeaux and A. Seaborne. Sparql query language for rdf. Latest version available as http://www.w3.org/TR/rdf-sparql-query/, January 2008.
[19]
I. Robinson, J. Webber, and E. Eifrem. Graph Databases. O'Reilly Media, Incorporated, 2013.
[20]
A. Roy et al. X-stream: Edge-centric graph processing using streaming partitions. In SOSP '13.
[21]
P. Stutz, A. Bernstein, and W. Cohen. Signal/collect: graph algorithms for the (semantic) web. In ISWC, 2010.
[22]
R. S. Xin et al. Shark: SQL and Rich Analytics at Scale. In SIGMOD'13.
[23]
M. Zaharia et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI, 2012.

Cited By

View all
  • (2018)Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark PlatformsAdvances on Broadband and Wireless Computing, Communication and Applications10.1007/978-3-030-02613-4_21(237-249)Online publication date: 19-Oct-2018
  • (2016)Big data analytics on Apache SparkInternational Journal of Data Science and Analytics10.1007/s41060-016-0027-91:3-4(145-164)Online publication date: 13-Oct-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web
April 2014
1396 pages
ISBN:9781450327459
DOI:10.1145/2567948
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed computation
  2. graph-parallel

Qualifiers

  • Research-article

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark PlatformsAdvances on Broadband and Wireless Computing, Communication and Applications10.1007/978-3-030-02613-4_21(237-249)Online publication date: 19-Oct-2018
  • (2016)Big data analytics on Apache SparkInternational Journal of Data Science and Analytics10.1007/s41060-016-0027-91:3-4(145-164)Online publication date: 13-Oct-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media