research-article

From graphs to tables the design of scalable systems for graph analytics

Author:

Joseph E. GonzalezAuthors Info & Claims

WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

Pages 1149 - 1150

https://doi.org/10.1145/2567948.2580059

Published: 07 April 2014 Publication History

Abstract

From social networks to language modeling, the growing scale and importance of graph data has driven the development of new graph-parallel systems. In this talk, I will review the graph-parallel abstraction and describe how it can be used to express important machine learning and graph analytics algorithms like PageRank and Latent factor models. I will present how systems like GraphLab and Pregel exploit restrictions in the graph-parallel abstraction along with advances in distributed graph representation to efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. Unfortunately, the same restrictions that enable graph-parallel systems to achieve substantial performance gains also limit their ability to express many of the important stages in a typical graph-analytics pipeline. As a consequence, existing approaches to graph-analytics typically compose multiple systems through brittle and costly file interfaces. To fill the need for a holistic approach to graph-analytics we introduce GraphX, which unifies graph-parallel and data-parallel computation under a single API and system. I will show how a simple set of data-parallel operators can be used to express graph-parallel computation and how, by applying a collection of query optimizations derived from our work on graph-parallel systems, we can execute entire graph-analytics pipelines efficiently in a more general data-parallel distributed fault-tolerant system achieving performance comparable to specialized state-of-the-art systems.

References

[1]

D. J. Abadi et al. Sw-store: A vertically partitioned dbms for semantic web data management. VLDB'09.

Digital Library

[2]

P. Boldi et al. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW'11.

Digital Library

[3]

P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW'04.

Digital Library

[4]

J. Broekstra et al. Sesame: A generic architecture for storing and querying rdf and rdf schema. In ISWC 2002.

Digital Library

[5]

A. Buluç and J. R. Gilbert. The combinatorial blas: design, implementation, and applications. IJHPCA, 25(4):496--509, 2011.

Digital Library

[6]

U. V. Çatalyürek, C. Aykanat, and B. Uçar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput., 32(2):656--683, 2010.

Digital Library

[7]

R. Cheng et al. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys, 2012.

Digital Library

[8]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, 2004.

Digital Library

[9]

S. Ewen et al. Spinning fast iterative data flows. VLDB'12.

Digital Library

[10]

J. E. Gonzalez et al. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI '12.

Digital Library

[11]

N. Jain et al. Graphbuilder: Scalable graph etl framework. In GRADES '13.

Digital Library

[12]

Y. Low et al. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. VLDB'2012.

Digital Library

[13]

Y. Low et al. Graphlab: A new parallel framework for machine learning. In UAI, pages 340--349, 2010.

[14]

G. Malewicz et al. Pregel: a system for large-scale graph processing. In SIGMOD'10.

Digital Library

[15]

F. Manola and E. Miller. RDF primer. W3C Recommendation, 10:1--107, 2004.

[16]

D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In SOSP '13.

Digital Library

[17]

T. Neumann and G. Weikum. Rdf-3x: A risc-style engine for rdf. VLDB'08.

Digital Library

[18]

E. Prud'hommeaux and A. Seaborne. Sparql query language for rdf. Latest version available as http://www.w3.org/TR/rdf-sparql-query/, January 2008.

[19]

I. Robinson, J. Webber, and E. Eifrem. Graph Databases. O'Reilly Media, Incorporated, 2013.

Digital Library

[20]

A. Roy et al. X-stream: Edge-centric graph processing using streaming partitions. In SOSP '13.

Digital Library

[21]

P. Stutz, A. Bernstein, and W. Cohen. Signal/collect: graph algorithms for the (semantic) web. In ISWC, 2010.

Digital Library

[22]

R. S. Xin et al. Shark: SQL and Rich Analytics at Scale. In SIGMOD'13.

Digital Library

[23]

M. Zaharia et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI, 2012.

Digital Library

Cited By

Li JLin SHsu YHuang Y(2018)Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark PlatformsAdvances on Broadband and Wireless Computing, Communication and Applications10.1007/978-3-030-02613-4_21(237-249)Online publication date: 19-Oct-2018
https://doi.org/10.1007/978-3-030-02613-4_21
Salloum SDautov RChen XPeng PHuang J(2016)Big data analytics on Apache SparkInternational Journal of Data Science and Analytics10.1007/s41060-016-0027-91:3-4(145-164)Online publication date: 13-Oct-2016
https://doi.org/10.1007/s41060-016-0027-9

Index Terms

From graphs to tables the design of scalable systems for graph analytics

Recommendations

WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

The SDA workshop at WSDM 2015 is the fifth International Workshop on Scalable Data Analytics, following the previous four workshops of SDA respectively held at IEEE Big Data 2013, PAKDD 2014, IEEE Big Data 2014, and IEEE ICDM 2014. This series of ...
Rethinking Data-Intensive Science Using Scalable Analytics Systems
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

"Next generation" data acquisition technologies are allowing scientists to collect exponentially more data at a lower cost. These trends are broadly impacting many scientific fields, including genomics, astronomy, and neuroscience. We can attack the ...
Scalable graph-based OLAP analytics over process execution data

In today's knowledge-, service-, and cloud-based economy, businesses accumulate massive amounts of data from a variety of sources. In order to understand businesses one may need to perform considerable analytics over large hybrid collections of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

April 2014

1396 pages

ISBN:9781450327459

DOI:10.1145/2567948

General Chair:
Chin-Wan Chung
Korea Advanced Institute of Science and Technology, Korea
,
Program Chairs:
Andrei Broder
Google Inc., USA
,
Kyuseok Shim
Seoul National University, Korea
,
Torsten Suel
New York University, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '14

Sponsor:

IW3C2

WWW '14: 23rd International World Wide Web Conference

April 7 - 11, 2014

Seoul, Korea

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
222
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li JLin SHsu YHuang Y(2018)Implementation of an Alternating Least Square Model Based Collaborative Filtering Movie Recommendation System on Hadoop and Spark PlatformsAdvances on Broadband and Wireless Computing, Communication and Applications10.1007/978-3-030-02613-4_21(237-249)Online publication date: 19-Oct-2018
https://doi.org/10.1007/978-3-030-02613-4_21
Salloum SDautov RChen XPeng PHuang J(2016)Big data analytics on Apache SparkInternational Journal of Data Science and Analytics10.1007/s41060-016-0027-91:3-4(145-164)Online publication date: 13-Oct-2016
https://doi.org/10.1007/s41060-016-0027-9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten