skip to main content
10.1145/2960414.2960416acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

GraphFrames: an integrated API for mixing graph and relational queries

Published: 24 June 2016 Publication History

Abstract

Graph data is prevalent in many domains, but it has usually required specialized engines to analyze. This design is onerous for users and precludes optimization across complete workflows. We present GraphFrames, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them. GraphFrames generalize the ideas in previous graph-on-RDBMS systems, such as GraphX and Vertexica, by letting the system materialize multiple views of the graph (not just the specific triplet views in these systems) and executing both iterative algorithms and pattern matching using joins. To make applications easy to write, GraphFrames provide a concise, declarative API based on the "data frame" concept in R that can be used for both interactive queries and standalone programs. Under this API, GraphFrames use a graph-aware join optimization algorithm across the whole computation that can select from the available views.
We implement GraphFrames over Spark SQL, enabling parallel execution on Spark and integration with custom code. We find that GraphFrames make it easy to express end-to-end workflows and match or exceed the performance of standalone tools, while enabling optimizations across workflow steps that cannot occur in current systems. In addition, we show that GraphFrames' view abstraction makes it easy to further speed up interactive queries by registering the appropriate view, and that the combination of graph and relational data allows for other optimizations, such as attribute-aware partitioning.

References

[1]
GraphFrames: DataFrame-based graphs. https://github.com/graphframes/graphframes, Apr. 2016.
[2]
Afrati, F. N., et al. GYM: a multiround join algorithm in MapReduce. CoRR abs/1410.4156 (2014).
[3]
Apache Spark. Spark IndexedRDD: An efficient updatable key-value store for Apache Spark. https://github.com/amplab/spark-indexedrdd, 2015.
[4]
Armbrust, M., et al. Spark SQL: relational data processing in Spark. In SIGMOD (2015).
[5]
Chirkova, R., et al. A formal perspective on the view selection problem. VLDB 2002.
[6]
Chu, S., et al. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD (2015).
[7]
Gonzalez, J., et al. GraphX: Graph processing in a distributed dataflow framework. In OSDI (2014).
[8]
Huang, J., et al. Query optimization of distributed pattern matching. In ICDE (2014).
[9]
Jindal, A., et al. Vertexica: Your relational friend for graph analytics! VLDB (2014).
[10]
Leskovec, J., et al. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[11]
Low, Y., et al. GraphLab: A new framework for parallel machine learning. In UAI (2010).
[12]
Malewicz, G., et al. Pregel: A system for large-scale graph processing. In SIGMOD (2010).
[13]
McAuley, J., et al. Inferring networks of substitutable and complementary products. In KDD (2015).
[14]
Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In SIGMOD (2008).
[15]
Rodriguez, M. A. The Gremlin graph traversal machine and language. CoRR abs/1508.03843 (2015).
[16]
Titan distributed graph database. http://thinkaurelius.github.io/titan/.
[17]
Webber, J. A programmatic introduction to Neo4j. In SPLASH (2012).

Cited By

View all
  • (2025)Topology-Agnostic Detection of Temporal Money Laundering Flows in Billion-Scale TransactionsMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74643-7_29(402-419)Online publication date: 1-Jan-2025
  • (2024)Large-Scale Graphs Community Detection using Spark GraphFrames2024 23rd International Symposium on Parallel and Distributed Computing (ISPDC)10.1109/ISPDC62236.2024.10705389(1-5)Online publication date: 8-Jul-2024
  • (2024)PCCMerge: a parallel method based on merging partial connected components in large graphs2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825757(4068-4077)Online publication date: 15-Dec-2024
  • Show More Cited By
  1. GraphFrames: an integrated API for mixing graph and relational queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    GRADES '16: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems
    June 2016
    54 pages
    ISBN:9781450347808
    DOI:10.1145/2960414
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    GRADES '16
    GRADES '16: Graph Data Management Experiences and Systems
    June 24, 2016
    California, Redwood Shores

    Acceptance Rates

    Overall Acceptance Rate 29 of 61 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Topology-Agnostic Detection of Temporal Money Laundering Flows in Billion-Scale TransactionsMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74643-7_29(402-419)Online publication date: 1-Jan-2025
    • (2024)Large-Scale Graphs Community Detection using Spark GraphFrames2024 23rd International Symposium on Parallel and Distributed Computing (ISPDC)10.1109/ISPDC62236.2024.10705389(1-5)Online publication date: 8-Jul-2024
    • (2024)PCCMerge: a parallel method based on merging partial connected components in large graphs2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825757(4068-4077)Online publication date: 15-Dec-2024
    • (2024)Knowledge graph based reasoning in medical image analysisComputers in Biology and Medicine10.1016/j.compbiomed.2024.109100182:COnline publication date: 1-Nov-2024
    • (2023)Presto: A Decade of SQL Analytics at MetaProceedings of the ACM on Management of Data10.1145/35897691:2(1-25)Online publication date: 20-Jun-2023
    • (2023)SparkAC: Fine-Grained Access Control in Spark for Secure Data Sharing and AnalyticsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.314954420:2(1104-1123)Online publication date: 1-Mar-2023
    • (2023)Extracting Graphs Properties with Semantic Joins2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00175(2262-2275)Online publication date: Apr-2023
    • (2022)ITeM: Independent temporal motifs to summarize and compare temporal networksIntelligent Data Analysis10.3233/IDA-20569826:4(1071-1096)Online publication date: 11-Jul-2022
    • (2022)ChukonuProceedings of the VLDB Endowment10.14778/3503585.350359615:4(872-885)Online publication date: 14-Apr-2022
    • (2022)Distributed hop-constrained s-t simple path enumeration at billion scaleProceedings of the VLDB Endowment10.14778/3489496.348949915:2(169-182)Online publication date: 4-Feb-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media