skip to main content
10.1145/3472883.3486991acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Fast and Accurate Optimizer for Query Processing over Knowledge Graphs

Published: 01 November 2021 Publication History

Abstract

This paper presents Gpl, a fast and accurate optimizer for query processing over knowledge graphs. Gpl is novel in three ways. First, Gpl proposes a type-centric approach to enhance the accuracy of cardinality estimation prominently, which naturally embeds the correlation of multiple query conditions into the existing type system of knowledge graphs. Second, to predict execution time accurately, Gpl constructs a specialized cost model for graph exploration scheme and tunes the coefficients with target hardware platform and graph data. Third, Gpl further uses a budget-aware strategy for plan enumeration with a greedy heuristic to boost the overall performance (i.e., optimization time and execution time) for various workloads. Evaluations with representative knowledge graphs and query benchmarks show that Gpl can select optimal plans for 33 of 39 queries and only incurs less than 5% slowdown on average compared to optimal results. In contrast, the state-of-the-art optimizer and manually tuned results will cause 100% and 36% slowdown, respectively.

Supplementary Material

VTT File (Day3_Session10-Order3.vtt)
MP4 File (Day3_Session10-Order3.mp4)
Presentation video

References

[1]
2013. SPARQL 1.1 Query Language. https://www.w3.org/TR/sparql11-query/.
[2]
2014. Resource Description Framework (RDF). https://www.w3.org/RDF/.
[3]
2021. DBpedia's SPARQL Benchmark. http://aksw.org/Projects/DBPSB.
[4]
2021. Neo4j Cypher Query Language. https://neo4j.com/developer/cypher-query-language/.
[5]
2021. SWAT Projects - the Lehigh University Benchmark (LUBM). http://swat.cse.lehigh.edu/projects/lubm/.
[6]
2021. TigerGraph GSQL Query Language. htps://www.tigergraph.com/gsql/.
[7]
2021. Waterloo SPARQL Diversity Test Suite (WSDTS). https://dsg.uwaterloo.ca/watdiv/.
[8]
Ibrahim Abdelaziz, Razen Harbi, Semih Salihoglu, and Panos Kalnis. 2017. Combining vertex-centric graph processing with sparql for large-scale rdf data analytics. IEEE Transactions on Parallel and Distributed Systems 28, 12 (2017), 3374--3388.
[9]
Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A Relational Engine for Graph Processing. ACM Trans. Database Syst. 42, 4, Article 20 (2017).
[10]
Rana Alotaibi, Chuan Lei, Abdul Quamar, Vasilis Efthymiou, and Fatma Ozcan. 2021. Property Graph Schema Optimization for Domain-Specific Knowledge Graphs. In Proc. ICDE. 924--935.
[11]
Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed Evaluation of Subgraph Queries Using Worstcase Optimal Low Memory Dataflows. arXiv preprint arXiv:1802.03760 (2018).
[12]
Medha Atre, Vineet Chaoji, Mohammed J. Zaki, and James A. Hendler. 2010. Matrix "Bit" Loaded: A Scalable Lightweight Join Query Processor for RDF Data. In Proc. WWW.
[13]
Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proc. Usenix ATC. 49--60.
[14]
Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proc. OSDI, Vol. 8. 209--224.
[15]
Mariano P Consens and Alberto O Mendelzon. 1990. GraphLog: a Visual Formalism for Real Life Recursion. In Proc. PODS. 404--416.
[16]
J. F. da Trindade, K. Karanasos, C. Curino, S. Madden, and J. Shun. 2020. Kaskade: Graph Views for Efficient Graph Analytics. In Proc. ICDE. 193--204.
[17]
Andrey Gubichev and Thomas Neumann. 2014. Exploiting the Query Structure for Efficient Join Ordering in SPARQL Queries. In Proc. EDBT. 439--450.
[18]
Sairam Gurajada, Stephan Seufert, Iris Miliaraki, and Martin Theobald. 2014. TriAD: A Distributed Shared-nothing RDF Engine Based on Asynchronous Message Passing. In Proc. SIGMOD.
[19]
Ralf Hartmut Güting. 1994. GraphDB: Modeling and Querying Graphs in Databases. In Proc. VLDB, Vol. 94. 12--15.
[20]
Marc Gyssens, Jan Paredaens, Jan Van den Bussche, and Dirk Van Gucht. 1994. A Graph-oriented Object Database Model. IEEE Transactions on Knowledge & Data Engineering 4 (1994), 572--586.
[21]
Stephen Harris and Nigel Shadbolt. 2005. SPARQL Query Processing with Conventional Relational Database Systems. In Proc. WISE. 235--244.
[22]
Rakebul Hasan and Fabien Gandon. 2014. A Machine Learning Approach to SPARQL Query Performance Prediction. In Proc. WI-IAT. 266--273.
[23]
Huahai He and Ambuj K. Singh. 2008. Graphs-at-a-Time: Query Language and Access Methods for Graph Databases. In Proc. SIGMOD. 405--418.
[24]
Fuad Jamour, Ibrahim Abdelaziz, Yuanzhao Chen, and Panos Kalnis. 2019. Matrix Algebra Framework for Portable, Scalable and Efficient Query Engines for RDF Graphs. In Proc. EuroSys.
[25]
Pradeep Kumar and H. Howie Huang. 2016. G-Store: High-Performance Graph Store for Trillion-Edge Processing. In Proc. SC.
[26]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow. 9, 3 (Nov. 2015), 204--215.
[27]
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. 2007. Challenges in Parallel Graph Processing. PPL 17, 01 (2007), 5--20.
[28]
Hongbin Ma, Bin Shao, Yanghua Xiao, Liang Jeff Chen, and Haixun Wang. 2016. G-SQL: Fast Query Processing via Graph Exploration. Proc. VLDB Endow. 9, 12 (Aug. 2016), 900--911.
[29]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-case Optimal Joins. arXiv preprint arXiv:1903.02076 (2019).
[30]
Guido Moerkotte and Thomas Neumann. 2008. Dynamic Programming Strikes Back. In Proc. SIGMOD. 539--552.
[31]
Thomas Neumann and Guido Moerkotte. 2011. Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In Proc. ICDE. 984--994.
[32]
Thomas Neumann and Gerhard Weikum. 2008. RDF-3X: A RISC-style Engine for RDF. Proc. VLDB Endow. (2008).
[33]
Nikolaos Papailiou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. 2015. Graph-Aware, Workload-Adaptive SPARQL Query Caching. In Proc. SIGMOD.
[34]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and Tamer Özsu. 2017. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Proc. VLDB Endow. 11, 4 (Dec. 2017).
[35]
Lei Sheng, Z Meral Ozsoyoglu, and Gultekin Ozsoyoglu. 1999. A Graph Query Language and its Query Processing. In Proc. ICDE. 572--581.
[36]
Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. In Proc. OSDI.
[37]
Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. SQLGraph: An Efficient Relational-Based Property Graph Store. In Proc. SIGMOD.
[38]
Siyuan Wang, Chang Lou, Rong Chen, and Haibo Chen. 2018. Fast and Concurrent RDF Queries using RDMA-assisted GPU Graph Exploration. In Proc. Usenix ATC.
[39]
Youyang Yao, Jiaqi Li, and Rong Chen. 2018. Analysis and Improvement of Optimizer for Query Processing on Graph Store. In Proc. APSys. 6.
[40]
Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A Distributed Graph Engine for Web Scale RDF Data. In Proc. VLDB.
[41]
Yunhao Zhang, Rong Chen, and Haibo Chen. 2017. Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data. In Proc. SOSP.

Cited By

View all

Index Terms

  1. Fast and Accurate Optimizer for Query Processing over Knowledge Graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '21: Proceedings of the ACM Symposium on Cloud Computing
    November 2021
    685 pages
    ISBN:9781450386388
    DOI:10.1145/3472883
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Query optimization
    2. knowledge graphs

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SoCC '21
    Sponsor:
    SoCC '21: ACM Symposium on Cloud Computing
    November 1 - 4, 2021
    WA, Seattle, USA

    Acceptance Rates

    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 222
      Total Downloads
    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media