skip to main content
research-article

Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins

Published: 29 May 2021 Publication History

Abstract

We study the problem of optimizing one-time and continuous subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing worst-case optimal plans is to pick an ordering of the query vertices to match. We make two main contributions:
1. A cost-based dynamic programming optimizer for one-time queries that (i) picks efficient query vertex orderings for worst-case optimal plans and (ii) generates hybrid plans that mix traditional binary joins with worst-case optimal style multiway intersections. In addition to our optimizer, we describe an adaptive technique that changes the query vertex orderings of the worst-case optimal subplans during query execution for more efficient query evaluation. The plan space of our one-time optimizer contains plans that are not in the plan spaces based on tree decompositions from prior work.
2. A cost-based greedy optimizer for continuous queries that builds on the delta subgraph query framework. Given a set of continuous queries, our optimizer decomposes these queries into multiple delta subgraph queries, picks a plan for each delta query, and generates a single combined plan that evaluates all of the queries. Our combined plans share computations across operators of the plans for the delta queries if the operators perform the same intersections. To increase the amount of computation shared, we describe an additional optimization that shares partial intersections across operators.
Our optimizers use a new cost metric for worst-case optimal plans called intersection-cost. When generating hybrid plans, our dynamic programming optimizer for one-time queries combines intersection-cost with the cost of binary joins. We demonstrate the effectiveness of our plans, adaptive technique, and partial intersection sharing optimization through extensive experiments. Our optimizers are integrated into GraphflowDB.

Supplementary Material

a6-mhedhbi-apndx.pdf (mhedhbi.zip)
Supplemental movie, appendix, image and software files for, Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins

References

[1]
Ibrahim Abdelaziz, Razen Harbi, Semih Salihoglu, Panos Kalnis, and Nikos Mamoulis. 2015. SPARTex: A vertex-centric framework for RDF data analytics. Proc. VLDB (2015).
[2]
Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A relational engine for graph processing. Trans. Database Syst. 42, 4, Article 20 (2017), 44 pages.
[3]
Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. 2001. Estimating the selectivity of XML path expressions for internet scale applications. Proc. VLDB (2001).
[4]
F. N. Afrati and J. D. Ullman. 2011. Optimizing multiway joins in a map-reduce environment. TKDE 23, 9 (2011), 1282--1298.
[5]
Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. 2012. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB (2012).
[6]
Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed evaluation of subgraph queries using worst-case optimal and low-memory dataflows. Proc. VLDB (2018).
[7]
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and implementation of the LogicBlox system. In SIGMOD’15.
[8]
A. Atserias, M. Grohe, and D. Marx. 2013. Size bounds and query plans for relational joins. SIAM J. Comput. 42, 4 (2013), 1737--1767.
[9]
Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact embedding cluster index for scalable subgraph matching. In SIGMOD’19.
[10]
Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient subgraph matching by postponing Cartesian products. In SIGMOD’16.
[11]
Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. 1986. Efficiently updating materialized views. SIGMOD Rec. 15, 2 (1986), 61--71.
[12]
Arezo Bodaghi and Babak Teimourpour. 2018. Automobile Insurance Fraud Detection Using Social Network Analysis.
[13]
Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In SIGMOD’00.
[14]
Sutanay Choudhury, Lawrence B. Holder, George Chin Jr., Khushbu Agarwal, and John Feo. 2015. A selectivity based approach to continuous pattern detection in streaming graphs. In EDBT’15.
[15]
Shumo Chu, Magdalena Balazinska, and Dan Suciu. 2015. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD’15.
[16]
Sophie Cluet and Guido Moerkotte. 1995. On the complexity of generating optimal left-deep processing trees with cross products.
[17]
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 1999. Performance evaluation of the VF graph matching algorithm. In ICIAP’99.
[18]
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 2004. A (sub)graph isomorphism algorithm for matching large graphs. Trans. Pattern Anal. Mach. Intell. (2004).
[19]
Nilesh N. Dalvi, Sumit K. Sanghai, Prasan Roy, and S. Sudarshan. 2001. Pipelining in multi-query optimization. In PODS’01.
[20]
Amol Deshpande, Zachary Ives, and Vijayshankar Raman. 2007. Adaptive query processing. Found. Trends Databases (2007).
[21]
Wenfei Fan, Jianzhong Li, Jizhou Luo, Zijing Tan, Xin Wang, and Yinghui Wu. 2011. Incremental graph pattern matching. In SIGMOD’11.
[22]
Goetz Graefe. 1994. Volcano—An extensible and parallel query evaluation system. Trans. Knowl. Data Eng. 6, 1 (1994), 120--135.
[23]
Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. 2014. Real-time twitter recommendation: Online motif detection in large dynamic graphs. Proc. VLDB (2014).
[24]
Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together. In SIGMOD’19.
[25]
Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD’13.
[26]
Mingsheng Hong, Mirek Riedewald, Christoph Koch, Johannes Gehrke, and Alan Demers. 2009. Rule-based multi-query optimization. In EDBT’09.
[27]
Muhammad Idris, Martin Ugarte, and Stijn Vansummeren. 2017. The dynamic Yannakakis algorithm: Compact and efficient query processing under updates. In SIGMOD’17.
[28]
Oren Kalinsky, Yoav Etsion, and Benny Kimelfeld. 2017. Flexible caching in Trie Joins. In EDBT’17.
[29]
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An active graph database. In SIGMOD’17.
[30]
Kim, Kyoungmin and Seo, In and Han, Wook-Shin and Lee, Jeong-Hoon and Hong, Sungpack and Chafi, Hassan and Shin, Hyungyu and Jeong, Geonhwa. 2018. TurboFlux: A fast continuous subgraph matching system for streaming graph data. In SIGMOD’18.
[31]
Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: Higher-order delta processing fordynamic, frequently fresh views. VLDB J. 23 (2014), 253--278.
[32]
Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly sharing for streamed aggregation. In SIGMOD’06.
[33]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In WWW’10.
[34]
Wangchao Le, Anastasios Kementsietsidis, Songyun Duan, and Feifei Li. 2012. Scalable multi-query optimization for SPARQL. In ICDE’12.
[35]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proc. VLDB (2015).
[36]
Daniel Lemire, Leonid Boytsov, and Nathan Kurz. 2016. SIMD compression and the intersection of sorted integers. Softw. Pract. Exper. 46, 6 (2016), 723--749.
[37]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.
[38]
Longbin Lai and Lu Qin and Xuemin Lin and Ying Zhang and Lijun Chang. 2016. Scalable distributed subgraph enumeration. In VLDB’16.
[39]
Angela Maduko, Kemafor Anyanwu, Amit Sheth, and Paul Schliekelman. 2008. Graph summaries for subgraph frequency estimation. The Semantic Web: Research and Applications (2008).
[40]
Maximum Common Induced Subgraph [n.d.]. Maximum Common Induced Subgraph. Retrieved from https://en.wikipedia.org/wiki/Maximum_common_induced_subgraph.
[41]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB (2019).
[42]
Hoshi Mistry, Prasan Roy, S. Sudarshan, and Krithi Ramamritham. 2001. Materialized view selection and maintenance using multi-query optimization. In SIGMOD’01.
[43]
neo4j [n.d.]. Retrieved from Neo4j. https://neo4j.com/.
[44]
neo4j:fraud [n.d.]. Fraud Detection: Discovering Connections with Graph Databases. Retrieved from https://neo4j.com/use-cases/fraud-detection.
[45]
Thomas Neumann. 2011. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB (2011).
[46]
Thomas Neumann and Bernhard Radke. 2018. Adaptive optimization of very large join queries. In SIGMOD’18.
[47]
Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. VLDBJ 19 (2010), 91--113.
[48]
M. E. J. Newman. 2004. Detecting community structure in networks. Eur. Phys. J. B (2004).
[49]
H. Ngo, C. Ré, and A. Rudra. 2014. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec. 42, 4 (2014), 5--16.
[50]
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-case optimal join algorithms. In PODS’12.
[51]
Dung T. Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Join processing for graph patterns: An old dog with new tricks. CoRR/1503.04169 (2015)
[52]
Dan Olteanu and Jakub Závodný. 2015. Size bounds for factorised representations of query results. Trans. Database Syst. 40, 1, Article 2 (2015).
[53]
opencypher [n.d.]. Retrieved from openCypher. http://www.opencypher.org.
[54]
Paraschos Koutris and Semih Salihoglu and Dan Suciu. 2018. Algorithmic aspects of parallel data processing. Found. Trends Databases (2018).
[55]
Andrea Pugliese, Matthias Bröcheler, V. S. Subrahmanian, and Michael Ovelgönne. 2014. Efficient multiview maintenance under insertion in huge social networks. ACM Trans. Web 8, 2, Article 10 (2014).
[56]
Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB (2018).
[57]
Rada Chirkova and Jun Yang. 2012. Materialized views. Found. Trends Databases (2012).
[58]
Xuguang Ren and Junhu Wang. 2016. Multi-query optimization for subgraph isomorphism search. Proc. VLDB (2016).
[59]
Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. 2000. Efficient and extensible algorithms for multi query optimization. In SIGMOD’00.
[60]
Michael Rudolf, Marcus Paradies, Christof Bornhövd, and Wolfgang Lehner. 2013. The graph story of the SAP HANA database. In BTW’13.
[61]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: Extended survey. VLDB J. 29 (2020), 595--618.
[62]
Timos K. Sellis. 1988. Multiple-query optimization. Trans. Database Syst. 13, 1 (1988), 23--52.
[63]
Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. Proc. VLDB (2008).
[64]
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. 2014. Parallel subgraph listing in a large-scale graph. In SIGMOD’14.
[65]
J. R. Ullmann. 1976. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42.
[66]
Todd L. Veldhuizen. 2012. Leapfrog Triejoin: A worst-case optimal join algorithm. CoRR/1210.0481 (2012).
[67]
Todd L. Veldhuizen. 2013. Incremental maintenance for Leapfrog Triejoin. CoRR/1303.5313 (2013).
[68]
Umeshwar Dayal, Jennifer Widom, and Stefano Ceri. 1994. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann Publishers Inc.
[69]
Xifeng Yan, Philip S. Yu, and Jiawei Han. 2004. Graph indexing: A frequent structure-based approach. In SIGMOD’04.
[70]
Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. Proc. VLDB (2013).
[71]
Peixiang Zhao, Jeffrey Xu Yu, and Philip S. Yu. 2007. Graph indexing: Tree + delta <= graph. In Proc. VLDB’07.
[72]
Jingren Zhou, Per-Ake Larson, Johann-Christoph Freytag, and Wolfgang Lehner. 2007. Efficient exploitation of similar subexpressions for query processing. In SIGMOD’07.

Cited By

View all
  • (2024)TC-Match: Fast Time-Constrained Continuous Subgraph MatchingProceedings of the VLDB Endowment10.14778/3681954.368196317:11(2791-2804)Online publication date: 30-Aug-2024
  • (2024)In-depth Analysis of Continuous Subgraph Matching in a Common Delta Query Compilation FrameworkProceedings of the ACM on Management of Data10.1145/36549502:3(1-27)Online publication date: 30-May-2024
  • (2024)Worst-Case-Optimal Similarity Joins on Graph DatabasesProceedings of the ACM on Management of Data10.1145/36392942:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 46, Issue 2
Best of PODS 2019 and Regular Papers
June 2021
182 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3468529
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2021
Accepted: 01 January 2021
Revised: 01 December 2020
Received: 01 January 2020
Published in TODS Volume 46, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Subgraph queries
  2. generic join
  3. worst-case optimal joins

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)13
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TC-Match: Fast Time-Constrained Continuous Subgraph MatchingProceedings of the VLDB Endowment10.14778/3681954.368196317:11(2791-2804)Online publication date: 30-Aug-2024
  • (2024)In-depth Analysis of Continuous Subgraph Matching in a Common Delta Query Compilation FrameworkProceedings of the ACM on Management of Data10.1145/36549502:3(1-27)Online publication date: 30-May-2024
  • (2024)Worst-Case-Optimal Similarity Joins on Graph DatabasesProceedings of the ACM on Management of Data10.1145/36392942:1(1-26)Online publication date: 26-Mar-2024
  • (2024)A Survey on Concurrent Processing of Graph Analytical Queries: Systems and AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339393636:11(5508-5528)Online publication date: Nov-2024
  • (2024)Wings: Efficient Online Multiple Graph Pattern Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00260(3013-3027)Online publication date: 13-May-2024
  • (2024)NewSP: A New Search Process for Continuous Subgraph Matching over Dynamic Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00257(3324-3337)Online publication date: 13-May-2024
  • (2024)Efficient Multi-Query Oriented Continuous Subgraph Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00250(3230-3243)Online publication date: 13-May-2024
  • (2024)CSM-TopK: Continuous Subgraph Matching with TopK Density Constraints2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00239(3084-3097)Online publication date: 13-May-2024
  • (2024)Large Subgraph Matching: A Comprehensive and Efficient Approach for Heterogeneous Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00231(2972-2985)Online publication date: 13-May-2024
  • (2024)Batch Hop-Constrained s-t Simple Path Query Processing in Large Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00201(2557-2569)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media