skip to main content
research-article

Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins

Published:29 May 2021Publication History
Skip Abstract Section

Abstract

We study the problem of optimizing one-time and continuous subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing worst-case optimal plans is to pick an ordering of the query vertices to match. We make two main contributions:

1. A cost-based dynamic programming optimizer for one-time queries that (i) picks efficient query vertex orderings for worst-case optimal plans and (ii) generates hybrid plans that mix traditional binary joins with worst-case optimal style multiway intersections. In addition to our optimizer, we describe an adaptive technique that changes the query vertex orderings of the worst-case optimal subplans during query execution for more efficient query evaluation. The plan space of our one-time optimizer contains plans that are not in the plan spaces based on tree decompositions from prior work.

2. A cost-based greedy optimizer for continuous queries that builds on the delta subgraph query framework. Given a set of continuous queries, our optimizer decomposes these queries into multiple delta subgraph queries, picks a plan for each delta query, and generates a single combined plan that evaluates all of the queries. Our combined plans share computations across operators of the plans for the delta queries if the operators perform the same intersections. To increase the amount of computation shared, we describe an additional optimization that shares partial intersections across operators.

Our optimizers use a new cost metric for worst-case optimal plans called intersection-cost. When generating hybrid plans, our dynamic programming optimizer for one-time queries combines intersection-cost with the cost of binary joins. We demonstrate the effectiveness of our plans, adaptive technique, and partial intersection sharing optimization through extensive experiments. Our optimizers are integrated into GraphflowDB.

Skip Supplemental Material Section

Supplemental Material

References

  1. Ibrahim Abdelaziz, Razen Harbi, Semih Salihoglu, Panos Kalnis, and Nikos Mamoulis. 2015. SPARTex: A vertex-centric framework for RDF data analytics. Proc. VLDB (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A relational engine for graph processing. Trans. Database Syst. 42, 4, Article 20 (2017), 44 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. 2001. Estimating the selectivity of XML path expressions for internet scale applications. Proc. VLDB (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. N. Afrati and J. D. Ullman. 2011. Optimizing multiway joins in a map-reduce environment. TKDE 23, 9 (2011), 1282--1298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. 2012. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed evaluation of subgraph queries using worst-case optimal and low-memory dataflows. Proc. VLDB (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and implementation of the LogicBlox system. In SIGMOD’15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Atserias, M. Grohe, and D. Marx. 2013. Size bounds and query plans for relational joins. SIAM J. Comput. 42, 4 (2013), 1737--1767.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact embedding cluster index for scalable subgraph matching. In SIGMOD’19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient subgraph matching by postponing Cartesian products. In SIGMOD’16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. 1986. Efficiently updating materialized views. SIGMOD Rec. 15, 2 (1986), 61--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Arezo Bodaghi and Babak Teimourpour. 2018. Automobile Insurance Fraud Detection Using Social Network Analysis.Google ScholarGoogle Scholar
  13. Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In SIGMOD’00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sutanay Choudhury, Lawrence B. Holder, George Chin Jr., Khushbu Agarwal, and John Feo. 2015. A selectivity based approach to continuous pattern detection in streaming graphs. In EDBT’15.Google ScholarGoogle Scholar
  15. Shumo Chu, Magdalena Balazinska, and Dan Suciu. 2015. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD’15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sophie Cluet and Guido Moerkotte. 1995. On the complexity of generating optimal left-deep processing trees with cross products.Google ScholarGoogle Scholar
  17. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 1999. Performance evaluation of the VF graph matching algorithm. In ICIAP’99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 2004. A (sub)graph isomorphism algorithm for matching large graphs. Trans. Pattern Anal. Mach. Intell. (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nilesh N. Dalvi, Sumit K. Sanghai, Prasan Roy, and S. Sudarshan. 2001. Pipelining in multi-query optimization. In PODS’01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Amol Deshpande, Zachary Ives, and Vijayshankar Raman. 2007. Adaptive query processing. Found. Trends Databases (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wenfei Fan, Jianzhong Li, Jizhou Luo, Zijing Tan, Xin Wang, and Yinghui Wu. 2011. Incremental graph pattern matching. In SIGMOD’11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Goetz Graefe. 1994. Volcano—An extensible and parallel query evaluation system. Trans. Knowl. Data Eng. 6, 1 (1994), 120--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. 2014. Real-time twitter recommendation: Online motif detection in large dynamic graphs. Proc. VLDB (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together. In SIGMOD’19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD’13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mingsheng Hong, Mirek Riedewald, Christoph Koch, Johannes Gehrke, and Alan Demers. 2009. Rule-based multi-query optimization. In EDBT’09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Muhammad Idris, Martin Ugarte, and Stijn Vansummeren. 2017. The dynamic Yannakakis algorithm: Compact and efficient query processing under updates. In SIGMOD’17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Oren Kalinsky, Yoav Etsion, and Benny Kimelfeld. 2017. Flexible caching in Trie Joins. In EDBT’17.Google ScholarGoogle Scholar
  29. Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An active graph database. In SIGMOD’17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kim, Kyoungmin and Seo, In and Han, Wook-Shin and Lee, Jeong-Hoon and Hong, Sungpack and Chafi, Hassan and Shin, Hyungyu and Jeong, Geonhwa. 2018. TurboFlux: A fast continuous subgraph matching system for streaming graph data. In SIGMOD’18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: Higher-order delta processing fordynamic, frequently fresh views. VLDB J. 23 (2014), 253--278.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly sharing for streamed aggregation. In SIGMOD’06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In WWW’10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wangchao Le, Anastasios Kementsietsidis, Songyun Duan, and Feifei Li. 2012. Scalable multi-query optimization for SPARQL. In ICDE’12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proc. VLDB (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel Lemire, Leonid Boytsov, and Nathan Kurz. 2016. SIMD compression and the intersection of sorted integers. Softw. Pract. Exper. 46, 6 (2016), 723--749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  38. Longbin Lai and Lu Qin and Xuemin Lin and Ying Zhang and Lijun Chang. 2016. Scalable distributed subgraph enumeration. In VLDB’16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Angela Maduko, Kemafor Anyanwu, Amit Sheth, and Paul Schliekelman. 2008. Graph summaries for subgraph frequency estimation. The Semantic Web: Research and Applications (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Maximum Common Induced Subgraph [n.d.]. Maximum Common Induced Subgraph. Retrieved from https://en.wikipedia.org/wiki/Maximum_common_induced_subgraph.Google ScholarGoogle Scholar
  41. Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hoshi Mistry, Prasan Roy, S. Sudarshan, and Krithi Ramamritham. 2001. Materialized view selection and maintenance using multi-query optimization. In SIGMOD’01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. neo4j [n.d.]. Retrieved from Neo4j. https://neo4j.com/.Google ScholarGoogle Scholar
  44. neo4j:fraud [n.d.]. Fraud Detection: Discovering Connections with Graph Databases. Retrieved from https://neo4j.com/use-cases/fraud-detection.Google ScholarGoogle Scholar
  45. Thomas Neumann. 2011. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Thomas Neumann and Bernhard Radke. 2018. Adaptive optimization of very large join queries. In SIGMOD’18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. VLDBJ 19 (2010), 91--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. E. J. Newman. 2004. Detecting community structure in networks. Eur. Phys. J. B (2004).Google ScholarGoogle Scholar
  49. H. Ngo, C. Ré, and A. Rudra. 2014. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec. 42, 4 (2014), 5--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-case optimal join algorithms. In PODS’12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Dung T. Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Join processing for graph patterns: An old dog with new tricks. CoRR/1503.04169 (2015)Google ScholarGoogle Scholar
  52. Dan Olteanu and Jakub Závodný. 2015. Size bounds for factorised representations of query results. Trans. Database Syst. 40, 1, Article 2 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. opencypher [n.d.]. Retrieved from openCypher. http://www.opencypher.org.Google ScholarGoogle Scholar
  54. Paraschos Koutris and Semih Salihoglu and Dan Suciu. 2018. Algorithmic aspects of parallel data processing. Found. Trends Databases (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Andrea Pugliese, Matthias Bröcheler, V. S. Subrahmanian, and Michael Ovelgönne. 2014. Efficient multiview maintenance under insertion in huge social networks. ACM Trans. Web 8, 2, Article 10 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Rada Chirkova and Jun Yang. 2012. Materialized views. Found. Trends Databases (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xuguang Ren and Junhu Wang. 2016. Multi-query optimization for subgraph isomorphism search. Proc. VLDB (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. 2000. Efficient and extensible algorithms for multi query optimization. In SIGMOD’00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Michael Rudolf, Marcus Paradies, Christof Bornhövd, and Wolfgang Lehner. 2013. The graph story of the SAP HANA database. In BTW’13.Google ScholarGoogle Scholar
  61. Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: Extended survey. VLDB J. 29 (2020), 595--618.Google ScholarGoogle ScholarCross RefCross Ref
  62. Timos K. Sellis. 1988. Multiple-query optimization. Trans. Database Syst. 13, 1 (1988), 23--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. Proc. VLDB (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. 2014. Parallel subgraph listing in a large-scale graph. In SIGMOD’14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. J. R. Ullmann. 1976. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Todd L. Veldhuizen. 2012. Leapfrog Triejoin: A worst-case optimal join algorithm. CoRR/1210.0481 (2012).Google ScholarGoogle Scholar
  67. Todd L. Veldhuizen. 2013. Incremental maintenance for Leapfrog Triejoin. CoRR/1303.5313 (2013).Google ScholarGoogle Scholar
  68. Umeshwar Dayal, Jennifer Widom, and Stefano Ceri. 1994. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Xifeng Yan, Philip S. Yu, and Jiawei Han. 2004. Graph indexing: A frequent structure-based approach. In SIGMOD’04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. Proc. VLDB (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Peixiang Zhao, Jeffrey Xu Yu, and Philip S. Yu. 2007. Graph indexing: Tree + delta <= graph. In Proc. VLDB’07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Jingren Zhou, Per-Ake Larson, Johann-Christoph Freytag, and Wolfgang Lehner. 2007. Efficient exploitation of similar subexpressions for query processing. In SIGMOD’07. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 46, Issue 2
          Best of PODS 2019 and Regular Papers
          June 2021
          182 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3468529
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 May 2021
          • Accepted: 1 January 2021
          • Revised: 1 December 2020
          • Received: 1 January 2020
          Published in tods Volume 46, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format