Abstract
We study the problem of optimizing one-time and continuous subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing worst-case optimal plans is to pick an ordering of the query vertices to match. We make two main contributions:
1. A cost-based dynamic programming optimizer for one-time queries that (i) picks efficient query vertex orderings for worst-case optimal plans and (ii) generates hybrid plans that mix traditional binary joins with worst-case optimal style multiway intersections. In addition to our optimizer, we describe an adaptive technique that changes the query vertex orderings of the worst-case optimal subplans during query execution for more efficient query evaluation. The plan space of our one-time optimizer contains plans that are not in the plan spaces based on tree decompositions from prior work.
2. A cost-based greedy optimizer for continuous queries that builds on the delta subgraph query framework. Given a set of continuous queries, our optimizer decomposes these queries into multiple delta subgraph queries, picks a plan for each delta query, and generates a single combined plan that evaluates all of the queries. Our combined plans share computations across operators of the plans for the delta queries if the operators perform the same intersections. To increase the amount of computation shared, we describe an additional optimization that shares partial intersections across operators.
Our optimizers use a new cost metric for worst-case optimal plans called intersection-cost. When generating hybrid plans, our dynamic programming optimizer for one-time queries combines intersection-cost with the cost of binary joins. We demonstrate the effectiveness of our plans, adaptive technique, and partial intersection sharing optimization through extensive experiments. Our optimizers are integrated into GraphflowDB.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins
- Ibrahim Abdelaziz, Razen Harbi, Semih Salihoglu, Panos Kalnis, and Nikos Mamoulis. 2015. SPARTex: A vertex-centric framework for RDF data analytics. Proc. VLDB (2015). Google ScholarDigital Library
- Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A relational engine for graph processing. Trans. Database Syst. 42, 4, Article 20 (2017), 44 pages. Google ScholarDigital Library
- Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. 2001. Estimating the selectivity of XML path expressions for internet scale applications. Proc. VLDB (2001). Google ScholarDigital Library
- F. N. Afrati and J. D. Ullman. 2011. Optimizing multiway joins in a map-reduce environment. TKDE 23, 9 (2011), 1282--1298. Google ScholarDigital Library
- Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. 2012. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB (2012). Google ScholarDigital Library
- Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed evaluation of subgraph queries using worst-case optimal and low-memory dataflows. Proc. VLDB (2018). Google ScholarDigital Library
- Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and implementation of the LogicBlox system. In SIGMOD’15. Google ScholarDigital Library
- A. Atserias, M. Grohe, and D. Marx. 2013. Size bounds and query plans for relational joins. SIAM J. Comput. 42, 4 (2013), 1737--1767.Google ScholarDigital Library
- Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact embedding cluster index for scalable subgraph matching. In SIGMOD’19. Google ScholarDigital Library
- Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient subgraph matching by postponing Cartesian products. In SIGMOD’16. Google ScholarDigital Library
- Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. 1986. Efficiently updating materialized views. SIGMOD Rec. 15, 2 (1986), 61--71. Google ScholarDigital Library
- Arezo Bodaghi and Babak Teimourpour. 2018. Automobile Insurance Fraud Detection Using Social Network Analysis.Google Scholar
- Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In SIGMOD’00. Google ScholarDigital Library
- Sutanay Choudhury, Lawrence B. Holder, George Chin Jr., Khushbu Agarwal, and John Feo. 2015. A selectivity based approach to continuous pattern detection in streaming graphs. In EDBT’15.Google Scholar
- Shumo Chu, Magdalena Balazinska, and Dan Suciu. 2015. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD’15. Google ScholarDigital Library
- Sophie Cluet and Guido Moerkotte. 1995. On the complexity of generating optimal left-deep processing trees with cross products.Google Scholar
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 1999. Performance evaluation of the VF graph matching algorithm. In ICIAP’99. Google ScholarDigital Library
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 2004. A (sub)graph isomorphism algorithm for matching large graphs. Trans. Pattern Anal. Mach. Intell. (2004). Google ScholarDigital Library
- Nilesh N. Dalvi, Sumit K. Sanghai, Prasan Roy, and S. Sudarshan. 2001. Pipelining in multi-query optimization. In PODS’01. Google ScholarDigital Library
- Amol Deshpande, Zachary Ives, and Vijayshankar Raman. 2007. Adaptive query processing. Found. Trends Databases (2007). Google ScholarDigital Library
- Wenfei Fan, Jianzhong Li, Jizhou Luo, Zijing Tan, Xin Wang, and Yinghui Wu. 2011. Incremental graph pattern matching. In SIGMOD’11. Google ScholarDigital Library
- Goetz Graefe. 1994. Volcano—An extensible and parallel query evaluation system. Trans. Knowl. Data Eng. 6, 1 (1994), 120--135. Google ScholarDigital Library
- Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. 2014. Real-time twitter recommendation: Online motif detection in large dynamic graphs. Proc. VLDB (2014). Google ScholarDigital Library
- Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together. In SIGMOD’19. Google ScholarDigital Library
- Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD’13. Google ScholarDigital Library
- Mingsheng Hong, Mirek Riedewald, Christoph Koch, Johannes Gehrke, and Alan Demers. 2009. Rule-based multi-query optimization. In EDBT’09. Google ScholarDigital Library
- Muhammad Idris, Martin Ugarte, and Stijn Vansummeren. 2017. The dynamic Yannakakis algorithm: Compact and efficient query processing under updates. In SIGMOD’17. Google ScholarDigital Library
- Oren Kalinsky, Yoav Etsion, and Benny Kimelfeld. 2017. Flexible caching in Trie Joins. In EDBT’17.Google Scholar
- Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An active graph database. In SIGMOD’17. Google ScholarDigital Library
- Kim, Kyoungmin and Seo, In and Han, Wook-Shin and Lee, Jeong-Hoon and Hong, Sungpack and Chafi, Hassan and Shin, Hyungyu and Jeong, Geonhwa. 2018. TurboFlux: A fast continuous subgraph matching system for streaming graph data. In SIGMOD’18. Google ScholarDigital Library
- Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: Higher-order delta processing fordynamic, frequently fresh views. VLDB J. 23 (2014), 253--278.Google ScholarCross Ref
- Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly sharing for streamed aggregation. In SIGMOD’06. Google ScholarDigital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In WWW’10. Google ScholarDigital Library
- Wangchao Le, Anastasios Kementsietsidis, Songyun Duan, and Feifei Li. 2012. Scalable multi-query optimization for SPARQL. In ICDE’12. Google ScholarDigital Library
- Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proc. VLDB (2015). Google ScholarDigital Library
- Daniel Lemire, Leonid Boytsov, and Nathan Kurz. 2016. SIMD compression and the intersection of sorted integers. Softw. Pract. Exper. 46, 6 (2016), 723--749. Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.Google Scholar
- Longbin Lai and Lu Qin and Xuemin Lin and Ying Zhang and Lijun Chang. 2016. Scalable distributed subgraph enumeration. In VLDB’16. Google ScholarDigital Library
- Angela Maduko, Kemafor Anyanwu, Amit Sheth, and Paul Schliekelman. 2008. Graph summaries for subgraph frequency estimation. The Semantic Web: Research and Applications (2008). Google ScholarDigital Library
- Maximum Common Induced Subgraph [n.d.]. Maximum Common Induced Subgraph. Retrieved from https://en.wikipedia.org/wiki/Maximum_common_induced_subgraph.Google Scholar
- Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB (2019). Google ScholarDigital Library
- Hoshi Mistry, Prasan Roy, S. Sudarshan, and Krithi Ramamritham. 2001. Materialized view selection and maintenance using multi-query optimization. In SIGMOD’01. Google ScholarDigital Library
- neo4j [n.d.]. Retrieved from Neo4j. https://neo4j.com/.Google Scholar
- neo4j:fraud [n.d.]. Fraud Detection: Discovering Connections with Graph Databases. Retrieved from https://neo4j.com/use-cases/fraud-detection.Google Scholar
- Thomas Neumann. 2011. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB (2011). Google ScholarDigital Library
- Thomas Neumann and Bernhard Radke. 2018. Adaptive optimization of very large join queries. In SIGMOD’18. Google ScholarDigital Library
- Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. VLDBJ 19 (2010), 91--113. Google ScholarDigital Library
- M. E. J. Newman. 2004. Detecting community structure in networks. Eur. Phys. J. B (2004).Google Scholar
- H. Ngo, C. Ré, and A. Rudra. 2014. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec. 42, 4 (2014), 5--16. Google ScholarDigital Library
- Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-case optimal join algorithms. In PODS’12. Google ScholarDigital Library
- Dung T. Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Join processing for graph patterns: An old dog with new tricks. CoRR/1503.04169 (2015)Google Scholar
- Dan Olteanu and Jakub Závodný. 2015. Size bounds for factorised representations of query results. Trans. Database Syst. 40, 1, Article 2 (2015). Google ScholarDigital Library
- opencypher [n.d.]. Retrieved from openCypher. http://www.opencypher.org.Google Scholar
- Paraschos Koutris and Semih Salihoglu and Dan Suciu. 2018. Algorithmic aspects of parallel data processing. Found. Trends Databases (2018). Google ScholarDigital Library
- Andrea Pugliese, Matthias Bröcheler, V. S. Subrahmanian, and Michael Ovelgönne. 2014. Efficient multiview maintenance under insertion in huge social networks. ACM Trans. Web 8, 2, Article 10 (2014). Google ScholarDigital Library
- Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB (2018). Google ScholarDigital Library
- Rada Chirkova and Jun Yang. 2012. Materialized views. Found. Trends Databases (2012). Google ScholarDigital Library
- Xuguang Ren and Junhu Wang. 2016. Multi-query optimization for subgraph isomorphism search. Proc. VLDB (2016). Google ScholarDigital Library
- Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. 2000. Efficient and extensible algorithms for multi query optimization. In SIGMOD’00. Google ScholarDigital Library
- Michael Rudolf, Marcus Paradies, Christof Bornhövd, and Wolfgang Lehner. 2013. The graph story of the SAP HANA database. In BTW’13.Google Scholar
- Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: Extended survey. VLDB J. 29 (2020), 595--618.Google ScholarCross Ref
- Timos K. Sellis. 1988. Multiple-query optimization. Trans. Database Syst. 13, 1 (1988), 23--52. Google ScholarDigital Library
- Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. Proc. VLDB (2008). Google ScholarDigital Library
- Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. 2014. Parallel subgraph listing in a large-scale graph. In SIGMOD’14. Google ScholarDigital Library
- J. R. Ullmann. 1976. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42. Google ScholarDigital Library
- Todd L. Veldhuizen. 2012. Leapfrog Triejoin: A worst-case optimal join algorithm. CoRR/1210.0481 (2012).Google Scholar
- Todd L. Veldhuizen. 2013. Incremental maintenance for Leapfrog Triejoin. CoRR/1303.5313 (2013).Google Scholar
- Umeshwar Dayal, Jennifer Widom, and Stefano Ceri. 1994. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- Xifeng Yan, Philip S. Yu, and Jiawei Han. 2004. Graph indexing: A frequent structure-based approach. In SIGMOD’04. Google ScholarDigital Library
- Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A distributed graph engine for web scale RDF data. Proc. VLDB (2013). Google ScholarDigital Library
- Peixiang Zhao, Jeffrey Xu Yu, and Philip S. Yu. 2007. Graph indexing: Tree + delta <= graph. In Proc. VLDB’07. Google ScholarDigital Library
- Jingren Zhou, Per-Ake Larson, Johann-Christoph Freytag, and Wolfgang Lehner. 2007. Efficient exploitation of similar subexpressions for query processing. In SIGMOD’07. Google ScholarDigital Library
Index Terms
- Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins
Recommendations
Adopting worst-case optimal joins in relational database systems
Worst-case optimal join algorithms are attractive from a theoretical point of view, as they offer asymptotically better runtime than binary joins on certain types of queries. In particular, they avoid enumerating large intermediate results by processing ...
Optimizing subgraph queries by combining binary and worst-case optimal joins
We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multi-way intersections. The core problem in optimizing worst-case ...
Worst-Case-Optimal Similarity Joins on Graph Databases
PACMMODWe extend the concept of worst-case optimal equijoins in graph databases to the case where some nodes are required to be within the k-nearest neighbors (kNN) of others under some similarity function. We model the problem by superimposing the database ...
Comments