ABSTRACT
Conjunctive queries with predicates in the form of comparisons that span multiple relations have regained interest recently, due to their relevance in OLAP queries, spatiotemporal databases, and machine learning over relational data. The standard technique, predicate pushdown, has limited efficacy on such comparisons. A technique by Willard can be used to process short comparisons that are adjacent in the join tree in time linear in the input size plus output size. In this paper, we describe a new algorithm for evaluating conjunctive queries with both short and long comparisons, and identify an acyclic condition under which linear time can be achieved. We have also implemented the new algorithm on top of Spark, and our experimental results demonstrate order-of-magnitude speedups over SparkSQL on a variety of graph pattern and analytical queries.
Supplemental Material
Available for Download
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases: The Logical Level 1st ed.). Addison-Wesley Longman Publishing Co., Inc., USA.Google ScholarDigital Library
- Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 1383--1394. https://doi.org/10.1145/2723372.2742797Google ScholarDigital Library
- Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. 2007. On Acyclic Conjunctive Queries and Constant Delay Enumeration (CSL'07/EACSL'07). Springer-Verlag, 208--222.Google Scholar
- Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. 2008. Computational Geometry: Algorithms and Applications 3rd ed. ed.). Springer-Verlag TELOS, Santa Clara, CA, USA.Google ScholarDigital Library
- Nofar Carmeli and Markus Kröll. 2021. On the Enumeration Complexity of Unions of Conjunctive Queries. ACM Transactions on Database Systems, Vol. 46, 2, Article 5 (May 2021), 41 pages. https://doi.org/10.1145/3450263Google ScholarDigital Library
- Bernard Chazelle. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput., Vol. 17, 3 (1988), 427--462.Google ScholarDigital Library
- Bernard Chazelle and Leonidas J. Guibas. 1986. Fractional cascading: II. Applications. Algorithmica, Vol. 1 (1986), 163--191.Google ScholarDigital Library
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition 3rd ed.). The MIT Press.Google Scholar
- Georg Gottlob, Martin Grohe, nysret Musliu, Marko Samer, and Francesco Scarcello. 2005. Hypertree Decompositions: Structure, Algorithms, and Applications. In Proceedings of the 31st International Conference on Graph-Theoretic Concepts in Computer Science. Springer-Verlag, Berlin, Heidelberg, 1--15. https://doi.org/10.1007/11604686_1Google ScholarDigital Library
- MH Graham. 1980. On the universal relation .Technical Report. University of Toronto. Computer Systems Research Group and Graham, MH.Google Scholar
- Muhammad Idris, Martin Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. 2020. General dynamic Yannakakis: conjunctive queries with theta joins under updates. The VLDB Journal, Vol. 29 (2020), 619--653.Google ScholarCross Ref
- Manas R. Joglekar, Rohan Puttagunta, and Christopher Ré. 2016. AJAR: Aggregations and Joins over Annotated Relations. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (San Francisco, California, USA) (PODS '16). Association for Computing Machinery, New York, NY, USA, 91--106. https://doi.org/10.1145/2902251.2902293Google ScholarDigital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu. 2019. Boolean Tensor Decomposition for Conjunctive Queries with Negation. In 22nd International Conference on Database Theory, ICDT 2019, March 26--28, 2019, Lisbon, Portugal (LIPIcs, Vol. 127), Pablo Barceló and Marco Calautti (Eds.). Schloss Dagstuhl - Leibniz-Zentrum fü r Informatik, 21:1--21:19. https://doi.org/10.4230/LIPIcs.ICDT.2019.21Google Scholar
- Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. 2017. What Do Shannon-Type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another?. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Chicago, Illinois, USA) (PODS '17). Association for Computing Machinery, New York, NY, USA, 429--444. https://doi.org/10.1145/3034786.3056105Google ScholarDigital Library
- Mahmoud Abo Khamis, Ryan R. Curtin, Benjamin Moseley, Hung Q. Ngo, Xuanlong Nguyen, Dan Olteanu, and Maximilian Schleich. 2020. Functional Aggregate Queries with Additive Inequalities. ACM Transactions on Database Systems, Vol. 45, 4, Article 17 (dec 2020), 41 pages. https://doi.org/10.1145/3426865Google ScholarDigital Library
- Paraschos Koutris, Tova Milo, Sudeepa Roy, and Dan Suciu. 2017. Answering Conjunctive Queries with Inequalities. Theory of Computing Systems, Vol. 61 (2017), 2--30.Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data .Google Scholar
- Dániel Marx. 2013. Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries. J. ACM, Vol. 60, 6, Article 42 (nov 2013), 51 pages. https://doi.org/10.1145/2535926Google ScholarDigital Library
- Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-Case Optimal Join Algorithms: [Extended Abstract]. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Scottsdale, Arizona, USA) (PODS '12). Association for Computing Machinery, New York, NY, USA, 37--48. https://doi.org/10.1145/2213556.2213565Google ScholarDigital Library
- Mihai Patrascu. 2010. Towards Polynomial Lower Bounds for Dynamic Problems. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing (Cambridge, Massachusetts, USA) (STOC '10). Association for Computing Machinery, New York, NY, USA, 603--610. https://doi.org/10.1145/1806689.1806772Google ScholarDigital Library
- Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. 2021. Beyond Equi-Joins: Ranking, Enumeration and Factorization. Proc. International Conference on Very Large Data Bases, Vol. 14, 11 (jul 2021), 2599--2612. https://doi.org/10.14778/3476249.3476306Google ScholarDigital Library
- Ron van der Meyden. 1997. The complexity of querying indefinite data about linearly ordered domains. J. Comput. System Sci., Vol. 54, 1 (1997), 113--135.Google ScholarDigital Library
- Dan E Willard. 2002. An Algorithm for Handling Many Relational Calculus Queries Efficiently. J. Comput. System Sci., Vol. 65, 2 (Sept. 2002), 295--331.Google ScholarCross Ref
- Mihalis Yannakakis. 1981. Algorithms for Acyclic Database Schemes. In Proceedings of the Seventh International Conference on Very Large Data Bases - Volume 7 (Cannes, France) (VLDB '81). VLDB Endowment, 82--94.Google Scholar
- Clement Tak Yu and Meral Z Ozsoyoglu. 1979. An algorithm for tree-query membership of a distributed query. In COMPSAC 79. Proceedings. Computer Software and The IEEE Computer Society's Third International Applications Conference, 1979. IEEE, 306--312. https://doi.org/10.1109/CMPSAC.1979.762509Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for in-Memory Cluster Computing (NSDI'12). USENIX Association, USA, 2.Google Scholar
Index Terms
- Conjunctive Queries with Comparisons
Recommendations
Rewriting general conjunctive queries using views
The problem of rewriting queries using views has important applications in data integration, query optimization, and physical data independence maintenance. Previous researchers have proposed rewriting algorithms for queries and views that are Datalog ...
Rewriting general conjunctive queries using views
ADC '02: Proceedings of the 13th Australasian database conference - Volume 5The problem of rewriting queries using views has important applications in data integration, query optimization, and physical data independence maintenance. Previous researchers have proposed rewriting algorithms for queries and views that are Datalog ...
The complexity of acyclic conjunctive queries
This paper deals with the evaluation of acyclic Boolean conjunctive queries in relational databases. By well-known results of Yannakakis[1981], this problem is solvable in polynomial time; its precise complexity, however, has not been pinpointed so far. We ...
Comments