skip to main content
research-article
Open access

Efficiently Computing Join Orders with Heuristic Search

Published: 30 May 2023 Publication History

Abstract

Join order optimization is one of the most fundamental problems in processing queries on relational data. It has been studied extensively for almost four decades now. Still, because of its NP hardness, no generally efficient solution exists and the problem remains an important topic of research. The scope of algorithms to compute join orders ranges from exhaustive enumeration, to combinatorics based on graph properties, to greedy search, to genetic algorithms, to recently investigated machine learning. A few works exist that use heuristic search to compute join orders. However, a theoretical argument why and how heuristic search is applicable to join order optimization is lacking.
In this work, we investigate join order optimization via heuristic search. In particular, we provide a strong theoretical framework, in which we reduce join order optimization to the shortest path problem. We then thoroughly analyze the properties of this problem and the applicability of heuristic search. We devise crucial optimizations to make heuristic search tractable. We implement join ordering via heuristic search in a real DBMS and conduct an extensive empirical study. Our findings show that for star- and clique-shaped queries, heuristic search finds optimal plans an order of magnitude faster than current state of the art. Our suboptimal solutions further extend the cost/time Pareto frontier.

Supplemental Material

MP4 File
Video Presentation of "Efficiently Computing Join Orders with Heuristic Search" for SIGMOD 2023
PDF File
Read me
ZIP File
Source Code

References

[1]
Brian Babcock and Surajit Chaudhuri. 2005. Towards a robust query optimizer: a principled and practical approach. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 119--130.
[2]
Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. 1995. Optimizing queries with materialized views. In Proceedings of the Eleventh International Conference on Data Engineering. IEEE, 190--200.
[3]
Sophie Cluet and Guido Moerkotte. 1995. On the complexity of generating optimal left-deep processing trees with cross products. In International Conference on Database Theory. Springer, 54--67.
[4]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2016. Introduction to Algorithms. The MIT Press.
[5]
Rina Dechter and Judea Pearl. 1985. Generalized best-first search strategies and the optimality of A. Journal of the ACM (JACM), Vol. 32, 3 (1985), 505--536.
[6]
David DeHaan and Frank Wm Tompa. 2007. Optimal top-down join enumeration. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data. 785--796.
[7]
Edsger W Dijkstra et al. 1959. A note on two problems in connexion with graphs. Numerische mathematik, Vol. 1, 1 (1959), 269--271.
[8]
Stefan Edelkamp and Stefan Schrodl. 2011. Heuristic search: theory and applications. Elsevier.
[9]
Leonidas Fegaras. 1998. A new heuristic for optimizing large queries. In International Conference on Database and Expert Systems Applications. Springer, 726--735.
[10]
Pit Fender. 2014. Algorithms for Efficient Top-Down Join Enumeration. (2014).
[11]
Pit Fender and Guido Moerkotte. 2011a. A new, highly efficient, and easy to implement top-down join enumeration algorithm. In 2011 IEEE 27th International Conference on Data Engineering. IEEE, 864--875.
[12]
Pit Fender and Guido Moerkotte. 2011b. Reassessing top-down join enumeration. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 10 (2011), 1803--1818.
[13]
Frank Gray. 1953. Pulse code communication. US Patent 2,632,058.
[14]
Immanuel Haffner, Marcel Maltry, Joris Nix, Jens Dittrich, and Luca Gretscher. 2023. normalfont μtable. https://mutable.uni-saarland.de
[15]
Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, Vol. 4, 2 (1968), 100--107.
[16]
Toshihide Ibaraki and Tiko Kameda. 1984. On the optimal nesting order for computing n-relational joins. ACM Transactions on Database Systems (TODS), Vol. 9, 3 (1984), 482--502.
[17]
Navin Kabra and David J DeWitt. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 106--117.
[18]
Ravi Krishnamurthy, Haran Boral, and Carlo Zaniolo. 1986. Optimization of Nonrecursive Queries. In VLDB, Vol. 86. 128--137.
[19]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215.
[20]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data. 1275--1288.
[21]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul23. 2019. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment, Vol. 12, 11 (2019).
[22]
Guido Moerkotte and Thomas Neumann. 2006. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In Proceedings of the 32nd international conference on Very large data bases. Citeseer, 930--941.
[23]
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, and Alekh Jindal. 2021. Steering query optimizers: A practical take on big data workloads. In Proceedings of the 2021 International Conference on Management of Data. 2557--2569.
[24]
Thomas Neumann. 2009. Query simplification: graceful degradation for join-order optimization. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 403--414.
[25]
Thomas Neumann and Bernhard Radke. 2018. Adaptive optimization of very large join queries. In Proceedings of the 2018 International Conference on Management of Data. 677--692.
[26]
Kenneth W Ng, Zhenghao Wang, Richard R Muntz, and Silvia Nittel. 1999. Dynamic query re-optimization. In Proceedings. Eleventh International Conference on Scientific and Statistical Database Management. IEEE, 264--273.
[27]
Kiyoshi Ono and Guy M. Lohman. 1990. Measuring the Complexity of Join Enumeration in Query Optimization. In VLDB, Vol. 97. 314--325.
[28]
Matthew Perron, Zeyuan Shang, Tim Kraska, and Michael Stonebraker. 2019. How I learned to stop worrying and love re-optimization. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1758--1761.
[29]
Stuart Russell and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach 4 ed.). Prentice Hall.
[30]
Sa"id Salhi. 2017. Heuristic search: The emerging science of problem solving. Springer.
[31]
Alexander Schrijver. 2004. Combinatorial optimization: Polyhedra and efficiency (algorithms and combinatorics). Journal-Operational Research Society, Vol. 55, 9 (2004), 1018--1018.
[32]
P. Griffiths Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. 1989. Access Path Selection in a Relational Database Management System. In Readings in Artificial Intelligence and Databases. Elsevier, 511--522.
[33]
Timos K Sellis. 1988. Multiple-query optimization. ACM Transactions on Database Systems (TODS), Vol. 13, 1 (1988), 23--52.
[34]
Michael Steinbrunn, Guido Moerkotte, and Alfons Kemper. 1997. Heuristic and randomized optimization for the join ordering problem. The VLDB Journal, Vol. 6, 3 (1997), 191--208.
[35]
Bennet Vance. 1998. Join-order optimization with Cartesian products. Oregon Graduate Institute of Science and Technology.
[36]
Bennet Vance and David Maier. 1996. Rapid bushy join-order optimization with cartesian products. ACM SIGMOD Record, Vol. 25, 2 (1996), 35--46.
[37]
Florian Waas and Arjan Pellenkoft. 2000. Join order selection (good enough is easy). In British National Conference on Databases. Springer, 51--67.
[38]
Wentao Wu, Jeffrey F Naughton, and Harneet Singh. 2016. Sampling-based query re-optimization. In Proceedings of the 2016 International Conference on Management of Data. 1721--1736.

Cited By

View all
  • (2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
  • (2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 3-May-2024
  • (2024)Towards a Converged Relational-Graph Optimization FrameworkProceedings of the ACM on Management of Data10.1145/36988282:6(1-27)Online publication date: 20-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Badges

Author Tags

  1. join ordering
  2. query optimization
  3. query planning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)379
  • Downloads (Last 6 weeks)44
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQLProceedings of the ACM on Management of Data10.1145/37096593:1(1-28)Online publication date: 11-Feb-2025
  • (2024)POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least ResistanceProceedings of the VLDB Endowment10.14778/3648160.364817517:6(1350-1363)Online publication date: 3-May-2024
  • (2024)Towards a Converged Relational-Graph Optimization FrameworkProceedings of the ACM on Management of Data10.1145/36988282:6(1-27)Online publication date: 20-Dec-2024
  • (2024)DPconv: Super-Polynomially Faster Join OrderingProceedings of the ACM on Management of Data10.1145/36988092:6(1-26)Online publication date: 20-Dec-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media