Skip to main content
Log in

Efficient generation of query plans containing group-by, join, and groupjoin

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

It has been a recognized fact for many years that query execution can benefit from pushing grouping operators down in the operator tree and applying them before a join. This so-called eager aggregation reduces the size(s) of the join argument(s), making join evaluation faster. Lately, the idea enjoyed a revival when it was applied to outer joins for the first time and incorporated in a state-of-the-art plan generator. However, the recent approach is highly dependent on the use of heuristics because of the exponential growth of the search space that goes along with eager aggregation. Finding an optimal solution for larger queries calls for effective optimality-preserving pruning mechanisms to reduce the search space size as far as possible. By a more thorough investigation of functional dependencies and keys, we provide a set of new pruning criteria and extend the idea of eager aggregation further by combining it with the introduction of groupjoins. We evaluate the resulting plan generator with respect to runtime and memory consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

Notes

  1. By closure we mean the set of all dependencies derivable from a given set of dependencies, as the term is commonly understood.

References

  1. von Bültzingsloewen, G.: Optimizing SQL queries for parallel execution. SIGMOD Rec. 18, 17–22 (1989)

    Article  Google Scholar 

  2. Chaudhuri, S., Shim, K.: Including group-by in query optimization. In: Proceedings of International Conference on Very Large Data Bases (VLDB), vol 94, pp. 354–366 (1994)

  3. Cluet, S., Moerkotte, G.: Efficient evaluation of aggregates on bulk types. In: International Workshop on Database Programming Languages (1995)

  4. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  5. Eich, M., Fender, P., Moerkotte, G.: Faster plan generation through consideration of functional dependencies and keys. In: Proceedings of International Conference on Very Large Data Bases (VLDB), vol 9(10), pp. 756–767 (2016)

    Article  Google Scholar 

  6. Eich, M., Fender, P., Moerkotte, G.: Efficient generation of query plans containing group-by, join, and groupjoin. Technical report, University of Mannheim (2017)

  7. Eich, M., Moerkotte, G.: Dynamic programming: The next step. Technical report, University of Mannheim (2014)

  8. Eich, M., Moerkotte, G.: Dynamic programming: the next step. In: Proceedings of IEEE Conference on Data Engineering, pp. 903–914 (2015)

  9. Galindo-Legaria, C., Joshi, M.: Orthogonal optimization of subqueries and aggregation. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 571–581 (2001)

    Article  Google Scholar 

  10. Galindo-Legaria, C., Rosenthal, A.: Outerjoin simplification and reordering for query optimization. ACM Trans. Database Syst. 22(1), 43–74 (1997)

    Article  Google Scholar 

  11. Kemper, A., Neumann, T.: Hyper: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In: Proceedings of IEEE Conference on Data Engineering, pp. 195–206 (2011)

  12. Moerkotte, G., Fender, P., Eich, M.: On the correct and complete enumeration of the core search space. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 493–504 (2013)

  13. Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 539–552 (2008)

  14. Moerkotte, G., Neumann, T.: Accelerating queries with group-by and join by groupjoin. In: Proceedings of International Conference on Very Large Data Bases (VLDB), vol 4(11) (2011)

  15. Paulley, G.: Exploiting functional dependence in query optimization. Ph.D. thesis, University of Waterloo (2000)

  16. Yan, W.: Rewriting optimization of SQL queries containing group-by. Ph.D. thesis, University of Waterloo (1995)

  17. Yan, W., Larson, P.A.: Performing group-by before join. Technical Report CS 93-46, Dept. of Computer Science, University of Waterloo, Canada (1993)

  18. Yan, W., Larson, P.A.: Performing group-by before join. In: Proceedings of IEEE Conference on Data Engineering, pp. 89–100 (1994)

  19. Yan, W., Larson, P.A.: Eager aggregation and lazy aggregation. In: Proceedings of International Conference on Very Large Data Bases (VLDB), vol 95, pp. 345–357 (1995)

  20. Yan, W., Larson, P.A.: Interchanging the order of grouping and join. Technical Report CS 95-09, Dept. of Computer Science, University of Waterloo, Canada (1995)

Download references

Acknowledgements

We thank Simone Seeger for her help preparing the manuscript and the reviewers for their helpful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marius Eich.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eich, M., Fender, P. & Moerkotte, G. Efficient generation of query plans containing group-by, join, and groupjoin. The VLDB Journal 27, 617–641 (2018). https://doi.org/10.1007/s00778-017-0476-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-017-0476-3

Keywords

Navigation