Skip to main content

On the estimation of join result sizes

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 779))

Abstract

Good estimates of join result sizes are critical for query optimization in relational database management systems. We address the problem of incrementally obtaining accurate and consistent estimates of join result sizes. We have invented a new rule for choosing join selectivities for estimating join result sizes. The rule is part of a new unified algorithm called Algorithm ELS (Equivalence and Largest Selectivity). Prior to computing any result sizes, equivalence classes are determined for the join columns. The algorithm also takes into account the effect of local predicates on table and column cardinalities. These computations allow the correct selectivity values for each eligible join predicate to be computed. We show that the algorithm is correct and gives better estimates than current estimation algorithms.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Christodoulakis. Estimating Block Transfers and Join Sizes. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 40–54, 1983.

    Google Scholar 

  2. S. Christodoulakis. Implications of Certain Assumptions in Database Performance Evaluation. ACM Transactions on Database Systems, 9(2):163–186, June 1984.

    Google Scholar 

  3. C. Faloutsos and H. V. Jagadish. On B-tree Indices for Skewed Distributions. In Proceedings of the Eighteenth International Conference on Very Large Data Bases, pages 363–374, Vancouver, British Columbia, 1992. Morgan Kaufman.

    Google Scholar 

  4. Y. E. Ioannidis and S. Christodoulakis. On the Propogation of Errors in the Size of Join Results. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 268–277, Denver, Colorado, 1991.

    Google Scholar 

  5. Y.C. Kang. Randomized Algorithms for Query Optimization. PhD thesis, University of Wisconsin-Madison, October 1991. TR 1053.

    Google Scholar 

  6. C. A. Lynch. Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distributions of Column Values. In Proceedings of the Fourteenth International Conference on Very Large Data Bases, pages 240–251, Los Angeles, USA, 1988. Morgan Kaufman.

    Google Scholar 

  7. M. V. Mannino, P. Chu, and T. Sager. Statistical Profile Estimation in Database Systems. ACM Computing Surveys, 20(3):191–221, September 1988.

    Google Scholar 

  8. M. Muralikrishna and D. J. Dewitt. Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 28–36, Chicago, Illinois, 1988.

    Google Scholar 

  9. K. Ono and G. M. Lohman. Measuring the Complexity of Join Enumeration in Query Optimization. In Proceedings of the Sixteenth International Conference on Very Large Data Bases, pages 314–325, Brisbane, Australia, 1990. Morgan Kaufman.

    Google Scholar 

  10. G. Piatetsky-Shapiro and C. Connell. Accurate Estimation of the Number of Tuples Satisfying a Condition. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 256–276, 1984.

    Google Scholar 

  11. H. Pirahesh, J. Hellerstein, and W. Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 39–48, San Diego, California, 1992.

    Google Scholar 

  12. A. Rosenthal. Note on the Expected Size of a Join. ACM-SIGMOD Record, pages 19–25, July 1981.

    Google Scholar 

  13. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 23–34, 1979.

    Google Scholar 

  14. A. Swami. Optimization of Large Join Queries. PhD thesis, Stanford University, June 1989. STAN-CS-89-1262.

    Google Scholar 

  15. A. Swami and B. Iyer. A Polynomial Time Algorithm for Optimizing Join Queries. In Proceedings of IEEE Data Engineering Conference, pages 345–354. IEEE Computer Society, April 1993.

    Google Scholar 

  16. A. Swami and K. B. Schiefer. On the Estimation of Join Result Sizes. Technical report, IBM Research Division, October 1993. IBM Research Report RJ 9569.

    Google Scholar 

  17. G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Matthias Jarke Janis Bubenko Keith Jeffery

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Swami, A., Schiefer, K.B. (1994). On the estimation of join result sizes. In: Jarke, M., Bubenko, J., Jeffery, K. (eds) Advances in Database Technology — EDBT '94. EDBT 1994. Lecture Notes in Computer Science, vol 779. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57818-8_58

Download citation

  • DOI: https://doi.org/10.1007/3-540-57818-8_58

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57818-5

  • Online ISBN: 978-3-540-48342-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics