Abstract
Good estimates of join result sizes are critical for query optimization in relational database management systems. We address the problem of incrementally obtaining accurate and consistent estimates of join result sizes. We have invented a new rule for choosing join selectivities for estimating join result sizes. The rule is part of a new unified algorithm called Algorithm ELS (Equivalence and Largest Selectivity). Prior to computing any result sizes, equivalence classes are determined for the join columns. The algorithm also takes into account the effect of local predicates on table and column cardinalities. These computations allow the correct selectivity values for each eligible join predicate to be computed. We show that the algorithm is correct and gives better estimates than current estimation algorithms.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
S. Christodoulakis. Estimating Block Transfers and Join Sizes. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 40–54, 1983.
S. Christodoulakis. Implications of Certain Assumptions in Database Performance Evaluation. ACM Transactions on Database Systems, 9(2):163–186, June 1984.
C. Faloutsos and H. V. Jagadish. On B-tree Indices for Skewed Distributions. In Proceedings of the Eighteenth International Conference on Very Large Data Bases, pages 363–374, Vancouver, British Columbia, 1992. Morgan Kaufman.
Y. E. Ioannidis and S. Christodoulakis. On the Propogation of Errors in the Size of Join Results. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 268–277, Denver, Colorado, 1991.
Y.C. Kang. Randomized Algorithms for Query Optimization. PhD thesis, University of Wisconsin-Madison, October 1991. TR 1053.
C. A. Lynch. Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distributions of Column Values. In Proceedings of the Fourteenth International Conference on Very Large Data Bases, pages 240–251, Los Angeles, USA, 1988. Morgan Kaufman.
M. V. Mannino, P. Chu, and T. Sager. Statistical Profile Estimation in Database Systems. ACM Computing Surveys, 20(3):191–221, September 1988.
M. Muralikrishna and D. J. Dewitt. Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 28–36, Chicago, Illinois, 1988.
K. Ono and G. M. Lohman. Measuring the Complexity of Join Enumeration in Query Optimization. In Proceedings of the Sixteenth International Conference on Very Large Data Bases, pages 314–325, Brisbane, Australia, 1990. Morgan Kaufman.
G. Piatetsky-Shapiro and C. Connell. Accurate Estimation of the Number of Tuples Satisfying a Condition. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 256–276, 1984.
H. Pirahesh, J. Hellerstein, and W. Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 39–48, San Diego, California, 1992.
A. Rosenthal. Note on the Expected Size of a Join. ACM-SIGMOD Record, pages 19–25, July 1981.
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 23–34, 1979.
A. Swami. Optimization of Large Join Queries. PhD thesis, Stanford University, June 1989. STAN-CS-89-1262.
A. Swami and B. Iyer. A Polynomial Time Algorithm for Optimizing Join Queries. In Proceedings of IEEE Data Engineering Conference, pages 345–354. IEEE Computer Society, April 1993.
A. Swami and K. B. Schiefer. On the Estimation of Join Result Sizes. Technical report, IBM Research Division, October 1993. IBM Research Report RJ 9569.
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Swami, A., Schiefer, K.B. (1994). On the estimation of join result sizes. In: Jarke, M., Bubenko, J., Jeffery, K. (eds) Advances in Database Technology — EDBT '94. EDBT 1994. Lecture Notes in Computer Science, vol 779. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57818-8_58
Download citation
DOI: https://doi.org/10.1007/3-540-57818-8_58
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57818-5
Online ISBN: 978-3-540-48342-7
eBook Packages: Springer Book Archive