Depth estimation for ranking query optimization

Schnaitter, Karl; Spiegel, Joshua; Polyzotis, Neoklis

doi:10.1007/s00778-008-0124-z

Depth estimation for ranking query optimization

Special Issue Paper
Published: 15 January 2009

Volume 18, pages 521–542, (2009)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Karl Schnaitter¹,
Joshua Spiegel² &
Neoklis Polyzotis¹

120 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators that integrate the selection of top tuples with join processing. These operators access just “enough” of the input in order to generate just “enough” output and can offer significant speed-ups for query evaluation. The number of input tuples that an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized physical plans. We introduce an estimation methodology, termed deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization of the formal deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the performance of deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of deep as an estimation method and demonstrate its advantages over previously proposed techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. In: Proceedings of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pp. 275–286 (1999)
Agrawal, P., Widom, J.: Confidence-aware joins in large uncertain databases. Technical report, Stanford University (2007). http://dbpubs.stanford.edu/pub/2007-14
Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking join and self-join sizes in limited storage. In: PODS ’99: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 10–20, New York, NY, USA. ACM, New York (1999)
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate Query Processing Using Wavelets. In: Proceedings of the 26th Intl. Conf. on Very Large Data Bases, pp. 111–122 (2000)
Chaudhuri, S., Dalvi, N., Kaushik, R.: Robust cardinality and cost estimation for skyline operator. In: Proceedings of the 22nd Intl. Conf. on Data Engineering, 0:64 (2006)
Christodoulakis S.: Implications of certain assumptions in database performance evauation. ACM Trans. Database Syst. 9(2), 163–186 (1984)
Article MATH MathSciNet Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 102–113 (2001)
Graefe G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–169 (1993)
Article Google Scholar
Ilyas F., Aref G., Elmagarmid K.: Supporting top-k join queries in relational databases. Int. J. Very Large Databases 13(3), 207–221 (2004)
Google Scholar
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K., Elmongui, H., Shah, R., Vitter, J.S.: Adaptive rank-aware query optimization in relational databases. ACM Trans Database Syst, December (2006)
Ilyas, I.F., Shah, R., Aref, W.G., Vitter, J.S., Elmagarmid, A.K.: Rank-aware query optimization. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 203–214 (2004)
Ioannidis, Y.E., Christodoulakis, S.: On the Propagation of Errors in the Size of Join Results. In: Proceedings of the 1991 ACM SIGMOD Intl. Conf. on Management of Data, pp. 268–277 (1991)
Ioannidis, Y.E., Poosala, V.: Histogram-Based Approximation of Set-Valued Query Answers. In: Proceedings of the 25th Intl. Conf. on Very Large Data Bases, pp. 174–185 (1999)
Li, C., Chang, K.C., Ilyas, I.F., Ranksql, S.S.: query algebra and optimization for relational top-k queries. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 131–142 (2005)
Lipton R.J., Naughton J.F., Schneider D.A.: Practical selectivity estimation through adaptive sampling. SIGMOD Rec. 19(2), 1–11 (1990)
Article Google Scholar
Lipton R.J., Naughton J.F., Schneider D.A., Seshadri S.: Efficient sampling strategies for relational database operations. Theor. Comput. Sci. 116(1–2), 195–226 (1993)
Article MATH MathSciNet Google Scholar
Mamoulis N., Yiu M.L., Cheng K.H., Cheung D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)
Article Google Scholar
Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 426–435, New York, NY, USA. ACM Press, New York (1998)
Matias, Y., Vitter, J.S., Wang, M.: Wavelet-Based Histograms for Selectivity Estimation. In: Proceedings of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, pp. 448–459 (1998)
Natsev, A., Chang, Y., Smith, J.R., Li, C., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: Proceedings of the Intl. Conf. on Very Large Data Bases, pp. 281–290 (2001)
Poosala, V., Ioannidis, Y.E.: Selectivity Estimation Without the Attribute Value Independence Assumption. In: Proceedings of the 23rd Intl. Conf. on Very Large Data Bases, pp. 486–495 (1997)
Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved Histograms for Selectivity Estimation of Range Predicates. In: Proceedings of the 1996 ACM SIGMOD Intl. Conf. on Management of Data, pp. 294–305 (1996)
Spiegel, J., Polyzotis, N.: Graph-based synopses for relational selectivity estimation. In: SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 205–216 (2006)

Download references

Author information

Authors and Affiliations

UC Santa Cruz, Santa Cruz, CA, USA
Karl Schnaitter & Neoklis Polyzotis
Oracle, St. Louis, MO, USA
Joshua Spiegel

Authors

Karl Schnaitter
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Spiegel
View author publications
You can also search for this author in PubMed Google Scholar
Neoklis Polyzotis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karl Schnaitter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schnaitter, K., Spiegel, J. & Polyzotis, N. Depth estimation for ranking query optimization. The VLDB Journal 18, 521–542 (2009). https://doi.org/10.1007/s00778-008-0124-z

Download citation

Received: 28 February 2008
Revised: 05 December 2008
Accepted: 08 December 2008
Published: 15 January 2009
Issue Date: April 2009
DOI: https://doi.org/10.1007/s00778-008-0124-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Depth estimation for ranking query optimization

Abstract

Access this article

Similar content being viewed by others

Dissociation and propagation for approximate lifted inference with standard relational database management systems

Lero: applying learning-to-rank in query optimizer

Efficient dynamic pruning on largest scores first (LSF) retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Depth estimation for ranking query optimization

Abstract

Access this article

Similar content being viewed by others

Dissociation and propagation for approximate lifted inference with standard relational database management systems

Lero: applying learning-to-rank in query optimizer

Efficient dynamic pruning on largest scores first (LSF) retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation