Skip to main content
Log in

Depth estimation for ranking query optimization

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators that integrate the selection of top tuples with join processing. These operators access just “enough” of the input in order to generate just “enough” output and can offer significant speed-ups for query evaluation. The number of input tuples that an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized physical plans. We introduce an estimation methodology, termed deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization of the formal deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the performance of deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of deep as an estimation method and demonstrate its advantages over previously proposed techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. In: Proceedings of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pp. 275–286 (1999)

  2. Agrawal, P., Widom, J.: Confidence-aware joins in large uncertain databases. Technical report, Stanford University (2007). http://dbpubs.stanford.edu/pub/2007-14

  3. Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking join and self-join sizes in limited storage. In: PODS ’99: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 10–20, New York, NY, USA. ACM, New York (1999)

  4. Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate Query Processing Using Wavelets. In: Proceedings of the 26th Intl. Conf. on Very Large Data Bases, pp. 111–122 (2000)

  5. Chaudhuri, S., Dalvi, N., Kaushik, R.: Robust cardinality and cost estimation for skyline operator. In: Proceedings of the 22nd Intl. Conf. on Data Engineering, 0:64 (2006)

  6. Christodoulakis S.: Implications of certain assumptions in database performance evauation. ACM Trans. Database Syst. 9(2), 163–186 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  7. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 102–113 (2001)

  8. Graefe G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–169 (1993)

    Article  Google Scholar 

  9. Ilyas F., Aref G., Elmagarmid K.: Supporting top-k join queries in relational databases. Int. J. Very Large Databases 13(3), 207–221 (2004)

    Google Scholar 

  10. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K., Elmongui, H., Shah, R., Vitter, J.S.: Adaptive rank-aware query optimization in relational databases. ACM Trans Database Syst, December (2006)

  11. Ilyas, I.F., Shah, R., Aref, W.G., Vitter, J.S., Elmagarmid, A.K.: Rank-aware query optimization. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 203–214 (2004)

  12. Ioannidis, Y.E., Christodoulakis, S.: On the Propagation of Errors in the Size of Join Results. In: Proceedings of the 1991 ACM SIGMOD Intl. Conf. on Management of Data, pp. 268–277 (1991)

  13. Ioannidis, Y.E., Poosala, V.: Histogram-Based Approximation of Set-Valued Query Answers. In: Proceedings of the 25th Intl. Conf. on Very Large Data Bases, pp. 174–185 (1999)

  14. Li, C., Chang, K.C., Ilyas, I.F., Ranksql, S.S.: query algebra and optimization for relational top-k queries. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 131–142 (2005)

  15. Lipton R.J., Naughton J.F., Schneider D.A.: Practical selectivity estimation through adaptive sampling. SIGMOD Rec. 19(2), 1–11 (1990)

    Article  Google Scholar 

  16. Lipton R.J., Naughton J.F., Schneider D.A., Seshadri S.: Efficient sampling strategies for relational database operations. Theor. Comput. Sci. 116(1–2), 195–226 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. Mamoulis N., Yiu M.L., Cheng K.H., Cheung D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)

    Article  Google Scholar 

  18. Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 426–435, New York, NY, USA. ACM Press, New York (1998)

  19. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-Based Histograms for Selectivity Estimation. In: Proceedings of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, pp. 448–459 (1998)

  20. Natsev, A., Chang, Y., Smith, J.R., Li, C., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: Proceedings of the Intl. Conf. on Very Large Data Bases, pp. 281–290 (2001)

  21. Poosala, V., Ioannidis, Y.E.: Selectivity Estimation Without the Attribute Value Independence Assumption. In: Proceedings of the 23rd Intl. Conf. on Very Large Data Bases, pp. 486–495 (1997)

  22. Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved Histograms for Selectivity Estimation of Range Predicates. In: Proceedings of the 1996 ACM SIGMOD Intl. Conf. on Management of Data, pp. 294–305 (1996)

  23. Spiegel, J., Polyzotis, N.: Graph-based synopses for relational selectivity estimation. In: SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 205–216 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karl Schnaitter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schnaitter, K., Spiegel, J. & Polyzotis, N. Depth estimation for ranking query optimization. The VLDB Journal 18, 521–542 (2009). https://doi.org/10.1007/s00778-008-0124-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0124-z

Keywords

Navigation