Skip to main content

Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket

  • Conference paper
Model and Data Engineering (MEDI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8748))

Included in the following conference series:

  • 859 Accesses

Abstract

Most of the histograms, maintained by the actual DBMSs, make the uniform frequency assumption and most commonly approximate all frequencies in a bucket by their average. Thus, these histograms require storing the average frequency for each bucket. Hence, the accuracy of any estimation performed using the histogram depends highly on the technique used for approximating values into each bucket. Several approaches for approximating the set of attribute values with in a bucket have been studied in the literature. Some of histograms record every distinct value that appears in each bucket and other ones make crude assumptions about it. The most significant are the continuous values assumption, the uniform spread assumption and finally, the point value assumption. Other existing approaches are based on sampling techniques to approximate values inside a histogram bucket. The problem here is that all the proposed techniques assume that attribute values have equal spreads. Motivated by the inaccuracy of previous approaches in approximating value sets with non uniform spreads and by the significant estimation error that can be reached with the various assumptions, we need to compute d distinct values v1, v2, . . ., vd that lie between the lowest and highest values in the range of each bucket without making any assumption about the values spreadsheet. For this reason, we propose an efficient algorithm for calculating these d values dynamically as new values are inserted into the attribute. The problem can be returned to calculate values of (d-2) quantiles; namely, the 1/d-, 2/d-, …, (d-2)/d-quantiles, along with the lowest and highest values in the bucket. For each quantile to be estimated, we maintain a set of five markers that are updated after every new value inserted in the attribute. The results of a set of experiments comparing the accuracy of the proposed algorithm to the uniform spread assumption using various sets of values, over different types of histograms, show the effectiveness of our technique especially when values have non-equal spreads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kooi, R.P.: The optimization of queries in relational databases. PhD thesis, Case Western Reserver University (September 1980)

    Google Scholar 

  2. Shapiro, G.P., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. In: Proc. of ACM SIGMOD, pp. 256–276 (1984)

    Google Scholar 

  3. Ioannidis, Y.: Universality of serial histograms. In: Proc. of 19th VLDB, pp. 256–267 (1993)

    Google Scholar 

  4. Ioannidis, Y., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. In: Proc. of ACM TODS (1993)

    Google Scholar 

  5. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.T.: Access path selection in a relational database management system. In: Proc. of ACM SIGMOD, pp. 23–34 (1979)

    Google Scholar 

  6. Poosala, V., Ioannidis, Y., Haas, P., Shekita, E.: Improved histograms for selectivity estimation of range predicates. In: Proc. of ACM SIGMOD, pp. 294–305 (1996)

    Google Scholar 

  7. Ioannidis, Y., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. of ACM SIGMOD, pp. 233–244 (1995)

    Google Scholar 

  8. Poosala, V., Ioannidis, Y.: Estimation of query-result distribution and its application in parallel-join load balancing. In: Proc. of 22nd VLDB, pp. 448–459 (1996)

    Google Scholar 

  9. Labbadi, W., Akaichi, J.: Improving range query result size estimation based on a new optimal histogram. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds.) FQAS 2013. LNCS, vol. 8132, pp. 40–56. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  10. Jain, R., Chlamtac, I.: The p2 algorithm for dynamic calculation of quantiles and histograms without storing observations. Communications oh the ACM, 1076–1085 (1985)

    Google Scholar 

  11. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal histograms with quality guarantees. In: Proc. of 24th VLDB, pp. 275–286 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Labbadi, W., Akaichi, J. (2014). Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket. In: Ait Ameur, Y., Bellatreche, L., Papadopoulos, G.A. (eds) Model and Data Engineering. MEDI 2014. Lecture Notes in Computer Science, vol 8748. Springer, Cham. https://doi.org/10.1007/978-3-319-11587-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11587-0_28

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11586-3

  • Online ISBN: 978-3-319-11587-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics