Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket

Labbadi, Wissem; Akaichi, Jalel

doi:10.1007/978-3-319-11587-0_28

Wissem Labbadi¹⁸ &
Jalel Akaichi¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8748))

Included in the following conference series:

International Conference on Model and Data Engineering

881 Accesses

Abstract

Most of the histograms, maintained by the actual DBMSs, make the uniform frequency assumption and most commonly approximate all frequencies in a bucket by their average. Thus, these histograms require storing the average frequency for each bucket. Hence, the accuracy of any estimation performed using the histogram depends highly on the technique used for approximating values into each bucket. Several approaches for approximating the set of attribute values with in a bucket have been studied in the literature. Some of histograms record every distinct value that appears in each bucket and other ones make crude assumptions about it. The most significant are the continuous values assumption, the uniform spread assumption and finally, the point value assumption. Other existing approaches are based on sampling techniques to approximate values inside a histogram bucket. The problem here is that all the proposed techniques assume that attribute values have equal spreads. Motivated by the inaccuracy of previous approaches in approximating value sets with non uniform spreads and by the significant estimation error that can be reached with the various assumptions, we need to compute d distinct values v₁, v₂, . . ., v_d that lie between the lowest and highest values in the range of each bucket without making any assumption about the values spreadsheet. For this reason, we propose an efficient algorithm for calculating these d values dynamically as new values are inserted into the attribute. The problem can be returned to calculate values of (d-2) quantiles; namely, the 1/d-, 2/d-, …, (d-2)/d-quantiles, along with the lowest and highest values in the bucket. For each quantile to be estimated, we maintain a set of five markers that are updated after every new value inserted in the attribute. The results of a set of experiments comparing the accuracy of the proposed algorithm to the uniform spread assumption using various sets of values, over different types of histograms, show the effectiveness of our technique especially when values have non-equal spreads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Median of a Set of Histogram Data

Sampling Representation Contexts with Attribute Exploration

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

References

Kooi, R.P.: The optimization of queries in relational databases. PhD thesis, Case Western Reserver University (September 1980)
Google Scholar
Shapiro, G.P., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. In: Proc. of ACM SIGMOD, pp. 256–276 (1984)
Google Scholar
Ioannidis, Y.: Universality of serial histograms. In: Proc. of 19th VLDB, pp. 256–267 (1993)
Google Scholar
Ioannidis, Y., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. In: Proc. of ACM TODS (1993)
Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.T.: Access path selection in a relational database management system. In: Proc. of ACM SIGMOD, pp. 23–34 (1979)
Google Scholar
Poosala, V., Ioannidis, Y., Haas, P., Shekita, E.: Improved histograms for selectivity estimation of range predicates. In: Proc. of ACM SIGMOD, pp. 294–305 (1996)
Google Scholar
Ioannidis, Y., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. of ACM SIGMOD, pp. 233–244 (1995)
Google Scholar
Poosala, V., Ioannidis, Y.: Estimation of query-result distribution and its application in parallel-join load balancing. In: Proc. of 22nd VLDB, pp. 448–459 (1996)
Google Scholar
Labbadi, W., Akaichi, J.: Improving range query result size estimation based on a new optimal histogram. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds.) FQAS 2013. LNCS, vol. 8132, pp. 40–56. Springer, Heidelberg (2013)
Chapter Google Scholar
Jain, R., Chlamtac, I.: The p² algorithm for dynamic calculation of quantiles and histograms without storing observations. Communications oh the ACM, 1076–1085 (1985)
Google Scholar
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal histograms with quality guarantees. In: Proc. of 24th VLDB, pp. 275–286 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Bouchoucha, BESTMOD Lab, ISG of Tunis, 20 Rue de la Liberté, 2000, Bardo, Tunisia
Wissem Labbadi & Jalel Akaichi

Authors

Wissem Labbadi
View author publications
You can also search for this author in PubMed Google Scholar
Jalel Akaichi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT-ENSEEIHT, 2 rue Charles, Camichel, BP 7122, 31071, Toulouse Cedex 7, France
Yamine Ait Ameur
LIAS/ISAE-ENSMA, Téléport 2, 1 avenue Clément Ader, BP 40109, 86961, Futuroscope Chasseneuil Cedex, France
Ladjel Bellatreche
Department of Computer Science, University of Cyprus, 1 University Avenue, Aglantzia, 2109, Nicosia, Cyprus
George A. Papadopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Labbadi, W., Akaichi, J. (2014). Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket. In: Ait Ameur, Y., Bellatreche, L., Papadopoulos, G.A. (eds) Model and Data Engineering. MEDI 2014. Lecture Notes in Computer Science, vol 8748. Springer, Cham. https://doi.org/10.1007/978-3-319-11587-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-11587-0_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11586-3
Online ISBN: 978-3-319-11587-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Median of a Set of Histogram Data

Sampling Representation Contexts with Attribute Exploration

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Algorithm to Approximate Values with Non-uniform Spreads Inside a Histogram Bucket

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Median of a Set of Histogram Data

Sampling Representation Contexts with Attribute Exploration

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation