Abstract
In many application contexts, like statistical databases, transaction recording systems, scientific databases, query optimizers, OLAP, and so on, data are summarized as histograms of aggregate values. When the task of reconstructing range queries on original data from aggregate data is performed, a certain estimation error cannot be avoided, due to the loss of information in compressing data. Error size strongly depends both on how histograms partition data domains and on how estimation inside each bucket is done. We propose a new type of histogram, based on an unbalanced binary-tree partition, suitable for providing quick answers to hierarchical range queries, and we use adaptive tree-indexing for better approximating frequencies inside buckets. As the results from our experiments demonstrate, our histogram behaves considerably better than state-of-the-art histograms, showing smaller errors in all considered data sets at the same storage space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
F. Buccafurri, F. Furfaro, F. Saccá, Estimating Range Queries using Aggregate Data with Integrity Constraints: a Probabilistic Approach, Proc. of the 8th International Conference on Database Theory, London(UK), 4–6 January 2001.
F. Buccafurri, L. Pontieri, D. Rosaci, D. Saccá,Improving Range Query Estimation on Histograms, Proc. of the International Conference ICDE 2002, San José, 2002.
S. Christodoulakis, Implications of Certain Assumptions in Data Base Perfomance Evaluations, ACM TODS 1984.
C. Faloutsos, H. V. Jagadish, N. D. Sidiripoulos. Recovering Information from Summary Data. In Proceedings of the 1997 VLDB, Athens, 1997
Y. Ioannidis, V. Poosala. Balancing histogram optimality and practicality for query result size estimation, In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Josè (CA), May 1995
H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, T. Suel. Optimal histograms for quality guarantees, In Proc. of the 1998 ACM SIGMOD Int. Conf. on Management of Data, Seattle (Washington), June 1998
F. Malvestuto, A Universal-Scheme Approach to Statistical Databases Containing Homogeneous Summary Tables, ACM TODS, 18(4), 678–708, December 1993.
Y. Matias, J. S. Vitter, M. Wang. Wavelet-based histograms for selectivity estimation, In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, June 1998.
V. Poosala, Y. E. Ioannidis, P. J. Haas, E. J. Shekita, Improved histograms for selectivity estimation of range predicates, In Proc. of the 1996 ACM SIGMOD Int. Conf. on Management of Data, Montreal (Canada), June 1996.
V. Poosala, V. Ganti, Y.E. Ioannidis, Approximate Query Answering using Histograms, IEEE Data Engineering Bulletin Vol. 22, March 1999.
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. T. Price. Access path selection in a relational database management system, In Proc. of ACM SIGMOD Internatinal Conference, Boston (MA), May-June 1979.
I. Sitzmann, P.J. Stuckey, Improving Temporal Joins Using Histograms, Proc. of 11th Int. Conf. on Database and Expert Systems Applications, London(UK), 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buccafurri, F., Furfaro, F., Lax, G., Saccá, D. (2002). Binary-Tree Histograms with Tree Indices. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2002. Lecture Notes in Computer Science, vol 2453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46146-9_85
Download citation
DOI: https://doi.org/10.1007/3-540-46146-9_85
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44126-7
Online ISBN: 978-3-540-46146-3
eBook Packages: Springer Book Archive