skip to main content
10.1145/1066677.1066817acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Hierarchical binary histograms for summarizing multi-dimensional data

Published: 13 March 2005 Publication History

Abstract

The need to compress data into synopses of summarized information often arises in many application scenarios, where the aim is to retrieve aggregate data efficiently, possibly trading off the computational efficiency with the accuracy of the estimation. A widely used approach for summarizing multi-dimensional data is the histogram-based representation scheme, which consists in partitioning the data domain into a number of blocks (called buckets), and then storing summary information for each block. In this paper, a new histogram-based summarization technique which is very effective for multi-dimensional data is proposed. This technique exploits a multi-resolution organization of summary data, on which an efficient physical representation model is defined. The adoption of this representation model (based on a hierarchical organization of the buckets) enables some storage space to be saved w.r.t. traditional histograms, which can be invested to obtain finer grain blocks, thus approximating data with more detail. Experimental results show that our technique yields higher accuracy in retrieving aggregate information from the histogram w.r.t. traditional approaches (classical multi-dimensional histograms as well as other types of summarization technique).

References

[1]
Acharya, S., Poosala, V., Ramaswamy, S., Selectivity estimation in spatial databases, Proc. ACM SIGMOD Conf. 1999, Philadelphia (PA), USA.
[2]
Chaudhuri, S., An overview of query optimization in relational systems, Proc. PODS 1998, Seattle (WA), USA.
[3]
Garofalakis, M., Gibbons, P. B., Wavelet synopses with error guarantees, Proc. ACM SIGMOD Conf. 2002, Madison (WI), USA.
[4]
Ioannidis, Y. E., Poosala, V., Balancing histogram optimality and practicality for query result size estimation, Proc. ACM SIGMOD Conf.1995, San Jose (CA), USA.
[5]
Jagadish, H. V., Jin, H., Ooi, B. C., Tan, K.-L., Global optimization of histograms, Proc. SIGMOD Conf. 2001, Santa Barbara (CA), USA.
[6]
Kooi, R. P., The optimization of queries in relational databases, PhD thesis, CWR University, 1980.
[7]
Korn, F., Johnson, T., Jagadish, H. V., Range selectivity estimation for continuous attributes, Proc. SSDBM Conf. 1999, Cleveland (OH), USA.
[8]
Muthukrishnan, S., Poosala, V., Suel, T., On rectangular partitioning in two dimensions: algorithms, complexity and applications, Proc. ICDT 1999, Jerusalem, Israel.
[9]
Poosala, V., Ioannidis, Y. E., Selectivity estimation without the attribute value independence assumption, Proc. VLDB Conf. 1997, Athens, Greece.
[10]
Vitter, J. S., Wang, M., Iyer, B., Data cube approximation and histograms via wavelets, Proc. CIKM 1998, Washington, USA.
[11]
Vitter, J. S., Wang, M., Approximate computation of multidimensional aggregates of sparse data using wavelets, Proc. ACM SIGMOD Conf. 1999, Philadelphia (PA), USA.

Cited By

View all
  • (2023)iDMS: An Index-Based Framework for Tracking Distributed Multidimensional Data Streams2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00231(1381-1388)Online publication date: 24-Jul-2023
  • (2018)Approximate range---sum query answering on data cubes with probabilistic guaranteesJournal of Intelligent Information Systems10.1007/s10844-006-0007-y28:2(161-197)Online publication date: 28-Dec-2018
  • (2017)Reducing the complexity of an adaptive radial basis function network with a histogram algorithmNeural Computing and Applications10.5555/3041299.316897428:1(365-378)Online publication date: 1-Jan-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
March 2005
1814 pages
ISBN:1581139640
DOI:10.1145/1066677
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. histograms
  2. multi-dimensional data
  3. range queries

Qualifiers

  • Article

Conference

SAC05
Sponsor:
SAC05: The 2005 ACM Symposium on Applied Computing
March 13 - 17, 2005
New Mexico, Santa Fe

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)iDMS: An Index-Based Framework for Tracking Distributed Multidimensional Data Streams2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00231(1381-1388)Online publication date: 24-Jul-2023
  • (2018)Approximate range---sum query answering on data cubes with probabilistic guaranteesJournal of Intelligent Information Systems10.1007/s10844-006-0007-y28:2(161-197)Online publication date: 28-Dec-2018
  • (2017)Reducing the complexity of an adaptive radial basis function network with a histogram algorithmNeural Computing and Applications10.5555/3041299.316897428:1(365-378)Online publication date: 1-Jan-2017
  • (2010)LSA-Based Compression of Data Cubes for Efficient Approximate Range-SUM Query Answering in OLAPAdvances in Intelligent Information Systems10.1007/978-3-642-05183-8_5(111-145)Online publication date: 2010
  • (2009)Histogram-Based Compression of Databases and Data CubesEncyclopedia of Information Science and Technology, Second Edition10.4018/978-1-60566-026-4.ch274(1743-1752)Online publication date: 2009
  • (2009)Optimality and scalability in lattice histogram constructionProceedings of the VLDB Endowment10.14778/1687627.16877032:1(670-681)Online publication date: 1-Aug-2009
  • (2008)Hierarchical synopses with optimal error guaranteesACM Transactions on Database Systems10.1145/1386118.138612433:3(1-53)Online publication date: 3-Sep-2008
  • (2008)Lattice HistogramsProceedings of the 2008 IEEE 24th International Conference on Data Engineering10.1109/ICDE.2008.4497433(247-256)Online publication date: 7-Apr-2008
  • (2006)Accuracy Control in Compressed Multidimensional Data Cubes for Quality of Answer-based OLAP ToolsProceedings of the 18th International Conference on Scientific and Statistical Database Management10.1109/SSDBM.2006.10(301-310)Online publication date: 3-Jul-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media