skip to main content
research-article

Analyses of multi-level and multi-component compressed bitmap indexes

Published: 15 February 2008 Publication History

Abstract

Bitmap indexes are known as the most effective indexing methods for range queries on append-only data, and many different bitmap indexes have been proposed in the research literature. However, only two of the simplest ones are used in commercial products. To better understand the benefits offered by the more sophisticated variations, we conduct an analytical comparison of well-known bitmap indexes, most of which are in the class of multi-component bitmap indexes. Our analysis is the first to fully incorporate the effects of compression on their performance. We produce closed-form formulas for both the index sizes and the query processing costs for the worst cases. One surprising finding is that the two simple indexes are in fact the best among multi-component indexes. Additionally, we investigate a number of novel variations in a class of multi-level indexes, and find that they answer queries faster than the best of multi-component indexes. More specifically, some two-level indexes are predicted by analyses and verified with experiments to be 5 to 10 times faster than well-known indexes. Furthermore, these two-level indexes have the optimal computational complexity for answering queries.

Supplementary Material

Wu Appendix (a2-wu-apndx.pdf)
Online appendix to analyses of multi-level and multi-component compressed bitmap indexes on article 02.

References

[1]
Antoshenkov, G. 1995. Byte-Aligned bitmap compression. In Proceedings of the Data Compression Conference (DCC'95). U.S. Patent number 5,363,098.
[2]
Antoshenkov, G. and Ziauddin, M. 1996. Query processing and optimization in ORACLE RDB. VLDB J. 5, 229--237.
[3]
Bloch, G., Greiner, S., De Meer, H., and Trivedi, K. S. 1998. Queueing Networks and Markov Chains. John Wiley and Sons, New York.
[4]
Bookstein, A. and Klein, S. T. 1990. Using bitmaps for medium sized information retrieval systems. Inform. Process. Manag. 26, 525--533.
[5]
Bookstein, A., Klein, S. T., and Raita, T. 2000. Simple Bayesian model for bitmap compression. Inform. Retriev. 1, 4, 315--328.
[6]
Chan, C. -Y. and Ioannidis, Y. E. 1998. Bitmap index design and evaluation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, 355--366.
[7]
Chan, C. Y. and Ioannidis, Y. E. 1999. An efficient bitmap encoding scheme for selection queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, 215--226.
[8]
Chaudhuri, S., Dayal, U., and Ganti, V. 2001. Database technology for decision support systems. Comput. 34, 12, 48--55.
[9]
Comer, D. 1979. The ubiquitous B-tree. Comput. Surv. 11, 2, 121--137.
[10]
Delis, A., Faloutsos, C., and Ghandeharizadeh, S., Eds. 1999. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York.
[11]
Faloutsos, C. and Christodoulakis, S. 1984. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Inform. Syst. 2, 4, 267--288.
[12]
Faloutsos, C., Matias, Y., and Silberschatz, A. 1996. Modeling skewed distribution using multifractals and the '80-20' law. In Proceedings of the International Conference on Very Large DataBases (VLDB '96). Morgan Kaufmann, San Francisco, CA, 307--317.
[13]
Hu, Y., Sundara, S., Chorma, T., and Srinivasan, J. 2005. Supporting RFID-based item tracking applications in oracle DBMS using a bitmap datatype. In Proceedings of the International Conference on Very Large DataBases (VLDB). K. Böhm, C. S. Jensen, L. M. Haas, M. L. Kersten, P.-Å. Larson, and B. C. Ooi Eds., ACM Press, New York, 1140--1151.
[14]
Johnson, T. 1999. Performance measurements of compressed bitmap indices. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB'99). M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie Eds. Morgan Kaufmann, San Francisco, CA, 278--289.
[15]
Knuth, D. E. 1998. The Art of Computer Programming 2nd Ed. Vol. 3. Addison Wesley.
[16]
Koudas, N. 2000. Space efficient bitmap indexing. In Proceedings of the 9th International Conference on Information Knowledge Management (CIKM'00). ACM Press, New York, 194--201.
[17]
Liebendörfer, M., Rampp, M., Janka, H.-T., and Mezzacappa, A. 2005. Supernova simulations with boltzmann neutrino transport: A comparison of methods. Astrophys. J. 620, 840--860.
[18]
Lin, X., Li, Y., and Tsang, C. P. 1999. Applying on-line bitmap indexing to reduce counting costs in mining association rules. Inform. Sci. 120, 1-4, 197--208.
[19]
MacNicol, R. and French, B. 2004. Sybase IQ multiplex - Designed for analytics. In Proceedings of the 30th International Conference on Very Large DataBases (VLDB'04). 1227--1230.
[20]
Moffat, A. and Zobel, J. 1992. Parameterised compression for sparse bitmaps. In Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval. N. Belkin, P. Ingwersen, and A. M. Pejtersen Eds., ACM Press, New York, 274--285.
[21]
Nascimento, M. A., Özsu, M. T., Kossmann, D., Miller, R. J., Blakeley, J. A., and Schiefer, K. B., Eds. 2004. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04). Morgan Kaufmann.
[22]
O'Neil, E., O'Neil, P., and Wu, K. 2007. Bitmap index design choices and their performance implications. In Proceedings of IDEAS'07. 72--84.
[23]
O'Neil, P. 1987. Model 204 architecture and performance. In Proceedings of the 2nd International Workshop in High Performance Transaction Systems. Lecture Notes in Computer Science, vol. 359. Springer, 40--59.
[24]
O'Neil, P. 1997. Informix indexing support for data warehouses. Datab. Program. Des. 10, 2, 38--43.
[25]
O'Neil, P. and Quass, D. 1997. Improved query performance with variant indices. In Proceedings of the ACM SIGMOD International Conference on Management of Data. J. Peckham Ed., ACM Press, New York, 38--49.
[26]
O'Neil, P. E. and Graefe, G. 1995. Multi-Table joins through bitmapped join indices. SIGMOD Rec. 24, 3, 8--11.
[27]
Pedersen, T. B. and Jensen, C. S. 1998. Research issues in clinical data warehousing. In Proceedings of the 10th International Conference on Scientific and Statistical Database Management. M. Rafanelli and M. Jarke Eds., IEEE Computer Society, 43--52.
[28]
Rotem, D., Stockinger, K., and Wu, K. 2005. Optimizing candidate check costs for bitmap indices. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'05).
[29]
Sinha, R. R. and Winslett, M. 2007. Multi-Resolution bitmap indexes for scientific data. ACM Trans. Datab. Syst. 32, 3, 16.
[30]
Stockinger, K., Shalf, J., Bethel, W., and Wu, K. 2005. DEX: Increasing the capability of scientific data analysis pipelines by using efficient bitmap indices to accelerate scientific visualization. In Proceedings of the International Conference on Statistical and Scientific Database Management (SSDBM'05). 35--44. LBNL rep. LBNL-57023.
[31]
Stockinger, K. and Wu, K. 2006. Bitmap Indices for Data Warehouses. Idea Group, Inc., Chapter VII, 179--202. LBNL-59952.
[32]
Stockinger, K., Wu, K., and Shoshani, A. 2002. Strategies for processing ad hoc queries on large data warehouses. In Proceedings of the International Workshop on Data Warehouseing and OLAP (DOLAP'02). 72--79.
[33]
Stockinger, K., Wu, K., and Shoshani, A. 2004. Evaluation strategies for bitmap indices with binning. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA'04). Springer.
[34]
Wong, H. K. T., Liu, H.-F., Olken, F., Rotem, D., and Wong, L. 1985. Bit transposed files. In Proceedings of the International Conference on Very Large DataBases (VLDB'85). 448--457.
[35]
Wu, K. 2005. FastBit: An efficient indexing technology for accelerating data-intensive science. J. Phys.: Conf. Ser. 16, 556--560.
[36]
Wu, K., Otoo, E., and Shoshani, A. 2001a. Compressed bitmap indices for efficient query processing. Tech. rep. LBNL-47807, Lawrence Berkeley National Laboratory, Berkeley, CA.
[37]
Wu, K., Otoo, E., and Shoshani, A. 2001b. A performance comparison of bitmap indexes. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 559--561.
[38]
Wu, K., Otoo, E., and Shoshani, A. 2004. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the International Conference on Very Large Databases (VLDB'04). 24--35. LBNL-54673.
[39]
Wu, K., Otoo, E., and Shoshani, A. 2006. Optimizing bitmap indices with efficient compression. ACM Trans. Datab. Syst. 31, 1--38.
[40]
Wu, K.-L. and Yu, P. 1996. Range-Based bitmap indexing for high cardinality attributes with skew. Tech. rep. RC 20449, IBM Watson Research Division, Yorktown Heights, NY.
[41]
Wu, M.-C. 1999. Query optimization for selections using bitmaps. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York.
[42]
Wu, M.-C. and Buchmann, A. P. 1998. Encoded bitmap indexing for data warehouses. In Proceedings of the 14th International Conference on Data Engineering. IEEE Computer Society, 220--230.
[43]
Zobel, J., Moffat, A., and Ramamohanarao, K. 1998. Inverted files versus signature files for text indexing. ACM Trans. Datab. Syst. 23, 4, 453--490.

Cited By

View all
  • (2024)Query Optimization Using Indexation Techniques in Datawarehouse: Survey and Use CasesArtificial Intelligence, Data Science and Applications10.1007/978-3-031-48465-0_53(406-412)Online publication date: 5-Mar-2024
  • (2022)Top-k dominating queries on incomplete large datasetThe Journal of Supercomputing10.1007/s11227-021-04005-x78:3(3976-3997)Online publication date: 1-Feb-2022
  • (2022)Optimizing data query performance of Bi-cluster for large-scale scientific data in supercomputersThe Journal of Supercomputing10.1007/s11227-021-03965-478:2(2417-2441)Online publication date: 1-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 35, Issue 1
February 2010
310 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1670243
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Accepted: 01 July 2009
Revised: 01 March 2009
Received: 01 June 2008
Published: 15 February 2008
Published in TODS Volume 35, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multi-component bitmap index
  2. compression
  3. multi-level bitmap index
  4. performance analysis

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Query Optimization Using Indexation Techniques in Datawarehouse: Survey and Use CasesArtificial Intelligence, Data Science and Applications10.1007/978-3-031-48465-0_53(406-412)Online publication date: 5-Mar-2024
  • (2022)Top-k dominating queries on incomplete large datasetThe Journal of Supercomputing10.1007/s11227-021-04005-x78:3(3976-3997)Online publication date: 1-Feb-2022
  • (2022)Optimizing data query performance of Bi-cluster for large-scale scientific data in supercomputersThe Journal of Supercomputing10.1007/s11227-021-03965-478:2(2417-2441)Online publication date: 1-Feb-2022
  • (2022)Revealing top-k dominant individuals in incomplete data based on spark environmentEnvironment, Development and Sustainability10.1007/s10668-022-02652-5Online publication date: 3-Oct-2022
  • (2022)Performance Evaluation of Embedded Time Series Indexes Using Bitmaps, Partitioning, and TreesSensor Networks10.1007/978-3-031-17718-7_7(125-151)Online publication date: 27-Sep-2022
  • (2020)Big Data architecture for intelligent maintenance: a focus on query processing and machine learning algorithmsJournal of Big Data10.1186/s40537-020-00340-77:1Online publication date: 12-Aug-2020
  • (2020)Tree-Encoded BitmapsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380588(937-967)Online publication date: 11-Jun-2020
  • (2020)A survey on indexing techniques for mobility in Internet of Things'International Journal of Network Management10.1002/nem.209730:4Online publication date: 7-Jul-2020
  • (2020)Optimizing bitmap index encoding for high performance queriesConcurrency and Computation: Practice and Experience10.1002/cpe.594333:18Online publication date: 7-Sep-2020
  • (2019)Orchestrating Big Data Analysis Workflows in the CloudACM Computing Surveys10.1145/333230152:5(1-41)Online publication date: 13-Sep-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media