skip to main content
article

Optimizing bitmap indices with efficient compression

Published: 01 March 2006 Publication History

Abstract

Bitmap indices are efficient for answering queries on low-cardinality attributes. In this article, we present a new compression scheme called Word-Aligned Hybrid (WAH) code that makes compressed bitmap indices efficient even for high-cardinality attributes. We further prove that the new compressed bitmap index, like the best variants of the B-tree index, is optimal for one-dimensional range queries. More specifically, the time required to answer a one-dimensional range query is a linear function of the number of hits. This strongly supports the well-known observation that compressed bitmap indices are efficient for multidimensional range queries because results of one-dimensional range queries computed with bitmap indices can be easily combined to answer multidimensional range queries. Our timing measurements on range queries not only confirm the linear relationship between the query response time and the number of hits, but also demonstrate that WAH compressed indices answer queries faster than the commonly used indices including projection indices, B-tree indices, and other compressed bitmap indices.

References

[1]
Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt, A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan Kaufmann, San Francisco, CA, 329--338.]]
[2]
Antoshenkov, G. 1994. Byte-aligned bitmap compression. Tech. rep. Oracle Corp., Redwood Shores, CA. U.S. Patent number 5,363,098.]]
[3]
Antoshenkov, G. and Ziauddin, M. 1996. Query processing and optimization in ORACLE RDB. VLDB J. 5, 229--237.]]
[4]
Chan, C.-Y. and Ioannidis, Y. E. 1998. Bitmap index design and evaluation. In Proceedings of the 1998 ACM SIGMOD: International Conference on Management of Data. ACM Press, New York, NY, 355--366.]]
[5]
Chan, C. Y. and Ioannidis, Y. E. 1999. An efficient bitmap encoding scheme for selection queries. In SIGMOD 1999, Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA, A. Delis, C. Faloutsos, and S. Ghandeharizadeh, Eds. ACM Press, New York, NY, 215--226.]]
[6]
Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. ACM SIGMOD Rec. 26, 1 (Mar.), 65--74.]]
[7]
Comer, D. 1979. The ubiquitous B-tree. Comput. Surv. 11, 2, 121--137.]]
[8]
Czech, Z. J. and Majewski, B. S. 1993. A linear time algorithm for finding minimal perfect hash functions. Comput. J. 36, 6 (Dec.), 579--587.]]
[9]
Fox, E. A., Chen, Q. F., Daoud, A. M., and Heath, L. S. 1991. Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Syst. 9, 3, 281--308.]]
[10]
Furuse, K., Asada, K., and Iizawa, A. 1995. Implementation and performance evaluation of compressed bit-sliced signature files. In Information Systems and Data Management, 6th International Conference, CISMOD'95, Bombay, India, November 15--17, 1995, Proceedings, S. Bhalla, Ed. Lecture Notes in Computer Science, vol. 1006. Springer, Berlin, Germany, 164--177.]]
[11]
Gailly, J. and Adler, M. 1998. zlib 1.1.3 manual. Source code available online at http://-www.info-zip.org/pub/-infozip/zlib.]]
[12]
Ishikawa, Y., Kitagawa, H., and Ohbo, N. 1993. Evalution of signature files as set access facilities in OODBs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, (Washington, DC May 26--28), P. Buneman and S. Jajodia, Eds. ACM Press, New York, NY, 247--256.]]
[13]
Johnson, T. 1999. Performance measurements of compressed bitmap indices. In VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK, M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, Eds. Morgan Kaufmann, San Francisco, CA, 278--289. A longer version appeared as AT&T report number AMERICA112.]]
[14]
Jürgens, M. and Lenz, H.-J. 1999. Tree based indexes vs. bitmap indexes---a performance study. In Proceedings of the International Workshop on Design and Management of Data Warehouses, DMDW'99, Heidelberg, Germany, June 14-15, 1999, S. Gatziu, M. A. Jeusfeld, M. Staudt, and Y. Vassiliou, Eds.]]
[15]
Knuth, D. E. 1998. The Art of Computer Programming, 2nd ed. Vol. 3. Addison Wesley, Reading, MA.]]
[16]
Koudas, N. 2000. Space efficient bitmap indexing. In Proceedings of the Ninth International Conference on Information Knowledge Management (CIKM 2000, November 6--11, McLean, VA). ACM Press, New York, NY, 194--201.]]
[17]
Lee, D. L., Kim, Y. M., and Patel, G. 1995. Efficient signature file methods for text retrieval. IEEE Trans. Knowl. Data Eng. 7, 3, 423--435.]]
[18]
Moffat, A. and Zobel, J. 1992. Parameterised compression for sparse bitmaps. In Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, (Copenhagen, Denmark, June), N. Belkin, P. Ingwersen, and A. M. Pejtersen, Eds. ACM Press, New York, NY, 274--285.]]
[19]
O'Neil, P. 1987. Model 204 architecture and performance. In 2nd International Workshop in High Performance Transaction Systems, Asilomar, CA. Lecture Notes in Computer Science, vol. 359. Springer-Verlag, Berlin, Germany, 40--59.]]
[20]
O'Neil, P. and O'Neil, E. 2000. Database: Principles, Programming, and Performance, 2nd ed. Morgan Kaugmann, San Francisco, CA.]]
[21]
O'Neil, P. and Quass, D. 1997. Improved query performance with variant indices. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, AZ, May 13--15), J. Peckham, Ed. ACM Press, New York, NY, 38--49.]]
[22]
O'Neil, P. E. and Graefe, G. 1995. Multi-table joins through bitmapped join indices. SIGMOD Rec. 24, 3, 8--11.]]
[23]
Shoshani, A., Bernardo, L. M., Nordberg, H., Rotem, D., and Sim, A. 1999. Multidimensional indexing and query coordination for tertiary storage management. In Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Cleveland, OH, 28--30 July). IEEE Computer Society Press, Los Alamitos, CA, 214--225.]]
[24]
Stockinger, K., Duellmann, D., Hoschek, W., and Schikuta, E. 2000. Improving the performance of high-energy physics analysis through bitmap indices. In Proceedings of the 11th International Conference on Database and Expert Systems Applications (DEXA 2000, London, Greenwich, UK).]]
[25]
Stockinger, K., Wu, K., and Shoshani, A. 2002. Strategies for processing ad hoc queries on large data warehouses. In Proceedings of DOLAP'02 (McLean, VA), 72--79. A draft appeared as Tech rep. LBNL-51791, Lawrence Berkeley National Laboratory, Berkeley, CA.]]
[26]
Wong, H. K. T., Liu, H.-F., Olken, F., Rotem, D., and Wong, L. 1985. Bit transposed files. In Proceedings of VLDB 85 (Stockholm, Sweden). 448--457.]]
[27]
Wu, K.-L. and Yu, P. 1996. Range-based bitmap indexing for high cardinality attributes with skew. Tech. rep. RC 20449. IBM Watson Research Division, Yorktown Heights, NY.]]
[28]
Wu, M.-C. and Buchmann, A. P. 1998. Encoded bitmap indexing for data warehouses. In Proceedings of the Fourteenth International Conference on Data Engineering (February 23--27, Orlando, FL). IEEE Computer Society, ACM Press, Los Alamitos, CA, 220--230.]]
[29]
Wu, K., Koegler, W., Chen, J., and Shoshani, A. 2003. Using bitmap index for interactive exploration of large datasets. In Proceedings of SSDBM 2003 (Cambridge, MA), 65--74. A draft appeared as Tech rep. LBNL-52535, Lawrence Berkeley National Laboratory, Berkeley, CA.]]
[30]
Wu, K., Otoo, E. J., and Shoshani, A. 2001a. A performance comparison of bitmap indexes. In Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management (Atlanta, GA, November 5--10). ACM Press, New York, NY, 559--561.]]
[31]
Wu, K., Otoo, E. J., and Shoshani, A. 2002. Compressing bitmap indexes for faster search operations. In Proceedings of SSDBM'02 (Edinburgh, Scotland), 99--108. Also published as Tech rep. LBNL-49627, Lawrence Berkeley National Laboratory, Berkeley, CA.]]
[32]
Wu, K., Otoo, E. J., and Shoshani, A. 2004. On the performance of bitmap indices for high-cardinality attributes. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31-September 3, 2004, M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, Eds. Morgan Kaufmann, San Francisco, CA, 24--35.]]
[33]
Wu, K., Otoo, E. J., Shoshani, A., and Nordberg, H. 2001b. Notes on design and implementation of compressed bit vectors. Tech. rep. LBNL/PUB-3161. Lawrence Berkeley National Laboratory, Berkeley, CA.]]
[34]
Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theor. 23, 3, 337--343.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 31, Issue 1
March 2006
438 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1132863
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2006
Published in TODS Volume 31, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compression
  2. bitmap index
  3. query processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Accurate and Invertible Sketch for Super Spread DetectionElectronics10.3390/electronics1301022213:1(222)Online publication date: 3-Jan-2024
  • (2024)CUBIT: Concurrent Updatable Bitmap IndexingProceedings of the VLDB Endowment10.14778/3705829.370585418:2(399-412)Online publication date: 1-Oct-2024
  • (2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
  • (2024)BigSet: An Efficient Set Intersection ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.343259536:12(7677-7691)Online publication date: Dec-2024
  • (2024)MulRF: A Multi-Dimensional Range Filter for Sublinear Time Range Query ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339731336:11(6600-6613)Online publication date: 1-Nov-2024
  • (2024)A 3.55 mJ/frame Energy-efficient Mixed-Transformer based Semantic Segmentation Accelerator for Mobile Devices2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558649(1-5)Online publication date: 19-May-2024
  • (2024)pbins: private bins for top-k semantic search over encrypted data using transformersInternational Journal of Information Security10.1007/s10207-024-00911-224:1Online publication date: 12-Nov-2024
  • (2023)An Index for Set Intersection With Post-FilteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332914536:7(2862-2876)Online publication date: 1-Nov-2023
  • (2023)Lossy Scientific Data Compression With SPERR2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00104(1007-1017)Online publication date: May-2023
  • (2023)Towards Optimizing Storage Costs on the Cloud2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00223(2919-2932)Online publication date: Apr-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media