skip to main content
10.1145/2882903.2915964acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

UpBit: Scalable In-Memory Updatable Bitmap Indexing

Published: 14 June 2016 Publication History

Editorial Notes

Computationally Replicable. The experimental results of this paper were replicated by a SIGMOD Review Committee and were found to support the central results reported in the paper. Details of the review process are found here

Abstract

Bitmap indexes are widely used in both scientific and commercial databases. They bring fast read performance for specific types of queries, such as equality and selective range queries. A major drawback of bitmap indexes, however, is that supporting updates is particularly costly. Bitmap indexes are kept compressed to minimize storage footprint; as a result, updating a bitmap index requires the expensive step of decoding and then encoding a bitvector. Today, more and more applications need support for both reads and writes, blurring the boundaries between analytical processing and transaction processing. This requires new system designs and access methods that support general updates and, at the same time, offer competitive read performance. In this paper, we propose scalable in-memory Updatable Bitmap indexing (UpBit), which offers efficient updates, without hurting read performance. UpBit relies on two design points. First, in addition to the main bitvector for each domain value, UpBit maintains an update bitvector, to keep track of updated values. Effectively, every update can now be directed to a highly-compressible, easy-to-update bitvector. While update bitvectors double the amount of uncompressed data, they are sparse, and as a result their compressed size is small. Second, we introduce fence pointers in all update bitvectors which allow for efficient retrieval of a value at an arbitrary position. Using both synthetic and real-life data, we demonstrate that UpBit significantly outperforms state-of-the-art bitmap indexes for workloads that contain both reads and writes. In particular, compared to update-optimized bitmap index designs UpBit is 15-29x faster in terms of update time and 2.7x faster in terms of read performance. In addition, compared to read-optimized bitmap index designs UpBit achieves efficient and scalable updates (51-115x lower update latency), while allowing for comparable read performance, having up to 8% overhead.

Supplementary Material

ReadMe (read.txt)
Rights information
Reproducibility (upbit-repro-source-scripts_v1.0.1.zip)
Scripts, Source Files

References

[1]
G. Antoshenkov. Byte-aligned Bitmap Compression. In Proceedings of the Conference on Data Compression (DCC), pages 476--476, 1995.
[2]
M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. MaSM: Efficient Online Updates in Data Warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 865--876, 2011.
[3]
M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. Online Updates on Data Warehouses via Judicious Use of Solid-State Storage. ACM Transactions on Database Systems (TODS), 40(1), 2015.
[4]
M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 461--466, 2016.
[5]
J. Becla and K.-T. Lim. Report from the first workshop on extremely large databases (XLDB 2007). Data Science Journal, 7, feb 2008.
[6]
Berkeley. Berkeley Earth Data. http://berkeleyearth.org/data/.
[7]
M. Cain and K. Milligan. IBM DB2 for i indexing methods and strategies. IBM White Paper, 2011.
[8]
G. Canahuate, M. Gibas, and H. Ferhatosmanoglu. Update Conscious Bitmap Indices. In Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), pages 15--25, 2007.
[9]
C.-Y. Chan and Y. E. Ioannidis. Bitmap index design and evaluation. ACM SIGMOD Record, 27(2):355--366, 1998.
[10]
C.-Y. Chan and Y. E. Ioannidis. An efficient bitmap encoding scheme for selection queries. ACM SIGMOD Record, 28(2):215--226, 1999.
[11]
S. Chaudhuri and U. Dayal. An Overview of Data Warehousing and OLAP Technology. ACM SIGMOD Record, 26(1):65--74, 1997.
[12]
D. R. Cheriton, A. Firoozshahian, A. Solomatnikov, J. P. Stevenson, and O. Azizi. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 287--300, 2012.
[13]
A. Colantonio and R. Di Pietro. Concise: Compressed 'N' Composable Integer Set. Information Processing Letters, 110(16):644--650, 2010.
[14]
F. Deliège and T. B. Pedersen. Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 228--239, 2010.
[15]
F. Fusco, M. Vlachos, X. Dimitropoulos, and L. Deri. Indexing million of packets per second using GPUs. In Proceedings of the Conference on Internet Measurement Conference (IMC), pages 327--332, 2013.
[16]
F. Fusco, M. Vlachos, and M. P. Stoecklin. Real-time creation of bitmap indexes on streaming network data. The VLDB Journal, 21(3):287--307, 2011.
[17]
A. Gupta, F. Yang, J. Govig, A. Kirsch, K. Chan, K. Lai, S. Wu, S. G. Dhoot, A. R. Kumar, A. Agiwal, S. Bhansali, M. Hong, J. Cameron, M. Siddiqi, D. Jones, J. Shute, A. Gubarev, S. Venkataraman, and D. Agrawal. Mesa: Geo-replicated, Near Real-time, Scalable Data Warehousing. Proc. VLDB Endow., 7(12):1259--1270, 2014.
[18]
G. Guzun, G. Canahuate, D. Chiu, and J. Sawin. A Tunable Compression Framework for Bitmap Indices. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 484--495, 2014.
[19]
R. MacNicol and B. French. Sybase IQ Multiplex - Designed For Analytics. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 1227--1230, 2004.
[20]
P. E. O'Neil. Model 204 Architecture and Performance. In Proceedings of the International Workshop on High Performance Transaction Systems (HPTS), pages 40--59, 1987.
[21]
P. E. O'Neil and D. Quass. Improved query performance with variant indexes. ACM SIGMOD Record, 26(2):38--49, 1997.
[22]
Oracle. Oracle Database 12c for Data Warehousing and Big Data. Oracle White Paper, 2013.
[23]
L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman. Main-memory scan sharing for multi-core CPUs. Proceedings of the VLDB Endowment, 1(1):610--621, 2008.
[24]
P. Russom. High-Performance Data Warehousing. TDWI Best Practices Report, 2012.
[25]
V. Sharma. Bitmap Index vs. B-tree Index: Which and When? Oracle White Paper, 2005.
[26]
K. Stockinger. Bitmap Indices for Speeding Up High-Dimensional Data Analysis. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA), pages 881--890, 2002.
[27]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 553--564, 2005.
[28]
TPC. Specification of TPC-H benchmark. http://www.tpc.org/tpch/.
[29]
B. Wang, H. Litz, and D. R. Cheriton. HICAMP Bitmap: Space-Efficient Updatable Bitmap Index for In-Memory Databases. In Proceedings of the International Workshop on Data Management on New Hardware (DAMON), pages 1--7, 2014.
[30]
C. White. Intelligent business strategies: Real-time data warehousing heats up. DM Review, 2002.
[31]
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units. Proceedings of the VLDB Endowment, 2(1):385--394, 2009.
[32]
H. K. T. Wong, H.-F. Liu, F. Olken, D. Rotem, and L. Wong. Bit Transposed Files. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 448--457, 1985.
[33]
K. Wu, S. Ahern, E. W. Bethel, J. Chen, H. Childs, E. Cormier-Michel, C. Geddes, J. Gu, H. Hagen, B. Hamann, W. Koegler, J. Lauret, J. Meredith, P. Messmer, E. J. Otoo, V. Perevoztchikov, A. Poskanzer, O. Rübel, A. Shoshani, A. Sim, K. Stockinger, G. Weber, and W.-M. Zhang. FastBit: interactively searching massive data. Journal of Physics: Conference Series, 180(1):012053, 2009.
[34]
K. Wu, E. J. Otoo, and A. Shoshani. Optimizing Bitmap Indices with Efficient Compression. ACM Transactions on Database Systems (TODS), 31(1):1--38, 2006.
[35]
K. Wu, E. J. Otoo, A. Shoshani, and H. Nordberg. Notes on Design and Implementation of Compressed Bit Vectors. Technical report, Lawrence Berkeley National Laboratory, 2001.
[36]
M.-C. Wu and A. P. Buchmann. Encoded Bitmap Indexing for Data Warehouses. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 220--230, 1998.
[37]
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 145--156, 2002.
[38]
M. Zukowski, P. A. Boncz, and S. Héman. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Engineering Bulletin, 28(2):17--22, 2005.

Cited By

View all
  • (2022)In-Place Updates in Tree-Encoded BitmapsProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538745(1-4)Online publication date: 6-Jul-2022
  • (2022)BoDS: A Benchmark on Data SortednessPerformance Evaluation and Benchmarking10.1007/978-3-031-29576-8_2(17-32)Online publication date: 5-Sep-2022
  • (2021)A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open IssuesFuture Internet10.3390/fi1401001914:1(19)Online publication date: 31-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. bitmap index
  2. efficient updates
  3. fence pointers
  4. upbit
  5. update bitvectors

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)234
  • Downloads (Last 6 weeks)43
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)In-Place Updates in Tree-Encoded BitmapsProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538745(1-4)Online publication date: 6-Jul-2022
  • (2022)BoDS: A Benchmark on Data SortednessPerformance Evaluation and Benchmarking10.1007/978-3-031-29576-8_2(17-32)Online publication date: 5-Sep-2022
  • (2021)A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open IssuesFuture Internet10.3390/fi1401001914:1(19)Online publication date: 31-Dec-2021
  • (2021)Updatable Materialization of Approximate Constraints2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00189(1991-1996)Online publication date: Apr-2021
  • (2020)Cuckoo indexProceedings of the VLDB Endowment10.14778/3424573.342457713:13(3559-3572)Online publication date: 27-Oct-2020
  • (2020)Tree-Encoded BitmapsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380588(937-967)Online publication date: 11-Jun-2020
  • (2019)Optimal column layout for hybrid workloadsProceedings of the VLDB Endowment10.14778/3358701.335870712:13(2393-2407)Online publication date: 1-Sep-2019
  • (2019)FITing-TreeProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319860(1189-1206)Online publication date: 25-Jun-2019
  • (2019)Upbit with Parallelized Merge2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence)10.1109/CONFLUENCE.2019.8776903(625-629)Online publication date: Jan-2019
  • (2019)Adaptive partitioning and indexing for in situ query processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00580-x29:1(569-591)Online publication date: 15-Nov-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media