Abstract
Bit-vectors are widely used for indexing and summarizing data due to their efficient processing in modern computers. Sparse bit-vectors can be further compressed to reduce their space requirement. Special compression schemes based on run-length encoders have been designed to avoid explicit decompression and minimize the decoding overhead during query execution. Moreover, highly compressed bit-vectors can exhibit a faster query time than the non-compressed ones. However, for hard-to-compress bit-vectors, compression does not speed up queries and can add considerable overhead. In these cases, bit-vectors are often stored verbatim (non-compressed). On the other hand, queries are answered by executing a cascade of bit-wise operations involving indexed bit-vectors and intermediate results. Often, even when the original bit-vectors are hard to compress, the intermediate results become sparse. It could be feasible to improve query performance by compressing these bit-vectors as the query is executed. In this scenario, it would be necessary to operate verbatim and compressed bit-vectors together. In this paper, we propose a hybrid framework where compressed and verbatim bitmaps can coexist and design algorithms to execute queries under this hybrid model. Our query optimizer is able to decide at run time when to compress or decompress a bit-vector. Our heuristics show that the applications using higher-density bitmaps can benefit from using this hybrid model, improving both their query time and memory utilization.










Similar content being viewed by others
References
Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC ’95: Proceedings of the Conference on Data Compression, p. 476. IEEE Computer Society, Washington, DC, USA (1995)
Wu, K., Otoo, E.J., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: Proceedings of the 2002 International Conference on Scientific and Statistical Database Management Conference (SSDBM’02), pp. 99–108 (2002)
Deliege, F., Pederson, T.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the 2010 International Conference on Extending Database Technology (EDBT’10), pp. 228–239 (2010)
Wu, K., Otoo, E.J., Shoshani, A., Nordberg, H.: Notes on Design and Implementation of Compressed Bit Vectors, Tech. Rep. LBNL/PUB-3161, Lawrence Berkeley National Laboratory (2001)
Colantonio, A., Di Pietro, R.: Concise: compressed ‘n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)
Fusco, F., Stoecklin, M.P., Vlachos, M.: Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. Proc. VLDB Endow. 3(2), 1382–1393 (2010)
Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 484–495. IEEE (2014)
Wu, K., Otoo, E.J., Shoshani, A.: A performance comparison of bitmap indexes. In: CIKM 2001, pp. 559–561 (2001)
Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69, 3–28 (2010)
Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better Bitmap Performance with Roaring Bitmaps, arXiv preprint arXiv:1402.6407
Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1), 1–38 (2006). doi:10.1145/1132863.1132864
O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: ACM Sigmod Record, vol. 26, ACM, pp. 38–49 (1997)
Rinfret, D.: Answering preference queries with bit-sliced index arithmetic. In: Proceedings of the 2008 C3S2E Conference (C3S2E ’08), pp. 173–185. ACM, New York, NY, USA (2008). doi:10.1145/1370256.1370286
Guzun, G., Tosado, J., Canahuate, G.: Slicing the dimensionality: Top-k query processing for high-dimensional spaces. In: TLDKS 14
O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 38–49. ACM Press (1997). doi:10.1145/253260.253268
Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD’99), pp. 215–226. ACM, New York, NY, USA (1999). doi:10.1145/304182.304201
Koudas, N.: Space efficient bitmap indexing. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM ’00), pp. 194–201. ACM, New York, NY, USA (2000). doi:10.1145/354756.354819
Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. SIGMOD Rec. 30(2), 47–57 (2001). doi:10.1145/376284.375669
Wu, M.-C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 220–230. IEEE Computer Society, Washington, DC, USA (1998)
Fabian Corrales, D.C., Sawin, J.: Variable length compression for bitmap indices. In: ACM International Conference on Database and Expert Systems Applications, pp. 381–395 (2011)
van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: ACM SIGMOD International Conference on Management of Data, pp. 913–924 (2011)
Lu, P., Wu, S., Shou, L., Tan, K.-L.: An efficient and compact indexing scheme for large-scale data store. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 326–337. IEEE (2013). doi:10.1007/s10115-015-0877-9
Guzun, G., Canahuate, G.: Performance evaluation of word-aligned compression methods for bitmap indices. Knowl. Inf. Syst. 1–28 (2015)
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-Law Distributions in Empirical Data (2009). doi:10.1137/070710111
Pareto, V.: Manual of Political Economy (1927) (trans: Ann S. Schwier and Alfred N. Page (New York: Augustus M. Kelley, 1971)), pp. 29–31
lászló Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286.5439, 509–512 (1999)
Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005). arXiv:cond-mat/0505371
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Comm. 5:1–10
Rinfret, D.: Term Matching and Bit-sliced Index Arithmetic. Ph.D. thesis, pp. 1–10. University of Massachusetts, Boston (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guzun, G., Canahuate, G. Hybrid query optimization for hard-to-compress bit-vectors. The VLDB Journal 25, 339–354 (2016). https://doi.org/10.1007/s00778-015-0419-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-015-0419-9