Skip to main content
Log in

Hybrid query optimization for hard-to-compress bit-vectors

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Bit-vectors are widely used for indexing and summarizing data due to their efficient processing in modern computers. Sparse bit-vectors can be further compressed to reduce their space requirement. Special compression schemes based on run-length encoders have been designed to avoid explicit decompression and minimize the decoding overhead during query execution. Moreover, highly compressed bit-vectors can exhibit a faster query time than the non-compressed ones. However, for hard-to-compress bit-vectors, compression does not speed up queries and can add considerable overhead. In these cases, bit-vectors are often stored verbatim (non-compressed). On the other hand, queries are answered by executing a cascade of bit-wise operations involving indexed bit-vectors and intermediate results. Often, even when the original bit-vectors are hard to compress, the intermediate results become sparse. It could be feasible to improve query performance by compressing these bit-vectors as the query is executed. In this scenario, it would be necessary to operate verbatim and compressed bit-vectors together. In this paper, we propose a hybrid framework where compressed and verbatim bitmaps can coexist and design algorithms to execute queries under this hybrid model. Our query optimizer is able to decide at run time when to compress or decompress a bit-vector. Our heuristics show that the applications using higher-density bitmaps can benefit from using this hybrid model, improving both their query time and memory utilization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC ’95: Proceedings of the Conference on Data Compression, p. 476. IEEE Computer Society, Washington, DC, USA (1995)

  2. Wu, K., Otoo, E.J., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: Proceedings of the 2002 International Conference on Scientific and Statistical Database Management Conference (SSDBM’02), pp. 99–108 (2002)

  3. Deliege, F., Pederson, T.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the 2010 International Conference on Extending Database Technology (EDBT’10), pp. 228–239 (2010)

  4. Wu, K., Otoo, E.J., Shoshani, A., Nordberg, H.: Notes on Design and Implementation of Compressed Bit Vectors, Tech. Rep. LBNL/PUB-3161, Lawrence Berkeley National Laboratory (2001)

  5. Colantonio, A., Di Pietro, R.: Concise: compressed ‘n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)

    Article  MATH  Google Scholar 

  6. Fusco, F., Stoecklin, M.P., Vlachos, M.: Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. Proc. VLDB Endow. 3(2), 1382–1393 (2010)

    Article  Google Scholar 

  7. Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 484–495. IEEE (2014)

  8. Wu, K., Otoo, E.J., Shoshani, A.: A performance comparison of bitmap indexes. In: CIKM 2001, pp. 559–561 (2001)

  9. Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69, 3–28 (2010)

    Article  Google Scholar 

  10. Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better Bitmap Performance with Roaring Bitmaps, arXiv preprint arXiv:1402.6407

  11. Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1), 1–38 (2006). doi:10.1145/1132863.1132864

    Article  Google Scholar 

  12. O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: ACM Sigmod Record, vol. 26, ACM, pp. 38–49 (1997)

  13. Rinfret, D.: Answering preference queries with bit-sliced index arithmetic. In: Proceedings of the 2008 C3S2E Conference (C3S2E ’08), pp. 173–185. ACM, New York, NY, USA (2008). doi:10.1145/1370256.1370286

  14. Guzun, G., Tosado, J., Canahuate, G.: Slicing the dimensionality: Top-k query processing for high-dimensional spaces. In: TLDKS 14

  15. O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 38–49. ACM Press (1997). doi:10.1145/253260.253268

  16. Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD’99), pp. 215–226. ACM, New York, NY, USA (1999). doi:10.1145/304182.304201

  17. Koudas, N.: Space efficient bitmap indexing. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM ’00), pp. 194–201. ACM, New York, NY, USA (2000). doi:10.1145/354756.354819

  18. Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. SIGMOD Rec. 30(2), 47–57 (2001). doi:10.1145/376284.375669

  19. Wu, M.-C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 220–230. IEEE Computer Society, Washington, DC, USA (1998)

  20. Fabian Corrales, D.C., Sawin, J.: Variable length compression for bitmap indices. In: ACM International Conference on Database and Expert Systems Applications, pp. 381–395 (2011)

  21. van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: ACM SIGMOD International Conference on Management of Data, pp. 913–924 (2011)

  22. Lu, P., Wu, S., Shou, L., Tan, K.-L.: An efficient and compact indexing scheme for large-scale data store. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 326–337. IEEE (2013). doi:10.1007/s10115-015-0877-9

  23. Guzun, G., Canahuate, G.: Performance evaluation of word-aligned compression methods for bitmap indices. Knowl. Inf. Syst. 1–28 (2015)

  24. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-Law Distributions in Empirical Data (2009). doi:10.1137/070710111

  25. Pareto, V.: Manual of Political Economy (1927) (trans: Ann S. Schwier and Alfred N. Page (New York: Augustus M. Kelley, 1971)), pp. 29–31

  26. lászló Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286.5439, 509–512 (1999)

  27. Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005). arXiv:cond-mat/0505371

    Article  Google Scholar 

  28. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  29. Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Comm. 5:1–10

  30. Rinfret, D.: Term Matching and Bit-sliced Index Arithmetic. Ph.D. thesis, pp. 1–10. University of Massachusetts, Boston (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gheorghi Guzun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guzun, G., Canahuate, G. Hybrid query optimization for hard-to-compress bit-vectors. The VLDB Journal 25, 339–354 (2016). https://doi.org/10.1007/s00778-015-0419-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-015-0419-9

Keywords