skip to main content
10.1145/2513591.2513641acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Dynamic bitmap index recompression through workload-based optimizations

Published: 09 October 2013 Publication History

Abstract

Many large-scale read-only databases and data warehouses use bitmap indices in an effort to speed up data analysis. These indices have the dual properties of compressibility and being able to leverage fast bit-wise operations for query processing. Numerous hybrid run-length encoding compression schemes have been proposed that greatly compress the index and enable querying without the need to decompress. Typically, these schemes align their compression with the computer architecture's word size to further accelerate queries.
Previously, we introduced Variable Length Compression (VLC), which uses a general encoding that can achieve better compression than word-aligned schemes. However, VLC's querying efficiency can vary widely due to mismatched alignment of compressed columns. In this paper, we present an optimizer which recompresses the bitmap over time. Based on query history, our approach allows the VLC user to specify the priority of compression versus query efficiency, then possibly recompress the bitmap accordingly. In an empirical study using scientific data sets, we showed that our approach was able to achieve both better compression ratios and query speedup over WAH and PLWAH. On the largest data set, our VLC optimizer compressed up to 1.73x better than WAH, and 1.46x over PLWAH. We also show a slight improvement in query efficiency in most experiments, while observing lucrative (11x to 16x) speedup in special cases.

References

[1]
M. J. Zaki and J. T. L. Wang, "Special issue on bionformatics and biological data management," in Information Systems, pp. "28:241--367", 2003.
[2]
F. Donno and M. Litmaath, "Data management in wlcg and egee. worldwide lhc computing grid," Tech. Rep. CERN-IT-Note-2008-002, CERN, Geneva, Feb 2008.
[3]
J. Harris, "Star collaboration 1994 nuclear physics a 566 277 285," 1994.
[4]
I. Spiegler and R. Maayan, "Storage and retrieval considerations of binary data bases.," Inf. Process. Manage., vol. 21, no. 3, pp. 233--254, 1985.
[5]
H. K. T. Wong, H. fen Liu, F. Olken, D. Rotem, and L. Wong, "Bit transposed files," in Proceedings of VLDB 85, pp. 448--457, 1985.
[6]
P. E. O'Neil, "Model 204 architecture and performance," in Proceedings of the 2nd International Workshop on High Performance Transaction Systems, (London, UK), pp. 40--59, Springer-Verlag, 1989.
[7]
"Apache Hive Project, http://hive.apache.org."
[8]
G. Antoshenkov, "Byte-aligned bitmap compression," in DCC '95: Proceedings of the Conference on Data Compression, p. 476, 1995.
[9]
K. Wu, E. J. Otoo, and A. Shoshani, "Compressing bitmap indexes for faster search operations," in SSDBM'02, pp. 99--108, 2002.
[10]
F. Deliège and T. B. Pedersen, "Position list word aligned hybrid: optimizing space and performance for compressed bitmaps," in In Proc. of EDBT, pp. 228--239, 2010.
[11]
F. Fusco, M. P. Stoecklin, and M. Vlachos, "Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic," VLDB'10, vol. 3, pp. 1382--1393, Sept. 2010.
[12]
F. Corrales, D. Chiu, and J. Sawin, "Variable Length Compression for Bitmap Indices," in DEXA'11, (Berlin, Heidelberg), pp. 381--395, Springer-Verlag, 2011.
[13]
D. Haynes, S. M. Corns, and G. K. Venayagamoorthy, "An exponential moving average algorithm," in IEEE Congress on Evolutionary Computation, pp. 1--8, 2012.
[14]
S. Chitraganti, S. Aberkane, and C. Aubrun, "Statistical properties of exponentially weighted moving average algorithm for change detection," in CDC, pp. 574--578, 2012.
[15]
K. Wu, E. Otoo, and A. Shoshani, "An efficient compression scheme for bitmap indices," in ACM Transactions on Database Systems, 2004.
[16]
F. Doan, "Workload-Driven Bitmap Recompression for Real-Time Query Acceleration," Master's thesis, Washington State University, 2013.
[17]
S. Rosset and A. Inger, "Kdd-cup 99: knowledge discovery in a charitable organization's donor database," SIGKDD Explor. Newsl., vol. 1, pp. 85--90, Jan. 2000.
[18]
A. Pinar, T. Tao, and H. Ferhatosmanoglu, "Compressing bitmap indices by data reorganization," in Proceedings of the 2005 International Conference on Data Engineering (ICDE'05), pp. 310--321, 2005.
[19]
T. Apaydin, A. Tosun, and H. Ferhatosmanoglu, "Analysis of basic data reordering techniques," in International Conference on Scientific and Statistical Database Management, pp. 517--524, 2008.
[20]
D. Lemire, O. Kaser, and K. Aouiche, "Sorting improves word-aligned bitmap indexes," Data and Knowledge Engineering, vol. 69, pp. 3--28, 2010.
[21]
E. Pourabbas, A. Shoshani, and K. Wu, "Minimizing index size by reordering rows and columns," in SSDBM'12, pp. 467--484, 2012.
[22]
A. Dan, P. S. Yu, and J.-Y. Chung, "Characterization of database access skew in a transaction processing environment," in Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, SIGMETRICS/PERFORMANCE'92, pp. 251--252, 1992.
[23]
A. Dan, P. S. Yu, and J. yao Chung, "Characterization of database access pattern for analytic prediction of buffer hit probability," VLDB Journal, vol. 4, pp. 127--154, 1995.
[24]
G. Weikum, A. Moenkeberg, C. Hasse, and P. Zabback, "Self-tuning database technology and information services: from wishful thinking to viable engineering," in VLDB, pp. 20--31, 2002.
[25]
K. P. Brown, M. Mehta, M. J. Carey, and M. Livny, "Towards automated performance tuning for complex workloads," in VLDB, pp. 72--84, 1994.
[26]
J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah, "Adaptive query processing: Technology in evolution," IEEE DATA ENGINEERING BULLETIN, vol. 23, no. 2, pp. 7--18, 2000.
[27]
M. Stillger, G. M. Lohman, V. Markl, and M. Kandil, "Leo - db2's learning optimizer," in VLDB, pp. 19--28, 2001.
[28]
A. Mehta, C. Gupta, S. Wang, U. Dayal, et al., "Automated workload management for enterprise data warehouses," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, no. 1, pp. 11--19, 2008.
[29]
N. Koudas, "Space efficient bitmap indexing," in Proceedings of the 9th international conference on Information and knowledge management, CIKM '00, (New York, NY, USA), pp. 194--201, ACM, 2000.
[30]
D. Rotem, K. Stockinger, and K. Wu, "Optimizing candidate check costs for bitmap indices," in Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 648--655, 2005.

Index Terms

  1. Dynamic bitmap index recompression through workload-based optimizations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IDEAS '13: Proceedings of the 17th International Database Engineering & Applications Symposium
    October 2013
    222 pages
    ISBN:9781450320252
    DOI:10.1145/2513591
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • UPC: Technical University of Catalunya
    • BytePress
    • Concordia University: Concordia University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bitmap
    2. compression
    3. indexing
    4. modeling
    5. optimization

    Qualifiers

    • Research-article

    Conference

    IDEAS '13
    Sponsor:
    • UPC
    • Concordia University

    Acceptance Rates

    IDEAS '13 Paper Acceptance Rate 9 of 51 submissions, 18%;
    Overall Acceptance Rate 74 of 210 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 117
      Total Downloads
    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media