skip to main content
research-article

Optimistically Compressed Hash Tables & Strings in theUSSR

Published:17 June 2021Publication History
Skip Abstract Section

Abstract

Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure.

We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.

References

  1. https://public.tableau.com.Google ScholarGoogle Scholar
  2. https://github.com/cwida/public_bi_benchmark.Google ScholarGoogle Scholar
  3. D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Balkesen, J. Teubner, G. Alonso, and M. T. ¨Ozsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. Memory-efficient hash joins. PVLDB, 8(4):353--364, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Binnig, S. Hildenbrand, and F. F¨arber. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD, pages 283--296, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Boncz, A.-C. Anatiotis, and S. Kl¨abe. JCC-H: Adding join crossing correlations with skew to TPC-H. In TPCTC, pages 103--119, 2017.Google ScholarGoogle Scholar
  8. P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225--237, 2005.Google ScholarGoogle Scholar
  9. P. Celis. Robin Hood Hashing. PhD thesis, University of Waterloo, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Crolotte and A. Ghazal. Introducing skew into the TPC-H benchmark. In TPCTC, pages 137--145, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. F¨arber, S. K. Cha, J. Primsch, C. Bornh¨ovd, S. Sigg, and W. Lehner. SAP HANA database: Data management for modern business applications. SIGMOD Record, pages 45--51, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A scan accelerator for rapid and robust predicate evaluation. In SIGMOD, pages 857--872, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Jain, D. Moritz, D. Halperin, B. Howe, and E. Lazowska. SQLShare: Results from a multi-year SQL-as-a-service experiment. In SIGMOD, pages 281--293, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Lang, T. M¨uhlbauer, F. Funke, P. Boncz, T. Neumann, and A. Kemper. Data Blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Larson, C. Clinciu, C. Fraser, E. N. Hanson, M. Mokhtar, M. Nowakiewicz, V. Papadimos, S. L. Price, S. Rangarajan, R. Rusanu, and M. Saubhasik. Enhancements to SQL server column stores. In SIGMOD, pages 1159--1168, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J.-G. Lee, G. Attaluri, R. Barber, N. Chainani, O. Draese, F. Ho, S. Idreos, M.-S. Kim, S. Lightstone, G. Lohman, K. Morfonios, K. Murthy, I. Pandis, L. Qiao, V. Raman, V. K. Samy, R. Sidle, K. Stolze, and L. Zhang. Joins on encoded and partitioned data. PVLDB, 7(13):1355--1366, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Li and J. Patel. BitWeaving: Fast scans for main memory data processing. In SIGMOD, pages 289--300, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. M¨uller, C. Ratsch, and F. F¨arber. Adaptive string dictionary compression in in-memory column-store database systems. In EDBT, pages 283--294, 2014.Google ScholarGoogle Scholar
  19. R. Pagh and F. Rodler. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Patel, H. Deshmukh, J. Zhu, N. Potti, Z. Zhang, M. Spehlmann, H. Memisoglu, and S. Saurabh. Quickstep: A data platform based on the scaling-up approach. PVLDB, 11(6):663--676, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. Kulandaisamy, J. Leenstra, S. Lightstone, S. Liu, G. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Schuh, X. Chen, and J. Dittrich. An experimental comparison of thirteen relational equi-joins in main memory. In SIGMOD, pages 1961--1976, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. VLDB, pages 510--521, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Muehlbauer, T. Neumann, and M. Then. Get real: How benchmarks fail to represent the real world. In DBTEST, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Zukowski, N. Nes, and P. Boncz. DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing. In DaMoN, pages 47--54, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)5

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader