Abstract
Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure.
We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.
- https://public.tableau.com.Google Scholar
- https://github.com/cwida/public_bi_benchmark.Google Scholar
- D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006. Google ScholarDigital Library
- C. Balkesen, J. Teubner, G. Alonso, and M. T. ¨Ozsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarDigital Library
- R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. Memory-efficient hash joins. PVLDB, 8(4):353--364, 2014. Google ScholarDigital Library
- C. Binnig, S. Hildenbrand, and F. F¨arber. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD, pages 283--296, 2009. Google ScholarDigital Library
- P. Boncz, A.-C. Anatiotis, and S. Kl¨abe. JCC-H: Adding join crossing correlations with skew to TPC-H. In TPCTC, pages 103--119, 2017.Google Scholar
- P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225--237, 2005.Google Scholar
- P. Celis. Robin Hood Hashing. PhD thesis, University of Waterloo, 1986. Google ScholarDigital Library
- A. Crolotte and A. Ghazal. Introducing skew into the TPC-H benchmark. In TPCTC, pages 137--145, 2012. Google ScholarDigital Library
- F. F¨arber, S. K. Cha, J. Primsch, C. Bornh¨ovd, S. Sigg, and W. Lehner. SAP HANA database: Data management for modern business applications. SIGMOD Record, pages 45--51, 2012. Google ScholarDigital Library
- B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A scan accelerator for rapid and robust predicate evaluation. In SIGMOD, pages 857--872, 2018. Google ScholarDigital Library
- S. Jain, D. Moritz, D. Halperin, B. Howe, and E. Lazowska. SQLShare: Results from a multi-year SQL-as-a-service experiment. In SIGMOD, pages 281--293, 2016. Google ScholarDigital Library
- H. Lang, T. M¨uhlbauer, F. Funke, P. Boncz, T. Neumann, and A. Kemper. Data Blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In SIGMOD, 2016. Google ScholarDigital Library
- P. Larson, C. Clinciu, C. Fraser, E. N. Hanson, M. Mokhtar, M. Nowakiewicz, V. Papadimos, S. L. Price, S. Rangarajan, R. Rusanu, and M. Saubhasik. Enhancements to SQL server column stores. In SIGMOD, pages 1159--1168, 2013. Google ScholarDigital Library
- J.-G. Lee, G. Attaluri, R. Barber, N. Chainani, O. Draese, F. Ho, S. Idreos, M.-S. Kim, S. Lightstone, G. Lohman, K. Morfonios, K. Murthy, I. Pandis, L. Qiao, V. Raman, V. K. Samy, R. Sidle, K. Stolze, and L. Zhang. Joins on encoded and partitioned data. PVLDB, 7(13):1355--1366, 2014. Google ScholarDigital Library
- Y. Li and J. Patel. BitWeaving: Fast scans for main memory data processing. In SIGMOD, pages 289--300, 2013. Google ScholarDigital Library
- I. M¨uller, C. Ratsch, and F. F¨arber. Adaptive string dictionary compression in in-memory column-store database systems. In EDBT, pages 283--294, 2014.Google Scholar
- R. Pagh and F. Rodler. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004. Google ScholarDigital Library
- J. Patel, H. Deshmukh, J. Zhu, N. Potti, Z. Zhang, M. Spehlmann, H. Memisoglu, and S. Saurabh. Quickstep: A data platform based on the scaling-up approach. PVLDB, 11(6):663--676, 2018. Google ScholarDigital Library
- V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. Kulandaisamy, J. Leenstra, S. Lightstone, S. Liu, G. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
- S. Schuh, X. Chen, and J. Dittrich. An experimental comparison of thirteen relational equi-joins in main memory. In SIGMOD, pages 1961--1976, 2016. Google ScholarDigital Library
- A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. VLDB, pages 510--521, 1994. Google ScholarDigital Library
- A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Muehlbauer, T. Neumann, and M. Then. Get real: How benchmarks fail to represent the real world. In DBTEST, 2018. Google ScholarDigital Library
- T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarDigital Library
- M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. In ICDE, 2006. Google ScholarDigital Library
- M. Zukowski, N. Nes, and P. Boncz. DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing. In DaMoN, pages 47--54, 2008. Google ScholarDigital Library
Recommendations
Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache
Cache memories play a critical role in bridging the latency, bandwidth, and energy gaps between cores and off-chip memory. However, caches frequently consume a significant fraction of a multicore chip's area and thus account for a significant fraction ...
Improving hash join performance through prefetching
Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU ...
Prefix and Suffix Reversals on Strings
SPIRE 2015: Proceedings of the 22nd International Symposium on String Processing and Information Retrieval - Volume 9309The Sorting by Prefix Reversals problem consists in sorting the elements of a given permutation $$\pi $$ with a minimum number of prefix reversals, i.e. reversals that always imply the leftmost element of $$\pi $$. A natural extension of this problem is ...
Comments