research-article

Optimistically Compressed Hash Tables & Strings in theUSSR

Authors:
Tim Gubner

CWI

CWI
View Profile

,
Viktor Leis

FSU Jena

FSU Jena
View Profile

,
Peter Boncz

CWI

CWI
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 50 Issue 1March 2021pp 60–67https://doi.org/10.1145/3471485.3471500

Published:17 June 2021Publication History

ACM SIGMOD Record

Abstract

Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure.

We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.

References

https://public.tableau.com.Google Scholar
https://github.com/cwida/public_bi_benchmark.Google Scholar
D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. ¨Ozsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarDigital Library
R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. Memory-efficient hash joins. PVLDB, 8(4):353--364, 2014. Google ScholarDigital Library
C. Binnig, S. Hildenbrand, and F. F¨arber. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD, pages 283--296, 2009. Google ScholarDigital Library
P. Boncz, A.-C. Anatiotis, and S. Kl¨abe. JCC-H: Adding join crossing correlations with skew to TPC-H. In TPCTC, pages 103--119, 2017.Google Scholar
P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225--237, 2005.Google Scholar
P. Celis. Robin Hood Hashing. PhD thesis, University of Waterloo, 1986. Google ScholarDigital Library
A. Crolotte and A. Ghazal. Introducing skew into the TPC-H benchmark. In TPCTC, pages 137--145, 2012. Google ScholarDigital Library
F. F¨arber, S. K. Cha, J. Primsch, C. Bornh¨ovd, S. Sigg, and W. Lehner. SAP HANA database: Data management for modern business applications. SIGMOD Record, pages 45--51, 2012. Google ScholarDigital Library
B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A scan accelerator for rapid and robust predicate evaluation. In SIGMOD, pages 857--872, 2018. Google ScholarDigital Library
S. Jain, D. Moritz, D. Halperin, B. Howe, and E. Lazowska. SQLShare: Results from a multi-year SQL-as-a-service experiment. In SIGMOD, pages 281--293, 2016. Google ScholarDigital Library
H. Lang, T. M¨uhlbauer, F. Funke, P. Boncz, T. Neumann, and A. Kemper. Data Blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In SIGMOD, 2016. Google ScholarDigital Library
P. Larson, C. Clinciu, C. Fraser, E. N. Hanson, M. Mokhtar, M. Nowakiewicz, V. Papadimos, S. L. Price, S. Rangarajan, R. Rusanu, and M. Saubhasik. Enhancements to SQL server column stores. In SIGMOD, pages 1159--1168, 2013. Google ScholarDigital Library
J.-G. Lee, G. Attaluri, R. Barber, N. Chainani, O. Draese, F. Ho, S. Idreos, M.-S. Kim, S. Lightstone, G. Lohman, K. Morfonios, K. Murthy, I. Pandis, L. Qiao, V. Raman, V. K. Samy, R. Sidle, K. Stolze, and L. Zhang. Joins on encoded and partitioned data. PVLDB, 7(13):1355--1366, 2014. Google ScholarDigital Library
Y. Li and J. Patel. BitWeaving: Fast scans for main memory data processing. In SIGMOD, pages 289--300, 2013. Google ScholarDigital Library
I. M¨uller, C. Ratsch, and F. F¨arber. Adaptive string dictionary compression in in-memory column-store database systems. In EDBT, pages 283--294, 2014.Google Scholar
R. Pagh and F. Rodler. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004. Google ScholarDigital Library
J. Patel, H. Deshmukh, J. Zhu, N. Potti, Z. Zhang, M. Spehlmann, H. Memisoglu, and S. Saurabh. Quickstep: A data platform based on the scaling-up approach. PVLDB, 11(6):663--676, 2018. Google ScholarDigital Library
V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. Kulandaisamy, J. Leenstra, S. Lightstone, S. Liu, G. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
S. Schuh, X. Chen, and J. Dittrich. An experimental comparison of thirteen relational equi-joins in main memory. In SIGMOD, pages 1961--1976, 2016. Google ScholarDigital Library
A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. VLDB, pages 510--521, 1994. Google ScholarDigital Library
A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Muehlbauer, T. Neumann, and M. Then. Get real: How benchmarks fail to represent the real world. In DBTEST, 2018. Google ScholarDigital Library
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarDigital Library
M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. In ICDE, 2006. Google ScholarDigital Library
M. Zukowski, N. Nes, and P. Boncz. DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing. In DaMoN, pages 47--54, 2008. Google ScholarDigital Library

Recommendations

Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache

Cache memories play a critical role in bridging the latency, bandwidth, and energy gaps between cores and off-chip memory. However, caches frequently consume a significant fraction of a multicore chip's area and thus account for a significant fraction ...
Read More
Improving hash join performance through prefetching

Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU ...
Read More
Prefix and Suffix Reversals on Strings
SPIRE 2015: Proceedings of the 22nd International Symposium on String Processing and Information Retrieval - Volume 9309

The Sorting by Prefix Reversals problem consists in sorting the elements of a given permutation $$\pi $$ with a minimum number of prefix reversals, i.e. reversals that always imply the leftmost element of $$\pi $$. A natural extension of this problem is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 50, Issue 1
March 2021
90 pages
ISSN:0163-5808
DOI:10.1145/3471485
Editors:
Rada Chirkova
North Carolina State University
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Wim Martens
University of Bayreuth
,
Divesh Srivastava
ATT research
,
Marcelo Arenas
Research Highlights
,
Marianne Winslett
University of Illinois
,
Jun Yang
Duke University
,
Azza Abouzied
NYU
,
Lyublena Antova
Datometry
,
Aaron J. Elmore
University of Chicago
,
Kyriakos Mouratidis
Singapore Management University
,
Dan Olteanu
University of Oxford
,
Immanuel Trummer
Cornell University
,
Yannis Velegrakis
Utrecht University
,
Renata Borovica-Gajic
Surveys
Issue’s Table of Contents
Copyright © 2021 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2021
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 118
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimistically Compressed Hash Tables & Strings in theUSSR

ACM SIGMOD Record

Abstract

References

Cited By

Recommendations

Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache

Improving hash join performance through prefetching

Prefix and Suffix Reversals on Strings