research-article

Adaptive Hybrid Indexes

Authors:
Christoph Anneser

Technical University of Munich, Munich, Germany

Technical University of Munich, Munich, Germany
View Profile

,
Andreas Kipf

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Huanchen Zhang

Tsinghua University, Peking, China

Tsinghua University, Peking, China
View Profile

,
Thomas Neumann

Technical University of Munich, Munich, Germany

Technical University of Munich, Munich, Germany
View Profile

,
Alfons Kemper

Technical University of Munich, Munich, Germany

Technical University of Munich, Munich, Germany
View Profile

SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataJune 2022Pages 1626–1639https://doi.org/10.1145/3514221.3526121

Published:11 June 2022Publication History

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Pages 1626–1639

ABSTRACT

While index structures are crucial components in high-performance query processing systems, they occupy a large fraction of the available memory. Recently-proposed compact indexes reduce this space overhead and thus speed up queries by allowing the database to keep larger working sets in memory. These compact indexes, however, are slower than performance-optimized in-memory indexes because they adopt encodings that trade performance for memory efficiency. Applying different encodings within a single index might allow optimizing both dimensions at the same time - however, it is not clear which encodings should be applied to which index parts at build-time.

To take advantage of multiple encodings in one index structure, we present a new framework forming the basis of workload-adaptive hybrid indexes which moves encoding decisions to run-time instead. By sampling incoming queries adaptively, it tracks accesses to index parts and keeps fine-grained statistics which are used for space- and performance-optimized encoding migrations. We evaluated our framework using B+-trees and tries, and examine the adaptation process and space/performance trade-off for real-world and synthetic workloads. For skewed workloads, our framework can reduce the space by up to 82% while retaining more than 90% of the original performance.

Supplemental Material

SIGMOD22-moddm280.mp4

mp4

152 MB

Download

References

AWS EC2 Instances. https://aws.amazon.com/en/ec2/instance-types/high-memory [accessed 2021-03-01].Google Scholar
LevelDB. https://github.com/google/leveldb [accessed 2021-03-01].Google Scholar
RocksDB. https://rocksdb.org/ [accessed 2021-03-01].Google Scholar
S2 Geometry Library. https://s2geometry.io/ [accessed 2021-03-01].Google Scholar
Tape is dead, Disk is tape, Flash is disk, RAM locality is king. http://research.microsoft.com/en-us/um/people/gray/talks/Flash_is_Good.ppt [accessed 2021-03-01].Google Scholar
C++ HopscotchMap. https://github.com/Tessil/hopscotch-map [accessed 2021-03-01].Google Scholar
Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling Queries on Compressed Data. In NSDI. USENIX Association, 337--350.Google Scholar
Adnan Alhomssi and Viktor Leis. 2021. Contention and Space Management in B-Trees. In CIDR. 26--37.Google Scholar
Christoph Anneser, Andreas Kipf, Harald Lang, Thomas Neumann, and Alfons Kemper. 2020. The Case for Hybrid Succinct Data Structures. In EDBT. 391--394.Google Scholar
Nikolas Askitis and Ranjan Sinha. 2007. HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings. In ACSC. 97--105.Google Scholar
Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In EDBT. 461--466.Google Scholar
David Benoit, Erik D. Demaine, J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. 2005. Representing Trees of Higher Degree. Algorithmica 43, 4 (Nov. 2005), 275--292.Google ScholarCross Ref
Matthias Böhm, Benjamin Schlegel, Peter Benjamin Volk, Ulrike Fischer, Dirk Habich, and Wolfgang Lehner. 2011. Efficient In-Memory Indexing with Generalized Prefix Trees. In BTW. 227--246.Google Scholar
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In USENIX. 209--223.Google Scholar
Kun-Ta Chuang, Jiun-Long Huang, and Ming-Syan Chen. 2008. Mining top-k frequent patterns in the presence of the memory constraint. The VLDB Journal 17, 5 (Aug. 2008), 1321--1344. https://doi.org/10.1007/s00778-007-0078--6Google ScholarCross Ref
Edith Cohen, Nadav Grossaug, and Haim Kaplan. 2006. Processing Top k Queries from Samples. In CoNEXT.Google Scholar
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In SoCC. 143--154. https://doi.org/10.1145/1807128.1807152Google ScholarDigital Library
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD. 969--984. https://doi.org/10.1145/3318464.3389711Google ScholarDigital Library
Philipp Fent, Michael Jungmair, Andreas Kipf, and Thomas Neumann. 2020. START-Self-Tuning Adaptive Radix Tree. In ICDEW. IEEE, 147--153. https://doi.org/10.1109/ICDEW49219.2020.00015Google ScholarCross Ref
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow. 13, 8 (2020), 1162--1175. https://doi.org/10.14778/3389133.3389135Google ScholarDigital Library
Florian Funke, Alfons Kemper, and Thomas Neumann. 2012. Compacting Transactional Data in Hybrid OLTP&OLAP Databases. VLDB 5, 11 (2012). https://doi.org/10.14778/2350229.2350258Google ScholarDigital Library
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In SIGMOD. ACM, 1189--1206. https://doi.org/10.1145/3299869.3319860Google ScholarDigital Library
Roberto Grossi and Giuseppe Ottaviano. 2014. Fast Compressed Tries through Path Decompositions. ACM J. Exp. Algorithmics 19, 1 (2014). https://doi.org/10.1145/2656332Google ScholarDigital Library
Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput. 35, 2 (2005), 378--407. https://doi.org/10.1137/S0097539702402354Google ScholarDigital Library
Anurag Khandelwal, Rachit Agarwal, and Ion Stoica. 2016. BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores. In NSDI. USENIX Association, 485--500.Google Scholar
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. NeurIPS Workshop on Machine Learning for Systems (Dec. 2019). http://arxiv.org/abs/1911.13014Google Scholar
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In aiDM. 1--5.Google Scholar
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD. ACM, 489--504. https://doi.org/10.1145/3183713.3196909Google ScholarDigital Library
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. ACM, 311--326.Google ScholarDigital Library
Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. IEEE, 185--196. https://doi.org/10.1109/ICDE.2018.00026Google Scholar
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In ICDE, Vol. 13. 38--49.Google Scholar
Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of Practical Synchronization. In DaMoN. 1--8.Google Scholar
Justin J Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying Hot and Cold Data in Main-Memory Databases. In ICDE. IEEE, 26--37.Google Scholar
Xiaozhou Li, David G Andersen, Michael Kaminsky, and Michael J Freedman. 2014. Algorithmic Improvements for Fast Concurrent Cuckoo Hashing. In EuroSys. 1--14.Google Scholar
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-Value Storage. In EuroSys. 183--196.Google Scholar
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (2020), 1--13. https://doi.org/10.14778/3421424.3421425Google ScholarDigital Library
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient Computation of Frequent and Top-k Elements in Data Streams. In ICDT. Springer, 398--412.Google Scholar
Kyriakos Mouratidis, Spiridon Bakiras, and Dimitris Papadias. 2006. Continuous Monitoring of Top-k Queries over Sliding Windows. In SIGMOD. 635--646.Google Scholar
Gonzalo Navarro. 2016. Compact Data Structures - A Practical Approach. Cambridge University Press.Google Scholar
Thomas Neumann and Michael J Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.Google Scholar
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al . 2017. Self- Driving Database Management Systems. In CIDR.Google Scholar
Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems. In SIGMOD. 61--72.Google Scholar
Andrea Pietracaprina, Matteo Riondato, Eli Upfal, and Fabio Vandin. 2010. Mining top-K frequent itemsets through progressive sampling. Data Mining and Knowledge Discovery 21, 2 (2010), 310--326.Google ScholarDigital Library
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Prefix Sums and Multisets. ACM Trans. Algorithms 3, 4 (2007), 43.Google ScholarDigital Library
Wolf Rödiger, Sam Idicula, Alfons Kemper, and Thomas Neumann. 2016. Flow-Join: Adaptive Skew Handling for Distributed Joins over High-Speed Networks. In ICDE. IEEE, 1194--1205.Google Scholar
Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, and Tim Kraska. 2021. Bounding the Last Mile: Efficient Learned String Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).Google Scholar
Mihail Stoian, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. PLEX: Towards Practical Learned Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).Google Scholar
Michael Stonebraker, Lawrence A Rowe, and Michael Hirohama. 1990. The implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering 2, 1 (1990), 125--142.Google ScholarDigital Library
Jeffrey S Vitter. 1985. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37--57.Google ScholarDigital Library
Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD. 473--488.Google Scholar
Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. VLDB 14, 8 (2021), 1276--1288.Google ScholarDigital Library
Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance Analysis of NVMe SSDs and their Implication on Real World Databases. In SYSTOR. 1--11.Google Scholar
Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD. ACM, 1567--1581.Google Scholar
Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. TKDE 27, 7 (2015), 1920--1948.Google ScholarDigital Library
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD. 323--336.Google Scholar
Huanchen Zhang, Xiaoxuan Liu, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Order-Preserving Key Compression for In-Memory Search Trees. In SIGMOD. 1601--1615.Google Scholar

Index Terms

Adaptive Hybrid Indexes
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data access methods
      2. Data layout

Recommendations

Hybrid Indexes for Spatial-Visual Search
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

Due to the growth of geo-tagged images, recent web and mobile applications provide search capabilities for images that are similar to a given query image and simultaneously within a given geographical area. In this paper, we focus on designing index ...
Read More
Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Using indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a ...
Read More
A Hybrid BitFunnel and Partitioned Elias-Fano Inverted Index
WWW '19: The World Wide Web Conference

Search engines encounter a time vs. space trade-off: search responsiveness (i.e., a short query response time) comes at the cost of increased index storage. We propose a hybrid method which uses both (a) the recently published mapping-matrix-style index ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
General Chair:
Zachary Ives
University of Pennsylvania (USA)
,
Program Chairs:
Angela Bonifati
Lyon 1 University (France)
,
Amr El Abbadi
University of California, Santa Barbara (USA)
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adaptive index
hybrid index
space-efficient index
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,022
  Total Downloads
- Downloads (Last 12 months)282
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive Hybrid Indexes

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Hybrid Indexes for Spatial-Visual Search

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes

A Hybrid BitFunnel and Partitioned Elias-Fano Inverted Index