ABSTRACT
While index structures are crucial components in high-performance query processing systems, they occupy a large fraction of the available memory. Recently-proposed compact indexes reduce this space overhead and thus speed up queries by allowing the database to keep larger working sets in memory. These compact indexes, however, are slower than performance-optimized in-memory indexes because they adopt encodings that trade performance for memory efficiency. Applying different encodings within a single index might allow optimizing both dimensions at the same time - however, it is not clear which encodings should be applied to which index parts at build-time.
To take advantage of multiple encodings in one index structure, we present a new framework forming the basis of workload-adaptive hybrid indexes which moves encoding decisions to run-time instead. By sampling incoming queries adaptively, it tracks accesses to index parts and keeps fine-grained statistics which are used for space- and performance-optimized encoding migrations. We evaluated our framework using B+-trees and tries, and examine the adaptation process and space/performance trade-off for real-world and synthetic workloads. For skewed workloads, our framework can reduce the space by up to 82% while retaining more than 90% of the original performance.
Supplemental Material
- AWS EC2 Instances. https://aws.amazon.com/en/ec2/instance-types/high-memory [accessed 2021-03-01].Google Scholar
- LevelDB. https://github.com/google/leveldb [accessed 2021-03-01].Google Scholar
- RocksDB. https://rocksdb.org/ [accessed 2021-03-01].Google Scholar
- S2 Geometry Library. https://s2geometry.io/ [accessed 2021-03-01].Google Scholar
- Tape is dead, Disk is tape, Flash is disk, RAM locality is king. http://research.microsoft.com/en-us/um/people/gray/talks/Flash_is_Good.ppt [accessed 2021-03-01].Google Scholar
- C++ HopscotchMap. https://github.com/Tessil/hopscotch-map [accessed 2021-03-01].Google Scholar
- Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling Queries on Compressed Data. In NSDI. USENIX Association, 337--350.Google Scholar
- Adnan Alhomssi and Viktor Leis. 2021. Contention and Space Management in B-Trees. In CIDR. 26--37.Google Scholar
- Christoph Anneser, Andreas Kipf, Harald Lang, Thomas Neumann, and Alfons Kemper. 2020. The Case for Hybrid Succinct Data Structures. In EDBT. 391--394.Google Scholar
- Nikolas Askitis and Ranjan Sinha. 2007. HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings. In ACSC. 97--105.Google Scholar
- Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In EDBT. 461--466.Google Scholar
- David Benoit, Erik D. Demaine, J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. 2005. Representing Trees of Higher Degree. Algorithmica 43, 4 (Nov. 2005), 275--292.Google ScholarCross Ref
- Matthias Böhm, Benjamin Schlegel, Peter Benjamin Volk, Ulrike Fischer, Dirk Habich, and Wolfgang Lehner. 2011. Efficient In-Memory Indexing with Generalized Prefix Trees. In BTW. 227--246.Google Scholar
- Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In USENIX. 209--223.Google Scholar
- Kun-Ta Chuang, Jiun-Long Huang, and Ming-Syan Chen. 2008. Mining top-k frequent patterns in the presence of the memory constraint. The VLDB Journal 17, 5 (Aug. 2008), 1321--1344. https://doi.org/10.1007/s00778-007-0078--6Google ScholarCross Ref
- Edith Cohen, Nadav Grossaug, and Haim Kaplan. 2006. Processing Top k Queries from Samples. In CoNEXT.Google Scholar
- Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In SoCC. 143--154. https://doi.org/10.1145/1807128.1807152Google ScholarDigital Library
- Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD. 969--984. https://doi.org/10.1145/3318464.3389711Google ScholarDigital Library
- Philipp Fent, Michael Jungmair, Andreas Kipf, and Thomas Neumann. 2020. START-Self-Tuning Adaptive Radix Tree. In ICDEW. IEEE, 147--153. https://doi.org/10.1109/ICDEW49219.2020.00015Google ScholarCross Ref
- Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow. 13, 8 (2020), 1162--1175. https://doi.org/10.14778/3389133.3389135Google ScholarDigital Library
- Florian Funke, Alfons Kemper, and Thomas Neumann. 2012. Compacting Transactional Data in Hybrid OLTP&OLAP Databases. VLDB 5, 11 (2012). https://doi.org/10.14778/2350229.2350258Google ScholarDigital Library
- Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In SIGMOD. ACM, 1189--1206. https://doi.org/10.1145/3299869.3319860Google ScholarDigital Library
- Roberto Grossi and Giuseppe Ottaviano. 2014. Fast Compressed Tries through Path Decompositions. ACM J. Exp. Algorithmics 19, 1 (2014). https://doi.org/10.1145/2656332Google ScholarDigital Library
- Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput. 35, 2 (2005), 378--407. https://doi.org/10.1137/S0097539702402354Google ScholarDigital Library
- Anurag Khandelwal, Rachit Agarwal, and Ion Stoica. 2016. BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores. In NSDI. USENIX Association, 485--500.Google Scholar
- Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. NeurIPS Workshop on Machine Learning for Systems (Dec. 2019). http://arxiv.org/abs/1911.13014Google Scholar
- Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In aiDM. 1--5.Google Scholar
- Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD. ACM, 489--504. https://doi.org/10.1145/3183713.3196909Google ScholarDigital Library
- Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. ACM, 311--326.Google ScholarDigital Library
- Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. IEEE, 185--196. https://doi.org/10.1109/ICDE.2018.00026Google Scholar
- Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In ICDE, Vol. 13. 38--49.Google Scholar
- Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of Practical Synchronization. In DaMoN. 1--8.Google Scholar
- Justin J Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying Hot and Cold Data in Main-Memory Databases. In ICDE. IEEE, 26--37.Google Scholar
- Xiaozhou Li, David G Andersen, Michael Kaminsky, and Michael J Freedman. 2014. Algorithmic Improvements for Fast Concurrent Cuckoo Hashing. In EuroSys. 1--14.Google Scholar
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-Value Storage. In EuroSys. 183--196.Google Scholar
- Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (2020), 1--13. https://doi.org/10.14778/3421424.3421425Google ScholarDigital Library
- Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient Computation of Frequent and Top-k Elements in Data Streams. In ICDT. Springer, 398--412.Google Scholar
- Kyriakos Mouratidis, Spiridon Bakiras, and Dimitris Papadias. 2006. Continuous Monitoring of Top-k Queries over Sliding Windows. In SIGMOD. 635--646.Google Scholar
- Gonzalo Navarro. 2016. Compact Data Structures - A Practical Approach. Cambridge University Press.Google Scholar
- Thomas Neumann and Michael J Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.Google Scholar
- Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al . 2017. Self- Driving Database Management Systems. In CIDR.Google Scholar
- Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems. In SIGMOD. 61--72.Google Scholar
- Andrea Pietracaprina, Matteo Riondato, Eli Upfal, and Fabio Vandin. 2010. Mining top-K frequent itemsets through progressive sampling. Data Mining and Knowledge Discovery 21, 2 (2010), 310--326.Google ScholarDigital Library
- Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Prefix Sums and Multisets. ACM Trans. Algorithms 3, 4 (2007), 43.Google ScholarDigital Library
- Wolf Rödiger, Sam Idicula, Alfons Kemper, and Thomas Neumann. 2016. Flow-Join: Adaptive Skew Handling for Distributed Joins over High-Speed Networks. In ICDE. IEEE, 1194--1205.Google Scholar
- Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, and Tim Kraska. 2021. Bounding the Last Mile: Efficient Learned String Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).Google Scholar
- Mihail Stoian, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. PLEX: Towards Practical Learned Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).Google Scholar
- Michael Stonebraker, Lawrence A Rowe, and Michael Hirohama. 1990. The implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering 2, 1 (1990), 125--142.Google ScholarDigital Library
- Jeffrey S Vitter. 1985. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37--57.Google ScholarDigital Library
- Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD. 473--488.Google Scholar
- Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. VLDB 14, 8 (2021), 1276--1288.Google ScholarDigital Library
- Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance Analysis of NVMe SSDs and their Implication on Real World Databases. In SYSTOR. 1--11.Google Scholar
- Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD. ACM, 1567--1581.Google Scholar
- Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. TKDE 27, 7 (2015), 1920--1948.Google ScholarDigital Library
- Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD. 323--336.Google Scholar
- Huanchen Zhang, Xiaoxuan Liu, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Order-Preserving Key Compression for In-Memory Search Trees. In SIGMOD. 1601--1615.Google Scholar
Index Terms
- Adaptive Hybrid Indexes
Recommendations
Hybrid Indexes for Spatial-Visual Search
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017Due to the growth of geo-tagged images, recent web and mobile applications provide search capabilities for images that are similar to a given query image and simultaneously within a given geographical area. In this paper, we focus on designing index ...
Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataUsing indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a ...
A Hybrid BitFunnel and Partitioned Elias-Fano Inverted Index
WWW '19: The World Wide Web ConferenceSearch engines encounter a time vs. space trade-off: search responsiveness (i.e., a short query response time) comes at the cost of increased index storage. We propose a hybrid method which uses both (a) the recently published mapping-matrix-style index ...
Comments