skip to main content
10.1145/3514221.3526121acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Adaptive Hybrid Indexes

Published:11 June 2022Publication History

ABSTRACT

While index structures are crucial components in high-performance query processing systems, they occupy a large fraction of the available memory. Recently-proposed compact indexes reduce this space overhead and thus speed up queries by allowing the database to keep larger working sets in memory. These compact indexes, however, are slower than performance-optimized in-memory indexes because they adopt encodings that trade performance for memory efficiency. Applying different encodings within a single index might allow optimizing both dimensions at the same time - however, it is not clear which encodings should be applied to which index parts at build-time.

To take advantage of multiple encodings in one index structure, we present a new framework forming the basis of workload-adaptive hybrid indexes which moves encoding decisions to run-time instead. By sampling incoming queries adaptively, it tracks accesses to index parts and keeps fine-grained statistics which are used for space- and performance-optimized encoding migrations. We evaluated our framework using B+-trees and tries, and examine the adaptation process and space/performance trade-off for real-world and synthetic workloads. For skewed workloads, our framework can reduce the space by up to 82% while retaining more than 90% of the original performance.

Skip Supplemental Material Section

Supplemental Material

SIGMOD22-moddm280.mp4

mp4

152 MB

References

  1. AWS EC2 Instances. https://aws.amazon.com/en/ec2/instance-types/high-memory [accessed 2021-03-01].Google ScholarGoogle Scholar
  2. LevelDB. https://github.com/google/leveldb [accessed 2021-03-01].Google ScholarGoogle Scholar
  3. RocksDB. https://rocksdb.org/ [accessed 2021-03-01].Google ScholarGoogle Scholar
  4. S2 Geometry Library. https://s2geometry.io/ [accessed 2021-03-01].Google ScholarGoogle Scholar
  5. Tape is dead, Disk is tape, Flash is disk, RAM locality is king. http://research.microsoft.com/en-us/um/people/gray/talks/Flash_is_Good.ppt [accessed 2021-03-01].Google ScholarGoogle Scholar
  6. C++ HopscotchMap. https://github.com/Tessil/hopscotch-map [accessed 2021-03-01].Google ScholarGoogle Scholar
  7. Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling Queries on Compressed Data. In NSDI. USENIX Association, 337--350.Google ScholarGoogle Scholar
  8. Adnan Alhomssi and Viktor Leis. 2021. Contention and Space Management in B-Trees. In CIDR. 26--37.Google ScholarGoogle Scholar
  9. Christoph Anneser, Andreas Kipf, Harald Lang, Thomas Neumann, and Alfons Kemper. 2020. The Case for Hybrid Succinct Data Structures. In EDBT. 391--394.Google ScholarGoogle Scholar
  10. Nikolas Askitis and Ranjan Sinha. 2007. HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings. In ACSC. 97--105.Google ScholarGoogle Scholar
  11. Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In EDBT. 461--466.Google ScholarGoogle Scholar
  12. David Benoit, Erik D. Demaine, J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. 2005. Representing Trees of Higher Degree. Algorithmica 43, 4 (Nov. 2005), 275--292.Google ScholarGoogle ScholarCross RefCross Ref
  13. Matthias Böhm, Benjamin Schlegel, Peter Benjamin Volk, Ulrike Fischer, Dirk Habich, and Wolfgang Lehner. 2011. Efficient In-Memory Indexing with Generalized Prefix Trees. In BTW. 227--246.Google ScholarGoogle Scholar
  14. Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In USENIX. 209--223.Google ScholarGoogle Scholar
  15. Kun-Ta Chuang, Jiun-Long Huang, and Ming-Syan Chen. 2008. Mining top-k frequent patterns in the presence of the memory constraint. The VLDB Journal 17, 5 (Aug. 2008), 1321--1344. https://doi.org/10.1007/s00778-007-0078--6Google ScholarGoogle ScholarCross RefCross Ref
  16. Edith Cohen, Nadav Grossaug, and Haim Kaplan. 2006. Processing Top k Queries from Samples. In CoNEXT.Google ScholarGoogle Scholar
  17. Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In SoCC. 143--154. https://doi.org/10.1145/1807128.1807152Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD. 969--984. https://doi.org/10.1145/3318464.3389711Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Philipp Fent, Michael Jungmair, Andreas Kipf, and Thomas Neumann. 2020. START-Self-Tuning Adaptive Radix Tree. In ICDEW. IEEE, 147--153. https://doi.org/10.1109/ICDEW49219.2020.00015Google ScholarGoogle ScholarCross RefCross Ref
  20. Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow. 13, 8 (2020), 1162--1175. https://doi.org/10.14778/3389133.3389135Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Florian Funke, Alfons Kemper, and Thomas Neumann. 2012. Compacting Transactional Data in Hybrid OLTP&OLAP Databases. VLDB 5, 11 (2012). https://doi.org/10.14778/2350229.2350258Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In SIGMOD. ACM, 1189--1206. https://doi.org/10.1145/3299869.3319860Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Roberto Grossi and Giuseppe Ottaviano. 2014. Fast Compressed Tries through Path Decompositions. ACM J. Exp. Algorithmics 19, 1 (2014). https://doi.org/10.1145/2656332Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput. 35, 2 (2005), 378--407. https://doi.org/10.1137/S0097539702402354Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Anurag Khandelwal, Rachit Agarwal, and Ion Stoica. 2016. BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores. In NSDI. USENIX Association, 485--500.Google ScholarGoogle Scholar
  26. Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. NeurIPS Workshop on Machine Learning for Systems (Dec. 2019). http://arxiv.org/abs/1911.13014Google ScholarGoogle Scholar
  27. Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In aiDM. 1--5.Google ScholarGoogle Scholar
  28. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD. ACM, 489--504. https://doi.org/10.1145/3183713.3196909Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. ACM, 311--326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. IEEE, 185--196. https://doi.org/10.1109/ICDE.2018.00026Google ScholarGoogle Scholar
  31. Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In ICDE, Vol. 13. 38--49.Google ScholarGoogle Scholar
  32. Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of Practical Synchronization. In DaMoN. 1--8.Google ScholarGoogle Scholar
  33. Justin J Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying Hot and Cold Data in Main-Memory Databases. In ICDE. IEEE, 26--37.Google ScholarGoogle Scholar
  34. Xiaozhou Li, David G Andersen, Michael Kaminsky, and Michael J Freedman. 2014. Algorithmic Improvements for Fast Concurrent Cuckoo Hashing. In EuroSys. 1--14.Google ScholarGoogle Scholar
  35. Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-Value Storage. In EuroSys. 183--196.Google ScholarGoogle Scholar
  36. Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (2020), 1--13. https://doi.org/10.14778/3421424.3421425Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient Computation of Frequent and Top-k Elements in Data Streams. In ICDT. Springer, 398--412.Google ScholarGoogle Scholar
  38. Kyriakos Mouratidis, Spiridon Bakiras, and Dimitris Papadias. 2006. Continuous Monitoring of Top-k Queries over Sliding Windows. In SIGMOD. 635--646.Google ScholarGoogle Scholar
  39. Gonzalo Navarro. 2016. Compact Data Structures - A Practical Approach. Cambridge University Press.Google ScholarGoogle Scholar
  40. Thomas Neumann and Michael J Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.Google ScholarGoogle Scholar
  41. Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al . 2017. Self- Driving Database Management Systems. In CIDR.Google ScholarGoogle Scholar
  42. Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems. In SIGMOD. 61--72.Google ScholarGoogle Scholar
  43. Andrea Pietracaprina, Matteo Riondato, Eli Upfal, and Fabio Vandin. 2010. Mining top-K frequent itemsets through progressive sampling. Data Mining and Knowledge Discovery 21, 2 (2010), 310--326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Prefix Sums and Multisets. ACM Trans. Algorithms 3, 4 (2007), 43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wolf Rödiger, Sam Idicula, Alfons Kemper, and Thomas Neumann. 2016. Flow-Join: Adaptive Skew Handling for Distributed Joins over High-Speed Networks. In ICDE. IEEE, 1194--1205.Google ScholarGoogle Scholar
  46. Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, and Tim Kraska. 2021. Bounding the Last Mile: Efficient Learned String Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).Google ScholarGoogle Scholar
  47. Mihail Stoian, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. PLEX: Towards Practical Learned Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).Google ScholarGoogle Scholar
  48. Michael Stonebraker, Lawrence A Rowe, and Michael Hirohama. 1990. The implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering 2, 1 (1990), 125--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jeffrey S Vitter. 1985. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD. 473--488.Google ScholarGoogle Scholar
  51. Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. VLDB 14, 8 (2021), 1276--1288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance Analysis of NVMe SSDs and their Implication on Real World Databases. In SYSTOR. 1--11.Google ScholarGoogle Scholar
  53. Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD. ACM, 1567--1581.Google ScholarGoogle Scholar
  54. Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. TKDE 27, 7 (2015), 1920--1948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD. 323--336.Google ScholarGoogle Scholar
  56. Huanchen Zhang, Xiaoxuan Liu, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Order-Preserving Key Compression for In-Memory Search Trees. In SIGMOD. 1601--1615.Google ScholarGoogle Scholar

Index Terms

  1. Adaptive Hybrid Indexes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
        June 2022
        2597 pages
        ISBN:9781450392495
        DOI:10.1145/3514221

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader