skip to main content
10.1145/3605573.3605590acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management

Published:13 September 2023Publication History

ABSTRACT

Learned indexes that utilize machine learning models can offer significant performance advantages over traditional indexes. However, existing learned indexes suffer from space-performance tradeoffs and they cannot scale well in multiple NUMA-nodes machines. These issues limit the development of learned indexes in production environments. In this paper, we propose DiffLex, a high-performance, memory-efficient and NUMA-aware learned index. The core idea of DiffLex is to differentiate key management based on hotness. To achieve high performance, DiffLex stores newly inserted keys in sparse deltas and frequently accessed keys in a sparse hot cache. For cold keys that take up most of the storage space, however, DiffLex stores them in dense arrays to save memory costs. DiffLex also makes sparse deltas and hot cache NUMA-aware by partitioning sparse deltas and replicating hot cache across different NUMA nodes. Our evaluation shows that DiffLex outperforms the state-of-the-art ALEX by 3.88x and 1.82x for insert and search operations, respectively, while maintaining a small index size.

References

  1. Mikkel Møller Andersen and Pınar Tözün. 2021. Micro-architectural Analysis of a Learned Index. arxiv:2109.08495 [cs.DS]Google ScholarGoogle Scholar
  2. Christoph Anneser, Andreas Kipf, Huanchen Zhang, Thomas Neumann, and Alfons Kemper. 2022. Adaptive Hybrid Indexes. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). 1626–1639.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bayer and M. Schkolnick. 1977. Concurrency of Operations on B-Trees. Acta Inf. 9, 1 (1977), 1–21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-Box Concurrent Data Structures for NUMA Architectures. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). 207–221.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (Santa Clara, CA, USA) (FAST’20). 209–224.Google ScholarGoogle Scholar
  6. Jiqiang Chen, Liang Chen, Sheng Wang, Guoyun Zhu, Yuanyuan Sun, Huan Liu, and Feifei Li. 2020. HotRing: A Hotspot-Aware In-Memory Key-Value Store. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 239–252.Google ScholarGoogle Scholar
  7. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing(SoCC ’10).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andrew Crotty. 2021. Hist-Tree: Those Who Ignore It Are Doomed to Learn. In 11th Conference on Innovative Data Systems Research, CIDR 2021,Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org.Google ScholarGoogle Scholar
  9. Henry Daly, Ahmed Hassan, Michael F Spear, and Roberto Palmieri. 2018. NUMASK: high performance scalable skip list for NUMA. In 32nd International Symposium on Distributed Computing (DISC 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  10. Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). 969–984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proc. VLDB Endow. 13, 8 (apr 2020), 1162–1175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-Aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD ’19). 1189–1206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ali Hadian and Thomas Heinis. 2019. Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes. In Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26-29, 2019. OpenProceedings.org, 710–713.Google ScholarGoogle Scholar
  14. Aarati Kakaraparthy, Jignesh M. Patel, Brian P. Kroth, and Kwanghyun Park. 2022. VIP Hashing: Adapting to Skew in Popularity of Data on the Fly. Proc. VLDB Endow. 15, 10 (sep 2022), 1978–1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM ’20). Article 5, 5 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). 489–504.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pengfei Li, Yu Hua, Jingnan Jia, and Pengfei Zuo. 2022. FINEdex: A Fine-Grained Learned Index Scheme for Scalable and Concurrent Memory Systems. Proc. VLDB Endow. 15, 2 (feb 2022), 321–334.Google ScholarGoogle Scholar
  18. Li Liu, Chunhua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, and Ji Zhang. 2023. A Data-Aware Learned Index Scheme for Efficient Writes. In Proceedings of the 51st International Conference on Parallel Processing (Bordeaux, France) (ICPP ’22). Association for Computing Machinery, New York, NY, USA, Article 28, 11 pages.Google ScholarGoogle Scholar
  19. Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). 2789–2792.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ajit Mathew and Changwoo Min. 2020. HydraList: A Scalable in-Memory Index Using Asynchronous Updates and Partial Replication. Proc. VLDB Endow. 13, 9 (jun 2020), 1332–1345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 985–1000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20). 308–320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Youyun Wang, Chuzhe Tang, Zhaoguo Wang, and Haibo Chen. 2020. SIndex: A Scalable Learned Index for String Keys. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems(APSys ’20). 17–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chaichon Wongkham, Baotong Lu, Chris Liu, Zhicong Zhong, Eric Lo, and Tianzheng Wang. 2022. Are Updatable Learned Indexes Ready?Proc. VLDB Endow. 15, 11 (sep 2022), 3004–3017.Google ScholarGoogle Scholar
  25. Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (apr 2021), 1276–1288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shangyu Wu, Yufei Cui, Jinghuan Yu, Xuan Sun, Tei-Wei Kuo, and Chun Jason Xue. 2022. NFL: Robust Learned Index via Distribution Transformation. Proc. VLDB Endow. 15, 10 (sep 2022), 2188–2200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis. 2023. FLIRT: A Fast Learned Index for Rolling Time frames. In Proceedings 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31, 2023, Julia Stoyanovich, Jens Teubner, Nikos Mamoulis, Evaggelia Pitoura, and Jan Mühlig (Eds.). OpenProceedings.org, 234–246.Google ScholarGoogle Scholar
  28. Juncheng Yang, Yao Yue, and K. V. Rashmi. 2021. A Large-Scale Analysis of Hundreds of In-Memory Key-Value Cache Clusters at Twitter. ACM Trans. Storage 17, 3, Article 17 (aug 2021), 35 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In Proceedings of the 2016 International Conference on Management of Data(SIGMOD ’16). 1567–1581.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing
                August 2023
                858 pages
                ISBN:9798400708435
                DOI:10.1145/3605573

                Copyright © 2023 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 13 September 2023

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed limited

                Acceptance Rates

                Overall Acceptance Rate91of313submissions,29%
              • Article Metrics

                • Downloads (Last 12 months)181
                • Downloads (Last 6 weeks)59

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format