ABSTRACT
Learned indexes that utilize machine learning models can offer significant performance advantages over traditional indexes. However, existing learned indexes suffer from space-performance tradeoffs and they cannot scale well in multiple NUMA-nodes machines. These issues limit the development of learned indexes in production environments. In this paper, we propose DiffLex, a high-performance, memory-efficient and NUMA-aware learned index. The core idea of DiffLex is to differentiate key management based on hotness. To achieve high performance, DiffLex stores newly inserted keys in sparse deltas and frequently accessed keys in a sparse hot cache. For cold keys that take up most of the storage space, however, DiffLex stores them in dense arrays to save memory costs. DiffLex also makes sparse deltas and hot cache NUMA-aware by partitioning sparse deltas and replicating hot cache across different NUMA nodes. Our evaluation shows that DiffLex outperforms the state-of-the-art ALEX by 3.88x and 1.82x for insert and search operations, respectively, while maintaining a small index size.
- Mikkel Møller Andersen and Pınar Tözün. 2021. Micro-architectural Analysis of a Learned Index. arxiv:2109.08495 [cs.DS]Google Scholar
- Christoph Anneser, Andreas Kipf, Huanchen Zhang, Thomas Neumann, and Alfons Kemper. 2022. Adaptive Hybrid Indexes. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). 1626–1639.Google ScholarDigital Library
- R. Bayer and M. Schkolnick. 1977. Concurrency of Operations on B-Trees. Acta Inf. 9, 1 (1977), 1–21.Google ScholarDigital Library
- Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-Box Concurrent Data Structures for NUMA Architectures. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). 207–221.Google ScholarDigital Library
- Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (Santa Clara, CA, USA) (FAST’20). 209–224.Google Scholar
- Jiqiang Chen, Liang Chen, Sheng Wang, Guoyun Zhu, Yuanyuan Sun, Huan Liu, and Feifei Li. 2020. HotRing: A Hotspot-Aware In-Memory Key-Value Store. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 239–252.Google Scholar
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing(SoCC ’10).Google ScholarDigital Library
- Andrew Crotty. 2021. Hist-Tree: Those Who Ignore It Are Doomed to Learn. In 11th Conference on Innovative Data Systems Research, CIDR 2021,Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org.Google Scholar
- Henry Daly, Ahmed Hassan, Michael F Spear, and Roberto Palmieri. 2018. NUMASK: high performance scalable skip list for NUMA. In 32nd International Symposium on Distributed Computing (DISC 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). 969–984.Google ScholarDigital Library
- Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proc. VLDB Endow. 13, 8 (apr 2020), 1162–1175.Google ScholarDigital Library
- Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-Aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD ’19). 1189–1206.Google ScholarDigital Library
- Ali Hadian and Thomas Heinis. 2019. Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes. In Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26-29, 2019. OpenProceedings.org, 710–713.Google Scholar
- Aarati Kakaraparthy, Jignesh M. Patel, Brian P. Kroth, and Kwanghyun Park. 2022. VIP Hashing: Adapting to Skew in Popularity of Data on the Fly. Proc. VLDB Endow. 15, 10 (sep 2022), 1978–1990.Google ScholarDigital Library
- Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM ’20). Article 5, 5 pages.Google ScholarDigital Library
- Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). 489–504.Google ScholarDigital Library
- Pengfei Li, Yu Hua, Jingnan Jia, and Pengfei Zuo. 2022. FINEdex: A Fine-Grained Learned Index Scheme for Scalable and Concurrent Memory Systems. Proc. VLDB Endow. 15, 2 (feb 2022), 321–334.Google Scholar
- Li Liu, Chunhua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, and Ji Zhang. 2023. A Data-Aware Learned Index Scheme for Efficient Writes. In Proceedings of the 51st International Conference on Parallel Processing (Bordeaux, France) (ICPP ’22). Association for Computing Machinery, New York, NY, USA, Article 28, 11 pages.Google Scholar
- Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). 2789–2792.Google ScholarDigital Library
- Ajit Mathew and Changwoo Min. 2020. HydraList: A Scalable in-Memory Index Using Asynchronous Updates and Partial Replication. Proc. VLDB Endow. 13, 9 (jun 2020), 1332–1345.Google ScholarDigital Library
- Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 985–1000.Google ScholarDigital Library
- Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20). 308–320.Google ScholarDigital Library
- Youyun Wang, Chuzhe Tang, Zhaoguo Wang, and Haibo Chen. 2020. SIndex: A Scalable Learned Index for String Keys. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems(APSys ’20). 17–24.Google ScholarDigital Library
- Chaichon Wongkham, Baotong Lu, Chris Liu, Zhicong Zhong, Eric Lo, and Tianzheng Wang. 2022. Are Updatable Learned Indexes Ready?Proc. VLDB Endow. 15, 11 (sep 2022), 3004–3017.Google Scholar
- Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (apr 2021), 1276–1288.Google ScholarDigital Library
- Shangyu Wu, Yufei Cui, Jinghuan Yu, Xuan Sun, Tei-Wei Kuo, and Chun Jason Xue. 2022. NFL: Robust Learned Index via Distribution Transformation. Proc. VLDB Endow. 15, 10 (sep 2022), 2188–2200.Google ScholarDigital Library
- Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis. 2023. FLIRT: A Fast Learned Index for Rolling Time frames. In Proceedings 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31, 2023, Julia Stoyanovich, Jens Teubner, Nikos Mamoulis, Evaggelia Pitoura, and Jan Mühlig (Eds.). OpenProceedings.org, 234–246.Google Scholar
- Juncheng Yang, Yao Yue, and K. V. Rashmi. 2021. A Large-Scale Analysis of Hundreds of In-Memory Key-Value Cache Clusters at Twitter. ACM Trans. Storage 17, 3, Article 17 (aug 2021), 35 pages.Google ScholarDigital Library
- Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In Proceedings of the 2016 International Conference on Management of Data(SIGMOD ’16). 1567–1581.Google ScholarDigital Library
Index Terms
- DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management
Recommendations
SWIX: A Memory-efficient Sliding Window Learned Index
PACMMODData stream processing systems enable querying over sliding windows of streams of data. Efficient index structures for the streaming window are a crucial building block to enable querying the sliding window for operations such as aggregation and joins. ...
Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and WorkshopsPhase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...
Write-Aware Management of NVM-based Memory Extensions
ICS '16: Proceedings of the 2016 International Conference on SupercomputingEmerging Non-Volatile Memory (NVM) technologies, such as 3D XPoint, are expected to be in production as early as 2016. Emerging NVMs are very attractive for several reasons. First, they are non-volatile and hence incur no refresh power. Second, they are ...
Comments