research-article

DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management

Authors:
Lixiao Cui

College of Computer Science, Nankai University, China

College of Computer Science, Nankai University, China

0000-0002-4017-0974
View Profile

,
Kedi Yang

College of Computer Science, Nankai University, China

College of Computer Science, Nankai University, China

0009-0004-9770-9987
View Profile

,
Yusen Li

College of Computer Science, Nankai University, China

College of Computer Science, Nankai University, China

0000-0001-6623-350X
View Profile

,
Gang Wang

College of Computer Science, Nankai University, China

College of Computer Science, Nankai University, China

0000-0003-0387-2501
View Profile

,
Xiaoguang Liu

College of Computer Science, Nankai University, China

College of Computer Science, Nankai University, China

0000-0002-9010-3278
View Profile

ICPP '23: Proceedings of the 52nd International Conference on Parallel ProcessingAugust 2023Pages 62–71https://doi.org/10.1145/3605573.3605590

Published:13 September 2023Publication History

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Pages 62–71

ABSTRACT

Learned indexes that utilize machine learning models can offer significant performance advantages over traditional indexes. However, existing learned indexes suffer from space-performance tradeoffs and they cannot scale well in multiple NUMA-nodes machines. These issues limit the development of learned indexes in production environments. In this paper, we propose DiffLex, a high-performance, memory-efficient and NUMA-aware learned index. The core idea of DiffLex is to differentiate key management based on hotness. To achieve high performance, DiffLex stores newly inserted keys in sparse deltas and frequently accessed keys in a sparse hot cache. For cold keys that take up most of the storage space, however, DiffLex stores them in dense arrays to save memory costs. DiffLex also makes sparse deltas and hot cache NUMA-aware by partitioning sparse deltas and replicating hot cache across different NUMA nodes. Our evaluation shows that DiffLex outperforms the state-of-the-art ALEX by 3.88x and 1.82x for insert and search operations, respectively, while maintaining a small index size.

References

Mikkel Møller Andersen and Pınar Tözün. 2021. Micro-architectural Analysis of a Learned Index. arxiv:2109.08495 [cs.DS]Google Scholar
Christoph Anneser, Andreas Kipf, Huanchen Zhang, Thomas Neumann, and Alfons Kemper. 2022. Adaptive Hybrid Indexes. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). 1626–1639.Google ScholarDigital Library
R. Bayer and M. Schkolnick. 1977. Concurrency of Operations on B-Trees. Acta Inf. 9, 1 (1977), 1–21.Google ScholarDigital Library
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-Box Concurrent Data Structures for NUMA Architectures. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). 207–221.Google ScholarDigital Library
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (Santa Clara, CA, USA) (FAST’20). 209–224.Google Scholar
Jiqiang Chen, Liang Chen, Sheng Wang, Guoyun Zhu, Yuanyuan Sun, Huan Liu, and Feifei Li. 2020. HotRing: A Hotspot-Aware In-Memory Key-Value Store. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 239–252.Google Scholar
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing(SoCC ’10).Google ScholarDigital Library
Andrew Crotty. 2021. Hist-Tree: Those Who Ignore It Are Doomed to Learn. In 11th Conference on Innovative Data Systems Research, CIDR 2021,Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org.Google Scholar
Henry Daly, Ahmed Hassan, Michael F Spear, and Roberto Palmieri. 2018. NUMASK: high performance scalable skip list for NUMA. In 32nd International Symposium on Distributed Computing (DISC 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). 969–984.Google ScholarDigital Library
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proc. VLDB Endow. 13, 8 (apr 2020), 1162–1175.Google ScholarDigital Library
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-Aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD ’19). 1189–1206.Google ScholarDigital Library
Ali Hadian and Thomas Heinis. 2019. Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes. In Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26-29, 2019. OpenProceedings.org, 710–713.Google Scholar
Aarati Kakaraparthy, Jignesh M. Patel, Brian P. Kroth, and Kwanghyun Park. 2022. VIP Hashing: Adapting to Skew in Popularity of Data on the Fly. Proc. VLDB Endow. 15, 10 (sep 2022), 1978–1990.Google ScholarDigital Library
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Portland, Oregon) (aiDM ’20). Article 5, 5 pages.Google ScholarDigital Library
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). 489–504.Google ScholarDigital Library
Pengfei Li, Yu Hua, Jingnan Jia, and Pengfei Zuo. 2022. FINEdex: A Fine-Grained Learned Index Scheme for Scalable and Concurrent Memory Systems. Proc. VLDB Endow. 15, 2 (feb 2022), 321–334.Google Scholar
Li Liu, Chunhua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, and Ji Zhang. 2023. A Data-Aware Learned Index Scheme for Efficient Writes. In Proceedings of the 51st International Conference on Parallel Processing (Bordeaux, France) (ICPP ’22). Association for Computing Machinery, New York, NY, USA, Article 28, 11 pages.Google Scholar
Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). 2789–2792.Google ScholarDigital Library
Ajit Mathew and Changwoo Min. 2020. HydraList: A Scalable in-Memory Index Using Asynchronous Updates and Partial Replication. Proc. VLDB Endow. 13, 9 (jun 2020), 1332–1345.Google ScholarDigital Library
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 985–1000.Google ScholarDigital Library
Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20). 308–320.Google ScholarDigital Library
Youyun Wang, Chuzhe Tang, Zhaoguo Wang, and Haibo Chen. 2020. SIndex: A Scalable Learned Index for String Keys. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems(APSys ’20). 17–24.Google ScholarDigital Library
Chaichon Wongkham, Baotong Lu, Chris Liu, Zhicong Zhong, Eric Lo, and Tianzheng Wang. 2022. Are Updatable Learned Indexes Ready?Proc. VLDB Endow. 15, 11 (sep 2022), 3004–3017.Google Scholar
Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (apr 2021), 1276–1288.Google ScholarDigital Library
Shangyu Wu, Yufei Cui, Jinghuan Yu, Xuan Sun, Tei-Wei Kuo, and Chun Jason Xue. 2022. NFL: Robust Learned Index via Distribution Transformation. Proc. VLDB Endow. 15, 10 (sep 2022), 2188–2200.Google ScholarDigital Library
Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis. 2023. FLIRT: A Fast Learned Index for Rolling Time frames. In Proceedings 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31, 2023, Julia Stoyanovich, Jens Teubner, Nikos Mamoulis, Evaggelia Pitoura, and Jan Mühlig (Eds.). OpenProceedings.org, 234–246.Google Scholar
Juncheng Yang, Yao Yue, and K. V. Rashmi. 2021. A Large-Scale Analysis of Hundreds of In-Memory Key-Value Cache Clusters at Twitter. ACM Trans. Storage 17, 3, Article 17 (aug 2021), 35 pages.Google ScholarDigital Library
Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In Proceedings of the 2016 International Conference on Management of Data(SIGMOD ’16). 1567–1581.Google ScholarDigital Library

Index Terms

DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains

Index terms have been assigned to the content through auto-classification.

Recommendations

SWIX: A Memory-efficient Sliding Window Learned Index
PACMMOD

Data stream processing systems enable querying over sliding windows of streams of data. Efficient index structures for the streaming window are a crucial building block to enable querying the sliding window for operations such as aggregation and joins. ...
Read More
Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

Phase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...
Read More
Write-Aware Management of NVM-based Memory Extensions
ICS '16: Proceedings of the 2016 International Conference on Supercomputing

Emerging Non-Volatile Memory (NVM) technologies, such as 3D XPoint, are expected to be in production as early as 2016. Emerging NVMs are very attractive for several reasons. First, they are non-volatile and hence incur no refresh power. Second, they are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing
August 2023
858 pages
ISBN:9798400708435
DOI:10.1145/3605573

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 September 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Index structure
NUMA-aware index
learned index
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 181
  Total Downloads
- Downloads (Last 12 months)181
- Downloads (Last 6 weeks)59
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

SWIX: A Memory-efficient Sliding Window Learned Index

Energy efficient Phase Change Memory based main memory for future high performance systems

Write-Aware Management of NVM-based Memory Extensions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

SWIX: A Memory-efficient Sliding Window Learned Index

Energy efficient Phase Change Memory based main memory for future high performance systems

Write-Aware Management of NVM-based Memory Extensions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media