Abstract
The learned index is a new index structure that uses a trained model to directly predict the position of a key and thus has high query performance. However, static learned indexes cannot handle insert operations. Although static PGM-index uses a dynamic data structure to support inserts, it faces a serious read amplification problem under read-write workloads, as the inefficient lookup process of the buffers diminishes the learned indexes. Besides, this structure also leads to periodic retraining of the internal PGM-indexes because the buffers and the learned indexes are strongly coupled, which is unacceptable for those static learned indexes that need tuning. Obviously, this structure is not an ideal general framework. In this paper, we propose a two-layer Hybrid Index Framework (HIF) to address such issues. Specifically, the dynamic layer is used as a buffer for inserts, and the static layer consisting of static learned indexes is used for lookups only. HIF effectively alleviates read amplification by searching the static layer directly. And with this hierarchical structure, HIF isolates learned indexes from insert operations. Thus HIF can completely avoid the retraining of the learned indexes by transformation strategy from the dynamic layer to the static layer. Moreover, we provide a self-tuning algorithm for the learned indexes that cannot be built in a single pass over the data, allowing them to be applied to dynamic workloads with low training overhead. We have conducted experiments using multiple datasets and workloads and the results show that on average, three HIF-based static learned indexes, HLI, PGM, and RMI, achieve up to 1.8 \(\times \), 1.7 \(\times \), and 1.5 \(\times \) higher throughput than the original dynamic PGM-index for insert ratio below 70%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: SIGMOD, pp. 489—504 (2018)
Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: Fiting-tree: A data-aware index structure. In: SIGMOD, pp. 1189—1206 (2019)
Ferragina, P., Vinciguerra, G.: The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow. 13, 1162–1175 (2020)
Ding, Y., Zhao, X., Jin, P.: An error-bounded space-efficient hybrid learned index with high lookup performance. In: DEXA, pp. 216–228. Springer (2022)
Bingmann, T.: STX B+ Tree (2013). https://panthema.net/2007/stx-btree
Marcus, R., et al.: Benchmarking learned indexes. Proc. VLDB Endow. 14, 1–13 (2020)
Wongkham, C., Lu, B., Liu, C., Zhong, Z., Lo, E., Wang, T.: Are updatable learned indexes ready? Proc. VLDB Endow. 15, 3004–3017 (2022)
Xie, Q., Pang, C., Zhou, X., Zhang, X., Deng, K.: Maximum error-bounded piecewise linear representation for online stream approximation. VLDB J. 23, 915–937 (2014)
Li, X., Li, J., Wang, X.: Aslm: Adaptive single layer model for learned index. In: DASFAA Workshops, pp. 80–95 (2019)
Li, P., Hua, Y., Jia, J., Zuo, P.: Finedex: a fine-grained learned index scheme for scalable and concurrent memory systems. Proc. VLDB Endow. 15, 321–334 (2021)
Ding, J., et al.: ALEX: an updatable adaptive learned index. In: SIGMOD, pp. 969–984 (2020)
Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C.: Updatable learned index with precise positions. Proc. VLDB Endow. 14, 1276–1288 (2021)
Tang, C., et al.: Xindex: a scalable learned index for multicore data storage. In: PPoPP, pp. 308—320 (2020)
Lu, B., Ding, J., Lo, E., Minhas, U.F., Wang, T.: Apex: a high-performance learned index on persistent memory. Proc. VLDB Endow. 15, 597–610 (2021)
Zhang, Z., et al.: Plin: a persistent learned index for non-volatile memory with high performance and instant recovery. Proc. VLDB Endow. 16, 243–255 (2022)
Zhang, J., Gao, Y.: Carmi: a cache-aware learned index with a cost-based construction algorithm. Proc. VLDB Endow. 15, 2679–2691 (2021)
Kipf, A., Marcus, R., van Renen, A., Stoian, M., Kemper, A., Kraska, T., Neumann, T.: RadixSpline: a single-pass learned index. In: aiDM@SIGMOD, pp. 1–5 (2020)
Acknowledgements
This paper is supported by the Humanities and Social Sciences Foundation of the Ministry of Education (17YJCZH260), and the Sichuan Science and Technology Program (2020YFS0057).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ding, Y., Zhao, X. (2024). A High-Performance Hybrid Index Framework Supporting Inserts for Static Learned Indexes. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14333. Springer, Singapore. https://doi.org/10.1007/978-981-97-2387-4_30
Download citation
DOI: https://doi.org/10.1007/978-981-97-2387-4_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2386-7
Online ISBN: 978-981-97-2387-4
eBook Packages: Computer ScienceComputer Science (R0)