COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance

Zhang, Zhou; Jin, Pei-Quan; Wang, Xiao-Liang; Lv, Yan-Qi; Wan, Shou-Hong; Xie, Xi-Ke

doi:10.1007/s11390-021-1348-2

COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance

Regular Paper
Published: 30 July 2021

Volume 36, pages 721–740, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zhou Zhang^1,2,
Pei-Quan Jin^1,2,
Xiao-Liang Wang^1,2,
Yan-Qi Lv^1,2,
Shou-Hong Wan^1,2 &
…
Xi-Ke Xie¹

473 Accesses
9 Citations
Explore all metrics

Abstract

The recently proposed learned index has higher query performance and space efficiency than the conventional B+-tree. However, the original learned index has the problems of insertion failure and unbounded query complexity, meaning that it supports neither insertions nor bounded query complexity. Some variants of the learned index use an out-of-place strategy and a bottom-up build strategy to accelerate insertions and support bounded query complexity, but introduce additional query costs and frequent node splitting operations. Moreover, none of the existing learned indices are cache-friendly. In this paper, aiming to not only support efficient queries and insertions but also offer bounded query complexity, we propose a new learned index called COLIN (Cache-cOnscious Learned INdex). Unlike previous solutions using an out-of-place strategy, COLIN adopts an in-place approach to support insertions and reserves some empty slots in a node to optimize the node’s data placement. In particular, through model-based data placement and cache-conscious data layout, COLIN decouples the local-search boundary from the maximum error of the model. The experimental results on five workloads and three datasets show that COLIN achieves the best read/write performance among all compared indices and outperforms the second best index by 18.4%, 6.2%, and 32.9% on the three datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

Article Open access 12 April 2024

A survey on data storage and placement methodologies for Cloud-Big Data ecosystem

Article Open access 11 February 2019

References

Kraska T, Beutel A, Chi E H, Dean J, Polyzotis N. The case for learned index structures. In Proc. the 2018 International Conference on Management of Data, Jun. 2018, pp.489-504. https://doi.org/10.1145/3183713.3196909.
Galakatos A, Markovitch M, Binnig C, Fonseca R, Kraska T. FITing-Tree: A data-aware index structure. In Proc. the 2019 International Conference on Management of Data, Jun. 2019, pp.1189-1206. https://doi.org/10.1145/3299869.3319860.
Ferragina P, Vinciguerra G. The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, 2020, 13(8): 1162-1175. https://doi.org/10.14778/3389133.3389135.
Ding J, Minhas U F, Yu J et al. ALEX: An updatable adaptive learned index. In Proc. the 2020 ACM International Conference on Management of Data, Jun. 2020, pp.969-984. https://doi.org/10.1145/3318464.3389711.
Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q V, Hinton G E, Dean J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proc. the 5th International Conference on Learning Representations, April 2017.
Liu X, Lin Z, Wang H. Novel online methods for time series segmentation. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1616-1626. https://doi.org/10.1109/TKDE.2008.29.
Article Google Scholar
Xu Z, Zhang R, Ramamohanarao K, Parampalli U. An adaptive algorithm for online time series segmentation with error bound guarantee. In Proc. the 15th International Conference on Extending Database Technology, Mar. 2012, pp.192-203. https://doi.org/10.1145/2247596.2247620.
Xie Q, Pang C, Zhou X, Zhang X, Deng K. Maximum error-bounded piecewise linear representation for online stream approximation. The VLDB Journal, 2014, 23(6): 915-937. https://doi.org/10.1007/s00778-014-0355-0.
Article Google Scholar
Bentley J L, Yao A C. An almost optimal algorithm for unbounded searching. Information Processing Letters, 1976, 5(3): 82-87. https://doi.org/10.1016/0020-0190(76)90071-5.
Article MathSciNet MATH Google Scholar
Hadian A, Heinis T. Considerations for handling updates in learned index structures. In Proc. the 2nd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Jul. 2019, Article No. 3. https://doi.org/10.1145/3329859.3329874.
Li X, Li J, Wang X. ASLM: Adaptive single layer model for learned index. In Proc. the 2019 International Conference on Database Systems for Advanced Applications, Apr. 2019, pp.80-95. https://doi.org/10.1007/978-3-030-18590-9_6.
O’Neil P, Cheng E Y, Gawlick D, Oneil E. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996, 33(4): 351-385. https://doi.org/10.1007/s002360050048.
Article MATH Google Scholar
Bender M A, Hu H. An adaptive packed-memory array. ACM Transactions on Database Systems, 2007, 32(4): Article No. 26. https://doi.org/10.1145/1292609.1292616.
Ailamaki A, DeWitt D, Hill M, Wood D. DBMSs on a modern processor: Where does time go? In Proc. the 25th International Conference on Very Large Data Bases, Sept. 1999, pp.266-277.
Hadian A, Heinis T. Shift-Table: A low-latency learned index for range queries using model correction. In Proc. the 24th International Conference on Extending Database Technology, Mar. 2021, pp.253-264. https://doi.org/10.5441/002/edbt.2021.23.
Tang C, Wang Y, Hu G et al. XIndex: A scalable learned index for multicore data storage. In Proc. the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2020, pp.308-320. https://doi.org/10.1145/3332466.3374547.
Kipf A, Marcus R, van Renen A, Stoian M, Kemper A, Kraska T, Neumann T. RadixSpline: A single-pass learned index. In Proc. the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Jun. 2020, Article No. 5. https://doi.org/10.1145/3401071.3401659.
Neumann T, Michel S. Smooth interpolating histograms with error guarantees. In Proc. the 25th British National Conference on Databases, July 2008, pp.126-138. https://doi.org/10.1007/978-3-540-70504-8_12.
Bilgram R. Cost models for learned index with insertions [Master Thesis]. Department of Computer Science, Aalborg University, 2019.
Wang Y, Tang C, Wang Z, Chen H. SIndex: A scalable learned index for string keys. In Proc. the 11th ACM SIGOPS Asia-Pacific Workshop on Systems, Aug. 2020, pp.17-24. https://doi.org/10.1145/3409963.3410496.
Llaveshi A, Sirin U, Ailamaki A, West R. Accelerating B+-tree search by using simple machine learning techniques. In Proc. the 1st International Workshop on Applied AI for Database Systems and Applications, Aug. 2019.
Hadian A, Heinis T. Interpolation-friendly B-trees: Bridging the gap between algorithmic and learned indexes. In Proc. the 22nd International Conference on Extending Database Technology, Mar. 2019, pp.710-713. https://doi.org/10.5441/002/edbt.2019.93.
Hadian A, Heinis T. MADEX: Learning-augmented algorithmic index structures. In Proc. the 2nd International Workshop on Applied AI for Database Systems and Applications, Aug. 2020.
Li P, Lu H, Zheng Q, Yang L, Pan G. LISA: A learned index structure for spatial data. In Proc. the 2020 International Conference on Management of Data, Jun. 2020, pp.2119-2133. https://doi.org/10.1145/3318464.3389703.
Qi J, Liu G, Jensen C S, Kulik L. Effectively learning spatial indices. Proceedings of the VLDB Endowment, 2020, 13(11): 2341-2354. https://doi.org/10.14778/3407790.3407829.
Nathan V, Ding J, Alizadeh M, Kraska T. Learning multidimensional indexes. In Proc. the 2020 International Conference on Management of Data, Jun. 2020, pp.985-1000. https://doi.org/10.1145/3318464.3380579.
Ding J, Nathan V, Alizadeh M, Kraska T. Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. Proceedings of the VLDB Endowment, 2020, 14(2): 74-86. https://doi.org/10.14778/3425879.3425880.
Zhou X, Chai C, Li G, Sun J. Database meets artificial intelligence: A survey. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2020.2994641.
Sun J, Li G. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 2019, 13(3): 307-319. https://doi.org/10.14778/3368289.3368296.
Rodriguez L V, Yusuf F, Lyons S, Paz E, Rangaswami R, Liu J, Zhao M, Narasimhan G. Learning cache replacement with CACHEUS. In Proc. the 19th USENIX Conference on File and Storage Technologies, Feb. 2021, pp.341-354.
Zhou X, Sun J, Li G, Feng J. Query performance prediction for concurrent queries using graph embedding. Proceedings of the VLDB Endowment, 2020, 13(9): 1416-1428. https://doi.org/10.14778/3397230.3397238.
Fan J, Liu T, Li G, Chen J, Shen Y, Du X. Relational data synthesis using generative adversarial networks: A design space exploration. Proceedings of the VLDB Endowment, 2020, 13(11): 1962-1975. https://doi.org/10.14778/3407790.3407802.
Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symposium on Cloud Computing, Jun. 2010, pp.143-154. https://doi.org/10.1145/1807128.1807152.
Jin P, Ou Y, Härder T, Li Z. AD-LRU: An efficient buffer replacement algorithm for ash-based databases. Data & Knowledge Engineering, 2012, 72: 83-102. https://doi.org/10.1016/j.datak.2011.09.007.

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
Zhou Zhang, Pei-Quan Jin, Xiao-Liang Wang, Yan-Qi Lv, Shou-Hong Wan & Xi-Ke Xie
Key Laboratory of Electromagnetic Space Information, Chinese Academy of Sciences, Hefei, 230026, China
Zhou Zhang, Pei-Quan Jin, Xiao-Liang Wang, Yan-Qi Lv & Shou-Hong Wan

Authors

Zhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pei-Quan Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Qi Lv
View author publications
You can also search for this author in PubMed Google Scholar
Shou-Hong Wan
View author publications
You can also search for this author in PubMed Google Scholar
Xi-Ke Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pei-Quan Jin.

Supplementary Information

ESM 1

(PDF 158 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Jin, PQ., Wang, XL. et al. COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance. J. Comput. Sci. Technol. 36, 721–740 (2021). https://doi.org/10.1007/s11390-021-1348-2

Download citation

Received: 01 February 2021
Accepted: 13 June 2021
Published: 30 July 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11390-021-1348-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance

Abstract

Access this article

Similar content being viewed by others

In-memory database acceleration on FPGAs: a survey

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

A survey on data storage and placement methodologies for Cloud-Big Data ecosystem

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance

Abstract

Access this article

Similar content being viewed by others

In-memory database acceleration on FPGAs: a survey

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

A survey on data storage and placement methodologies for Cloud-Big Data ecosystem

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation