LIFOSS: a learned index scheme for streaming scenarios

Yu, Tong; Liu, Guanfeng; Liu, An; Li, Zhixu; Zhao, Lei

doi:10.1007/s11280-022-01021-6

LIFOSS: a learned index scheme for streaming scenarios

Published: 21 February 2022

Volume 26, pages 501–518, (2023)
Cite this article

World Wide Web Aims and scope Submit manuscript

Tong Yu¹,
Guanfeng Liu²,
An Liu ORCID: orcid.org/0000-0002-6368-576X¹,
Zhixu Li³ &
…
Lei Zhao¹

574 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Recently, researches on dynamic decision-making based on streaming data are in full swing. As an indispensable technology for data management and analysis, indexing methods are also evolving. The indexing paradigm named learned index has been proposed to replace the traditional B-tree family in some scenarios. It has been proved that learned indexes can provide higher lookup efficiency and lower storage cost overhead than traditional indexes. Usually, learned indexes assume that the data is static or at least the data distribution is unchanged. However, the streaming scenarios break the strong assumption. This paper presents a learned index scheme for streaming scenarios (LIFOSS for short), where the workloads insert, delete, and query data arbitrarily. Precisely, LIFOSS consists of three parts: a) an adaptive packed-memory array which stores data and handles updates with lower bound of performance guaranteed; b) a middle-layer model group, used to fit the cumulative distribution function of data; c) a feedback mechanism designed to update parameters of the model group above in real-time locally. Extensive experiments on two public datasets show that LIFOSS performs better than the state-of-the-art dynamic learned index method. LIFOSS reduces the lookup latency by at least \(6\%\), and its dynamic performance is more stable, requiring only a tiny amount of extra space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Big data analytics on Apache Spark

Article 13 October 2016

Stratified random sampling from streaming and stored data

Article 23 October 2020

References

Bender, M.A., Hu, H.: An adaptive packed-memory array. ACM Trans. Database Syst. 32(4), 26 (2007)
Article Google Scholar
BFerragina, P., HVinciguerra, G.: The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. VLDB 13(8), 1162–1175 (2020)
Google Scholar
Chen, J., Zhong, M., Li, J., Wang, D., Qian, T., Tu, H.: Effective deep attributed network representation learning with topology adapted smoothing. IEEE Trans Cybern., https://doi.org/10.1109/TCYB.2021.3064092 (2021)
Davitkova, A., Milchevski, E., Michel, S.: The ML-Index: a multidimensional, learned index for point, range, and nearest-neighbor queries. In: EDBT, pp. 407–410 (2020)
Ding, J., Minhas, U.F., Yu, J., Wang, C., Do J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D.B., Kraska, T., Fu, X., Xu, J., Lu, H.: ALEX: An updatable adaptive learned index. In: SIGMOD, pp. 969–984 (2019)
Ding, J., Nathan, V., Alizadeh, M., Kraska, T.: Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. VLDB 14(2), 74–86 (2020)
Google Scholar
Ferragina, P., Lillo, F., Vinciguerra, G.: Why are learned indexes so effective?. In: ICML, pp. 3123–3132 (2020)
Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: FITing-Tree: a data-aware index structure. In: SIGMOD, pp. 1189–1206 (2019)
Gao, Y., Ye, J., Gao, X., Chen, G.: Middle layer based scalable learned index scheme. Journal of Software 31(3), 620–633 (2020)
Google Scholar
He, Y., Sick, B.: Clear: An adaptive continual learning framework for regression tasks. arxiv:2101.00926 (2021)
Huszár, R.: On quadratic penalties in elastic weight consolidation. arxiv:1712.03847 (2017)
Kipf, A., Marcus, R., Renen, A.V., Stoian, M., Kemper, A., Kraska, T., Neumann, T.: Sosd: A benchmark for learned indexes. arxiv:1911.13014 (2019)
Kraska, T., Alizadeh, M., Beutel, A., Chi, E.H., Kristo, A., Leclerc, G., Madden, S., Mao, H., Nathan, V.: SageDB: a learned database system. In: CIDR (2019)
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: SIGMOD, pp. 489–504 (2018)
Leo, D.D., Boncz, P.A.: Packed memory arrays - rewired. In: ICDE, pp. 830–841 (2019)
Li, J., Cai, T., Ajmal, M., Li, R., Timos, S., Yu, J.: Holistic influence maximization for targeted advertisements in spatial social. In: ICDE, pp. 1340–1343 (2018)
Li, P., Lu, H., Zheng, Q., Yang, L., Pan, G.: LISA: A learned index structure for spatial data. In: SIGMOD, pp. 2119–2133 (2020)
Li, G., Zhou, X., Cao, L.: AI meets database: AI4DB and DB4AI. In: SIGMOD, pp. 2859–2866 (2021)
Li, Z., Wang, X., Li, J., Chen, Y., Zhang, Q.: Deep attributed network representation learning of complex coupling interaction. Knowl. Based Syst 212, 106618 (2021)
Article Google Scholar
Nathan, V., Ding, J., Alizadeh, M., Kraska, T.: Learning multi-dimensional indexes. In: SIGMOD, pp. 985–1000 (2020)
Pandey, V., Renen, A.V., Kipf, A., Ding, J., Sabek, I., Kemper, A.: The case for learned spatial indexes. In: VLDB, pp. 2119–2133 (2020)
Patrick, E., Cheng, E., Gawlick, D., Elizabeth, J.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)
Article MATH Google Scholar
Schwarz, J., Czarnecki, W., Luketina, J., Teh, Y.W., Pascanu, R., Hadsell, R.: Progress & Compress: A scalable framework for continual learning. In: ICML, pp. 4535–4544 (2018)
Wang, H., Fu, X., Xu, J., Lu, H.: Learned index for spatial queries. In: MDM, pp. 569–574 (2019)
Wu, J., Zhang, Y., Chen, S., Chen, Y., Wang, J., Xing, C.: Updatable learned index with precise positions. VLDB 14(8), 1276–1288 (2021)
Google Scholar
Xue, G., Zhong, M., Li, J., Chen, J., Zhai, C., Kong, R.: Dynamic network embedding survey. arxiv:2103.15447 (2021)
Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. In: TKDE, https://doi.org/10.1109/TKDE.2021.3101356 (2021)

Download references

Funding

This work was supported by the Natural Science Foundation of Jiangsu Province, China (Grant numbers BK20211307) and the Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China (Grant Nos. 18KJA520010, 19KJA610002).

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, China
Tong Yu, An Liu & Lei Zhao
Department of Computing, Macquarie University, Sydney, Australia
Guanfeng Liu
School of Computer Science, Fudan University, Shanghai, China
Zhixu Li

Authors

Tong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Guanfeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
An Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhixu Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to An Liu.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

This article belongs to the Topical Collection: Special Issue on Decision Making in Heterogeneous Network Data Scenarios and Applications

Guest Editors: Jianxin Li, Chengfei Liu, Ziyu Guan, and Yinghui Wu

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, T., Liu, G., Liu, A. et al. LIFOSS: a learned index scheme for streaming scenarios. World Wide Web 26, 501–518 (2023). https://doi.org/10.1007/s11280-022-01021-6

Download citation

Received: 04 October 2021
Revised: 21 January 2022
Accepted: 02 February 2022
Published: 21 February 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11280-022-01021-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LIFOSS: a learned index scheme for streaming scenarios

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LIFOSS: a learned index scheme for streaming scenarios

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

Stratified random sampling from streaming and stored data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation