TinyLFU-based semi-stream cache join for near-real-time data warehousing

Naeem, M. Asif; Waqar, Wasiullah; Mirza, Farhaan; Tahir, Ali

doi:10.1007/s00500-022-07475-0

TinyLFU-based semi-stream cache join for near-real-time data warehousing

Focus
Published: 11 September 2022

Volume 26, pages 11091–11103, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

M. Asif Naeem ORCID: orcid.org/0000-0001-6785-7875¹,
Wasiullah Waqar¹,
Farhaan Mirza² &
…
Ali Tahir³

200 Accesses
Explore all metrics

Abstract

Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed for making successful business decisions. Keeping this in mind, a famous algorithm called CACHEJOIN (Cache Join) was proposed. The limitation of the CACHEJOIN algorithm is that it does not deal with the frequently changing trends in a stream data efficiently. To overcome this limitation, in this paper, we propose a TinyLFU-CACHEJOIN algorithm, a modified version of the original CACHEJOIN algorithm, which is designed to enhance the performance of a CACHEJOIN algorithm. TinyLFU-CACHEJOIN employs an intelligent strategy which keeps only those records of R in the cache that have a high hit rate in S. This mechanism of TinyLFU-CACHEJOIN allows it to deal with the sudden and abrupt trend changes in S. We developed a cost model for our TinyLFU-CACHEJOIN algorithm and proved it empirically. We also assessed the performance of our proposed TinyLFU-CACHEJOIN algorithm with the existing CACHEJOIN algorithm on a skewed synthetic dataset. The experiments proved that TinyLFU-CACHEJOIN algorithm significantly outperforms the CACHEJOIN algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Cache-Based Semi-Stream Join to deal with Unmatched Stream Data

A Multi-way Semi-stream Join for a Near-Real-Time Data Warehouse

Supporting Real-Time Analytic Queries in Big and Fast Data Environments

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Enquiries about data availability should be directed to the authors.

References

Agrahari K, Singh D (2017) Realisation of cache optimisation using new technique. Int J Adv Res Comput Sci 8:750–752
Google Scholar
Arora R, Gupta M (2017) E-governance using data warehousing and data mining. Int J Comput Appl 169:28–31
Google Scholar
Aziz O, Anees T, Mehmood E (2021) An efficient data access approach with queue and stack in optimized hybrid join. IEEE Access 9:41261–41274
Article Google Scholar
Baig M, Shuib L, Yadegaridehkordi E (2019) Big data adoption: state of the art and research challenges. Inf Process Manag 56:102095
Article Google Scholar
Dobbie G, Naeem MA, Weber G (2011) Hybridjoin for near-real-time data warehousing. Int J Data Wareh Min (IJDWM) 7(4):21–42
Article Google Scholar
Einziger G, Friedman R, Manes B (2017) Tinylfu: a highly efficient cache admission policy. ACM Trans Storage 13:1–31
Article Google Scholar
Ferrer-i Cancho R, Vitevitch M (2018) The origins of zipf’s meaning-frequency law. J Am Soc Inf Sci 69:1369–1379
Google Scholar
Garani G, Chernov A, Savvas I, Butakova M (2019) A data warehouse approach for business intelligence. p 70–75
Gupta D, Batra S (2017) A short survey on bloom filter and its variants. p 1086–1092
Jain S, Sharma S (2018) Application of data warehouse in decision support and business intelligence system, pp 231–234
Kim H, Lee K (2020) Semi-stream similarity join processing in a distributed environment. IEEE Access 8:130194–130204
Article Google Scholar
Kim H-J, Lee K-H (2020) Semi-stream similarity join processing in a distributed environment. IEEE Access 8:130194–130204
Article Google Scholar
Kim K, Jeong Y, Lee Y, Lee S (2019) Analysis of counting bloom filters used for count thresholding. Electronics 8:779
Article Google Scholar
Kudagi S, Jayakumar N (2019) Survey on different cache replacement algorithms 7:10–13
Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60:293–303
Article Google Scholar
Martínez AB, Galvis-Lista EA, Florez LCG (2012) Modeling techniques for extraction transformation and load processes: a critical review, pp 41–47
Mehmood E, Anees T (2019) Performance analysis of not only sql semi-stream join using mongodb for real-time data warehousing. IEEE Access 7:134215–134225
Article Google Scholar
Naeem MA (2013) Efficient processing of semi-stream data. In: Eighth international conference on digital information management (ICDIM 2013), p 7–10
Naeem MA, Dobbie G, Weber G (2012) A lightweight stream-based join with limited resource consumption 7448:431–442
Naeem MA, Weber G, Lutteroth C (2019) A memory optimal many-to-many semi-stream join. Distrib Parallel Databases 37:623–649
Article Google Scholar
Patgiri R, Nayak S, Borgohain S (2018) Role of bloom filter in big data research: a survey. Int J Adv Comput Sci Appl 9:655–661
Google Scholar
Polyzotis N, Skiadopoulos S, Vassiliadis P, Simitsis A, Frantzell N (2008) Meshing streaming updates with persistent data in an active data warehouse. IEEE Trans Knowl Data Eng 20(7):976–991
Article Google Scholar
Ramakrishnan R, Gehrke J, Gehrke J (2003) Database management systems. McGraw-Hill, New York
MATH Google Scholar
Sabtu A et al. (2017) The challenges of extract, transform and loading (etl) system implementation for near real-time environment. p 1–5
Sarna G, Bhatia M (2018) Identification of suspicious patterns in social network using zipf’s law. p 957–962
Singh N, Agrahari K (2018) Enhanced performance of cache memory. Int J Adv Res Comput Sci 9:34–36
Vyas S, Vaishnav P (2017) A comparative study of various etl process and their testing techniques in data warehouse. J Stat Manag Syst 20(4):753–763
Wijaya R, Pudjoatmodjo B (2015) An overview and implementation of extraction-transformation-loading (etl) process in data warehouse (case study: department of agriculture). p 70–74
Zhang F, Chen H, Jin H (2019) Simois: a scalable distributed stream join system with skewed workloads. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), p 176–185

Download references

Funding

This is not a funded research.

Author information

Authors and Affiliations

School of Computing, National University of Computer and Emerging Sciences (NUCES), H-11/4, Islamabad, 40000, Pakistan
M. Asif Naeem & Wasiullah Waqar
SECMS, Auckland University of Technology, 6 Paul Street, Auckland, 1010, New Zealand
Farhaan Mirza
Institute of Geographical Information Systems, National University of Science and Technology, Islamabad, 40000, Pakistan
Ali Tahir

Authors

M. Asif Naeem
View author publications
You can also search for this author inPubMed Google Scholar
Wasiullah Waqar
View author publications
You can also search for this author inPubMed Google Scholar
Farhaan Mirza
View author publications
You can also search for this author inPubMed Google Scholar
Ali Tahir
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

MAN has a leading role in this research. He presented the idea and prepared the architecture for the proposed approach. WW as a master student implemented the algorithm and produced the initial performance results. FM contributed in performance tuning and proofreading the paper. AT helped in write up of the paper.

Corresponding author

Correspondence to M. Asif Naeem.

Ethics declarations

Conflict of interest

The authors have no conflict of interest with any editorial member of the journal.

Ethics approval

The research presented in the paper has no human involvement, and therefore no ethical approval is required.

Consent for publication

The authors approves the consent for publishing their work in this journal.

Additional information

Communicated by Sara Shahzad.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Naeem, M.A., Waqar, W., Mirza, F. et al. TinyLFU-based semi-stream cache join for near-real-time data warehousing. Soft Comput 26, 11091–11103 (2022). https://doi.org/10.1007/s00500-022-07475-0

Download citation

Accepted: 29 July 2022
Published: 11 September 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00500-022-07475-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TinyLFU-based semi-stream cache join for near-real-time data warehousing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Cache-Based Semi-Stream Join to deal with Unmatched Stream Data

A Multi-way Semi-stream Join for a Near-Real-Time Data Warehouse

Supporting Real-Time Analytic Queries in Big and Fast Data Environments

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now