Skip to main content
Log in

An ensembled data frequency prediction based framework for fast processing using hybrid cache optimization

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Technological advancements have led to an exponential growth in input–output intensive data that demands high-performance computing. To access the data, the frequency of firing the same query is quite high. Hence, predicting and prefetching these frequently-used queries can enhance the performance in terms of execution time and cache hit ratio. Therefore, a prediction-based framework has been proposed which initially, generates memory traces to identify the data usage patterns in terms of query frequency. The future query requests have been predicted and classified using an ensembled approach that yields 87.5% accuracy. It successfully reduces the error rate up to 11%. Furthermore, the predicted classified results have been tagged as hot and cold data on the basis of threshold frequency. The identified hot data has been prefetched into the cache that provides 96.5% cache hits with 9.7% decreased execution time. Hybrid cache replacement algorithm has been utilized to keep the cache updated with the hot data. The experimental results have been compared with the existing frameworks and benchmarks, which shows 6.8% improvement in accuracy with 9% increment in cache hits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Arora S, Bala A (2019) A survey: ICT enabled energy efficiency techniques for big data applications. Cluster Comput pp 1–22

  • Bala A, Chana I (2016) Prediction-based proactive load balancing approach through VM migration. Eng Comput 32(4):581–92

    Article  Google Scholar 

  • Banditwattanawong T (2012) From web cache to cloud cache. InInternational Conference on Grid and Pervasive Computing pp. 1-15

  • Busch A, Noorshams Q, Kounev S, Koziolek A, Reussner R, Amrehn E (2015) Automated workload characterization for i/o performance analysis in virtualized environments. InProceedings of the 6th ACM/SPEC International Conference on Performance Engineering,ACM, pp 265–276

  • Byna S, Chen Y, Sun XH (2009) Taxonomy of data prefetching for multicore processors. J Comput Sci Technol 24(3):405–17

    Article  Google Scholar 

  • Capra S (2013) Cloud computing trace characterization and synthetic workload generation 1–98

  • Chao W (2018) Web cache intelligent replacement strategy combined with GDSF and SVM network re-accessed probability prediction. J Ambient Intell Hum Comput pp. 1–7

  • Chilimbi TM (2001) Efficient representations and abstractions for quantifying and exploiting data reference locality. InACM SIGPLAN Notices, ACM 36(5):191–202

    Article  Google Scholar 

  • Chou HT, DeWitt DJ (1986) An evaluation of buffer management strategies for relational database systems. Algorithmica 1(1–4):311–36

    Article  MathSciNet  Google Scholar 

  • Colarelli D, Grunwald D (2002) Massive arrays of idle disks for storage archives. InSC’02: Proceedings of the ACM/IEEE Conference on Supercomputing IEEE pp. 47–47

  • Cordón O, Kazienko P, Trawiński B (2011) Special issue on hybrid and ensemble methods in machine learning. New Gen Comput 29(3):241–4

    Article  Google Scholar 

  • Daniel G, Sunyé G, Cabot J (2019) Advanced prefetching and caching of models with PrefetchML. Softw Syst Model 18(3):1773–94

    Article  Google Scholar 

  • Dorier M, Ibrahim S, Antoniu G, Ross R (2015) On the Use of Formal Grammars to Predict HPC I/O Behaviors pp 1–38

  • Du Z, Fan W, Chai Y, Chen Y (2013) Priori information and sliding window based prediction algorithm for energy-efficient storage systems in cloud. Simul Model Pract Theory 39:3–19

    Article  Google Scholar 

  • Frias-Martinez E, Karamcheti V (2002) A prediction model for user access sequences. Web Mining for Usage Patterns and User Profiles, In: WEBKDD Workshop, pp 1–11

  • Gad-ElRab AA, ElDahshan KA, Sobhi A (2016) A predictable markov based cache replacement scheme in mobile environments. Int J Comput Sci Inf Secur 14(4):15

    Google Scholar 

  • Galicia A, Talavera-Llames R, Troncoso A, Koprinska I, Martínez-Álvarez F (2019) Multi-step forecasting for big data time series based on ensemble learning. Knowl-Based Syst 163:830–41

    Article  Google Scholar 

  • Guttman D, Kandemir MT, Arunachalam M, Khanna R (2015) Machine learning techniques for improved data prefetching. In 5th International Conference on Energy Aware Computing Systems & Applications, IEEE pp. 1–4

  • Han WS, Moon YS, Whang KY (2003) PrefetchGuide: capturing navigational access patterns for prefetching in client/server object-oriented/object-relational DBMSs. Inf Sci 152:47–61

    Article  Google Scholar 

  • Haraty RA, Nahas LH (2018) A Recommended replacement algorithm for the scalable asynchronous cache consistency scheme. In: IT Convergence and Security pp. 88-9

  • Iqbal W, Erradi A, Mahmood A (2018) Dynamic workload patterns prediction for proactive auto-scaling of web applications. J Netw Comput Appl 124:94–107

    Article  Google Scholar 

  • Janjusic T, Kavi K (2013) Gleipnir: a memory profiling and tracing tool. ACM SIGARCH Comput Architect News 41(4):8–12

    Article  Google Scholar 

  • Johnson T, Shasha D (1994) 2Q: a low overhead high performance bu er management replacement algorithm. InProceedings of the 20th International Conference on Very Large Data Bases, pp. 439–450

  • Jung S, Lee Y, Song YH (2010) A process-aware hot/cold identification scheme for flash memory storage systems. IEEE Trans Consum Electron 56(2):339–47

    Article  Google Scholar 

  • Kang H, Wong JL (2013) vcsimx86: a cache simulation framework for x86 virtualization hosts. Stony Brook University

  • Kaur N, Bansal S, Bansal RK (2013) Task scheduling & energy conservation techniques for multiprocessor computing systems. International journal of networks and systems 2(2)

  • Khandelwal M (2011) Blast-induced ground vibration prediction using support vector machine. Eng Comput 27(3):193–200

    Article  Google Scholar 

  • Kim J, Park J, Park S (2017) Neural network for saturation prediction of solid state drives. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE , pp. 2069-2074

  • Liu C, Lv M, Pan Y, Chen H, Li Y, Li C, Xu Y (2018) LCR: Load-Aware Cache Replacement Algorithm for Flash-Based SSDs. IEEE International Conference on Networking. Architecture and Storage (NAS), IEEE, pp 1–10

  • Mun JH, Lim H (2017) Cache sharing using bloom filters in named data networking. J Netw Comput Appl 90:74–82

    Article  Google Scholar 

  • Nanda R, Sharma KS, Chande S (2016) Enhancing the query performance of NoSQL datastores using caching framework. Int J Comput Sci Inf Technol 7(5):2332–6

    Google Scholar 

  • Noorshams Q, Rostami K, Kounev S, Tuma P, Reussner R (2013) I/O performance modeling of virtualized storage systems. In2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, IEEE, pp. 121-130

  • Oly J, Reed DA (2002) Markov model prediction of I/O requests for scientific applications. In: Proceedings of the 16th international conference on Supercomputing, ACM pp. 147–155

  • Oneil EJ, Oneil PE, Weikum G (193) The LRU-K page replacement algorithm for database disk buffering. Acm Sigmod Record 22(2):297–306

  • Pareek NK, Patidar V, Sud KK (2006) Image encryption using chaotic logistic map. Image Vis Comput 24(9):926–934

    Article  Google Scholar 

  • Qadri MY, Qadri NN, Fleury M, McDonald-Maier KD (2017) Energy-efficient data prefetch buffering for low-end embedded processors. Microelectron J 62:57–64

    Article  Google Scholar 

  • Rathore N, Chana I (2015) Variable threshold-based hierarchical load balancing technique in Grid. Eng Comput 31(3):597–615

    Article  Google Scholar 

  • Sarwar S, Ul-Qayyum Z, Malik OA (2012) A hybrid intelligent system to improve predictive accuracy for cache prefetching. Expert Syst Appl 39(2):1626–36

    Article  Google Scholar 

  • Singh S, Chana I, Singh MZ (2013) language based an algorithm for event detection, analysis and classification in machine vision. In International Conference on Human Computer Interactions (ICHCI), IEEE, pp.1–7

  • Van Houdt B (2014) On the necessity of hot and cold data identification to reduce the write amplification in flash-based SSDs. Perform Eval 82:1–4

    Article  Google Scholar 

  • Wang L, von Laszewski G, Huang F, Dayal J, Frulani T, Fox G (2011) Task scheduling with ANN-based temperature prediction in a data center: a simulation-based study. Eng Comput 27(4):381–91

    Article  Google Scholar 

  • Wang H, Luo Z (2017) Data Cache Prefetching with Perceptron Learning. arXiv preprint arXiv:1712.00905

  • Witt C, Bux M, Gusew W, Leser U (2018) Predictive performance modeling for distributed computing using black-box monitoring and machine learning. arXiv preprint arXiv:1805.11877

  • Zhai J, Zang L, Zhou Z (2018) Ensemble dropout extreme learning machine via fuzzy integral for data classification. Neurocomputing 275:1043–52

    Article  Google Scholar 

  • Zhang L, Deng Y, Zhu W, Zhou J, Wang F (2015) Skewly replicating hot data to construct a power-efficient storage cluster. J Netw Comput Appl 50:168–79

    Article  Google Scholar 

  • Zhou Y, Philbin J, Li K (2001) The Multi-Queue Replacement Algorithm for Second Level Buffer Caches. InUSENIX Annual Technical Conference, General Track, pp. 91-104

Download references

Acknowledgements

One of the authors, Sumedha Arora offers the sincerest gratitude to the Council of Scientific and Industrial Research (CSIR), Government of India, for funding the research and providing the required resources to carry out this work with the ACK.NO: 143253/2K17/1 and File No: 09/677(0030)/2018-EMR-I.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anju Bala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arora, S., Bala, A. An ensembled data frequency prediction based framework for fast processing using hybrid cache optimization. J Ambient Intell Human Comput 12, 285–301 (2021). https://doi.org/10.1007/s12652-020-01973-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-01973-5

Keywords

Navigation