Abstract
Big data is usually processed in a decentralized computational environment with a number of distributed storage systems and processing facilities to enable both online and offline data analysis. In such a context, data access is fundamental to enhance processing efficiency as well as the user experience inspecting the data and the caching system is a solution widely adopted in many diverse domains. In this context, the optimization of cache management plays a central role to sustain the growing demand for data. In this article, we propose an autonomous approach based on a Reinforcement Learning technique to implement an agent to manage the file storing decisions. Moreover, we test the proposed method in a real context using the information on data analysis workflows of the CMS experiment at CERN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adhikari, V.K., et al.: Unreeling netflix: understanding and improving multi-CDN movie delivery. In: 2012 Proceedings IEEE INFOCOM, pp. 1620–1628. IEEE (2012)
Ali, W., Shamsuddin, S.M., Ismail, A.S., et al.: A survey of web caching and prefetching. Int. J. Adv. Soft Comput. Appl. 3(1), 18–44 (2011)
Bird, I., Campana, S., Girone, M., Espinal, X., McCance, G., Schovancová, J.: Architecture and prototype of a WLCG data lake for HL-LHC. In: EPJ Web of Conferences, vol. 214, p. 04024. EDP Sciences (2019)
Chen, T.: Obtaining the optimal cache document replacement policy for the caching system of an EC website. Eur. J. Oper. Res. 181(2), 828–841 (2007)
Collaboration, C., et al.: The CMS experiment at the CERN LHC (2008)
Fanfani, A., et al.: Distributed analysis in CMS. J. Grid Comput. 8(2), 159–179 (2010)
Fang, H.: Managing data lakes in big data era: what’s a data lake and why has it became popular in data management ecosystem. In: 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 820–824. IEEE (2015)
Herodotou, H.: Autocache: employing machine learning to automate caching in distributed file systems. In: International Conference on Data Engineering Workshops (ICDEW), pp. 133–139 (2019)
Koskela, T., Heikkonen, J., Kaski, K.: Web cache optimization with nonlinear model using object features. Comput. Netw. 43(6), 805–817 (2003)
Kuznetsov, V., Li, T., Giommi, L., Bonacorsi, D., Wildish, T.: Predicting dataset popularity for the CMS experiment. arXiv preprint arXiv:1602.07226, 2016
Lei, L., You, L., Dai, G., Vu, T.X., Yuan, D., Chatzinotas, S.: A deep learning approach for optimizing content delivering in cache-enabled HetNet. In: 2017 International Symposium on Wireless Communication Systems (ISWCS), pp. 449–453. IEEE (2017)
Madera, C., Laurent, A.: The next information architecture evolution: the data lake wave. In: Proceedings of the 8th International Conference on Management of Digital EcoSystems, pp. 174–180 (2016)
Meoni, M., Perego, R., Tonellotto, N.: Dataset popularity prediction for caching of CMS big data. J. Grid Comput. 16(2), 211–228 (2018)
Narayanan, A., Verma, S., Ramadan, E., Babaie, P., Zhang, Z.-L.: Deepcache: a deep learning based framework for content caching. In: Proceedings of the 2018 Workshop on Network Meets AI & ML, pp. 48–53 (2018)
Sadeghi, A., Wang, G., Giannakis, G.B.: Deep reinforcement learning for adaptive caching in hierarchical content delivery networks. IEEE Trans. Cognit. Commun. Netw. 5(4), 1024–1033 (2019)
Skluzacek, T.J., Chard, K., Foster, I.: Klimatic: a virtual data lake for harvesting and distribution of geospatial data. In: 2016 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS), pp. 31–36. IEEE (2016)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Terrizzano, I.G., Schwarz, P.M., Roth, M., Colino, J.E.: Data wrangling: the challenging yourney from the wild to the lake. In: CIDR (2015)
Tian, G., Liebelt, M.: An effectiveness-based adaptive cache replacement policy. Microprocess. Microsyst. 38(1), 98–111 (2014)
Zhong, C., Gursoy, M.C., Velipasalar, S.: A deep reinforcement learning-based framework for content caching. In: 2018 52nd Annual Conference on Information Sciences and Systems (CISS), pp. 1–6. IEEE (2018)
Author information
Authors and Affiliations
Consortia
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tracolli, M., Baioletti, M., Poggioni, V., Spiga, D., on behalf of the CMS Collaboration. (2020). Caching Suggestions Using Reinforcement Learning. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_57
Download citation
DOI: https://doi.org/10.1007/978-3-030-64583-0_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)