Abstract
As a fast-rising storage model in recent years, cloud storage has adopted a “pay-as-you-go” approach to provide users with highly reliable, highly available, low-cost, and secure storage services that have received widespread attention and use within enterprises and individuals. Data replica management technology, as an essential part of cloud storage systems, has irreplaceable advantages in improving cluster fault tolerance and availability, so it has become the focus of many experts and scholars. It replicates multiple data blocks and places them in various nodes in the cluster; this makes the data more secure and reliable and improves the access rate while ensuring system load balance. Data replica technology runs through the process from replica creation to consistency maintenance. Each part of it has an essential impact on the performance of the cloud storage system. This article focuses on the dynamic decision of the number of data replicas in the cloud storage system. Considering the shortcomings of the static replica strategy, a dynamic decision strategy for the number of replicas based on the popularity of the data is proposed. By using the gray prediction model GM (1, 1) to predict the future data access frequency and using the Markov model to modify the prediction result, the data can be divided into hot data and non-hot data according to the predicted value, thereby determining the data replica number. Finally, through simulation experiments, the experimental results of the static replica strategy and the data hot-based replica number dynamic decision strategy are compared and analyzed.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Chang W, Wang P (2019) Write-aware replica placement for cloud computing. IEEE J Sel Areas Commun 37(3):656–667
Fu X, Li J, Liu W, Deng S and Wang J, (2019) Data replica placement policy based on load balance in cloud storage system. In: 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, pp 682–685
Ibrahim IA and Bassiouni M (2019) Improvement of data throughput in data-intensive cloud computing applications. In: 2019 IEEE Fifth International Conference on Big Data Computing Service and Applications (Big Data Service), Newark, CA, USA, pp 49–53
Aral A, Ovatman T (2018) A decentralized replica placement algorithm for edge computing. IEEE Trans Netw Serv Manage 15(2):516–529
Cui L, Zhang J, Yue L, Shi Y, Li H, Yuan D (2018) A genetic algorithm based data replica placement strategy for scientific applications in clouds. IEEE Trans Serv Comput 11(4):727–739
Guerrero C, Lera I, Bermejo B, Juiz C (2018) Multi-objective optimization for virtual machine allocation and replica placement in virtualized hadoop. IEEE Trans Parallel Distrib Syst 29(11):2568–2581
L. Zhang et al., 2018 Cost-effective and traffic-optimal data placement strategy for cloud-based online social networks. In: 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design (CSCWD), Nanjing, pp 110–115
Ping Z, Shengxi L (2018) Research on optimization of HDFS dynamic copy factor. Comput Technol Develop 28(7):68–72
Gongli LI, Xiaoyan Z, Hui L (2015) A cloud computing data copy dynamic management strategy. J Henan Norm Univ Nat Sci Edit 43(4):138–143
Song Z, Qingwei D, Jing S et al (2015) Cloud-based hot data copy factor decision algorithm based on prediction. Comput Mod 2:62–66
Lili X, Hong L, Jin L (2019) Research on population prediction based on grey prediction and radial basis network. Comput Sci 46(S1):431–435
Bingzhou W, Ruixia S (2016) Forecast of my country’s energy demand based on combined model. Math Pract Theory 46(20):45–53
Jiayue L, Bo Z, Xiang LI et al (2020) Anomaly prediction method of network traffic based on deep learning. Comput Eng Appl 56(6):39–50
Fuzhong L (2019) Prediction and simulation of damage rate of high energy consumption equipment in physics laboratory. Comput Simul 36(3):235–238
Liu Y Wu CQ Wang M Hou A and Wang Y (2018) On a Dynamic data placement strategy for heterogeneous hadoop clusters. In: 2018 International Symposium on Networks, Computers and Communications (ISNCC), Rome pp 1-7
Zhao Y, Li C, Li L and Zhang P, (2017) Dynamic replica creation strategy based on file hot and node load in hybrid cloud. In: 2017 19th International Conference on Advanced Communication Technology (ICACT), Bongpyeong, pp 213–220
Qu K Meng L Yang Y (2016) A dynamic replica strategy based on Markov model for hadoop distributed file system (HDFS). In: 2016 4th International Conference on Cloud Computing and Intelligence Systems (CCIS), Beijing, pp 337–342
Liu X, Hu Z, Pan S (2016) Control strategy for the number of replica in smart city cloud stroage system. Geomat Inform Sci Wuhan Univ 41(9):1205–1210
Huo L, Yi R. (2015) Research on replica strategy in cloud storage System[C]. In: 2015 International Conference on Computer Science and Applications (CSA). IEEE
Cai CX, ABAD CL, (2013) Campbell RH. Storage-efficient data replica number computation for multi-level priority data in distributed storage systems[C]. In: Dependable Systems and Networks Workshop (DSN-W). 2013 43rd Annual IEEE/IFIP Conference on IEEE
Guo L, Yang S, Wang S (2012) Replica deletion strategy based on gray prediction theory and cost in P2P network[C]. In: Computer Science Service System (CSSS). 2012 International Conference on IEEE
Deng M and Dong Y (2019) Application of improved grey GM (1, 1) model in power prediction of wind farm. In: 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China, pp 3764–3769
Liu S, Lin C and Yang Y, (2017) Several problems need to be studied in grey system theory. In: 2017 International Conference on Grey Systems and Intelligent Services (GSIS), Stockholm, pp 1–3
Xu H Wang G Luo L Lei M (2018)The design of reliability simulation of cloud system in the cloudsim. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, pp 215–219
He Q, Zhang F, Bian G et al (2022) Real-time network virtualization based on SDN and docker container. Cluster Comput. https://doi.org/10.1007/s10586-022-03731-y
He Q, Bian G, Zhang W et al (2022) RTFTL: design and implementation of real-time FTL algorithm for flash memory. J Super Comput 78:18959–18993
Scheid EJ, Rodrigues BB, Granville LZ and Stiller B (2019) Enabling dynamic SLA compensation using blockchain-based smart contracts, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Arlington, VA, USA, pp 53-61
Dahdouh K, Dakkak A, Oughdir L et al (2019) Large-scale e-learning recommender system based on Spark and Hadoop. J Big Data 6(1):1–23
Tao XU, Yuanyuan SUN (2020) LU Min Airline passenger flow forecast based on grey neural network. Comput Appl Softw 37(1):31–36
Acknowledgements
The authors would like to thank anonymous referees for their invaluable suggestions and comments. This work is supported by the National Natural Science Foundation of China (61872284); Industrial field of general projects of science and Technology Department of Shaanxi Province (2023-YBGY-203); Industrialization Project of Shaanxi Provincial Department of Education (21JC017); "Thirteenth Five-Year" National Key R&D Program Project (Project Number: 2019YFD1100901); Yulin science and technology project(2019-175); Natural Science Foundation of Shannxi Province, China(2014JM2-6127); The project sponsored by the scientific research Foundation for the returned overseas Chinese scholars, SEM No. [2014] 1685.
Author information
Authors and Affiliations
Contributions
Qinlu He and Zhen Li contributed significantly to Design Algorithm. Genqing Bian and Weiqi Zhang performed the perfor analysis and wrote the manuscript. Fan Zhang and Chen Chen contributed to manuscript preparation.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Conflict of interest
I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethics approval and consent to participate
The authors declare that they have no conflicts of interest.
Consent for publication
I have read and understood the publishing policy, and submit this manuscript in accordance with this policy. The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, Q., Zhang, F., Bian, G. et al. Dynamic decision-making strategy of replica number based on data hot. J Supercomput 79, 9584–9603 (2023). https://doi.org/10.1007/s11227-022-05029-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-05029-7