Abstract
Heterogeneous distributed computing environments are emerging for developing data-intensive (big data) applications that require to access huge data files. Therefore, effective data management like efficient access and data availability has become critical requirement in these systems. Data replication is an essential technique applied to achieve these goals through storing multiple replicas in a wisely manner. There are replication algorithms that address some metrics such as reliability, availability, bandwidth consumption, storage usage, response time. In this paper, we present different issues involved in data replication and discuss the key points of the recent algorithms with a tabular representation of all those features. The focus of the review is the existing algorithms of data replication that are based on the meta-heuristic techniques. This review will enable the readers to see that previous studies contributed response time to the data replication, but the contribution of the energy consumption and security improvement has not been considerable well. Moreover, the impact of meta-heuristic algorithms on data replication performance is investigated in a simulation study. Finally, open issues and future challenges have been presented in this research work.


















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdi S, Mohamadi S (2010) Two level job scheduling and data replication in data grid. Int J Grid Comput Appl (IJGCA) 1:23–37
Ahmed Almezeini N, Hafez A (2017) Task scheduling in cloud computing using lion optimization algorithm. Int J Adv Comput Sci Appl 8(11):77–83
Al Jadaan O, Abdulal W, Abdul Hameed M, Jabas A (2010) Enhancing data selection using genetic algorithm. In: International conference on computational intelligence and communication networks
Alami Milani B, Navimipour N (2016) A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J Netw Comput Appl 64:229–238
Alghamdi M, Tang B, Chen Y (2017) Profit-based file replication in data intensive cloud data centers. In: IEEE international conference on communications
Ali M, Kashif B, Khan U, Bhardwaj V, Keqin L, Albert Z (2018a) DROPS: division and replication of data in cloud for optimal performance and security. IEEE Trans Cloud Comput 6:303–315
Ali M, Bilal K, Khan SU, Veeravalli B, Li K, Zomaya AY (2018b) DROPS: division and replication of data in cloud for optimal performance and security. IEEE Trans Cloud Comput 6(2):3030–3315
Aljoumah E, Al-Mousawi F, Ahmad I, Al-Shammri M, Al-Jady Z (2015) SLA in cloud computing architectures: a comprehensive study. Int J Grid Distrib Comput 8(5):7–32
Almomani O, Madi M (2014) A GA-based replica placement mechanism for data grid. Int J Adv Comput Sci Appl 5(10):1–6
Amjad T, Sher M, Daud A (2012) A survey of dynamic replication strategies for improving data availability in data grids. Future Gener Comput Syst 28:337–349
Anjum A, McClatchey R, Ali A, Willers I (2006) Bulk scheduling with the DIANA scheduler. IEEE Trans Nucl Sci 53:18–29
Aznoli F, Jafari Navimipour N (2017) Cloud services recommendation: reviewing the recent advances and suggesting the future research directions. J Netw Comput Appl 77:73–86
Bai X, Jin H, Liao X, Shi X, Shao Z (2013) RTRM: a response time-based replica management strategy for cloud storage system. In: Park JJ et al (eds) Grid and pervasive computing. Springer, Berlin, pp 124–133
Basturk B, Karaboga D (2006) An artificial bee colony (ABC) algorithm for numeric function optimization. IEEE Swarm Intell Symp 8:687–697
Bell WH, Cameron DG, Capozza L, Millar AP, Stockinger K, Zini F (2003) Optorsim: a grid simulator for studying dynamic data replication strategies. Int J High Perform Comput Appl 17(4):403–416
Bielik N, Ahmad I (2012) Cooperative versus non-cooperative game theoretical techniques for energy aware task scheduling. In: International green computing conference
Bilal K, Khan SU, Zhang L, Li H, Hayat K, Madani SA, Min-Allah N, Wang L, Chen D, Iqbal M, Xu CZ, Zomaya AY (2013) Quantitative comparisons of the state of the art data center architectures. Concurr Comput Pract Exp 25(12):1771–1783
Boru D, Kliazovich D, Granelli F, Bouvry P, Zomaya AY (2015) Energy-efficient data replication in cloud computing datacenters. Cluster Comput 18(1):385–402
Bsoul M, Al-Khasawneh A, Abdallah E, Kilani Y (2011) Enhanced fast spread replication strategy for data grid. J Netw Comput Appl 34:575–580
Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithm. Softw Pract Exp 41(1):23–50
Chunlin L, Ping WY, Hengliang T, Youlong L (2019) Dynamic multi-objective optimized replica placement and migration strategies for SaaS applications in edge cloud. Future Gener Comput Syst 100:921–937
Cui L, Zhang J, Yue L, Shi Y, Li H, Yuan D (2018) A genetic algorithm based data replica placement strategy for scientific applications in clouds. IEEE Trans Serv Comput 11(4):727–739
Dinesh Reddy V, Gangadharan GR, Subrahmanya G, Rao VRK (2019) Energy-aware virtual machine allocation and selection in cloud data centers. Soft Comput 23(6):1917–1932
Dokeroglu T, Sevinc E, Kucukyilmaz T, Cosar A (2019) A survey on new generation metaheuristic algorithms. Comput Ind Eng 137:106040
Dorigo M (1992) Optimization, learning and natural algorithms. Ph.D. thesis, Dipartimento di Elettronica, Politecnico di Milano, Italy
Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 4:28–39
Ebadi Y, Jafari Navimipour N (2018) An energy-aware method for data replication in the cloud environments using a Tabu search and particle swarm optimization algorithm. Concurr Comput Pract Exp 31:e4757
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science (MHS’95), pp 39–43
Ebrahimzade H, Khayati GR, Schaffie M (2018) A novel predictive model for estimation of cobalt leaching from waste Li-ion batteries: application of genetic programming for design. J Environ Chem Eng 6(4):3999–4007
Ebrahimzade H, Khayati GR, Schaffie M (2020) PSO–ANN-based prediction of cobalt leaching rate from waste lithium–ion batteries. J Mater Cycles Waste Manag 22(1):228–239
El-Henawy I, Abdelmegeed NA (2018) Meta-heuristics algorithms: a survey. Int J Comput Appl 179(22):45–54
Farzampour A, Khatibinia M, Mansouri I (2019) Shape optimization of butterfly-shaped shear links using grey wolf algorithm. Ingegneria Sismica 36(1):27–41
Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In: Grid computing environments workshop, pp 1–10
Gill NK, Singh S (2016) A dynamic, cost-aware, optimized data replication strategy for heterogeneous cloud data centers. Future Gener Comput Syst 65:10–32
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99
Goyal T, Singh A, Agrawal A (2012) Cloudsim: simulator for cloud computing infrastructure and modeling. Procedia Eng 38:3566–3572
Grace K, Rajkuma M, Sumeetha S, Selvanayaki P (2014) GA based replica selection in data grid. In: International conference on advances in engineering and technology
Hamrouni T, Slimani S, Ben Charrada F (2016) A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids. Eng Appl Artif Intell 48:140–158
Hashemi SM, Khatibi Bardsiri A (2012) Cloud computing vs. grid computing. ARPN J Syst Softw 2(5):188–194
Henry Holland J (1992) Adaptation in natural and artificial systems, 2nd edn. MIT Press, Cambridge
Huang X, Wu F (2018) A cost-effective data replica placement strategy based on hybrid genetic algorithm for cloud services. In: International conference on research and practical issues of enterprise information systems, pp 43–56
Hussain K, Najib Mohd Salleh M, Cheng S, Shi Y (2019) Metaheuristic research: a comprehensive survey. Artif Intell Rev 52(4):2191–2233
Jafari Navimipour N, Alami Milani B (2016) Replica selection in the cloud environments using an ant colony algorithm. In: Third international conference on digital information processing, data mining, and wireless communications, pp 105–110
Jayasree P, Saravanan V (2018) Apsdrdo: adaptive particle swarm division and replication of data optimization for security in cloud computing. IOSR J Eng
Junfeng T, Weiping L (2016) Pheromone-based genetic algorithm adaptive selection algorithm in cloud storage. Int J Grid Distrib Comput 9(6):269–278
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-TR06, Engineering Faculty, Computer Engineering Department, Erciyes University
Khalili Azimi S (2019) A bee colony (beehive) based approach for data replication in cloud environments. In: Kouhsari SM (ed) Fundamental research in electrical engineering. Springer, Singapore, pp 1039–1052
Khojand M, Fatan Serj M, Ashrafi S, Namaki V (2018) Predicting dynamic replication based on fuzzy system in data grid. arXiv:1804.02963
Kingsy Grace R, Manimegalai R (2014) Dynamic replica placement and selection strategies in data grids—a comprehensive survey. J Parallel Distrib Comput 74:2099–2108
Kliazovich D, Bouvry P, Khan SU (2012) GreenCloud: a packet-level simulator of energy-aware cloud computing data centers. J Supercomput 62:1263–1283
Kumar M, Sharma SC, Goel A, Singh SP (2019) Comprehensive survey for scheduling techniques in cloud computing. J Netw Comput Appl 143:1–33
Li R, Hu Y, Lee P (2017) Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Trans Parallel Distrib Syst 28(9):2500–2513
Limam S, Mokadem R, Belalem G (2019) Data replication strategy with satisfaction of availability, performance and tenant budget requirements. Cluster Comput 22(4):1199–1210
Liu L, Yang Y, Wang H, Tan Z, Li C (2017) A group based genetic algorithm data replica placement strategy for scientific workflow. In: 16th international conference on computer and information science, pp 459–464
Liu J, Shen H, Narman HS, Lin Z, Li Z (2018) Popularity-aware multi-failure resilient and cost-effective replication for high data durability in cloud storage. Trans Parallel Distrib Syst 30:2355–2369
Long SQ, Zhao YL, Chen W (2014) MORM: a multi-objective optimized replication management strategy for cloud storage cluster. J Syst Architect 60(2):234–244
Ma K, Yang B (2017) Stream-based live data replication approach of in-memory cache. Concurr Comput Pract Exp 29(11):1–9
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Mansouri N (2014) Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments. Front Comput Sci 8(3):391–408
Mansouri N (2016) Adaptive data replication strategy in cloud computing for performance improvement. Front Comput Sci 10(5):925–935
Mansouri Y, Buyya R (2018) Dynamic replication and migration of data objects with hot-spot and cold-spot statuses across storage data centers. J Parallel Distrib Comput 126:121–133
Mansouri N, Dastghaibyfard GH (2013) Enhanced dynamic hierarchical replication and weighted scheduling strategy in data grid. J Parallel Distrib Comput 73:534–543
Mansouri N, Javidi MM (2018a) An efficient data replication strategy in large-scale data grid environments based on availability and popularity. AUT J Model Simul 50(1):39–50
Mansouri N, Javidi MM (2018b) A new prefetching-aware data replication to decrease access latency in cloud environment. J Syst Softw 144:197–215
Mansouri N, Javidi MM (2018c) A hybrid data replication strategy with fuzzy-based deletion for heterogeneous cloud data centers. J Supercomput 74(10):5349–5372
Mansouri N, Javidi MM (2019) Cost-based job scheduling strategy in cloud computing environments. Distrib Parallel Databases. https://doi.org/10.1007/s10619-019-07273-y
Mansouri N, Dastghaibyfard GH, Horri A (2011) A novel job scheduling algorithm for improving data grid’s performance. In: International conference on P2P, parallel, grid, cloud and internet computing
Mansouri N, Kuchaki Rafsanjani M, Javidi MM (2017) DPRS: a dynamic popularity aware replication strategy with parallel download scheme in cloud environments. Simul Model Pract Theory 77:177–196
Mansouri N, Javidi MM, Mohammad Hasani Zade B (2019) Using data mining techniques to improve replica management in cloud environment. Soft Comput. https://doi.org/10.1007/s00500-019-04357-w
Mansouri N, Mohammad Hasani Zade B, Javidi MM (2019b) Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory. Comput Ind Eng 130:597–633
Masdari M, Salehi F, Jalali M, Bidaki M (2016) A survey of PSO-based scheduling algorithms in cloud computing. J Netw Syst Manag 25(1):122–158
Michael MA, Linton A, Michael F, Sebastien G (2010) Autonomic clouds on the grid. J Grid Comput 8:1–18
Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl Based Syst 89:228–249
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Mirzai NM, Zahrai SM, Bozorgi F (2017) Proposing optimum parameters of TMDs using GSA and PSO algorithms for drift reduction and uniformity. Struct Eng Mech 63(2):147–160
Mohammad Khanli L, Isazadeh A, Shishavan TN (2011) PHFS: a dynamic replication method, to decrease access latency in the multi-tier data grid. Future Gen Comput Syst 27(3):233–244
Mokadem R, Hameurlain A (2020) Data replication strategy with tenant performance and provider economic profit guarantees in cloud data centers. J Syst Softw 159:110447
Moura J, Hutchison D (2016) Review and analysis of networking challenges in cloud computing. J Netw Comput Appl 60:113–129
Muñoz VM, Carballeira FG (2006) PSO-LRU algorithm for data grid replication service. In: International conference on high performance computing for computational science, pp 656–669
Nadh Singh BR, Raja Srinivasa Reddy B (2017) A review on big data mining in cloud computing. In: Saini H, Sayal R, Rawat S (eds) Innovations in computer science and engineering. Springer, Singapore, pp 131–142
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18
Natesan G, Chokkalingam A (2019) Optimal task scheduling in the cloud environment using a mean grey wolf optimization algorithm. Int J Technol 10(1):126–136
Park AM, Kim JH, Go YB, Yoon WS (2003) Dynamic grid replication strategy based on internet hierarchy. In: International workshop on grid and cooperative computing, vol 1001, pp 1324–1331
Peraza C, Valdez F, Garcia M, Melin P, Castillo O (2016) A new fuzzy harmony search algorithm using fuzzy logic for dynamic parameter adaptation. Algorithms 9(4):69
Pitchai R, Babu S, Supraja P, Anjanayya S (2019) Prediction of availability and integrity of cloud data using soft computing technique. Soft Comput 23:8555–8562
Qu K, Meng L, Yang Y (2016) A dynamic replica strategy based on Markov model for Hadoop distributed file system, HDFS. In: International conference on cloud computing and intelligence systems, IEEE Computer Society Press, New York, pp 337–342
Rahman RM, Barker K, Alhajj R (2008) Replica placement strategies in data grid. J Grid Comput 6(1):103–123
Ranganathan K, Foster I (2001) Identifying dynamic replication strategies for a high performance data grid. In: International workshop on grid computing, pp 75–86
Ranganathan K, Foster I (2002) Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of 11th IEEE international symposium on high performance distributed computing (HPDC’02)
Rehman UU, Ali A, Anwar Z (2014) secCloudSim: secure cloud simulator. In: 12th international conference on frontiers of information technology, pp 208–213
Sadeghzadeh M, Navaezadeh S (2014) Improving replica in data grid by using firefly algorithm. In: International conference on challenges in IT, engineering and technology (ICCIET’2014), pp 17–18
Salem R, Salam MA, Abdelkader H, Awad A, Arafa A (2019) An artificial bee colony algorithm for data replication optimization in cloud environments. IEEE Access 7:1–12
Sang-Min P, Jair-Hoom K (2003) Chameleon: a resource scheduler in a data grid environment. In: Proceedings of third IEEE international symposium on cluster computing and the grid (CCGRID’03), pp 258–265
Saremi S, Mirjalili S, Lewis A (2017) Grasshopper optimization algorithm: theory and application. Adv Eng Softw 105:30–47
Séguéla M, Mokadem R, Pierson JM (2019) Comparing energy-aware vs. cost-aware data replication strategy. In: Tenth international green and sustainable computing conference (IGSC). IEEE, Alexandria, VA, USA
Shijie J, Yi P, Weisheng L, Liyin S (2010) Study on analyzing questionnaire survey by Monte Carlo simulation. In: International conference on E-business and E-government
Shojaatmand A, Saghiri N, Hashemi S, Abbasi Dezfoli M (2011) Improving replica selection in data grid using a dynamic ant algorithm. Int J Inf Stud 3(4):139
Shojaiemehr B, Rahmani AM, Nasih Qader N (2018) Cloud computing service negotiation: a systematic review. Comput Stand Interfaces 55:196–206
Shvachko K, Hairong K, Radia S, Chansler (2010) The Hadoop distributed file system. In: Proceedings of the 26th symposium on mass storage systems and technologies, pp 1–10
Singh Kushwah V, Kumar Goyal S, Sharma A (2018) Meta-heuristic techniques study for fault tolerance in cloud computing environment: a survey work. In: Ray K, Sharma T, Rawat S, Saini R, Bandyopadhyay A (eds) Soft computing: theories and applications. Springer, Singapore, pp 1–11
Sun M, Sun J, Lu E, Yu C (2005) Ant algorithm for file replica selection in data grid. In: First international conference on semantics, knowledge and grid
Sun DW, Chang GR, Gao S, Jin LZ, Wei Wang X (2012) Modeling a dynamic data replication strategy to increase system availability in cloud computing environments. J Comput Sci Technol 27(2):256–272
Taheri J, Choon Lee Y, Zomaya AY, Jay Siegel H (2013) A bee colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Comput Oper Res 40(6):1564–1578
Terry DB, Prabhakaran V, Kotla R, Balakrishnan M, Aguilera MK, Abu-Libdeh H (2013) Consistency-based service level agreements for cloud storage. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles
Tharani R (2016) Balanced ant colony optimization algorithm for job scheduling in grid computing. Int J Eng Res Technol 4(11):1–6
Tos U, Mokadem R, Hameurlain A, Ayav T, Bora S (2015) Dynamic replication strategies in data grid systems: a survey. J Supercomput 71(11):4116–4140
Tos U, Mokadem R, Hameurlain A, Ayav T, Bora S (2018) Ensuring performance and provider profit through data replication in cloud systems. Cluster Comput 21:1479–1492
Tsai CW, Rodrigues J (2014) Metaheuristic scheduling for cloud: a survey. IEEE Syst J 8(1):279–297
Tsai CW, Tsai PW, Pan JS, Chao HC (2015) Metaheuristics for the deployment problem of WSN: a review. Microprocess Microsyst 39(8):1305–1317
Tu M, Li P, Yen IL, Thuraisingham BM, Khan L (2010) Secure data objects replication in data grid. IEEE Trans Depend Secure Comput 7(1):50–64
Tziritas N, Kolodziej J, Zomaya AY, Madani SA, Min-Allah N, Wang L, Xu CZ, Marwan Malluhi Q, Pecero JE, Balaji P, Vishnu A, Ranjan R, Zeadally S, Li H (2015) Performance analysis of data intensive cloud systems based on data management and replication: a survey. Distrib Parallel Databases 34(2):179–215
Wang L, Luo J, Shen J, Dong F (2013) Cost and time aware ant colony algorithm for data replica in alpha magnetic spectrometer experiment. In: IEEE international congress on big data, pp 247–254
Wei Q, Veeravalli B, Gong B, Zeng L, Feng D (2010) CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: IEEE international conference on cluster computing, pp 188–196
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Wu X (2016) Data sets replicas placements strategy from cost-effective view in the cloud. Sci Program 11:1–13
Wu X (2017) Combination replicas placements strategy for data sets from cost-effective view in the cloud. Int J Comput Intell Syst 10:521–539
Xu Q, Xu Z, Wang T (2015) A data-placement strategy based on genetic algorithm in cloud computing. Int J Intell Sci 5:145–157
Yang X-S (2009) Firefly algorithms for multimodal optimization. In: International symposium on stochastic algorithms, pp 169–178
Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010), pp 65–74
Yang XS (2013) Firefly algorithm: recent advances and applications. Int J Swarm Intell 1(1):36–50
Yang L, Lin J, Zheng Y (2013) A replica selection strategy on ant-algorithm in data-intensive applications. Int J Online Eng 9:38–41
Yang J, Jiang B, Lv Z, Raymond Choo KK (2020) A task scheduling algorithm considering game theory designed for energy management in cloud computing. Future Gen Comput Syst 105:985–992
Yuan D, Yang Y, Liu X, Chen JJ (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214
Zhang B, Wang X, Huang M (2014) A data replica placement scheme for cloud storage under healthcare IoT environment. Appl Mech Mater 556–562:5511–5517
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
N. Mansouri and M.M. Javidi declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mansouri, N., Javidi, M.M. A review of data replication based on meta-heuristics approach in cloud computing and data grid. Soft Comput 24, 14503–14530 (2020). https://doi.org/10.1007/s00500-020-04802-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-04802-1