Abstract
In recent years, cloud storage systems have emerged as a promising technology for storing data blocks on various cloud servers. One of the main mechanisms in cloud storage systems is data replication, for which various solutions are proposed. Data replication's main target is achieving higher performance for data-intensive applications by addressing some critical challenges of this criterion, such as availability, reliability, security, bandwidth, and response time of data access. However, to the best of the author’s knowledge, there is no systematic, comprehensive, and complete survey in the cloud data replication despite its impacts and maturity. This paper presents a comprehensive survey and classification of state-of-the-art data replication schemes among different existing cloud computing solutions in the form of a classical classification to define current schemes on the topic and present open issues. The presented classification comprises three main classes; data deduplication schemes, data auditing schemes, and data handling schemes. A complete comparative comparison of the replication schemes highlights their main properties, such as utilized classes, type of the scheme, the place of implementation, evaluation tools, and their advantages and weaknesses. Finally, open issues and future uncovered or weakly covered research challenges are discussed, and the survey will be concluded.

















Similar content being viewed by others
References
Wang, L., Ranjan, R., Chen, J., Benatallah, B. (eds.): Cloud Computing: Methodology, Systems, and Applications. CRC Press, Boca Raton (2017)
Ghobaei-Arani, M., Souri, A.: LP-WSC: a linear programming approach for web service composition in geographically distributed cloud environments. J. Supercomput. 75(5), 2603–2628 (2019)
Rittinghouse, J.W., Ransome, J.F.: Cloud Computing: Implementation, Management, and Security. CRC Press (2016)
Ghobaei-Arani, M., Souri, A., Baker, T., Hussien, A.: ControCity: an autonomous approach for controlling elasticity using buffer Management in Cloud Computing Environment. IEEE Access 7, 106912–106924 (2019)
Aslanpour, M.S., Ghobaei-Arani, M., Toosi, A.N.: Auto-scaling web applications in clouds: a cost-aware approach. J. Netw. Comput. Appl. 95, 26–41 (2017)
Mokadem, R., Hameurlain, A.: A data replication strategy with tenant performance and provider economic profit guarantees in Cloud data centers. J. Syst. Softw. 159, 110447 (2020)
Lee, C.A., Bohn, R.B., Michel, M.: The NIST cloud federation reference architecture 5. NIST Spec. Publ. 500, 332 (2020)
Ghobaei-Arani, M., Shahidinejad, A.: An efficient resource provisioning approach for analyzing cloud workloads: a metaheuristic-based clustering approach. J. Supercomput. 77(1), 711–750 (2021)
Escamilla-Ambrosio, P.J., Rodríguez-Mota, A., Aguirre-Anaya, E., Acosta-Bermejo, R., Salinas-Rosales, M.: Distributing Computing in the internet of things: cloud, fog and edge computing overview. In NEO 2016 (pp. 87–115). Springer, Cham (2018)
Shakarami, A., Ghobaei-Arani, M. and Shahidinejad, A.: A survey on the computation offloading approaches in mobile edge computing: a machine learning-based perspective. Computer Networks, p. 107496 (2020)
Shakarami, A., Ghobaei-Arani, M., Masdari, M., Hosseinzadeh, M.: A survey on the computation offloading approaches in mobile edge/cloud computing environment: a stochastic-based perspective. J Grid Comput 1–33 (2020)
Shakarami, A., Shahidinejad, A., Ghobaei-Arani, M.: A review on the computation offloading approaches in mobile edge computing: a game-theoretic perspective. Practice and Experience, Software (2020)
Aslanpour, M.S., Ghobaei-Arani, M., Heydari, M., Mahmoudi, N.: LARPA: A learning automata-based resource provisioning approach for massively multiplayer online games in cloud environments. Int. J. Commun. Syst. 32(14), e4090 (2019)
Wang, B., Wang, C., Huang, W., Song, Y., Qin, X.: A survey and taxonomy on task offloading for edge-cloud computing. IEEE Access 8, 186080–186101 (2020)
Shakarami, A., Shahidinejad, A., Ghobaei-Arani, M.: An autonomous computation offloading strategy in Mobile Edge Computing: A deep learning-based hybrid approach. J. Netw. Comput. Appl. 178, 102974 (2021)
Uthayakumar, J., Vengattaraman, T., Dhavachelvan, P.: A survey on data compression techniques: from the perspective of data quality, coding schemes, data type and applications. J. King Saud Univ. Comput. Inf. Sci. 33, 119 (2018)
Widodo, R.N., Lim, H., Atiquzzaman, M.: SDM: smart deduplication for mobile cloud storage. Futur. Gener. Comput. Syst. 70, 64–73 (2017)
Kaur, R., Chana, I., Bhattacharya, J.: Data deduplication techniques for efficient cloud storage management: a systematic review. J. Supercomput. 74(5), 2035–2085 (2018)
Aslanpour, M.S., Toosi, A.N., Gaire, R., Cheema, M.A.: Auto-scaling of Web Applications in Clouds: A Tail Latency Evaluation. In: 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC) (pp. 186–195). IEEE (2020)
Milani, B.A., Navimipour, N.J.: A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J. Netw. Comput. Appl. 64, 229–238 (2016)
Ghobaei-Arani, M.: A workload clustering based resource provisioning mechanism using Biogeography based optimization technique in the cloud based systems. Soft Comput. 25, 1–18 (2020)
Ghobaei-Arani, M., Khorsand, R., Ramezanpour, M.: An autonomous resource provisioning framework for massively multiplayer online games in cloud environment. J. Netw. Comput. Appl. 142, 76–97 (2019)
Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic approach for resource provisioning of cloud services. Clust. Comput. 19(3), 1017–1036 (2016)
Aslanpour, M.S., Dashti, S.E., Ghobaei-Arani, M., Rahmanian, A.A.: Resource provisioning for cloud applications: a 3-D, provident and flexible approach. J. Supercomput. 74(12), 6470–6501 (2018)
Al Ridhawi, I., Al Ridhawi, Y.: A cache-node selection mechanism for data replication and service composition within cloud-based systems. In: 2017 Ninth International Conference on Ubiquitous and Future Networks (ICUFN) (pp. 726–731). IEEE (2017)
Zheng, Z., Lyu, M.R.: A distributed replication strategy evaluation and selection framework for fault tolerant web services. In: 2008 IEEE international conference on web services (pp. 145–152). IEEE (2008)
Ghobaei‐Arani, M., Rahmanian, A.A., Souri, A., Rahmani, A.M.: A moth‐flame optimization algorithm for web service composition in cloud computing: simulation and verification. Software Pract. Exper. 48(10), 1865–1892 (2018). https://doi.org/10.1002/spe.2598
Stiemer, A., Fetai, I., Schuldt, H.: Comparison of eager and quorum-based replication in a cloud environment. In: 2015 IEEE international conference on big data (big data) (pp. 1738–1748). IEEE (2015)
Peluso, S., Romano, P., Quaglia, F.: Score: a scalable one-copy serializable partial replication protocol. In: ACM/IFIP/USENIX international conference on distributed systems platforms and open distributed processing (pp. 456–475). Springer, Berlin (2012)
Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic resource provisioning approach for service-based cloud applications: a hybrid approach. Future Gener. Comput. Syst. 78, 191–210 (2018). https://doi.org/10.1016/j.future.2017.02.022
Huang, W., Wang, H., Zhang, Y., Zhang, S.: A novel cluster computing technique based on signal clustering and analytic hierarchy model using hadoop. Clust. Comput. 22(6), 13077–13084 (2019)
Arthanari, J., Baskaran, R.: Enhancement of video streaming analysis using cluster-computing framework. Clust. Comput. 22(2), 3771–3781 (2019)
Kimura, M., Zhao, X, Nakagawa, T.: Reliability analysis of a cloud computing system with replication: using Markov renewal processes. In: Principles of performance and reliability modeling and evaluation (pp. 401–423). Springer, Cham (2016)
Aslanpour, M.S., Gill, S.S., Toosi, A.N.: Performance evaluation metrics for cloud, fog and edge computing: a review, taxonomy, benchmarks and standards for future research. Internet Things (2020). https://doi.org/10.1016/j.iot.2020.100273
Ulabedin, Z., Nazir, B.: Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform. J. Supercomput. 10, 1–30 (2021). https://doi.org/10.1007/s11227-020-03541-2
Shahidinejad, A., Ghobaei-Arani, M., Masdari, M.: Resource provisioning using workload clustering in cloud computing environment: a hybrid approach. Cluster Comput 24(1), 319–342 (2021). https://doi.org/10.1007/s10586-020-03107-0
Aslanpour, M.S., Toosi, A.N., Taheri, J., Gaire, R.: AutoScaleSim: A simulation toolkit for auto-scaling Web applications in clouds. Simul. Model. Pract. Theory 108, 102245 (2021)
Yi, M., Wei, J., Song, L.: Efficient integrity verification of replicated data in cloud computing system. Comput. Secur. 65, 202–212 (2017)
Mansouri, N., Javidi, M.M.: A review of data replication based on meta-heuristics approach in cloud computing and data grid. Soft Comput. 24, 14503 (2020)
Gill, N.K., Singh, S.: A dynamic, cost-aware, optimized data replication strategy for heterogeneous cloud data centers. Futur. Gener. Comput. Syst. 65, 10–32 (2016)
Li, C., Wang, C., Tang, H., Luo, Y.: Scalable and dynamic replica consistency maintenance for edge-cloud system. Futur. Gener. Comput. Syst. 101, 590–604 (2019)
Eck, O., Schaefer, D.: A semantic file system for integrated product data management. Adv. Eng. Inform. 25(2), 177–184 (2011)
Sehgal, P., Basu, S., Srinivasan, K., Voruganti, K.: An empirical study of file systems on NVM. In: 2015 31st symposium on mass storage systems and technologies (MSST) (pp. 1–14). IEEE (2015)
Fu, S., He, L., Huang, C., Liao, X., Li, K.: Performance optimization for managing massive numbers of small files in distributed file systems. IEEE Trans. Parallel Distrib. Syst. 26(12), 3433–3448 (2014)
Slimani, S., Hamrouni, T., Ben Charrada, F.: Service-oriented replication strategies for improving quality-of-service in cloud computing: a survey. Clust. Comput. 24, 361 (2020)
Li, C., Song, M., Zhang, M., Luo, Y.: Effective replica management for improving reliability and availability in edge-cloud computing environment. J. Parallel Distrib. Comput. 143, 107 (2020)
Shao, Y., Li, C., Tang, H.: A data replica placement strategy for IoT workflows in collaborative edge and cloud environments. Comput. Netw. 148, 46–59 (2019)
Dabas, C., Aggarwal, J.: Delayed replication algorithm with dynamic threshold for cloud datacenters. In: Applications of computing, automation and wireless systems in electrical engineering (pp. 625–637). Springer, Singapore (2019)
Ebadi, Y., Jafari Navimipour, N.: An energy-aware method for data replication in the cloud environments using a Tabu search and particle swarm optimization algorithm. Concurr. Comput. 31(1), e4757 (2019)
Mazumdar, S., Seybold, D., Kritikos, K., Verginadis, Y.: A survey on data storage and placement methodologies for cloud-big data ecosystem. J. Big Data 6(1), 15 (2019)
Hsieh, H.C., Chiang, M.L.: The incremental load balance cloud algorithm by using dynamic data deployment. J. Grid Comput. 17(3), 553–575 (2019)
John, S.N., Mirnalinee, T.T.: A novel dynamic data replication strategy to improve access efficiency of cloud storage. Information Systems and e-Business Management, pp. 1–22 (2019)
Mohammadi, B., Navimipour, N.J.: Data replication mechanisms in the peer-to-peer networks. Int. J. Commun Syst 32(14), e3996 (2019)
Mansouri, N., Javidi, M.M., Zade, B.M.H.: Using data mining techniques to improve replica management in cloud environment. Soft Comput. 24, 7335 (2019)
Campêlo, R.A., Casanova, M.A., Guedes, D.O., Laender, A.H.: A brief survey on replica consistency in cloud environments. J. Internet Serv. Appl. 11(1), 1–13 (2020)
Khelaifa, A., Benharzallah, S., Kahloul, L., Euler, R., Laouid, A., Bounceur, A.: A comparative analysis of adaptive consistency approaches in cloud storage. J. Parallel Distrib. Comput. 129, 36–49 (2019)
Li, C., Bai, J., Chen, Y., Luo, Y.: Resource and replica management strategy for optimizing financial cost and user experience in edge cloud computing system. Inf. Sci. 516, 33–55 (2020)
Li, C., Wang, Y., Chen, Y., Luo, Y.: Energy-efficient fault-tolerant replica management policy with deadline and budget constraints in edge-cloud environment. J. Netw. Comput. Appl. 143, 152–166 (2019)
Guo, J., Li, C., Luo, Y.: Fast replica recovery and adaptive consistency preservation for edge cloud system. Soft Comput. 24, 14943 (2020)
Luo, L., Xing, L., Levitin, G.: Optimizing dynamic survivability and security of replicated data in cloud systems under co-residence attacks. Reliab. Eng. Syst. Saf. 192, 106265 (2019)
Mansouri, N., Rafsanjani, M.K., Javidi, M.M.: DPRS: A dynamic popularity aware replication strategy with parallel download scheme in cloud environments. Simul. Model. Pract. Theory 77, 177–196 (2017)
Li, K., Tang, Y., Chen, J., Yuan, Z., Xu, C., Xu, J.: Cost-effective data feeds to blockchains via workload-adaptive data replication. In: Proceedings of the 21st international middleware conference (pp. 371–385) (2020)
Sun, S., Yao, W., Qiao, B., Zong, M., He, X., Li, X.: RRSD: a file replication method for ensuring data reliability and reducing storage consumption in a dynamic Cloud-P2P environment. Futur. Gener. Comput. Syst. 100, 844–858 (2019)
Limam, S., Mokadem, R., Belalem, G.: Data replication strategy with satisfaction of availability, performance and tenant budget requirements. Clust. Comput. 22(4), 1199–1210 (2019)
Hema, S., Kangaiammal, A. (2019) Distributed storage hash algorithm (DSHA) for file-based deduplication in cloud computing. In: International conference on computer networks and inventive communication technologies (pp. 572–581). Springer, Cham (2019)
Rani, I.S., Venkateswarlu, B.: A systematic review of different data compression technique of cloud big sensing data. In: International conference on computer networks and inventive communication technologies (pp. 222–228). Springer, Cham (2019)
Yin, J., Tang, Y., Deng, S., Bangpeng, Z., Zomaya, A.: MUSE: a multi-tierd and SLA-driven deduplication framework for cloud storage systems. IEEE Trans. Comput. 70, 759 (2020)
Pooranian, Z., Shojafar, M., Garg, S., Taheri, R., Tafazolli, R.: LEVER: secure deduplicated cloud storage with encrypted two-party interactions in cyber-physical systems. IEEE Trans. Ind. Inf. (2020). https://doi.org/10.1109/TII.2020.3021013
Saharan, S., Somani, G., Gupta, G., Verma, R., Gaur, M.S., Buyya, R.: QuickDedup: efficient VM deduplication in cloud computing environments. J. Parallel Distrib. Comput. 139, 18–31 (2020)
Kavya, V., Sumathi, R., Shwetha, A.N.: A survey on data auditing approaches to preserve privacy and data integrity in cloud computing. In: International conference on sustainable communication networks and application (pp. 108–118). Springer, Cham (2019)
Dwivedi, A.K., Kumar, N., Pathela, M.: Distributed and lazy auditing of outsourced data. In: International conference on distributed computing and internet technology (pp. 364–379). Springer, Cham (2020)
Lin, Y., Li, J., Jia, X., Ren, K.: Multiple-replica integrity auditing schemes for cloud data storage. Concurr. Comput. (2019)
Castro-Medina, F., Rodríguez-Mazahua, L., Abud-Figueroa, M.A., Romero-Torres, C., Reyes-Hernández, L.Á., Alor-Hernández, G.: Application of data fragmentation and replication methods in the cloud: a review. In: 2019 international conference on electronics, communications and computers (CONIELECOMP) (pp. 47–54). IEEE (2019)
Souravlas, S., Sifaleras, A.: Trends in data replication strategies: a survey. Int. J. Parallel Emergent Distrib. Syst. 34(2), 222–239 (2019)
Dabas, C., Aggarwal, J.: An intensive review of data replication algorithms for cloud systems. In: Emerging research in computing, information, communication and applications (pp. 25–39). Springer, Singapore (2019)
Tran, T., Pham, D.T., Duong, Q., Mai, A.: An adaptive hash-based text deduplication for ADS-B data-dependent trajectory clustering problem. In: 2019 IEEE-RIVF international conference on computing and communication technologies (RIVF) (pp. 1–6). IEEE (2019)
Wu, S., Li, K.C., Mao, B., Liao, M.: DAC: improving storage availability with deduplication-assisted cloud-of-clouds. Futur. Gener. Comput. Syst. 74, 190–198 (2017)
Zhou, Y., Feng, D., Hua, Y., Xia, W., Fu, M., Huang, F., Zhang, Y.: A similarity-aware encrypted deduplication scheme with flexible access control in the cloud. Futur. Gener. Comput. Syst. 84, 177–189 (2018)
Tang, Y., Yin, J., Deng, S. and Li, Y.: DIODE: dynamic inline-offline de duplication providing efficient space-saving and read/write performance for primary storage systems. In: 2016 IEEE 24th international symposium on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS) (pp. 481–486). IEEE (2016)
Li, S., Xu, C., Zhang, Y.: CSED: client-side encrypted deduplication scheme based on proofs of ownership for cloud storage. J. Inf. Secur. Appl. 46, 250–258 (2019)
Widodo, R.N., Lim, H., Atiquzzaman, M.: A new content-defined chunking algorithm for data deduplication in cloud storage. Futur. Gener. Comput. Syst. 71, 145–156 (2017)
Jiang, T., Chen, X., Ma, J.: Public integrity auditing for shared dynamic cloud data with group user revocation. IEEE Trans. Comput. 65(8), 2363–2373 (2015)
Ding, R., Xu, Y., Cui, J., Zhong, H.: A public auditing protocol for cloud storage system with intrusion-resilience. IEEE Syst. J. 14(1), 633–644 (2019)
Hu, C., Xu, Y., Liu, P., Yu, J., Guo, S., Zhao, M.: Enabling cloud storage auditing with key-exposure resilience under continual key-leakage. Inf. Sci. 520, 15–30 (2020)
Patil, J.M., Chaudhari, S.S.: Efficient privacy preserving and dynamic public auditing for storage cloud. In: 2019 international conference on nascent technologies in engineering (ICNTE) (pp. 1–6). IEEE (2019)
Wang, H., Qin, H., Zhao, M., Wei, X., Shen, H., Susilo, W.: Blockchain-based fair payment smart contract for public cloud storage auditing. Inf. Sci. 519, 348–362 (2020)
Yu, H., Cai, Y., Sinnott, R.O., Yang, Z.: ID-based dynamic replicated data auditing for the cloud. Concurr. Comput. 31(11), e5051 (2019)
Ali, S.A., Ramakrishnan, M.: Secure provable data possession scheme with replication support in the cloud using Tweaks. Clust. Comput. 22(1), 1113–1123 (2019)
Zhang, Y., Ni, J., Tao, X., Wang, Y., Yu, Y.: Provable multiple replication data possession with full dynamics for secure cloud storage. Concurr. Comput. 28(4), 1161–1173 (2016)
Li, C., Chen, Y., Tan, P., Yang, G.: Towards comprehensive provable data possession in cloud computing. Wuhan Univ. J. Nat. Sci. 18(3), 265–271 (2013)
Abbes, H., Louati, T., Cérin, C.: Dynamic replication factor model for Linux containers-based cloud systems. J. Supercomput. 76, 7219 (2020)
Ao, W.C., Psounis, K.: Resource-constrained replication strategies for hierarchical and heterogeneous tasks. IEEE Trans. Parallel Distrib. Syst. 31(4), 793–804 (2019)
Boru, D., Kliazovich, D., Granelli, F., Bouvry, P., Zomaya, A.Y.: Models for efficient data replication in cloud computing datacenters. In: 2015 IEEE international conference on communications (ICC) (pp. 6056–6061). IEEE (2015)
Zou, X., Pan, J., Du, W., Chen, S.: Elastic database replication in the cloud. In: International conference on algorithms and architectures for parallel processing (pp. 667–681). Springer, Cham (2015)
Alghamdi, M., Tang, B., Chen, Y.: Profit-based file replication in data intensive cloud data centers. In: 2017 IEEE International Conference on Communications (ICC) (pp. 1–7). IEEE (2017)
Armknecht, F., Barman, L., Bohli, J.M., Karame, G.O.: Mirror: enabling proofs of data replication and retrievability in the cloud. In: 25th {USENIX} security symposium ({USENIX} security 16) (pp. 1051–1068) (2016)
Nivetha, N.K., Vijayakumar, D.: Modeling fuzzy based replication strategy to improve data availabiity in cloud datacenter. In: 2016 international conference on computing technologies and intelligent data engineering (ICCTIDE'16) (pp. 1–6). IEEE (2016)
Jayalakshmi, D.S., Ranjana, T.R., Ramaswamy, S.: Dynamic data replication across geo-distributed cloud data centres. In: International conference on distributed computing and internet technology (pp. 182–187). Springer, Cham (2016)
Jayalakshmi, D.S., TP, RR, Srinivasan, R.:, Dynamic data replication strategy in cloud environments. In: 2015 fifth international conference on advances in computing and communications (ICACC) (pp. 102–105). IEEE (2015)
Rajalakshmi, A., Vijayakumar, D., Srinivasagan, K.G.: An improved dynamic data replica selection and placement in cloud. In 2014 international conference on recent trends in information technology (pp. 1–6). IEEE (2014)
Kumar, K.A., Quamar, A., Deshpande, A., Khuller, S.: SWORD: workload-aware data placement and replica selection for cloud data management systems. VLDB J. 23(6), 845–870 (2014)
Zhang, T.: A QoS-enhanced data replication service in virtualised cloud environments. Int. J. Netw. Virtual Organ. 22(1), 1–16 (2020)
Mansouri, N.: Adaptive data replication strategy in cloud computing for performance improvement. Front. Comp. Sci. 10(5), 925–935 (2016)
Huang, Y., Huang, J., Liu, C., Zhang, C.: PFPMine: a parallel approach for discovering interacting data entities in data-intensive cloud workflows. Futur. Gener. Comput. Syst. 113, 474–487 (2020)
Atrey, A., Van Seghbroeck, G., Mora, H., De Turck, F., Volckaert, B.: SpeCH: a scalable framework for data placement of data-intensive services in geo-distributed clouds. J. Netw. Comput. Appl. 142, 1–14 (2019)
Ko, A.C., Zaw, W.T.: Fault tolerant erasure coded replication for HDFS based cloud storage. In: 2014 IEEE fourth international conference on big data and cloud computing (pp. 104–109). IEEE (2014)
Janpet, J., Wen, Y.F.: Reliable and available data replication planning for cloud storage. In: 2013 IEEE 27th international conference on advanced information networking and applications (AINA) (pp. 772–779). IEEE (2013)
Lin, J.W., Chen, C.H., Chang, J.M.: QoS-aware data replication for data-intensive applications in cloud computing systems. IEEE Trans. Cloud Comput. 1(1), 101–115 (2013)
Fu, S., He, L., Liao, X., Huang, C.: Developing the Cloud-integrated data replication framework in decentralized online social networks. J. Comput. Syst. Sci. 82(1), 113–129 (2016)
Cidon, A., Stutsman, R., Rumble, S., Katti, S., Ousterhout, J., Rosenblum, M.: MinCopysets: derandomizing replication in cloud storage. In: Proc. 10th USENIX Symp. NSDI (pp. 1–5) (2013)
Xie, F., Yan, J., Shen, J.: Towards cost reduction in cloud-based workflow management through data replication. In: 2017 fifth international conference on advanced cloud and big data (CBD) (pp. 94–99). IEEE (2017)
Djebbar, E.I., Belalem, G.: Optimization of tasks scheduling by an efficacy data placement and replication in cloud computing. In: International conference on algorithms and architectures for parallel processing (pp. 22–29). Springer, Cham (2013)
Ibrahim, I.A., Dai, W., Bassiouni, M.: Intelligent data placement mechanism for replicas distribution in cloud storage systems. In: 2016 IEEE international conference on smart cloud (smartcloud) (pp. 134–139). IEEE (2016)
Dai, W., Ibrahim, I., Bassiouni, M.: A new replica placement policy for hadoop distributed file system. In: 2016 IEEE 2nd international conference on big data security on cloud (bigdatasecurity), IEEE international conference on high performance and smart computing (HPSC), and IEEE international conference on intelligent data and security (IDS) (pp. 262–267). IEEE (2016)
Zhang, H., Lin, B., Liu, Z., Guo, W.: Data replication placement strategy based on bidding mode for cloud storage cluster. In: 2014 11th web information system and application conference (pp. 207–212). IEEE (2014)
Khalajzadeh, H., Yuan, D., Grundy, J., Yang, Y.: Improving cloud-based online social network data placement and replication. In: 2016 IEEE 9th international conference on cloud computing (CLOUD) (pp. 678–685). IEEE (2016)
Li, C., Wang, Y., Tang, H., Luo, Y.: Dynamic multi-objective optimized replica placement and migration strategies for SaaS applications in edge cloud. Futur. Gener. Comput. Syst. 100, 921–937 (2019)
da Silva, G.H.G., Holanda, M., Araujo, A.: Data replication policy in a cloud computing environment. In: 2016 11th Iberian conference on information systems and technologies (CISTI) (pp. 1–6). IEEE (2016)
Bacis, E., di Vimercati, S.D.C., Foresti, S., Paraboschi, S., Rosa, M., Samarati, P.: Dynamic allocation for resource protection in decentralized cloud storage. In: 2019 IEEE global communications conference (GLOBECOM) (pp. 1–6). IEEE (2019)
Khalili Azimi, S.: A Bee Colony (Beehive) based approach for data replication in cloud environments. In: Fundamental research in electrical engineering: the selected papers of the first international conference on fundamental research in electrical engineering (pp. 1039–1052). Springer Singapore (2019)
Long, S.Q., Zhao, Y.L., Chen, W.: MORM: a multi-objective optimized replication management strategy for cloud storage cluster. J. Syst. Architect. 60(2), 234–244 (2014)
Liu, J., Shen, H.: A popularity-aware cost-effective replication scheme for high data durability in cloud storage. In: 2016 IEEE international conference on big data (big data) (pp. 384–389). IEEE (2016)
Gill, N.K., Singh, S.: Dynamic cost-aware re-replication and rebalancing strategy in cloud system. In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications (FICTA) 2014 (pp. 39–47). Springer, Cham (2015)
Lee, H.C., Ahn, H.B., Lee, M.J.: Operation atomicity and storage replication in a collaborative middleware based on cloud storage. In: Future Information Technology (pp. 833–840). Springer, Berlin (2014)
Satpute, S., Deora, B.S.: Efficient replication of cloud data for mobile devices. In: 2014 international conference on issues and challenges in intelligent computing techniques (ICICT) (pp. 299–302). IEEE (2014)
Shwe, T., Aritsugi, M.: Proactive re-replication strategy in HDFS based cloud data center. In: Proceedings of the10th international conference on utility and cloud computing (pp. 121–130) (2017)
Chen, L., Hoang, D.B.: Adaptive data replicas management based on active data-centric framework in cloud environment. In: 2013 IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (pp. 101–108). IEEE (2013)
Huang, K., Li, D.: MRMS: a MOEA-based replication management scheme for cloud storage system. In: 2015 IEEE/CIC international conference on communications in China (ICCC) (pp. 1–6). IEEE (2015)
Liu, X., Harwood, A., Karunasekera, S., Rubinstein, B., Buyya, R.: E-storm: Replication-based state management in distributed stream processing systems. In: 2017 46th international conference on parallel processing (ICPP) (pp. 571–580). IEEE (2017)
Edwin, E.B., Umamaheswari, P., Thanka, M.R.: An efficient and improved multi-objective optimized replication management with dynamic and cost aware strategies in cloud computing data center. Clust. Comput. 22(5), 11119–11128 (2019)
Casas, I., Taheri, J., Ranjan, R., Wang, L., Zomaya, A.Y.: A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Futur. Gener. Comput. Syst. 74, 168–178 (2017)
Sookhtsaraei, R., Artin, J., Ghorbani, A., Faraahi, A., Adineh, H.: A locality-based replication manager for data cloud. Front. Inf. Technol. Electron. Eng. 17(12), 1275–1286 (2016)
Stiemer, A., Fetai, I., Schuldt, H.: Analyzing the performance of data replication and data partitioning in the cloud: the BEOWULF approach. In: 2016 IEEE international conference on big data (Big Data) (pp. 2837–2846). IEEE (2016)
Mostafa, N.: A dynamic approach for consistency service in cloud and fog environment. In: 2020 fifth international conference on fog and mobile edge computing (FMEC) (pp. 28–33). IEEE (2020)
Luo, S., Hou, M., Zhan, S., Lyu, M., Li, M.: Consistency maintenance in replication: a novel strategy based on diamond topology in cloud storage. Chin. J. Electron. 26(1), 192–198 (2017)
Basu, S., Pattnaik, P.K.: A consistency preservation based approach for data-intensive cloud computing environment. In: 2017 8th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–5). IEEE (2017)
Wang, H., Li, J., Zhang, H., Zhou, Y.: Benchmarking replication and consistency strategies in cloud serving databases: Hbase and cassandra. In: Workshop on big data benchmarks, performance optimization, and emerging hardware (pp. 71–82). Springer, Cham (2014)
Zhu, Z., Qi, G., Zheng, M., Sun, J., Chai, Y.: Blockchain based consensus checking in decentralized cloud storage. Simul. Model. Pract. Theory 102, 101987 (2020)
Mseddi, A., Salahuddin, M.A., Zhani, M.F., Elbiaze, H., Glitho, R.H.: On optimizing replica migration in distributed cloud storage systems. In: 2015 IEEE 4th international conference on cloud networking (CloudNet) (pp. 191–197). IEEE (2015).
Mansouri, Y., Toosi, A.N., Buyya, R.: Cost optimization for dynamic replication and migration of data in cloud data centers. IEEE Trans. Cloud Comput. (2017)
Tripathi, A., Rajappan, G.: Scalable transaction management for partially replicated data in cloud computing environments. In: 2016 IEEE 9th international conference on cloud computing (CLOUD) (pp. 260–267). IEEE (2016)
Al Nuaimi, K., Mohamed, N., Al Nuaimi, M., Al-Jaroodi, J.: Dual direction load balancing and partial replication storage of cloud DaaS. In: 2014 IEEE 3rd international conference on cloud networking (CloudNet) (pp. 432–437). IEEE (2014)
Shen, M., Kshemkalyani, A.D., Hsu, T.Y.: Causal consistency for geo-replicated cloud storage under partial replication. In: 2015 IEEE international parallel and distributed processing symposium workshop (pp. 509–518). IEEE (2015)
Mahmood, T., Puzhavakath Narayanan, S., Rao, S., Vijaykumar, T.N., Thottethodi, M.: Achieving causal consistency under partial replication for geo-distributed cloud storage (2016)
Soyjaudah, K.M.S., Catherine, P.C., Coonjah, I.: Evaluation of UDP tunnel for data replication in data centers and cloud environment. In: 2016 international conference on computing, communication and automation (ICCCA) (pp. 1217–1221). IEEE (2016)
Ramanan, M., Vivekanandan, P.: Efficient data integrity and data replication in cloud using stochastic diffusion method. Clust. Comput. 22(6), 14999–15006 (2019)
Tahir, M., Sardaraz, M., Mehmood, Z., Muhammad, S.: CryptoGA: a cryptosystem based on genetic algorithm for cloud data security. Clust. Comput. 1–14 (2020)
Salunkhe, S.D., Patil, D.: Division and replication for data with public auditing scheme for cloud storage. In: 2016 International Conference on Computing Communication Control and automation (ICCUBEA) (pp. 1–5). IEEE (2016)
Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42, 373–386 (2006)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 267–273 (2003)
Burnham, J.F.: Scopus database: a review. Biomed. Digital Librar. 3, 1 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1: Data-driven topic detection for replication schemes in cloud computing topics
Appendix 1: Data-driven topic detection for replication schemes in cloud computing topics
In this appendix, extracting frequently occurring topics from the set of selected papers utilizing a replication schemes is explained. To do so, non-negative matrix factorization (NMF) is utilized for topic modeling [149]. The main reason of selecting this approach is its lack of requiring critical parameters making it more encouraging for such applications [150]. NMF factorizes matrix X (n × m) that can be roughly rewritten by UV such that both U (n × k) and V (k × m) are non-negative.
In this study, we have used the selected research articles from scientific journals and conferences in order to detect topics related to replication schemes in cloud computing. Abstracts and keywords of the selected review papers are extracted from Scopus database [151]. We consider the title, abstract, and keyword of each article as a single document. Then, stop-word removing, punctuation removing, and remove words with a frequency less than two are done in pre-processing step, and then, as a result, a frequency matrix with TFIDF (i.e., matrix X which contains words weight for documents) was composed.
Regarding \({X}_{m.n}\cong {U}_{m.k}\times {V}_{k.n}\), we can assume X consists of k different clusters such that each row of U will represents one research paper (i.e. one document) and each element in the row represent the degree of each topic in the paper. On the other side of the multiplication, each row of V represents a topic and each, element of the row represents the degree of each term in the topic.
Applying this approach, we could easily detect the best topics with normalizing the values and thresholds. We did a number of experimental results over different number of clusters (k = 3, 4, 5, 6, 7, 8, 9, 10) and thresholds for values of V and U matrices (0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5). All the achieved results are manually checked with a subset of our papers (i.e., 20 papers are randomly chosen from the paper list) in order to verify and determine the best-achieved results. Finally, we informed that three clusters and 0.2 thresholds performed the best results regarding the classification accuracy. These three detected topics include data deduplication, data auditing, and replication handling. Therefore, these three topics are considered as the main categories of replication schemes.
Rights and permissions
About this article
Cite this article
Shakarami, A., Ghobaei-Arani, M., Shahidinejad, A. et al. Data replication schemes in cloud computing: a survey. Cluster Comput 24, 2545–2579 (2021). https://doi.org/10.1007/s10586-021-03283-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03283-7