Skip to main content

Advertisement

Log in

Data replication schemes in cloud computing: a survey

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In recent years, cloud storage systems have emerged as a promising technology for storing data blocks on various cloud servers. One of the main mechanisms in cloud storage systems is data replication, for which various solutions are proposed. Data replication's main target is achieving higher performance for data-intensive applications by addressing some critical challenges of this criterion, such as availability, reliability, security, bandwidth, and response time of data access. However, to the best of the author’s knowledge, there is no systematic, comprehensive, and complete survey in the cloud data replication despite its impacts and maturity. This paper presents a comprehensive survey and classification of state-of-the-art data replication schemes among different existing cloud computing solutions in the form of a classical classification to define current schemes on the topic and present open issues. The presented classification comprises three main classes; data deduplication schemes, data auditing schemes, and data handling schemes. A complete comparative comparison of the replication schemes highlights their main properties, such as utilized classes, type of the scheme, the place of implementation, evaluation tools, and their advantages and weaknesses. Finally, open issues and future uncovered or weakly covered research challenges are discussed, and the survey will be concluded.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Wang, L., Ranjan, R., Chen, J., Benatallah, B. (eds.): Cloud Computing: Methodology, Systems, and Applications. CRC Press, Boca Raton (2017)

    Google Scholar 

  2. Ghobaei-Arani, M., Souri, A.: LP-WSC: a linear programming approach for web service composition in geographically distributed cloud environments. J. Supercomput. 75(5), 2603–2628 (2019)

    Article  Google Scholar 

  3. Rittinghouse, J.W., Ransome, J.F.: Cloud Computing: Implementation, Management, and Security. CRC Press (2016)

  4. Ghobaei-Arani, M., Souri, A., Baker, T., Hussien, A.: ControCity: an autonomous approach for controlling elasticity using buffer Management in Cloud Computing Environment. IEEE Access 7, 106912–106924 (2019)

    Article  Google Scholar 

  5. Aslanpour, M.S., Ghobaei-Arani, M., Toosi, A.N.: Auto-scaling web applications in clouds: a cost-aware approach. J. Netw. Comput. Appl. 95, 26–41 (2017)

    Article  Google Scholar 

  6. Mokadem, R., Hameurlain, A.: A data replication strategy with tenant performance and provider economic profit guarantees in Cloud data centers. J. Syst. Softw. 159, 110447 (2020)

    Article  Google Scholar 

  7. Lee, C.A., Bohn, R.B., Michel, M.: The NIST cloud federation reference architecture 5. NIST Spec. Publ. 500, 332 (2020)

    Google Scholar 

  8. Ghobaei-Arani, M., Shahidinejad, A.: An efficient resource provisioning approach for analyzing cloud workloads: a metaheuristic-based clustering approach. J. Supercomput. 77(1), 711–750 (2021)

    Article  Google Scholar 

  9. Escamilla-Ambrosio, P.J., Rodríguez-Mota, A., Aguirre-Anaya, E., Acosta-Bermejo, R., Salinas-Rosales, M.: Distributing Computing in the internet of things: cloud, fog and edge computing overview. In NEO 2016 (pp. 87–115). Springer, Cham (2018)

  10. Shakarami, A., Ghobaei-Arani, M. and Shahidinejad, A.: A survey on the computation offloading approaches in mobile edge computing: a machine learning-based perspective. Computer Networks, p. 107496 (2020)

  11. Shakarami, A., Ghobaei-Arani, M., Masdari, M., Hosseinzadeh, M.: A survey on the computation offloading approaches in mobile edge/cloud computing environment: a stochastic-based perspective. J Grid Comput 1–33 (2020)

  12. Shakarami, A., Shahidinejad, A., Ghobaei-Arani, M.: A review on the computation offloading approaches in mobile edge computing: a game-theoretic perspective. Practice and Experience, Software (2020)

    Google Scholar 

  13. Aslanpour, M.S., Ghobaei-Arani, M., Heydari, M., Mahmoudi, N.: LARPA: A learning automata-based resource provisioning approach for massively multiplayer online games in cloud environments. Int. J. Commun. Syst. 32(14), e4090 (2019)

    Article  Google Scholar 

  14. Wang, B., Wang, C., Huang, W., Song, Y., Qin, X.: A survey and taxonomy on task offloading for edge-cloud computing. IEEE Access 8, 186080–186101 (2020)

    Article  Google Scholar 

  15. Shakarami, A., Shahidinejad, A., Ghobaei-Arani, M.: An autonomous computation offloading strategy in Mobile Edge Computing: A deep learning-based hybrid approach. J. Netw. Comput. Appl. 178, 102974 (2021)

    Article  Google Scholar 

  16. Uthayakumar, J., Vengattaraman, T., Dhavachelvan, P.: A survey on data compression techniques: from the perspective of data quality, coding schemes, data type and applications. J. King Saud Univ. Comput. Inf. Sci. 33, 119 (2018)

    Google Scholar 

  17. Widodo, R.N., Lim, H., Atiquzzaman, M.: SDM: smart deduplication for mobile cloud storage. Futur. Gener. Comput. Syst. 70, 64–73 (2017)

    Article  Google Scholar 

  18. Kaur, R., Chana, I., Bhattacharya, J.: Data deduplication techniques for efficient cloud storage management: a systematic review. J. Supercomput. 74(5), 2035–2085 (2018)

    Article  Google Scholar 

  19. Aslanpour, M.S., Toosi, A.N., Gaire, R., Cheema, M.A.: Auto-scaling of Web Applications in Clouds: A Tail Latency Evaluation. In: 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC) (pp. 186–195). IEEE (2020)

  20. Milani, B.A., Navimipour, N.J.: A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J. Netw. Comput. Appl. 64, 229–238 (2016)

    Article  Google Scholar 

  21. Ghobaei-Arani, M.: A workload clustering based resource provisioning mechanism using Biogeography based optimization technique in the cloud based systems. Soft Comput. 25, 1–18 (2020)

    Google Scholar 

  22. Ghobaei-Arani, M., Khorsand, R., Ramezanpour, M.: An autonomous resource provisioning framework for massively multiplayer online games in cloud environment. J. Netw. Comput. Appl. 142, 76–97 (2019)

    Article  Google Scholar 

  23. Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic approach for resource provisioning of cloud services. Clust. Comput. 19(3), 1017–1036 (2016)

    Article  Google Scholar 

  24. Aslanpour, M.S., Dashti, S.E., Ghobaei-Arani, M., Rahmanian, A.A.: Resource provisioning for cloud applications: a 3-D, provident and flexible approach. J. Supercomput. 74(12), 6470–6501 (2018)

    Article  Google Scholar 

  25. Al Ridhawi, I., Al Ridhawi, Y.: A cache-node selection mechanism for data replication and service composition within cloud-based systems. In: 2017 Ninth International Conference on Ubiquitous and Future Networks (ICUFN) (pp. 726–731). IEEE (2017)

  26. Zheng, Z., Lyu, M.R.: A distributed replication strategy evaluation and selection framework for fault tolerant web services. In: 2008 IEEE international conference on web services (pp. 145–152). IEEE (2008)

  27. Ghobaei‐Arani, M., Rahmanian, A.A., Souri, A., Rahmani, A.M.: A moth‐flame optimization algorithm for web service composition in cloud computing: simulation and verification. Software Pract. Exper. 48(10), 1865–1892 (2018). https://doi.org/10.1002/spe.2598

    Article  Google Scholar 

  28. Stiemer, A., Fetai, I., Schuldt, H.: Comparison of eager and quorum-based replication in a cloud environment. In: 2015 IEEE international conference on big data (big data) (pp. 1738–1748). IEEE (2015)

  29. Peluso, S., Romano, P., Quaglia, F.: Score: a scalable one-copy serializable partial replication protocol. In: ACM/IFIP/USENIX international conference on distributed systems platforms and open distributed processing (pp. 456–475). Springer, Berlin (2012)

  30. Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic resource provisioning approach for service-based cloud applications: a hybrid approach. Future Gener. Comput. Syst. 78, 191–210 (2018). https://doi.org/10.1016/j.future.2017.02.022

    Article  Google Scholar 

  31. Huang, W., Wang, H., Zhang, Y., Zhang, S.: A novel cluster computing technique based on signal clustering and analytic hierarchy model using hadoop. Clust. Comput. 22(6), 13077–13084 (2019)

    Article  Google Scholar 

  32. Arthanari, J., Baskaran, R.: Enhancement of video streaming analysis using cluster-computing framework. Clust. Comput. 22(2), 3771–3781 (2019)

    Article  Google Scholar 

  33. Kimura, M., Zhao, X, Nakagawa, T.: Reliability analysis of a cloud computing system with replication: using Markov renewal processes. In: Principles of performance and reliability modeling and evaluation (pp. 401–423). Springer, Cham (2016)

  34. Aslanpour, M.S., Gill, S.S., Toosi, A.N.: Performance evaluation metrics for cloud, fog and edge computing: a review, taxonomy, benchmarks and standards for future research. Internet Things (2020). https://doi.org/10.1016/j.iot.2020.100273

  35. Ulabedin, Z., Nazir, B.: Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform. J. Supercomput. 10, 1–30 (2021). https://doi.org/10.1007/s11227-020-03541-2

    Article  Google Scholar 

  36. Shahidinejad, A., Ghobaei-Arani, M., Masdari, M.: Resource provisioning using workload clustering in cloud computing environment: a hybrid approach. Cluster Comput 24(1), 319–342 (2021). https://doi.org/10.1007/s10586-020-03107-0

    Article  Google Scholar 

  37. Aslanpour, M.S., Toosi, A.N., Taheri, J., Gaire, R.: AutoScaleSim: A simulation toolkit for auto-scaling Web applications in clouds. Simul. Model. Pract. Theory 108, 102245 (2021)

    Article  Google Scholar 

  38. Yi, M., Wei, J., Song, L.: Efficient integrity verification of replicated data in cloud computing system. Comput. Secur. 65, 202–212 (2017)

    Article  Google Scholar 

  39. Mansouri, N., Javidi, M.M.: A review of data replication based on meta-heuristics approach in cloud computing and data grid. Soft Comput. 24, 14503 (2020)

    Article  Google Scholar 

  40. Gill, N.K., Singh, S.: A dynamic, cost-aware, optimized data replication strategy for heterogeneous cloud data centers. Futur. Gener. Comput. Syst. 65, 10–32 (2016)

    Article  Google Scholar 

  41. Li, C., Wang, C., Tang, H., Luo, Y.: Scalable and dynamic replica consistency maintenance for edge-cloud system. Futur. Gener. Comput. Syst. 101, 590–604 (2019)

    Article  Google Scholar 

  42. Eck, O., Schaefer, D.: A semantic file system for integrated product data management. Adv. Eng. Inform. 25(2), 177–184 (2011)

    Article  Google Scholar 

  43. Sehgal, P., Basu, S., Srinivasan, K., Voruganti, K.: An empirical study of file systems on NVM. In: 2015 31st symposium on mass storage systems and technologies (MSST) (pp. 1–14). IEEE (2015)

  44. Fu, S., He, L., Huang, C., Liao, X., Li, K.: Performance optimization for managing massive numbers of small files in distributed file systems. IEEE Trans. Parallel Distrib. Syst. 26(12), 3433–3448 (2014)

    Article  Google Scholar 

  45. Slimani, S., Hamrouni, T., Ben Charrada, F.: Service-oriented replication strategies for improving quality-of-service in cloud computing: a survey. Clust. Comput. 24, 361 (2020)

    Article  Google Scholar 

  46. Li, C., Song, M., Zhang, M., Luo, Y.: Effective replica management for improving reliability and availability in edge-cloud computing environment. J. Parallel Distrib. Comput. 143, 107 (2020)

    Article  Google Scholar 

  47. Shao, Y., Li, C., Tang, H.: A data replica placement strategy for IoT workflows in collaborative edge and cloud environments. Comput. Netw. 148, 46–59 (2019)

    Article  Google Scholar 

  48. Dabas, C., Aggarwal, J.: Delayed replication algorithm with dynamic threshold for cloud datacenters. In: Applications of computing, automation and wireless systems in electrical engineering (pp. 625–637). Springer, Singapore (2019)

  49. Ebadi, Y., Jafari Navimipour, N.: An energy-aware method for data replication in the cloud environments using a Tabu search and particle swarm optimization algorithm. Concurr. Comput. 31(1), e4757 (2019)

    Article  Google Scholar 

  50. Mazumdar, S., Seybold, D., Kritikos, K., Verginadis, Y.: A survey on data storage and placement methodologies for cloud-big data ecosystem. J. Big Data 6(1), 15 (2019)

    Article  Google Scholar 

  51. Hsieh, H.C., Chiang, M.L.: The incremental load balance cloud algorithm by using dynamic data deployment. J. Grid Comput. 17(3), 553–575 (2019)

    Article  Google Scholar 

  52. John, S.N., Mirnalinee, T.T.: A novel dynamic data replication strategy to improve access efficiency of cloud storage. Information Systems and e-Business Management, pp. 1–22 (2019)

  53. Mohammadi, B., Navimipour, N.J.: Data replication mechanisms in the peer-to-peer networks. Int. J. Commun Syst 32(14), e3996 (2019)

    Article  Google Scholar 

  54. Mansouri, N., Javidi, M.M., Zade, B.M.H.: Using data mining techniques to improve replica management in cloud environment. Soft Comput. 24, 7335 (2019)

    Article  Google Scholar 

  55. Campêlo, R.A., Casanova, M.A., Guedes, D.O., Laender, A.H.: A brief survey on replica consistency in cloud environments. J. Internet Serv. Appl. 11(1), 1–13 (2020)

    Article  Google Scholar 

  56. Khelaifa, A., Benharzallah, S., Kahloul, L., Euler, R., Laouid, A., Bounceur, A.: A comparative analysis of adaptive consistency approaches in cloud storage. J. Parallel Distrib. Comput. 129, 36–49 (2019)

    Article  Google Scholar 

  57. Li, C., Bai, J., Chen, Y., Luo, Y.: Resource and replica management strategy for optimizing financial cost and user experience in edge cloud computing system. Inf. Sci. 516, 33–55 (2020)

    Article  MathSciNet  Google Scholar 

  58. Li, C., Wang, Y., Chen, Y., Luo, Y.: Energy-efficient fault-tolerant replica management policy with deadline and budget constraints in edge-cloud environment. J. Netw. Comput. Appl. 143, 152–166 (2019)

    Article  Google Scholar 

  59. Guo, J., Li, C., Luo, Y.: Fast replica recovery and adaptive consistency preservation for edge cloud system. Soft Comput. 24, 14943 (2020)

    Article  Google Scholar 

  60. Luo, L., Xing, L., Levitin, G.: Optimizing dynamic survivability and security of replicated data in cloud systems under co-residence attacks. Reliab. Eng. Syst. Saf. 192, 106265 (2019)

    Article  Google Scholar 

  61. Mansouri, N., Rafsanjani, M.K., Javidi, M.M.: DPRS: A dynamic popularity aware replication strategy with parallel download scheme in cloud environments. Simul. Model. Pract. Theory 77, 177–196 (2017)

    Article  Google Scholar 

  62. Li, K., Tang, Y., Chen, J., Yuan, Z., Xu, C., Xu, J.: Cost-effective data feeds to blockchains via workload-adaptive data replication. In: Proceedings of the 21st international middleware conference (pp. 371–385) (2020)

  63. Sun, S., Yao, W., Qiao, B., Zong, M., He, X., Li, X.: RRSD: a file replication method for ensuring data reliability and reducing storage consumption in a dynamic Cloud-P2P environment. Futur. Gener. Comput. Syst. 100, 844–858 (2019)

    Article  Google Scholar 

  64. Limam, S., Mokadem, R., Belalem, G.: Data replication strategy with satisfaction of availability, performance and tenant budget requirements. Clust. Comput. 22(4), 1199–1210 (2019)

    Article  Google Scholar 

  65. Hema, S., Kangaiammal, A. (2019) Distributed storage hash algorithm (DSHA) for file-based deduplication in cloud computing. In: International conference on computer networks and inventive communication technologies (pp. 572–581). Springer, Cham (2019)

  66. Rani, I.S., Venkateswarlu, B.: A systematic review of different data compression technique of cloud big sensing data. In: International conference on computer networks and inventive communication technologies (pp. 222–228). Springer, Cham (2019)

  67. Yin, J., Tang, Y., Deng, S., Bangpeng, Z., Zomaya, A.: MUSE: a multi-tierd and SLA-driven deduplication framework for cloud storage systems. IEEE Trans. Comput. 70, 759 (2020)

    Article  Google Scholar 

  68. Pooranian, Z., Shojafar, M., Garg, S., Taheri, R., Tafazolli, R.: LEVER: secure deduplicated cloud storage with encrypted two-party interactions in cyber-physical systems. IEEE Trans. Ind. Inf. (2020). https://doi.org/10.1109/TII.2020.3021013

    Article  Google Scholar 

  69. Saharan, S., Somani, G., Gupta, G., Verma, R., Gaur, M.S., Buyya, R.: QuickDedup: efficient VM deduplication in cloud computing environments. J. Parallel Distrib. Comput. 139, 18–31 (2020)

    Article  Google Scholar 

  70. Kavya, V., Sumathi, R., Shwetha, A.N.: A survey on data auditing approaches to preserve privacy and data integrity in cloud computing. In: International conference on sustainable communication networks and application (pp. 108–118). Springer, Cham (2019)

  71. Dwivedi, A.K., Kumar, N., Pathela, M.: Distributed and lazy auditing of outsourced data. In: International conference on distributed computing and internet technology (pp. 364–379). Springer, Cham (2020)

  72. Lin, Y., Li, J., Jia, X., Ren, K.: Multiple-replica integrity auditing schemes for cloud data storage. Concurr. Comput. (2019)

  73. Castro-Medina, F., Rodríguez-Mazahua, L., Abud-Figueroa, M.A., Romero-Torres, C., Reyes-Hernández, L.Á., Alor-Hernández, G.: Application of data fragmentation and replication methods in the cloud: a review. In: 2019 international conference on electronics, communications and computers (CONIELECOMP) (pp. 47–54). IEEE (2019)

  74. Souravlas, S., Sifaleras, A.: Trends in data replication strategies: a survey. Int. J. Parallel Emergent Distrib. Syst. 34(2), 222–239 (2019)

    Article  Google Scholar 

  75. Dabas, C., Aggarwal, J.: An intensive review of data replication algorithms for cloud systems. In: Emerging research in computing, information, communication and applications (pp. 25–39). Springer, Singapore (2019)

  76. Tran, T., Pham, D.T., Duong, Q., Mai, A.: An adaptive hash-based text deduplication for ADS-B data-dependent trajectory clustering problem. In: 2019 IEEE-RIVF international conference on computing and communication technologies (RIVF) (pp. 1–6). IEEE (2019)

  77. Wu, S., Li, K.C., Mao, B., Liao, M.: DAC: improving storage availability with deduplication-assisted cloud-of-clouds. Futur. Gener. Comput. Syst. 74, 190–198 (2017)

    Article  Google Scholar 

  78. Zhou, Y., Feng, D., Hua, Y., Xia, W., Fu, M., Huang, F., Zhang, Y.: A similarity-aware encrypted deduplication scheme with flexible access control in the cloud. Futur. Gener. Comput. Syst. 84, 177–189 (2018)

    Article  Google Scholar 

  79. Tang, Y., Yin, J., Deng, S. and Li, Y.: DIODE: dynamic inline-offline de duplication providing efficient space-saving and read/write performance for primary storage systems. In: 2016 IEEE 24th international symposium on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS) (pp. 481–486). IEEE (2016)

  80. Li, S., Xu, C., Zhang, Y.: CSED: client-side encrypted deduplication scheme based on proofs of ownership for cloud storage. J. Inf. Secur. Appl. 46, 250–258 (2019)

    Google Scholar 

  81. Widodo, R.N., Lim, H., Atiquzzaman, M.: A new content-defined chunking algorithm for data deduplication in cloud storage. Futur. Gener. Comput. Syst. 71, 145–156 (2017)

    Article  Google Scholar 

  82. Jiang, T., Chen, X., Ma, J.: Public integrity auditing for shared dynamic cloud data with group user revocation. IEEE Trans. Comput. 65(8), 2363–2373 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  83. Ding, R., Xu, Y., Cui, J., Zhong, H.: A public auditing protocol for cloud storage system with intrusion-resilience. IEEE Syst. J. 14(1), 633–644 (2019)

    Article  Google Scholar 

  84. Hu, C., Xu, Y., Liu, P., Yu, J., Guo, S., Zhao, M.: Enabling cloud storage auditing with key-exposure resilience under continual key-leakage. Inf. Sci. 520, 15–30 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  85. Patil, J.M., Chaudhari, S.S.: Efficient privacy preserving and dynamic public auditing for storage cloud. In: 2019 international conference on nascent technologies in engineering (ICNTE) (pp. 1–6). IEEE (2019)

  86. Wang, H., Qin, H., Zhao, M., Wei, X., Shen, H., Susilo, W.: Blockchain-based fair payment smart contract for public cloud storage auditing. Inf. Sci. 519, 348–362 (2020)

    Article  MathSciNet  Google Scholar 

  87. Yu, H., Cai, Y., Sinnott, R.O., Yang, Z.: ID-based dynamic replicated data auditing for the cloud. Concurr. Comput. 31(11), e5051 (2019)

    Article  Google Scholar 

  88. Ali, S.A., Ramakrishnan, M.: Secure provable data possession scheme with replication support in the cloud using Tweaks. Clust. Comput. 22(1), 1113–1123 (2019)

    Article  Google Scholar 

  89. Zhang, Y., Ni, J., Tao, X., Wang, Y., Yu, Y.: Provable multiple replication data possession with full dynamics for secure cloud storage. Concurr. Comput. 28(4), 1161–1173 (2016)

    Article  Google Scholar 

  90. Li, C., Chen, Y., Tan, P., Yang, G.: Towards comprehensive provable data possession in cloud computing. Wuhan Univ. J. Nat. Sci. 18(3), 265–271 (2013)

    Article  Google Scholar 

  91. Abbes, H., Louati, T., Cérin, C.: Dynamic replication factor model for Linux containers-based cloud systems. J. Supercomput. 76, 7219 (2020)

    Article  Google Scholar 

  92. Ao, W.C., Psounis, K.: Resource-constrained replication strategies for hierarchical and heterogeneous tasks. IEEE Trans. Parallel Distrib. Syst. 31(4), 793–804 (2019)

    Article  Google Scholar 

  93. Boru, D., Kliazovich, D., Granelli, F., Bouvry, P., Zomaya, A.Y.: Models for efficient data replication in cloud computing datacenters. In: 2015 IEEE international conference on communications (ICC) (pp. 6056–6061). IEEE (2015)

  94. Zou, X., Pan, J., Du, W., Chen, S.: Elastic database replication in the cloud. In: International conference on algorithms and architectures for parallel processing (pp. 667–681). Springer, Cham (2015)

  95. Alghamdi, M., Tang, B., Chen, Y.: Profit-based file replication in data intensive cloud data centers. In: 2017 IEEE International Conference on Communications (ICC) (pp. 1–7). IEEE (2017)

  96. Armknecht, F., Barman, L., Bohli, J.M., Karame, G.O.: Mirror: enabling proofs of data replication and retrievability in the cloud. In: 25th {USENIX} security symposium ({USENIX} security 16) (pp. 1051–1068) (2016)

  97. Nivetha, N.K., Vijayakumar, D.: Modeling fuzzy based replication strategy to improve data availabiity in cloud datacenter. In: 2016 international conference on computing technologies and intelligent data engineering (ICCTIDE'16) (pp. 1–6). IEEE (2016)

  98. Jayalakshmi, D.S., Ranjana, T.R., Ramaswamy, S.: Dynamic data replication across geo-distributed cloud data centres. In: International conference on distributed computing and internet technology (pp. 182–187). Springer, Cham (2016)

  99. Jayalakshmi, D.S., TP, RR, Srinivasan, R.:, Dynamic data replication strategy in cloud environments. In: 2015 fifth international conference on advances in computing and communications (ICACC) (pp. 102–105). IEEE (2015)

  100. Rajalakshmi, A., Vijayakumar, D., Srinivasagan, K.G.: An improved dynamic data replica selection and placement in cloud. In 2014 international conference on recent trends in information technology (pp. 1–6). IEEE (2014)

  101. Kumar, K.A., Quamar, A., Deshpande, A., Khuller, S.: SWORD: workload-aware data placement and replica selection for cloud data management systems. VLDB J. 23(6), 845–870 (2014)

    Article  Google Scholar 

  102. Zhang, T.: A QoS-enhanced data replication service in virtualised cloud environments. Int. J. Netw. Virtual Organ. 22(1), 1–16 (2020)

    Article  Google Scholar 

  103. Mansouri, N.: Adaptive data replication strategy in cloud computing for performance improvement. Front. Comp. Sci. 10(5), 925–935 (2016)

    Article  Google Scholar 

  104. Huang, Y., Huang, J., Liu, C., Zhang, C.: PFPMine: a parallel approach for discovering interacting data entities in data-intensive cloud workflows. Futur. Gener. Comput. Syst. 113, 474–487 (2020)

    Article  Google Scholar 

  105. Atrey, A., Van Seghbroeck, G., Mora, H., De Turck, F., Volckaert, B.: SpeCH: a scalable framework for data placement of data-intensive services in geo-distributed clouds. J. Netw. Comput. Appl. 142, 1–14 (2019)

    Article  Google Scholar 

  106. Ko, A.C., Zaw, W.T.: Fault tolerant erasure coded replication for HDFS based cloud storage. In: 2014 IEEE fourth international conference on big data and cloud computing (pp. 104–109). IEEE (2014)

  107. Janpet, J., Wen, Y.F.: Reliable and available data replication planning for cloud storage. In: 2013 IEEE 27th international conference on advanced information networking and applications (AINA) (pp. 772–779). IEEE (2013)

  108. Lin, J.W., Chen, C.H., Chang, J.M.: QoS-aware data replication for data-intensive applications in cloud computing systems. IEEE Trans. Cloud Comput. 1(1), 101–115 (2013)

    Article  Google Scholar 

  109. Fu, S., He, L., Liao, X., Huang, C.: Developing the Cloud-integrated data replication framework in decentralized online social networks. J. Comput. Syst. Sci. 82(1), 113–129 (2016)

    Article  MathSciNet  Google Scholar 

  110. Cidon, A., Stutsman, R., Rumble, S., Katti, S., Ousterhout, J., Rosenblum, M.: MinCopysets: derandomizing replication in cloud storage. In: Proc. 10th USENIX Symp. NSDI (pp. 1–5) (2013)

  111. Xie, F., Yan, J., Shen, J.: Towards cost reduction in cloud-based workflow management through data replication. In: 2017 fifth international conference on advanced cloud and big data (CBD) (pp. 94–99). IEEE (2017)

  112. Djebbar, E.I., Belalem, G.: Optimization of tasks scheduling by an efficacy data placement and replication in cloud computing. In: International conference on algorithms and architectures for parallel processing (pp. 22–29). Springer, Cham (2013)

  113. Ibrahim, I.A., Dai, W., Bassiouni, M.: Intelligent data placement mechanism for replicas distribution in cloud storage systems. In: 2016 IEEE international conference on smart cloud (smartcloud) (pp. 134–139). IEEE (2016)

  114. Dai, W., Ibrahim, I., Bassiouni, M.: A new replica placement policy for hadoop distributed file system. In: 2016 IEEE 2nd international conference on big data security on cloud (bigdatasecurity), IEEE international conference on high performance and smart computing (HPSC), and IEEE international conference on intelligent data and security (IDS) (pp. 262–267). IEEE (2016)

  115. Zhang, H., Lin, B., Liu, Z., Guo, W.: Data replication placement strategy based on bidding mode for cloud storage cluster. In: 2014 11th web information system and application conference (pp. 207–212). IEEE (2014)

  116. Khalajzadeh, H., Yuan, D., Grundy, J., Yang, Y.: Improving cloud-based online social network data placement and replication. In: 2016 IEEE 9th international conference on cloud computing (CLOUD) (pp. 678–685). IEEE (2016)

  117. Li, C., Wang, Y., Tang, H., Luo, Y.: Dynamic multi-objective optimized replica placement and migration strategies for SaaS applications in edge cloud. Futur. Gener. Comput. Syst. 100, 921–937 (2019)

    Article  Google Scholar 

  118. da Silva, G.H.G., Holanda, M., Araujo, A.: Data replication policy in a cloud computing environment. In: 2016 11th Iberian conference on information systems and technologies (CISTI) (pp. 1–6). IEEE (2016)

  119. Bacis, E., di Vimercati, S.D.C., Foresti, S., Paraboschi, S., Rosa, M., Samarati, P.: Dynamic allocation for resource protection in decentralized cloud storage. In: 2019 IEEE global communications conference (GLOBECOM) (pp. 1–6). IEEE (2019)

  120. Khalili Azimi, S.: A Bee Colony (Beehive) based approach for data replication in cloud environments. In: Fundamental research in electrical engineering: the selected papers of the first international conference on fundamental research in electrical engineering (pp. 1039–1052). Springer Singapore (2019)

  121. Long, S.Q., Zhao, Y.L., Chen, W.: MORM: a multi-objective optimized replication management strategy for cloud storage cluster. J. Syst. Architect. 60(2), 234–244 (2014)

    Article  Google Scholar 

  122. Liu, J., Shen, H.: A popularity-aware cost-effective replication scheme for high data durability in cloud storage. In: 2016 IEEE international conference on big data (big data) (pp. 384–389). IEEE (2016)

  123. Gill, N.K., Singh, S.: Dynamic cost-aware re-replication and rebalancing strategy in cloud system. In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications (FICTA) 2014 (pp. 39–47). Springer, Cham (2015)

  124. Lee, H.C., Ahn, H.B., Lee, M.J.: Operation atomicity and storage replication in a collaborative middleware based on cloud storage. In: Future Information Technology (pp. 833–840). Springer, Berlin (2014)

  125. Satpute, S., Deora, B.S.: Efficient replication of cloud data for mobile devices. In: 2014 international conference on issues and challenges in intelligent computing techniques (ICICT) (pp. 299–302). IEEE (2014)

  126. Shwe, T., Aritsugi, M.: Proactive re-replication strategy in HDFS based cloud data center. In: Proceedings of the10th international conference on utility and cloud computing (pp. 121–130) (2017)

  127. Chen, L., Hoang, D.B.: Adaptive data replicas management based on active data-centric framework in cloud environment. In: 2013 IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (pp. 101–108). IEEE (2013)

  128. Huang, K., Li, D.: MRMS: a MOEA-based replication management scheme for cloud storage system. In: 2015 IEEE/CIC international conference on communications in China (ICCC) (pp. 1–6). IEEE (2015)

  129. Liu, X., Harwood, A., Karunasekera, S., Rubinstein, B., Buyya, R.: E-storm: Replication-based state management in distributed stream processing systems. In: 2017 46th international conference on parallel processing (ICPP) (pp. 571–580). IEEE (2017)

  130. Edwin, E.B., Umamaheswari, P., Thanka, M.R.: An efficient and improved multi-objective optimized replication management with dynamic and cost aware strategies in cloud computing data center. Clust. Comput. 22(5), 11119–11128 (2019)

    Article  Google Scholar 

  131. Casas, I., Taheri, J., Ranjan, R., Wang, L., Zomaya, A.Y.: A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems. Futur. Gener. Comput. Syst. 74, 168–178 (2017)

    Article  Google Scholar 

  132. Sookhtsaraei, R., Artin, J., Ghorbani, A., Faraahi, A., Adineh, H.: A locality-based replication manager for data cloud. Front. Inf. Technol. Electron. Eng. 17(12), 1275–1286 (2016)

    Article  Google Scholar 

  133. Stiemer, A., Fetai, I., Schuldt, H.: Analyzing the performance of data replication and data partitioning in the cloud: the BEOWULF approach. In: 2016 IEEE international conference on big data (Big Data) (pp. 2837–2846). IEEE (2016)

  134. Mostafa, N.: A dynamic approach for consistency service in cloud and fog environment. In: 2020 fifth international conference on fog and mobile edge computing (FMEC) (pp. 28–33). IEEE (2020)

  135. Luo, S., Hou, M., Zhan, S., Lyu, M., Li, M.: Consistency maintenance in replication: a novel strategy based on diamond topology in cloud storage. Chin. J. Electron. 26(1), 192–198 (2017)

    Article  Google Scholar 

  136. Basu, S., Pattnaik, P.K.: A consistency preservation based approach for data-intensive cloud computing environment. In: 2017 8th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–5). IEEE (2017)

  137. Wang, H., Li, J., Zhang, H., Zhou, Y.: Benchmarking replication and consistency strategies in cloud serving databases: Hbase and cassandra. In: Workshop on big data benchmarks, performance optimization, and emerging hardware (pp. 71–82). Springer, Cham (2014)

  138. Zhu, Z., Qi, G., Zheng, M., Sun, J., Chai, Y.: Blockchain based consensus checking in decentralized cloud storage. Simul. Model. Pract. Theory 102, 101987 (2020)

    Article  Google Scholar 

  139. Mseddi, A., Salahuddin, M.A., Zhani, M.F., Elbiaze, H., Glitho, R.H.: On optimizing replica migration in distributed cloud storage systems. In: 2015 IEEE 4th international conference on cloud networking (CloudNet) (pp. 191–197). IEEE (2015).

  140. Mansouri, Y., Toosi, A.N., Buyya, R.: Cost optimization for dynamic replication and migration of data in cloud data centers. IEEE Trans. Cloud Comput. (2017)

  141. Tripathi, A., Rajappan, G.: Scalable transaction management for partially replicated data in cloud computing environments. In: 2016 IEEE 9th international conference on cloud computing (CLOUD) (pp. 260–267). IEEE (2016)

  142. Al Nuaimi, K., Mohamed, N., Al Nuaimi, M., Al-Jaroodi, J.: Dual direction load balancing and partial replication storage of cloud DaaS. In: 2014 IEEE 3rd international conference on cloud networking (CloudNet) (pp. 432–437). IEEE (2014)

  143. Shen, M., Kshemkalyani, A.D., Hsu, T.Y.: Causal consistency for geo-replicated cloud storage under partial replication. In: 2015 IEEE international parallel and distributed processing symposium workshop (pp. 509–518). IEEE (2015)

  144. Mahmood, T., Puzhavakath Narayanan, S., Rao, S., Vijaykumar, T.N., Thottethodi, M.: Achieving causal consistency under partial replication for geo-distributed cloud storage (2016)

  145. Soyjaudah, K.M.S., Catherine, P.C., Coonjah, I.: Evaluation of UDP tunnel for data replication in data centers and cloud environment. In: 2016 international conference on computing, communication and automation (ICCCA) (pp. 1217–1221). IEEE (2016)

  146. Ramanan, M., Vivekanandan, P.: Efficient data integrity and data replication in cloud using stochastic diffusion method. Clust. Comput. 22(6), 14999–15006 (2019)

    Article  Google Scholar 

  147. Tahir, M., Sardaraz, M., Mehmood, Z., Muhammad, S.: CryptoGA: a cryptosystem based on genetic algorithm for cloud data security. Clust. Comput. 1–14 (2020)

  148. Salunkhe, S.D., Patil, D.: Division and replication for data with public auditing scheme for cloud storage. In: 2016 International Conference on Computing Communication Control and automation (ICCUBEA) (pp. 1–5). IEEE (2016)

  149. Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42, 373–386 (2006)

    Article  MATH  Google Scholar 

  150. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 267–273 (2003)

  151. Burnham, J.F.: Scopus database: a review. Biomed. Digital Librar. 3, 1 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa Ghobaei-Arani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Data-driven topic detection for replication schemes in cloud computing topics

Appendix 1: Data-driven topic detection for replication schemes in cloud computing topics

In this appendix, extracting frequently occurring topics from the set of selected papers utilizing a replication schemes is explained. To do so, non-negative matrix factorization (NMF) is utilized for topic modeling [149]. The main reason of selecting this approach is its lack of requiring critical parameters making it more encouraging for such applications [150]. NMF factorizes matrix X (n × m) that can be roughly rewritten by UV such that both U (n × k) and V (k × m) are non-negative.

In this study, we have used the selected research articles from scientific journals and conferences in order to detect topics related to replication schemes in cloud computing. Abstracts and keywords of the selected review papers are extracted from Scopus database [151]. We consider the title, abstract, and keyword of each article as a single document. Then, stop-word removing, punctuation removing, and remove words with a frequency less than two are done in pre-processing step, and then, as a result, a frequency matrix with TFIDF (i.e., matrix X which contains words weight for documents) was composed.

Regarding \({X}_{m.n}\cong {U}_{m.k}\times {V}_{k.n}\), we can assume X consists of k different clusters such that each row of U will represents one research paper (i.e. one document) and each element in the row represent the degree of each topic in the paper. On the other side of the multiplication, each row of V represents a topic and each, element of the row represents the degree of each term in the topic.

Applying this approach, we could easily detect the best topics with normalizing the values and thresholds. We did a number of experimental results over different number of clusters (k = 3, 4, 5, 6, 7, 8, 9, 10) and thresholds for values of V and U matrices (0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5). All the achieved results are manually checked with a subset of our papers (i.e., 20 papers are randomly chosen from the paper list) in order to verify and determine the best-achieved results. Finally, we informed that three clusters and 0.2 thresholds performed the best results regarding the classification accuracy. These three detected topics include data deduplication, data auditing, and replication handling. Therefore, these three topics are considered as the main categories of replication schemes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shakarami, A., Ghobaei-Arani, M., Shahidinejad, A. et al. Data replication schemes in cloud computing: a survey. Cluster Comput 24, 2545–2579 (2021). https://doi.org/10.1007/s10586-021-03283-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03283-7

Keywords