Abstract
In recent years data grids have been deployed and grown in many scientific experiments and data centers. The deployment of such environments has allowed grid users to gain access to a large number of distributed data. Data replication is a key issue in a data grid and should be applied intelligently because it reduces data access time and bandwidth consumption for each grid site. Therefore this area will be very challenging as well as providing much scope for improvement. In this paper, we introduce a new dynamic data replication algorithm named Popular File Group Replication, PFGR which is based on three assumptions: first, users in a grid site (Virtual Organization) have similar interests in files and second, they have the temporal locality of file accesses and third, all files are read-only. Based on file access history and first assumption, PFGR builds a connectivity graph for a group of dependent files in each grid site and replicates the most popular group files to the requester grid site. After that, when a user of that grid site needs some files, they are available locally. The simulation results show that our algorithm increases performance by minimizing the mean job execution time and bandwidth consumption and avoids unnecessary replication.
Similar content being viewed by others
References
Abdurrab, A.R., Xie, T.: FIRE: a file reunion based data replication strategy for data grids. In: 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid). pp. 215–223 (2010)
Andronikou, V., Mamouras, K., Tserpes, K., Kyriazis, D., Varvarigou, T.: Dynamic QoS-aware data replication in grid environments based on data “importance”. Futur. Gener. Comput. Syst. 28(3), 544–553 (2012)
Bell, W.H., Cameron, D.G., Capozza, L., Millar, A.P., Stockinger, K., Zini, F: Simulation of dynamic grid replication strategies in OptorSim. International Journal of High performance Computing Application 17(4) (2003)
Cameron D.G., Millar A.P., Nicholson, C.: OptorSim: a simulation tool for scheduling and replica optimization in data grids. In: Proceedings of Computing in High Energy and Nuclear Physics (2004)
Cameron, D.G., Schiaffino, R., Ferguson, J., Millar, A. P., Nicholson, C., Stockinger, K., Zini, F.: OptorSim v2.1 Installation and User Guide (2006)
Chang, R.-S., Chang, H.-P., Wang, Y.-T.: A dynamic weighted data replication strategy in data grids. In: IEEE/ ACS International Conference on Computer Systems and Applications, AICCSA 2008. pp. 414–421 (2008)
Costantini, A., Gervasi, O., Zollo, F., Caprini, L.: User interaction and data management for large scale grid applications. Journal of Grid Computing, pp. 485–497 (2014)
Fadaie, Z., Rahmani, A. M.: A new replica placement algorithm in data grid. Int. J. Comput. Sci. 9(2), 491–507 (2012)
Foster, I., Ranganathan, K.: Design and evaluation of dynamic replication strategies a high performance data grid. In: Proceedings of International Conference on Computing in High Energy and Nuclear Physics. China : s.n (2001)
Griffioen, J., Appleton, R.: Performance measurements of automatic prefetching. In: Parallel and Distributed Computing Systems, pp. 165–170 (1995)
Holtman, K.: CMS data grid system overview and requirement (2001)
Lee, M.-C., Leu, F.-Y., Chena, Y.-P.: PFRF: an adaptive data replication algorithm based on star-topology data grids. Futur. Gener. Comput. Syst. 28, 1045–1057 (2012)
Mansouri, N., Dastghaibyfard, G. H., Mansouri, E.: s.l.: combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications. Article in Press (2013)
Mohammad Khanli, L., Isazadeh, A., Shishavan, T. N.: PHFS: a dynamic replication method, to decrease access latency in the multi-tier data grid. Futur. Gener. Comput. Syst. 27, 233–244 (2011)
Nukarapu, D.T., Tang, B., Wang, L., Shiyong, L.: Data replication in data intensive scientific applications with performance guarantee. IEEE Trans. Parallel Distrib. Syst. 22(8), 1299–1306 (2011)
Ranganathan K., Foster, I.: Identifying dynamic replication strategies for a high-performance data grid. In: Proceeding of the Second International Workshop on Grid Computing. Denver : s.n., pp. 75–86 (2001)
Saadat, N., Rahmani, A. M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Futur. Gener. Comput. Syst. 28(4), 666–681 (2012)
Sashi, K., Thanamani, A.S.: Dynamic replication in a data grid using a modified BHR region based algorithm. Futur. Gener. Comput. Syst. 27(2), 202–210 (2011)
Taheri, J., Lee, Y.C., Zomaya, A.Y., Siegel, H.J.: A Bee Colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Comput. Oper. Res. 40(6), 1564–1578 (2013)
Tanenbaum, A. S., Van Steen, M.: Distributed systems: Principles and Paradigms, 2e s.l.: Prentice Hall (2007)
Wang, S.H., Li, K., Mei, J., Xiao, G.: A reliability-aware task scheduling algorithm based on replication on heterogeneous computing systems. Journal of Grid Computing (2016)
Zhang, J., Li, Q., Zhou, W.: HDCache: a distributed cache system for real-time cloud services. Journal of Grid Computing, pp. 407–428 (2016)
Zhang, P., Xie, K., Ma, X., Li, X., Sun, Y.: A replication strategy based on swarm intelligence in spatial data grid. In: 18th International Conference on Geoinformatics. pp. 1–5 (2010)
Zhao, W., Xianbin, X., Wang, Z., Zhang, Y., He, S.: A dynamic optimal replication strategy in data grid environment. In: International Conference on Internet Technology and Applications, pp. 1–4 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rahmani, A., Azari, L. & Daniel, H. A File Group Data Replication Algorithm for Data Grids. J Grid Computing 15, 379–393 (2017). https://doi.org/10.1007/s10723-017-9407-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-017-9407-1