Skip to main content
Log in

A File Group Data Replication Algorithm for Data Grids

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

In recent years data grids have been deployed and grown in many scientific experiments and data centers. The deployment of such environments has allowed grid users to gain access to a large number of distributed data. Data replication is a key issue in a data grid and should be applied intelligently because it reduces data access time and bandwidth consumption for each grid site. Therefore this area will be very challenging as well as providing much scope for improvement. In this paper, we introduce a new dynamic data replication algorithm named Popular File Group Replication, PFGR which is based on three assumptions: first, users in a grid site (Virtual Organization) have similar interests in files and second, they have the temporal locality of file accesses and third, all files are read-only. Based on file access history and first assumption, PFGR builds a connectivity graph for a group of dependent files in each grid site and replicates the most popular group files to the requester grid site. After that, when a user of that grid site needs some files, they are available locally. The simulation results show that our algorithm increases performance by minimizing the mean job execution time and bandwidth consumption and avoids unnecessary replication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abdurrab, A.R., Xie, T.: FIRE: a file reunion based data replication strategy for data grids. In: 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid). pp. 215–223 (2010)

    Google Scholar 

  2. Andronikou, V., Mamouras, K., Tserpes, K., Kyriazis, D., Varvarigou, T.: Dynamic QoS-aware data replication in grid environments based on data “importance”. Futur. Gener. Comput. Syst. 28(3), 544–553 (2012)

    Article  Google Scholar 

  3. Bell, W.H., Cameron, D.G., Capozza, L., Millar, A.P., Stockinger, K., Zini, F: Simulation of dynamic grid replication strategies in OptorSim. International Journal of High performance Computing Application 17(4) (2003)

  4. Cameron D.G., Millar A.P., Nicholson, C.: OptorSim: a simulation tool for scheduling and replica optimization in data grids. In: Proceedings of Computing in High Energy and Nuclear Physics (2004)

    Google Scholar 

  5. Cameron, D.G., Schiaffino, R., Ferguson, J., Millar, A. P., Nicholson, C., Stockinger, K., Zini, F.: OptorSim v2.1 Installation and User Guide (2006)

  6. Chang, R.-S., Chang, H.-P., Wang, Y.-T.: A dynamic weighted data replication strategy in data grids. In: IEEE/ ACS International Conference on Computer Systems and Applications, AICCSA 2008. pp. 414–421 (2008)

    Google Scholar 

  7. Costantini, A., Gervasi, O., Zollo, F., Caprini, L.: User interaction and data management for large scale grid applications. Journal of Grid Computing, pp. 485–497 (2014)

  8. Fadaie, Z., Rahmani, A. M.: A new replica placement algorithm in data grid. Int. J. Comput. Sci. 9(2), 491–507 (2012)

    Google Scholar 

  9. Foster, I., Ranganathan, K.: Design and evaluation of dynamic replication strategies a high performance data grid. In: Proceedings of International Conference on Computing in High Energy and Nuclear Physics. China : s.n (2001)

    Google Scholar 

  10. Griffioen, J., Appleton, R.: Performance measurements of automatic prefetching. In: Parallel and Distributed Computing Systems, pp. 165–170 (1995)

  11. Holtman, K.: CMS data grid system overview and requirement (2001)

  12. Lee, M.-C., Leu, F.-Y., Chena, Y.-P.: PFRF: an adaptive data replication algorithm based on star-topology data grids. Futur. Gener. Comput. Syst. 28, 1045–1057 (2012)

    Article  Google Scholar 

  13. Mansouri, N., Dastghaibyfard, G. H., Mansouri, E.: s.l.: combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications. Article in Press (2013)

  14. Mohammad Khanli, L., Isazadeh, A., Shishavan, T. N.: PHFS: a dynamic replication method, to decrease access latency in the multi-tier data grid. Futur. Gener. Comput. Syst. 27, 233–244 (2011)

    Article  Google Scholar 

  15. Nukarapu, D.T., Tang, B., Wang, L., Shiyong, L.: Data replication in data intensive scientific applications with performance guarantee. IEEE Trans. Parallel Distrib. Syst. 22(8), 1299–1306 (2011)

    Article  Google Scholar 

  16. Ranganathan K., Foster, I.: Identifying dynamic replication strategies for a high-performance data grid. In: Proceeding of the Second International Workshop on Grid Computing. Denver : s.n., pp. 75–86 (2001)

    Google Scholar 

  17. Saadat, N., Rahmani, A. M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Futur. Gener. Comput. Syst. 28(4), 666–681 (2012)

    Article  Google Scholar 

  18. Sashi, K., Thanamani, A.S.: Dynamic replication in a data grid using a modified BHR region based algorithm. Futur. Gener. Comput. Syst. 27(2), 202–210 (2011)

    Article  Google Scholar 

  19. Taheri, J., Lee, Y.C., Zomaya, A.Y., Siegel, H.J.: A Bee Colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Comput. Oper. Res. 40(6), 1564–1578 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  20. Tanenbaum, A. S., Van Steen, M.: Distributed systems: Principles and Paradigms, 2e s.l.: Prentice Hall (2007)

  21. Wang, S.H., Li, K., Mei, J., Xiao, G.: A reliability-aware task scheduling algorithm based on replication on heterogeneous computing systems. Journal of Grid Computing (2016)

  22. Zhang, J., Li, Q., Zhou, W.: HDCache: a distributed cache system for real-time cloud services. Journal of Grid Computing, pp. 407–428 (2016)

  23. Zhang, P., Xie, K., Ma, X., Li, X., Sun, Y.: A replication strategy based on swarm intelligence in spatial data grid. In: 18th International Conference on Geoinformatics. pp. 1–5 (2010)

    Google Scholar 

  24. Zhao, W., Xianbin, X., Wang, Z., Zhang, Y., He, S.: A dynamic optimal replication strategy in data grid environment. In: International Conference on Internet Technology and Applications, pp. 1–4 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Masoud Rahmani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rahmani, A., Azari, L. & Daniel, H. A File Group Data Replication Algorithm for Data Grids. J Grid Computing 15, 379–393 (2017). https://doi.org/10.1007/s10723-017-9407-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-017-9407-1

Keywords

Navigation