Abstract
Data grids deal with a huge amount of data regularly. It is a fundamental challenge to ensure efficient accesses to such widely distributed data sets. Creating replicas to a suitable site by data replication strategy can increase the system performance. It shortens the data access time and reduces bandwidth consumption. In this paper, a dynamic data replication mechanism called Latest Access Largest Weight (LALW) is proposed. LALW selects a popular file for replication and calculates a suitable number of copies and grid sites for replication. By associating a different weight to each historical data access record, the importance of each record is differentiated. A more recent data access record has a larger weight. It indicates that the record is more pertinent to the current situation of data access. A Grid simulator, OptorSim, is used to evaluate the performance of this dynamic replication strategy. The simulation results show that LALW successfully increases the effective network usage. It means that the LALW replication strategy can find out a popular file and replicates it to a suitable site without increasing the network burden too much.
Similar content being viewed by others
References
Bell WH, Cameron DG, Capozza L, Millar P, Stockinger K, Zini F (2003) OptorSim—a grid simulator for studying dynamic data replication strategies. Int J High Perform Comput Appl 17(4):403–416
Cameron DG, Schiaffino RC, Millar P, Nicholson C, Stockinger K, Zini F (2003) UK grid simulation with OptorSim. In: e-science all-hands meeting, Nottingham, UK, September 2003
Cameron DG, Schiaffino RC, Millar P, Nicholson C, Stockinger K, Zini F (2004) OptorSim: a grid simulator for replica optimisation. In: UK e-science all hands conference, 31 August–3 September 2004
Cameron DG, Schiaffino RC, Millar P, Nicholson C, Stockinger K, Zini F (2002) Evaluating scheduling and replica optimization strategies in OptorSim. In: Proceeding of 4th international workshop on grid computing (Grid2003), Phoenix, USA, November 2002
Cameron DG, Schiaffino RC, Ferguson J, Millar P, Nicholson C, Stockinger K, Zini F (2004) OptorSim v2.0 installation and user guide. http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html
Centioli C, Iannone F, Panella M, Vitale V, Bracco G, Guadagni R, Migliori S, Steffè M, Eccher S, Maslennikov A, Mililotti M, Molowny M, Palumbo G, Carboni M (2005) Wide area data replication in an ITER-relevant data environment. Fusion Eng Des 74(1–4):809–813
Chang R-S, Chang J-S, Lin S-Y (2007) Job scheduling and data replication on data grids. Future Gener Comput Syst 23(7):846–860
Chang R-S, Chen P-H (2007) Complete and fragmented replica selection and retrieval in data grids. Future Gener Comput Syst 23(4):536–546
Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S (2000) The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. J Netw Comput Appl 23:187–200
Čibej U, Slivnik B, Robič B (2005) The complexity of static data replication in data grids. Parallel Comput 31(8–9):900–912
Mat Deris M, Abawajy JH, Mamat A (2008) An efficient replicated data access approach for large-scale distributed systems. Future Gener Comput Syst 24(1):1–9
Forestiero A, Mastroianni C, Spezzano G (2008) QoS-based dissemination of content in grids. Future Gener Comput Syst 24(3):235–244
Foster I (2005) Globus toolkit version 4: software for service-oriented systems. In: IFIP international conference on network and parallel computing. Lecture notes in computer science, vol 3779. Springer, Berlin, pp 2–13
Hoschek W, Jaen-Martinez FJ, Samar A, Stockinger H, Stockinger K (2000) Data management in an international data grid project. In: Proceedings of the first IEEE/ACM international workshop on grid computing(GRID ’00), Bangalore, India, December 2000. Lecture notes in computer science, vol 1971. Springer, Berlin, pp 77–90
Lei M, Vrbsky SV, Hong X (2008) An on-line replication strategy to increase availability in data grids. Future Gener Comput Syst 24(2):85–98
Ranganathan K, Foster I (2002) Identifying dynamic replication strategies for a high-performance data grids. In: Proceeding of 3rd IEEE/ACM international workshop on grid computing, Denver, USA, November 2002. Lecture notes on computer science, vol 2242. Springer, Berlin, pp 75–86
Tang M, Lee B-S, Yeo C-K, Tang X (2005) Dynamic replication algorithms for the multi-tier data grid. Future Gener Comput Syst 21:775–790
Tang M, Lee B-S, Tang X, Yeo C-K (2006) The impact of data replication of job scheduling performance in the data grid. Future Gener Comput Syst 22:254–268
The European Data Grid Project. http://eu-datagrid.web.cern.ch/eu-datagrid/
The Large Hadron Collider. http://public.web.cern.ch/Public/en/LHC/LHC-en.html
European Organization for Nuclear Research (CERN). http://public.web.cern.ch/Public/Welcome.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chang, RS., Chang, HP. A dynamic data replication strategy using access-weights in data grids. J Supercomput 45, 277–295 (2008). https://doi.org/10.1007/s11227-008-0172-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0172-6