Abstract
Data grids support access to widely distributed storage for large numbers of users accessing potentially many large files. Efficient access is hindered by the high latency of the Internet. To improve access time, replication at nearby sites may be used. Replication also provides high availability, decreased bandwidth use, enhanced fault tolerance, and improved scalability. Resource availability, network latency, and user requests in a grid environment may vary with time. Any replica placement strategy must be able to adapt to such dynamic behavior. In this paper, we describe a new dynamic replica placement algorithm, Popularity Based Replica Placement (PBRP), for hierarchical data grids which is guided by file “popularity”. Our goal is to place replicas close to clients to reduce data access time while still using network and storage resources efficiently. The effectiveness of PBRP depends on the selection of a threshold value related to file popularity. We also present Adaptive-PBRP (APBRP) that determines this threshold dynamically based on data request arrival rates. We evaluate both algorithms using simulation. Results for a range of data access patterns show that our algorithms can shorten job execution time significantly and reduce bandwidth consumption compared to other dynamic replication methods.
Similar content being viewed by others
References
Allcock B, Bester J, Bresnahan J, Chervenak AL, Foster I, Kesselman C, Meder S, Nefedova V, Quesnal D, Tuecke S (2002) Data management and transfer in high performance computational grid environments. Parallel Comput J 28(3):749–771
Allcock W, Foster I, Nefedova V, Chervenak A, Deelman E, Kesselman C, Lee J, Sim A, Shoshani A, Drach B, Williams D (2001) High-performance remote access to climate simulation data: a challenge problem for data grid technologies. In: Proceedings of the supercomputing, 2001, pp 46–60
Bell W, Cameron D, Capozza L, Millar A, Stockinger K, Zini F (2002) Simulation of dynamic grid replication strategies in optorsim. In: Proceedings of the 3rd international IEEE workshop on grid computing (Grid’2002), 2002, pp 46–57
Bell W, Cameron D, Capozza L, Millar P, Stockinger K, Zini F (2003) Optorsim—a grid simulator for studying dynamic data replication strategies. Int J High Perform Comput Appl 17:403–416
Bell WH, Cameron DG, Carvajal-Schiaffino R, Millar AP, Stockinger K, Zini F (2003) Evaluation of an economy-based file replication strategy for a data grid. In: Proceedings of the 3rd IEEE/ACM international symposium on cluster computing and the grid, 2003, pp 667–674
Foster I, Alpert E, Chervenak A, Drach B, Kesselman C, Nefedova V, Middleton D, Shoshani A, Sim A, Williams D (2001) The earth system grid: turning climate datasets into community resources. In: Proceedings of the American meteorological society conference, 2001
Holtman K (2001) CMS Data grid system overview and requirements. CMS Experiment Note 2001/037, CERN
Kesselman C, Foster I (1998) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann, San Mateo
LHC Computing Grid (2009) http://lcg.web.cern.ch/lcg/. Distributed production environment for physics data processing
Lin Y, Liu P, Wu J (2006) Optimal placement of replicas in data grid environments with locality assurance. In: Proceedings of the 12th international conference on parallel and distributed systems (ICPADS’06), 2006, vol 1, pp 465–474
Park S, Kim J, Ko Y, Yoon W (2003) Dynamic data grid replication strategy based on Internet hierarchy. In: Proceedings of the second international workshop on grid and cooperative computing (GCC’2003), 2003, pp 838–846
Ranganathan K, Foster I (2001) Design and evaluation of dynamic replication strategies for a high performance data grid. In: Proceedings of the international conference on computing in high energy and nuclear physics, 2001
Ranganathan K, Foster IT (2001) Identifying dynamic replication strategies for a high-performance data grid. In: Proceedings of the international workshop on grid computing (GRID’2001), 2001, pp 75–86
Revees CR (1993) Modern heuristic techniques for combinatorial problems. Oxford Blackwell Scientific Publication, Oxford
Russel M, Allen G, Daues G, Foster I, Seidel E, Novotny J, Shalf J, von Laszewski G (2002) The astrophysics simulation collaboratory: a science portal enabling community software development. Clust Comput 5(3):297–304
Tang M, Lee B, Yeo C, Tang X (2005) Dynamic replication algorithms for the multi-tier data grid. Future Gener Comput Syst 21(5):775–790
The ATLAS experiment (2009) http://atlas.ch/. Particle Physics Experiment at CERN
The European Data Grid project (2001) The datagrid architecture. http://eu-datagrid.web.cern.ch/eu-datagrid/
Venugopal S, Buyya R, Ramamohanarao K (2006) A taxonomy of data grids for distributed data sharing, management, and processing. ACM Comput Surv 1:1–53
Wang H, Liu P, Wu J (2006) A QoS-aware heuristic algorithm for replica placement. J Grid Comput, 96–103
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shorfuzzaman, M., Graham, P. & Eskicioglu, R. Adaptive popularity-driven replica placement in hierarchical data grids. J Supercomput 51, 374–392 (2010). https://doi.org/10.1007/s11227-009-0371-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-009-0371-9