Dynamic replica placement and selection strategies in data grids— A comprehensive survey

https://doi.org/10.1016/j.jpdc.2013.10.009Get rights and content

Highlights

  • Survey on replica placement and selection strategies in data grids is presented.

  • Parameters that are used to evaluate the grid performance are summarized.

  • Grid architectural models and simulation tools used are discussed.

Abstract

Data replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth. Data replication enhances data availability and thereby increases the system reliability. There are two steps involved in data replication, namely, replica placement and replica selection. Replica placement involves identifying the best possible node to duplicate data based on network latency and user request. Replica selection involves selecting the best replica location to access the data for job execution in the data grid. Various replica placement and selection algorithms are available in the literature. These algorithms measure and analyze different parameters such as bandwidth consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper, various replica placement and selection strategies along with their merits and demerits are discussed. This paper also analyses the performance of various strategies with respect to the parameters mentioned above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the data grid environment.

Section snippets

Introduction: replica placement and replica selection

A computational grid  [22] is a combination of both hardware and software that provides reliable and consistent resources to execute a job in distributed environment. Data grid is a distributed collection of storage and computational resources located in different geographical locations.  [1], [18], [23] describe grid is a flexible, secure and co-ordinated resource sharing environment for individuals, institutions and resources. Computationally intensive applications need large amount of data,

Dynamic replica placement techniques

Both centralized and distributed dynamic replica placement strategies are further classified according to the type of network used for implementation.

Dynamic replica selection techniques

Replica selection is one of the key components of data management in data intensive application. It decides which replica location is the best place to access the data for users. If several replicas are available for a file, the optimization algorithm determines which replica should be selected to execute the job. The optimal replica is selected based on the following parameters: access cost, access latency, bandwidth consumption, balanced workload, maintenance cost, job execution time,

Summary of replica placement and selection strategies

In this paper, the summary of various replica placement and selection strategies is done based on the following aspects:

  • parameters that are used to evaluate the grid performance;

  • architectural models;

  • assumptions that are made during replication;

  • simulation tools used.

The grid replication and selection strategies are evaluated based on certain performance parameters. Any replica placement and selection strategy tries to improve one or more of the following parameters: makespan, quality assurance,

Conclusion

This paper presents a survey on replica placement and selection strategies for dynamic data grid environment. Different replica placement and selection strategies are proposed by researchers. In dynamic grid configuration, the user can join and leave the network at any point of time. Therefore, there is no specific grid topology used for the dynamic data grid. Most of the work done in replica placement and selection is based on the hierarchical and modified hierarchical architecture. The graph

R. Kingsy Grace graduated with B.E. Computer Science and Engineering in 2003 from Noorul Islam College of Engineering, India and completed M.E. Computer Science and Engineering in 2005 from Karunya Institute of Technology, Coimbatore, India. She is currently pursuing her Ph.D. at Anna University, Chennai, India. Her area of interest includes Grid Computing and her current research focus is on Dynamic replica placement and selection in data grids. She has about 10 years of teaching experience.

References (54)

  • J.J. Wu et al.

    Optimal replica placement in hierarchical data grids with locality assurance

    J. Parallel Distrib. Comput.

    (2008)
  • W. Allcock et al.

    Data management and transfer in high performance computational grid environments

    Parallel Comput. J.

    (2002)
  • W. Allcock et al.

    Secure, efficient data transport and replica management for high-performance data-intensive computing

  • W. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Drach, D....
  • R.M. Alumuttairi, R. Wankar, A. Negi, C.R. Rao, Smart Replica Selection for Data Grids using Rough Set Approximation...
  • R.M. Alumuttairi, R. Wankar, A. Negi, C.R. Rao, Replica Selection in Data Grids using Preconditioning of Decision...
  • R.M. Alumuttairi, R. Wankar, A. Negi, C.R. Rao, Rough Set Clustering Approach to Replica Selection in Data Grids...
  • W.H. Bell et al.

    OptorSim: a grid simulator for studying dynamic data replication strategies

    Int. J. High Perform. Comput. Appl.

    (2003)
  • W.H. Bell et al.

    Evaluation of an economy based file replication strategy for a data grid

  • G. Bingxiang, Y. Kui, A Global Dynamic Scheduling with Replica Selection Algorithm Using GridFTP, in: Proceedings of...
  • R. Buyya et al.
  • Z. Challal, Bouabana Tebibel, A priori replica placement strategy in data grid, in: Proceedings of International...
  • R. Chang, H. Chang, Y. Wang, A dynamic weighted data replication strategy in data grids, in: Proceedings of AICCSA...
  • K.Y. Cheng, H.H. Wang, C.H. Wen, Y.L. Lin, K.C. Li, C.L. Wang, Dynamic file replica location and selection strategy in...
  • A. Chervenak, E. Deelman, I. Foster, L. Guy, W. Hoschek, A. Iamnitchi, C. Kesselman, P. Kunst, M. Ripeanu, B....
  • A. Chervenak et al.

    The data grid: towards architecture for the distributed management and analysis of large scientific datasets

    J. Netw. Comput. Appl.

    (2001)
  • B. Dhruba, The Hadoop Distributed File System: Architecture and Design....
  • Cited by (69)

    • Keeping up with storage: Decentralized, write-enabled dynamic geo-replication

      2018, Future Generation Computer Systems
      Citation Excerpt :

      In this paper, we focus on allowing a geo-distributed application to access a geo-distributed data source with the lowest possible latency. Kingsy Grace et al. [7] provide an extensive survey of replica placement and selection algorithms available in the literature. Among these, Chen et al. [8] propose a dissemination-tree based replication algorithm leveraging a peer-to-peer location service.

    • Access strategies for network caching

      2021, IEEE/ACM Transactions on Networking
    View all citing articles on Scopus

    R. Kingsy Grace graduated with B.E. Computer Science and Engineering in 2003 from Noorul Islam College of Engineering, India and completed M.E. Computer Science and Engineering in 2005 from Karunya Institute of Technology, Coimbatore, India. She is currently pursuing her Ph.D. at Anna University, Chennai, India. Her area of interest includes Grid Computing and her current research focus is on Dynamic replica placement and selection in data grids. She has about 10 years of teaching experience. She is currently working as an Assistant Professor (Sr. Grade) in the Department of Computer Science and Engineering, Sri Ramakrishna Engineering College, Coimbatore, India.

    R. Manimegalai is presently working as a Professor and Director–Research, Department of Computer Science and Engineering, Park College Engineering and Technology, Coimbatore, India. She has published many papers in international/national journals and conferences. Her area of interest includes Reconfigurable Computing and Distributed Systems. She has to her credit 19 years of teaching, research and industry experience.

    View full text