A novel dynamic network data replication scheme based on historical access record and proactive deletion

Wang, Zhe; Li, Tao; Xiong, Naixue; Pan, Yi

doi:10.1007/s11227-011-0708-z

A novel dynamic network data replication scheme based on historical access record and proactive deletion

Published: 19 October 2011

Volume 62, pages 227–250, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Zhe Wang¹,
Tao Li¹,
Naixue Xiong² &
…
Yi Pan²

434 Accesses
58 Citations
Explore all metrics

An Erratum to this article was published on 21 December 2011

Abstract

Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical data replication strategy to improve performance in cloud computing

Article 04 December 2020

Dynamic Replication Management Scheme for Distributed File System

SORD: a new strategy of online replica deduplication in Cloud-P2P

Article 24 July 2018

References

Abdulla G (1998) Analysis and modeling of world wide web traffic. Ph.D. Thesis. Virginia Polytechnic Institute and State University, Virginia, USA
Wei Q, Veeravalli B, Gong B, Zeng L, Feng D (2010) CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE international conference on cluster computing
Google Scholar
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceedings of 19th ACM symposium on operating systems principles (SOSP 2003), New York, USA, October 2003
Google Scholar
Weil SA, Brandt SA, Miller EL, Long DDE, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceeding of 7th conference on operating system design and implementation (OSDI’06), November 2006
Google Scholar
The Apache Software Foundation (2011) Hadoop. http://hadoop.apache.org/core
Chang R-S, Chang H-P (2008) A dynamic data replication strategy using access-weights in Data Grids. J Supercomput 45:277–295
Article Google Scholar
Loukopoulos T, Ahmad I (2004) Static and adaptive distributed data replication using genetic algorithms. J Parallel Distrib Comput 64:1270–1285
Article MATH Google Scholar
Ranganathan K, Foster I (2001) Identifying dynamic replication strategies for a high-performance data grids. In: International workshop on grid computing, Denver, USA, 2001
Google Scholar
Lei M, Vrbsky SV, Hong X (2008) An on-line replication strategy to increase availability in data grids. Future Gener Comput Syst 24:85–98
Article MATH Google Scholar
Cibej U, Slivnik B, Robic B (2005) The complexity of static data replication in data grids. Parallel Comput 31:900–912
Article MathSciNet Google Scholar
Bsoul M, Al-Khasawneh A, Kilani Y, Obeidat I (2010) A threshold-based dynamic data replication strategy. J Supercomput. doi:10.1007/s11227-010-0466-3
MATH Google Scholar
Shen H (2010) IRM: integrated file replication and consistency maintenance in P2P systems. IEEE Trans Parallel Distrib Syst 21:100–113
Article Google Scholar
Tang M, Lee B-S, Yeo C-K, Tang X (2005) Dynamic replication algorithms for the multi-tier data grid. Future Gener Comput Syst 21:775–790
Article Google Scholar
Tang M, Lee B-S, Tang X, Yeo C-K (2006) The impact of data replication of job scheduling performance in the data grid. Future Gener Comput Syst 22:254–268
Article MATH Google Scholar
Zhang J, Lee B-S, Tang X, Yeo C-K (2010) A model to predict the optimal performance of the Hierarchical Data Grid. Future Gener Comput Syst 26:1–11
Article Google Scholar
Khanli LM, Isazadeh A, Shishavan TN (2011) PHFS: a dynamic replication method, to decrease access latency in the multi-tier data grid. Future Gener Comput Syst 27:233–244
Article Google Scholar
Khan SU, Ahmad I (2008) Comparison and analysis of ten static heuristics-based Internet data replication techniques. J Parallel Distrib Comput 68:113–136
Article MATH Google Scholar
Shen H (2010) An efficient and adaptive decentralized file replication algorithm in P2P file sharing systems. IEEE Trans Parallel Distrib Syst 21:827–840
Article Google Scholar
Bell WH, Cameron DG, Capozza L, Millar AP, Stockinger K, Zini F (2003) OptorSim—a grid simulator for studying dynamic data replication strategies. Int J High Perform Comput Appl 17:403–416
Article Google Scholar
Shorfuzzaman M, Graham P, Eskicioglu R (2010) Adaptive popularity-driven replica placement in hierarchical data grids. J Supercomput 51:374–392
Article Google Scholar
Rasool Q, Li J, Zhang S (2009) Replica placement in multi-tier data grid. In: IEEE international conference on dependable, autonomic and secure computing.
Google Scholar
Half-life (2011) http://en.wikipedia.org/wiki/Half-life

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Sichuan University, Chengdu, China
Zhe Wang & Tao Li
Dept. of Computer Science, Georgia State University, Atlanta, USA
Naixue Xiong & Yi Pan

Authors

Zhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Naixue Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Li, T., Xiong, N. et al. A novel dynamic network data replication scheme based on historical access record and proactive deletion. J Supercomput 62, 227–250 (2012). https://doi.org/10.1007/s11227-011-0708-z

Download citation

Published: 19 October 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11227-011-0708-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel dynamic network data replication scheme based on historical access record and proactive deletion

Abstract

Access this article

Similar content being viewed by others

Hierarchical data replication strategy to improve performance in cloud computing

Dynamic Replication Management Scheme for Distributed File System

SORD: a new strategy of online replica deduplication in Cloud-P2P

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel dynamic network data replication scheme based on historical access record and proactive deletion

Abstract

Access this article

Similar content being viewed by others

Hierarchical data replication strategy to improve performance in cloud computing

Dynamic Replication Management Scheme for Distributed File System

SORD: a new strategy of online replica deduplication in Cloud-P2P

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation