Abstract
Data-sharing scientific communities use storage systems as distributed data stores by replicating content. In such highly replicated environments, a particular dataset can reside at multiple locations and can thus be downloaded from any one of them. Since datasets of interest are significantly large in size, improving download speeds either by server selection or by co-allocation can offer substantial benefits. In this paper, we present an architecture for co-allocating Grid data transfers across multiple connections, enabling the parallel download of datasets from multiple servers. We have developed several co-allocation strategies comprising of simple brute-force, predictive and dynamic load balancing techniques as a means both to exploit rate differences among the various client–server links and to address dynamic rate fluctuations. We evaluate our approaches using the GridFTP data movement protocol in a wide-area testbed and present our results.
Similar content being viewed by others
References
“Akamai”, 2002. http://www.akamai.com
W. Allcock et al., “High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies”, in Supercomputing’01, 2001.
J.W. Byers et al., “Informed Content Delivery Across Overlay Networks”, in Proceedings of ACM SIGCOMM’02, 2002.
J.W. Byers, M. Luby and M. Mitzenmacher, “Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed up Downloads”, in Proceedings of IEEE INFOCOM, 1999.
J.W. Byers, M. Luby and M. Mitzenmacher, “A Digital Fountain Approach to Asynchronous Reliable Multicast”, IEEE J-SAC, Special Issue on Network Support for Multicast Communication, Vol. 20, No. 8, pp. 1528–1540, 2002.
J. Crowcroft and I. Pratt, “Peer to Peer: Peering Into the Future”, in Networks 2002, 2002.
K. Czajkowski et al., “Grid Information Services for Distributed Resource Sharing”, in Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press: San Francisco, CA, 2001.
K. Czajkowski, I. Foster and C. Kesselman, “Resource Co-Allocation in Computational Grids”, in Proceedings of the Eigth IEEE International Symposium on High Performance Distributed Computing (HPDC-8), 1999.
I. Foster and C. Kesselman, “The Globus Project: A Status Report”, in IPPS/SPDP’98 Heterogeneous Computing Workshop, 1998.
C. Gkantsidis, “Parallel Download”, 2002. http://www.cc.gatech.edu/~gantsich/parallel download.htm
M. Hafeez, A. Samar and H. Stockinger, “Prototype for Distributed Data Production in CMS”, in 7th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2000), 2000.
K. Holtman, “Object Level Replication for Physics”, in 4th Annual Globus Retreat, Pittsburgh, 2000.
W. Hoschek et al., “Data Management in an International Grid Project”, in 2000 International Workshop on Grid Computing (GRID 2000), Bangalore, India, 2000.
“Internet Bottlenecks: The Case of Edge Delivery Services”, Akamai Whitepaper, 2000.
K. Johnson et al., “The Measured Performance of Content Distribution Networks”, in Proceedings of the 5th International Web Caching and Content Delivery Workshop, Lisbon, Portugal, 2000.
J. Kangasharju, K. Ross and J.W. Roberts, “Performance Evaluation of Redirection Schemes in Content Distribution Networks”, in Proceedings of 4th Web Caching Workshop, San Diego, 1999.
D. Malon et al., “Grid-enabled Data Access in the ATLAS Athena Framework”, in Computing and High Energy Physics 2001 (CHEP’01) Conference, 2001.
R. Malpani, J. Lorch and D. Berge, “Making World Wide Web Caching Servers Cooperate”, in Proceedings of the Fourth International WWW Conference, 1995.
N.F. Maxemchuk, “Dispersity Routing”, in Proceedings of the International Conference on Communications, 1975.
H. Newman and R. Mount, “The Particle Physics Data Grid”, www.cacr.caltech.edu/ppdg
“Peer-to-Peer File Sharing: The Effects of File Sharing on a Service Provider’s Network”, Sandvine Whitepaper, 2002.
J.S. Planck et al., Algorithms for High Performance, Wide-Area, Distributed File Downloads, Department of Computer Science, University of Tennessee, 2002.
M.O. Rabin, “Efficient Dispersal of Information for Security”, Journal of the ACM, Vol. 38, pp. 335–348, 1989.
L. Rizzo, “Effective Erasure Codes for Reliable Computing”, Computer Communications Review, 1997.
P. Rodriguez, A. Kirpal and W.E. Biersack, “Parallel-access for Mirror Sites in the Internet”, in Proceedings of IEEE INFOCOM, 2000.
S. Saroiu, P.K. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, in Proceedings of Multimedia Computing and Networking (MMCN’02), 2002.
“Sloan Digital Sky Survey”, 2002. http://www.sdss.org
“Speedera”, 2002. http://www.speedera.com
“The Data Grid Project”, 2002. http://www.eu-datagrid.org
“The GriPhyN Project”, 2002. http://www.griphyn.org
“The LIGO Experiment”, 2002. http://www.ligo.caltech.edu/
A. Tirumala and J. Ferguson, “Iperf 1.2 – The TCP/UDP Bandwidth Measurement Tool”, 2001. http://dast.nlanr.net/Projects/Iperf
S. Vazhkudai and J. Schopf, “Predicting Sporadic Grid Data Transfers”, in 11th IEEE High Performance Distributed Computing (HPDC-11), IEEE Press: Edinburgh, Scotland, 2002.
S. Vazhkudai, J. Schopf and I. Foster, “Predicting the Performance Wide-Area Data Transfers”, in 16th International Parallel and Distributed Processing Symposium (IPDPS), IEEE Press: Fort Lauderdale, Florida, 2002.
S. Vazhkudai, S. Tuecke and I. Foster, “Replica Selection in the Globus Data Grid”, in First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), IEEE Press: Brisbane, Australia, 2001.
J. Wang, “A Survey of Web Caching Schemes for the Internet”, ACM Computer Communication Review, 1999.
L. Wang, V. Pai and L. Peterson, “The Effectiveness of Request Redirection”, in Proceedings of the 5th OSDI Symposium, 2002.
L. Zhang, S. Michel and S. Floyd, “Adaptive Web Caching: Towards a New Global Caching Architecture”, in Proceedings of the Third International Caching Workshop, 1998.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vazhkudai, S. Distributed Downloads of Bulk, Replicated Grid Data. J Grid Computing 2, 31–42 (2004). https://doi.org/10.1007/s10723-004-5877-z
Issue Date:
DOI: https://doi.org/10.1007/s10723-004-5877-z