Skip to main content
Log in

Distributed Downloads of Bulk, Replicated Grid Data

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Data-sharing scientific communities use storage systems as distributed data stores by replicating content. In such highly replicated environments, a particular dataset can reside at multiple locations and can thus be downloaded from any one of them. Since datasets of interest are significantly large in size, improving download speeds either by server selection or by co-allocation can offer substantial benefits. In this paper, we present an architecture for co-allocating Grid data transfers across multiple connections, enabling the parallel download of datasets from multiple servers. We have developed several co-allocation strategies comprising of simple brute-force, predictive and dynamic load balancing techniques as a means both to exploit rate differences among the various client–server links and to address dynamic rate fluctuations. We evaluate our approaches using the GridFTP data movement protocol in a wide-area testbed and present our results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. “Akamai”, 2002. http://www.akamai.com

  2. W. Allcock et al., “High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies”, in Supercomputing’01, 2001.

  3. J.W. Byers et al., “Informed Content Delivery Across Overlay Networks”, in Proceedings of ACM SIGCOMM’02, 2002.

  4. J.W. Byers, M. Luby and M. Mitzenmacher, “Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed up Downloads”, in Proceedings of IEEE INFOCOM, 1999.

  5. J.W. Byers, M. Luby and M. Mitzenmacher, “A Digital Fountain Approach to Asynchronous Reliable Multicast”, IEEE J-SAC, Special Issue on Network Support for Multicast Communication, Vol. 20, No. 8, pp. 1528–1540, 2002.

    Google Scholar 

  6. J. Crowcroft and I. Pratt, “Peer to Peer: Peering Into the Future”, in Networks 2002, 2002.

  7. K. Czajkowski et al., “Grid Information Services for Distributed Resource Sharing”, in Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press: San Francisco, CA, 2001.

    Google Scholar 

  8. K. Czajkowski, I. Foster and C. Kesselman, “Resource Co-Allocation in Computational Grids”, in Proceedings of the Eigth IEEE International Symposium on High Performance Distributed Computing (HPDC-8), 1999.

  9. I. Foster and C. Kesselman, “The Globus Project: A Status Report”, in IPPS/SPDP’98 Heterogeneous Computing Workshop, 1998.

  10. C. Gkantsidis, “Parallel Download”, 2002. http://www.cc.gatech.edu/~gantsich/parallel download.htm

  11. M. Hafeez, A. Samar and H. Stockinger, “Prototype for Distributed Data Production in CMS”, in 7th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2000), 2000.

  12. K. Holtman, “Object Level Replication for Physics”, in 4th Annual Globus Retreat, Pittsburgh, 2000.

  13. W. Hoschek et al., “Data Management in an International Grid Project”, in 2000 International Workshop on Grid Computing (GRID 2000), Bangalore, India, 2000.

  14. “Internet Bottlenecks: The Case of Edge Delivery Services”, Akamai Whitepaper, 2000.

  15. K. Johnson et al., “The Measured Performance of Content Distribution Networks”, in Proceedings of the 5th International Web Caching and Content Delivery Workshop, Lisbon, Portugal, 2000.

  16. J. Kangasharju, K. Ross and J.W. Roberts, “Performance Evaluation of Redirection Schemes in Content Distribution Networks”, in Proceedings of 4th Web Caching Workshop, San Diego, 1999.

  17. D. Malon et al., “Grid-enabled Data Access in the ATLAS Athena Framework”, in Computing and High Energy Physics 2001 (CHEP’01) Conference, 2001.

  18. R. Malpani, J. Lorch and D. Berge, “Making World Wide Web Caching Servers Cooperate”, in Proceedings of the Fourth International WWW Conference, 1995.

  19. N.F. Maxemchuk, “Dispersity Routing”, in Proceedings of the International Conference on Communications, 1975.

  20. H. Newman and R. Mount, “The Particle Physics Data Grid”, www.cacr.caltech.edu/ppdg

  21. “Peer-to-Peer File Sharing: The Effects of File Sharing on a Service Provider’s Network”, Sandvine Whitepaper, 2002.

  22. J.S. Planck et al., Algorithms for High Performance, Wide-Area, Distributed File Downloads, Department of Computer Science, University of Tennessee, 2002.

  23. M.O. Rabin, “Efficient Dispersal of Information for Security”, Journal of the ACM, Vol. 38, pp. 335–348, 1989.

    Google Scholar 

  24. L. Rizzo, “Effective Erasure Codes for Reliable Computing”, Computer Communications Review, 1997.

  25. P. Rodriguez, A. Kirpal and W.E. Biersack, “Parallel-access for Mirror Sites in the Internet”, in Proceedings of IEEE INFOCOM, 2000.

  26. S. Saroiu, P.K. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, in Proceedings of Multimedia Computing and Networking (MMCN’02), 2002.

  27. “Sloan Digital Sky Survey”, 2002. http://www.sdss.org

  28. “Speedera”, 2002. http://www.speedera.com

  29. “The Data Grid Project”, 2002. http://www.eu-datagrid.org

  30. “The GriPhyN Project”, 2002. http://www.griphyn.org

  31. “The LIGO Experiment”, 2002. http://www.ligo.caltech.edu/

  32. A. Tirumala and J. Ferguson, “Iperf 1.2 – The TCP/UDP Bandwidth Measurement Tool”, 2001. http://dast.nlanr.net/Projects/Iperf

  33. S. Vazhkudai and J. Schopf, “Predicting Sporadic Grid Data Transfers”, in 11th IEEE High Performance Distributed Computing (HPDC-11), IEEE Press: Edinburgh, Scotland, 2002.

    Google Scholar 

  34. S. Vazhkudai, J. Schopf and I. Foster, “Predicting the Performance Wide-Area Data Transfers”, in 16th International Parallel and Distributed Processing Symposium (IPDPS), IEEE Press: Fort Lauderdale, Florida, 2002.

    Google Scholar 

  35. S. Vazhkudai, S. Tuecke and I. Foster, “Replica Selection in the Globus Data Grid”, in First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), IEEE Press: Brisbane, Australia, 2001.

    Google Scholar 

  36. J. Wang, “A Survey of Web Caching Schemes for the Internet”, ACM Computer Communication Review, 1999.

  37. L. Wang, V. Pai and L. Peterson, “The Effectiveness of Request Redirection”, in Proceedings of the 5th OSDI Symposium, 2002.

  38. L. Zhang, S. Michel and S. Floyd, “Adaptive Web Caching: Towards a New Global Caching Architecture”, in Proceedings of the Third International Caching Workshop, 1998.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudharshan Vazhkudai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vazhkudai, S. Distributed Downloads of Bulk, Replicated Grid Data. J Grid Computing 2, 31–42 (2004). https://doi.org/10.1007/s10723-004-5877-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-004-5877-z

Keywords

Navigation