Skip to main content

Optimizing Data Management in Grid Environments

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2009 (OTM 2009)

Abstract

Grids currently serve as platforms for numerous scientific as well as business applications that generate and access vast amounts of data. In this paper, we address the need for efficient, scalable and robust data management in Grid environments. We propose a fully decentralized and adaptive mechanism comprising of two components: A Distributed Replica Location Service (DRLS) and a data transfer mechanism called GridTorrent. They both adopt Peer-to-Peer techniques in order to overcome performance bottlenecks and single points of failure. On one hand, DRLS ensures resilience by relying on a Byzantine-tolerant protocol and is able to handle massive concurrent requests even during node churn. On the other hand, GridTorrent allows for maximum bandwidth utilization through collaborative sharing among the various data providers and consumers. The proposed integrated architecture is completely backwards-compatible with already deployed Grids. To demonstrate these points, experiments have been conducted in LAN as well as WAN environments under various workloads. The evaluation shows that our scheme vastly outperforms the conventional mechanisms in both efficiency (up to 10 times faster) and robustness in case of failures and flash crowd instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. The Large Hadron Collider, http://lhc.web.cern.ch/lhc/

  2. The GREDIA Project, http://www.gredia.eu/

  3. The official site of Globus Toolkit, http://globus.org/toolkit

  4. The official BitTorrent client, http://www.bittorrent.org

  5. Distributed.net, RSA Labs 64bit RC5 Encryption Challenge, http://www.distributed.net

  6. PlanetLab: An open platform for developing, deploying, and accessing planetary-scale services, http://www.planet-lab.org/

  7. Allcock, B., Bester, J., Bresnahan, J., Chervenak, A.L., Foster, I., Kesselman, C., Meder, S., Nefedova, V., Quesnel, D., Tuecke, S.: Data management and transfer in high-performance computational grid environments. Parallel Computing 28(5), 749–771 (2002)

    Article  Google Scholar 

  8. Allcock, W., Bresnahan, J., Kettimithu, R., Link, M., Dumitresku, C., Raicu, I., Foster, I.: The globus striped gridftp framework and server. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2005 (2005)

    Google Scholar 

  9. Anderson, D.: Boinc: A system for public-resource computing and storage. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing (2004)

    Google Scholar 

  10. Cai, M., Chervenak, A., Frank, M.: A peer-to-peer replica location service based on a distributed hash table. In: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, Pittsburgh, PA (November 2004)

    Google Scholar 

  11. Chazapis, A., Koziris, N.: Xoros: A mutable distributed hash table. In: Proceedings of the 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007), Vienna, Austria (2007)

    Google Scholar 

  12. Chazapis, A., Zissimos, A., Koziris, N.: A peer-to-peer replica management service for high-throughput grids. In: Proceedings of the 2005 International Conference on Parallel Processing (ICPP 2005), Oslo, Norway (2005)

    Google Scholar 

  13. Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., Tuecke, S.: The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications (2000)

    Google Scholar 

  14. Chervenak, A., Palavalli, N., Bharathi, S., Kesselman, C., Schwartzkopf, R., Stockinger, H., Tierney, B.: Performance and Scalability of a replica location service. In: Proc. of the 13th IEEE International Symposioum on High Performance Distributed Computing Conference (HPDC), Honolulu (June 2004)

    Google Scholar 

  15. Cohen, B.: Incentives build robustness in bittorrent. In: Workshop on Economics of Peer-to-Peer Systems, Berkeley, CA, USA (June 2003)

    Google Scholar 

  16. Sullivan III, W.T., Werthimer, D., Bowyer, S., Cobb, J., Gedye, D., Anderson, D.: New major seti project based on project serendip data and 100,000 personal computers. In: Astronomical and Biochem ical Origins and the Search for Life in the Universe, Proc. of the Fifth Intl. Conf. on Bioastronomy (1997)

    Google Scholar 

  17. Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational grids. In: Proceedings of the 5th ACM conference on Computer and communications security, pp. 83–92. ACM Press, New York (1998)

    Chapter  Google Scholar 

  18. Kaplan, A., Fox, G., von Laszewski, G.: Gridtorrent framework: A high-performance data transfer and data sharing framework for scientific computing. In: Proceedings of GCE 2007, Reno, Nevada (2007)

    Google Scholar 

  19. Maymounkov, P., Mazières, D.: Kademlia: A peer-to-peer information system based on the xor metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, p. 53. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Peterson, L., Anderson, T., Culler, D., Roscoe, T.: A blueprint for introducing disruptive technology into the internet. In: Proceedings of HotNets–I, Princeton, NJ (October 2002)

    Google Scholar 

  21. Ripeanu, M., Foster, I.: A decentralized, adaptive, replica location service. In: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), Edinburgh, UK (July 2002)

    Google Scholar 

  22. Sherwood, R., Braud, R., Bhattacharjee, B.: Slurpie: A cooperative bulk data transfer protocol. In: Proceedings of IEEE INFOCOM (March 2004)

    Google Scholar 

  23. Stockinger, H., Samar, A., Holtman, K., Allcock, B., Foster, I., Tierney, B.: File and object replication in data grids. Cluster Computing 5(3), 305–314 (2002)

    Article  Google Scholar 

  24. Thain, D., Basney, J., Son, S.-C., Livny, M.: The kangaroo approach to data movement on the grid. In: Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing, HPDC10 (2001)

    Google Scholar 

  25. Wei, B., Fedak, G., Cappello, F.: Collaborative data distribution with bittorrent for computational desktop grids. In: Proceedings of the 4th International Symposium on Parallel and Distributed Computing, ISPDC 2005 (2005)

    Google Scholar 

  26. Weigle, E., Chien, A.A.: The composite endpoint protocol (cep): Scalable endpoints for terabit flows. In: Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005 (2005)

    Google Scholar 

  27. Zissimos, A., Doka, K., Chazapis, A., Koziris, N.: Gridtorrent: Optimizing data transfers in the grid with collaborative sharing. In: Proceedings of the 11th Panhellenic Conference on Informatics, Patras, Greece (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zissimos, A., Doka, K., Chazapis, A., Tsoumakos, D., Koziris, N. (2009). Optimizing Data Management in Grid Environments. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5870. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05148-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05148-7_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05147-0

  • Online ISBN: 978-3-642-05148-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics