Skip to main content
Log in

Chirp: a practical global filesystem for cluster and Grid computing

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area Grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of Grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alexandrov, A., Ibel, M., Schauser, K., Scheiman, C.: UFO: a personal global file system based on user-level extensions to the operating system. ACM Trans. Comput. Syst. 16, 207–233 (1998)

    Article  Google Scholar 

  2. Allcock, W., Bresnahan, J., Kettimuthu, R., Link, J.: The globus eXtensible input/output system (XIO): A protocol independent IO system for the Grid. In: Workshop on Middleware for Grid Computing, Melbourne (2005) November

  3. Allcock, W., Chervenak, A., Foster, I., Kesselman, C., Tuecke, S.: Protocols and services for distributed data-intensive science. In: Proceedings of Advanced Computing and Analysis Techniques in Physics Research, pp. 161–163, Fermi National Accelerator Laboratory, Batavia, IL, 16–20 October 2000

  4. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 3(215), 403–410 (1990)

    Google Scholar 

  5. Andrews, P., Kovatch, P., Jordan, C.: Massive high-performance global file systems for Grid computing. In: Supercomputing, Seattle, WA (2005) November

  6. Baru, C., Moore, R., Rajasekar, A., Wan, M.: The SDSC storage resource broker. In: Proceedings of CASCON, Toronto (1998) November

  7. Batsakis, A., Burns, R.: Cluster delegation: High-performance fault-tolerant data sharing in NFS. In: High Performance Distributed Computing, Honolulu, 4–6 June 2004

  8. Beck, M., Moore, T., Plank, J.: An end-to-end approach to globally scalable network storage. In: ACM SIGCOMM. Pittsburgh, Pennsylvania, 19–23 August 2002

  9. Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Livny, M.: Flexibility, manageability, and performance in a Grid storage appliance. In: IEEE Symposium on High Performance Distributed Computing, Edinburgh, Scotland, 24–26 July 2002

  10. Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 239–256 (1992)

    Article  Google Scholar 

  11. Bester, J., Foster, I., Kesselman, C., Tedesco, J., Tuecke, S.: GASS: a data movement and access service for wide area computing systems. In: 6th Workshop on I/O in Parallel and Distributed Systems. ACM, New York (1999)

    Google Scholar 

  12. Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, B., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13(3), 219–237 (2005)

    Google Scholar 

  13. Figueiredo, R., Kapadia, N., Fortes, J.: The PUNCH virtual file system: seamless access to decentralized storage services in a computational Grid. In: IEEE High Performance Distributed Computing. San Francisco, CA, 7–9 August 2001

  14. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: Grid services for distributed system integration. IEEE Comp. 35(6), 37–46 (2002)

    Google Scholar 

  15. Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational Grids. In: ACM Conference on Computer and Communications Security, pp. 83–92, San Francisco, CA, 3–5 November 1998

  16. Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the Grid: Enabling scalable virtual organizations. Lect. Notes Comput. Sci. 2150, 1–4 (2001)

    Article  Google Scholar 

  17. Ganguly, A., Agrawal, A., Boykin, P. O., Figueiredo, R.J.: WOW: Self organizing wide area overlay networks of workstations. J. Grid Computing 5(2) (2007)

  18. Gray, C., Cheriton, D.: Lease: an efficient fault-tolerant mechanism for distributed file cache consistency. In: Twelfth ACM Symposium on Operating Systems Principles, pp. 202–210, Litchfield Park, Arizona, 3–6 December 1989

  19. Grimshaw, A., Wulf, W., et al.: The legion vision of a worldwide virtual computer. Commun. ACM 40(1), 39–45 (1997)

    Article  Google Scholar 

  20. Hemmes, J., Thain, D.: Cacheable decentralized groups for Grid resource access control. In: IEEE Conference on Grid Computing, Barcelona, 28–29 September 2006

  21. Honeyman, P., Adamson, W.A., McKee, S.: GridNFS: global storage for global collaboration. In: Local to Global Data Interoperability. IEEE, Piscataway (2005)

    Google Scholar 

  22. Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., West, M.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)

    Article  Google Scholar 

  23. Jones, M.: Interposition agents: transparently interposing user code at the system interface. In: 14th ACM Symposium on Operating Systems Principles, pp. 80–93, Asheville, 5–8 December 1993

  24. Li, W., Liang, J., Xu, Z.: VegaFS: a prototype for file sharing crossing multiple domains. In: IEEE Conference on Cluster Computing, Hong Kong, 1–4 December 2003

  25. Moretti, C., Faltemier, T., Thain, D., Flynn, P.: Challenges in executing data intensive biometric workloads on a desktop Grid. In: Workshop on Large Scale and Volatile Desktop Grids, Long Beach, CA (2007) March

  26. Patterson, D.A., Gibson, G., Katz, R.: A case for redundant arrays of inexpensive disks (RAID). In: ACM SIGMOD international conference on management of data, pp. 109–116, Chicago, Illinois, 1–3 June 1988

  27. Phillips, P., et al.: Overview of the face recognition grand challenge. In: IEEE Computer Vision and Pattern Recognition. IEEE, Piscataway (2005)

    Google Scholar 

  28. Plank, J., Beck, M., Elwasif, W., Moore, T., Swany, M., Wolski, R.: The internet backplane protocol: Storage in the network. In: Network Storage Symposium, Seattle, WA, 14–15 October 1999

  29. Poirier, J., Canough, G., Gress, J., Mikocki, S., Rettig, T.: Nucl. Phys. B Proc. Suppl. 14, 143–147 (1990)

    Article  Google Scholar 

  30. Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., Lyon, B.: Design and implementation of the Sun network filesystem. In: USENIX Summer Technical Conference, pp. 119–130, Portland (1985)

  31. Shoshani, A., Sim, A., Gu, J.: Storage resource managers: middleware components for Grid storage. In: Nineteenth IEEE Symposium on Mass Storage Systems, Maryland, 15–18 April 2002

  32. Srinivasan, V., Mogul, J.: Spritely NFS: Experiments with cache consistency protocols. In: ACM Symposium on Operating Systems Principles, Litchfield Park, 3–6 December 1989

  33. Steiner, J., Neuman, C., Schiller, J.I.: Kerberos: An authentication service for open network systems. In: Proceedings of the USENIX Winter Technical Conference, pp. 191–200 (1988)

  34. Stone, N., et al.: PDIO: High performance remote file I/O for portals enabled compute nodes. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV (2006)

  35. Tatebe, O., Soda, N., Morita, Y., Matsuoka, S., Sekiguchi, S.: Gfarm v2: a Grid file system that supports high-performance distributed and parallel data computing. In: Computing in High Energy Physics (CHEP) (2004) September

  36. Thain, D.: Operating system support for space allocation in Grid storage systems. In: IEEE Conference on Grid Computing. IEEE, Piscataway (2006)

    Google Scholar 

  37. Thain, D., Klous, S., Wozniak, J., Brenner, P., Striegel, A., Izaguirre, J.: Separating abstractions from resources in a tactical storage system. In: IEEE/ACM Supercomputing. IEEE, Piscataway (2005)

    Google Scholar 

  38. Thain, D., Livny, M.: Bypass: a tool for building split execution systems. In: IEEE High Performance Distributed Computing. IEEE, Pittsburg, PA (2000)

    Google Scholar 

  39. Thain, D., Livny, M.: Parrot: transparent user-level middleware for data-intensive computing. In: Proceedings of the Workshop on Adaptive Grid Middleware, New Orleans (2003)

  40. Thain, D., Moretti, C.: Efficient access to many small files in a filesystem for Grid computing. In: IEEE Conference on Grid Computing. IEEE, Austin, TX (2007)

    Google Scholar 

  41. Thain, D., Tannenbaum, T., Livny, M.: Condor and the Grid. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003)

    Google Scholar 

  42. Vazhkudai, S., Ma, X., Freeh, V., Strickland, J., Tammineedi, N., Scott, S.: FreeLoader: Scavenging desktop storage resources for scientific data. In: Supercomputing, Seattle, WA (2005) November

  43. Walker, E.: A distributed file system for a wide-area high performance computing infrastructure. In: USENIX Workshop on Real Large Distributed Systems, Seattle, WA (2006) November

  44. Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: USENIX Operating Systems Design and Implementation, Seattle, WA (2006) November

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas Thain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thain, D., Moretti, C. & Hemmes, J. Chirp: a practical global filesystem for cluster and Grid computing. J Grid Computing 7, 51–72 (2009). https://doi.org/10.1007/s10723-008-9100-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-008-9100-5

Keywords

Navigation