Abstract
Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area Grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of Grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.
Similar content being viewed by others
References
Alexandrov, A., Ibel, M., Schauser, K., Scheiman, C.: UFO: a personal global file system based on user-level extensions to the operating system. ACM Trans. Comput. Syst. 16, 207–233 (1998)
Allcock, W., Bresnahan, J., Kettimuthu, R., Link, J.: The globus eXtensible input/output system (XIO): A protocol independent IO system for the Grid. In: Workshop on Middleware for Grid Computing, Melbourne (2005) November
Allcock, W., Chervenak, A., Foster, I., Kesselman, C., Tuecke, S.: Protocols and services for distributed data-intensive science. In: Proceedings of Advanced Computing and Analysis Techniques in Physics Research, pp. 161–163, Fermi National Accelerator Laboratory, Batavia, IL, 16–20 October 2000
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 3(215), 403–410 (1990)
Andrews, P., Kovatch, P., Jordan, C.: Massive high-performance global file systems for Grid computing. In: Supercomputing, Seattle, WA (2005) November
Baru, C., Moore, R., Rajasekar, A., Wan, M.: The SDSC storage resource broker. In: Proceedings of CASCON, Toronto (1998) November
Batsakis, A., Burns, R.: Cluster delegation: High-performance fault-tolerant data sharing in NFS. In: High Performance Distributed Computing, Honolulu, 4–6 June 2004
Beck, M., Moore, T., Plank, J.: An end-to-end approach to globally scalable network storage. In: ACM SIGCOMM. Pittsburgh, Pennsylvania, 19–23 August 2002
Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A., Arpaci-Dusseau, R., Livny, M.: Flexibility, manageability, and performance in a Grid storage appliance. In: IEEE Symposium on High Performance Distributed Computing, Edinburgh, Scotland, 24–26 July 2002
Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 239–256 (1992)
Bester, J., Foster, I., Kesselman, C., Tedesco, J., Tuecke, S.: GASS: a data movement and access service for wide area computing systems. In: 6th Workshop on I/O in Parallel and Distributed Systems. ACM, New York (1999)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, B., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13(3), 219–237 (2005)
Figueiredo, R., Kapadia, N., Fortes, J.: The PUNCH virtual file system: seamless access to decentralized storage services in a computational Grid. In: IEEE High Performance Distributed Computing. San Francisco, CA, 7–9 August 2001
Foster, I., Kesselman, C., Nick, J., Tuecke, S.: Grid services for distributed system integration. IEEE Comp. 35(6), 37–46 (2002)
Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational Grids. In: ACM Conference on Computer and Communications Security, pp. 83–92, San Francisco, CA, 3–5 November 1998
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the Grid: Enabling scalable virtual organizations. Lect. Notes Comput. Sci. 2150, 1–4 (2001)
Ganguly, A., Agrawal, A., Boykin, P. O., Figueiredo, R.J.: WOW: Self organizing wide area overlay networks of workstations. J. Grid Computing 5(2) (2007)
Gray, C., Cheriton, D.: Lease: an efficient fault-tolerant mechanism for distributed file cache consistency. In: Twelfth ACM Symposium on Operating Systems Principles, pp. 202–210, Litchfield Park, Arizona, 3–6 December 1989
Grimshaw, A., Wulf, W., et al.: The legion vision of a worldwide virtual computer. Commun. ACM 40(1), 39–45 (1997)
Hemmes, J., Thain, D.: Cacheable decentralized groups for Grid resource access control. In: IEEE Conference on Grid Computing, Barcelona, 28–29 September 2006
Honeyman, P., Adamson, W.A., McKee, S.: GridNFS: global storage for global collaboration. In: Local to Global Data Interoperability. IEEE, Piscataway (2005)
Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., West, M.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)
Jones, M.: Interposition agents: transparently interposing user code at the system interface. In: 14th ACM Symposium on Operating Systems Principles, pp. 80–93, Asheville, 5–8 December 1993
Li, W., Liang, J., Xu, Z.: VegaFS: a prototype for file sharing crossing multiple domains. In: IEEE Conference on Cluster Computing, Hong Kong, 1–4 December 2003
Moretti, C., Faltemier, T., Thain, D., Flynn, P.: Challenges in executing data intensive biometric workloads on a desktop Grid. In: Workshop on Large Scale and Volatile Desktop Grids, Long Beach, CA (2007) March
Patterson, D.A., Gibson, G., Katz, R.: A case for redundant arrays of inexpensive disks (RAID). In: ACM SIGMOD international conference on management of data, pp. 109–116, Chicago, Illinois, 1–3 June 1988
Phillips, P., et al.: Overview of the face recognition grand challenge. In: IEEE Computer Vision and Pattern Recognition. IEEE, Piscataway (2005)
Plank, J., Beck, M., Elwasif, W., Moore, T., Swany, M., Wolski, R.: The internet backplane protocol: Storage in the network. In: Network Storage Symposium, Seattle, WA, 14–15 October 1999
Poirier, J., Canough, G., Gress, J., Mikocki, S., Rettig, T.: Nucl. Phys. B Proc. Suppl. 14, 143–147 (1990)
Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., Lyon, B.: Design and implementation of the Sun network filesystem. In: USENIX Summer Technical Conference, pp. 119–130, Portland (1985)
Shoshani, A., Sim, A., Gu, J.: Storage resource managers: middleware components for Grid storage. In: Nineteenth IEEE Symposium on Mass Storage Systems, Maryland, 15–18 April 2002
Srinivasan, V., Mogul, J.: Spritely NFS: Experiments with cache consistency protocols. In: ACM Symposium on Operating Systems Principles, Litchfield Park, 3–6 December 1989
Steiner, J., Neuman, C., Schiller, J.I.: Kerberos: An authentication service for open network systems. In: Proceedings of the USENIX Winter Technical Conference, pp. 191–200 (1988)
Stone, N., et al.: PDIO: High performance remote file I/O for portals enabled compute nodes. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV (2006)
Tatebe, O., Soda, N., Morita, Y., Matsuoka, S., Sekiguchi, S.: Gfarm v2: a Grid file system that supports high-performance distributed and parallel data computing. In: Computing in High Energy Physics (CHEP) (2004) September
Thain, D.: Operating system support for space allocation in Grid storage systems. In: IEEE Conference on Grid Computing. IEEE, Piscataway (2006)
Thain, D., Klous, S., Wozniak, J., Brenner, P., Striegel, A., Izaguirre, J.: Separating abstractions from resources in a tactical storage system. In: IEEE/ACM Supercomputing. IEEE, Piscataway (2005)
Thain, D., Livny, M.: Bypass: a tool for building split execution systems. In: IEEE High Performance Distributed Computing. IEEE, Pittsburg, PA (2000)
Thain, D., Livny, M.: Parrot: transparent user-level middleware for data-intensive computing. In: Proceedings of the Workshop on Adaptive Grid Middleware, New Orleans (2003)
Thain, D., Moretti, C.: Efficient access to many small files in a filesystem for Grid computing. In: IEEE Conference on Grid Computing. IEEE, Austin, TX (2007)
Thain, D., Tannenbaum, T., Livny, M.: Condor and the Grid. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003)
Vazhkudai, S., Ma, X., Freeh, V., Strickland, J., Tammineedi, N., Scott, S.: FreeLoader: Scavenging desktop storage resources for scientific data. In: Supercomputing, Seattle, WA (2005) November
Walker, E.: A distributed file system for a wide-area high performance computing infrastructure. In: USENIX Workshop on Real Large Distributed Systems, Seattle, WA (2006) November
Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: USENIX Operating Systems Design and Implementation, Seattle, WA (2006) November
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thain, D., Moretti, C. & Hemmes, J. Chirp: a practical global filesystem for cluster and Grid computing. J Grid Computing 7, 51–72 (2009). https://doi.org/10.1007/s10723-008-9100-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-008-9100-5