Skip to main content

CSAR-2: A Case Study of Parallel File System Dependability Analysis

  • Conference paper
High Performance Computing and Communications (HPCC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 3726))

Abstract

Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage efficiency and level of fault tolerance. However the actual level of dependability of an enhanced striped file system is determined by more than just the redundancy scheme adopted, depending in general on other factors such as the type of fault detection mechanism, the nature and the speed of the recovery. In this paper we address the question of how to assess the dependability of CSAR, a version of PVFS augmented with a RAID5 distributed redundancy scheme we described in a previous work.

This work has been partially supported by the Consorzio Interuniversitario Nazionale per l’Informatica (CINI), by the Italian Ministry for Education, University, and Research (MIUR) in the framework of the FIRB Project ”Middleware for advanced services over large-scale, wired-wireless distributed systems (WEB-MINDS)”, by the National Partnership for Advanced Computational Infrastructure, by the Ohio Supercomputer Center through grants PAS0036 and PAS0121, and by NSF grant CNS-0403342. M.L. is partially supported by NSF DBI-0317335. Support from Hewlett-Packard is also gratefully acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: a parallel file system for Linux clusters. In: Proc. of the 4th Annual Linux Showcase and Conference, Atlanta, GA, pp. 317–327 (2000) (Best Paper Award)

    Google Scholar 

  2. Stonebraker, M., Schloss, G.A.: Distributed RAID-a new multiple copy algorithm. In: Proceedings of Sixth Int. Conf. on Data Engineering, February 5-9, pp. 430–437 (1990)

    Google Scholar 

  3. Pillai, M., Lauria, M.: CSAR: Cluster Storage with Adaptive Redundancy. In: ICPP 2003, Kaohsiung, Taiwan, ROC, October 2003, pp. 223–230 (2003)

    Google Scholar 

  4. Hwang, K., Jin, H., Ho, R.S.C.: Orthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing. IEEE Trans. on Parallel and Distributed Systems 13(1) (January 2002)

    Google Scholar 

  5. Trivedi, K.S.: SHARPE 2002: Symbolic Hierarchical Automated Reliability and Performance Evaluator. In: Proceedings of Int. Conf. on Dependable Systems and Networks, June 23-26, p. 544 (2002)

    Google Scholar 

  6. Mendiratta, V.B.: Reliability analysis of clustered computing systems. In: Proceedings of the 9th Int. Symp. on Software Reliability Engineering, November 1998, pp. 268–272 (1998)

    Google Scholar 

  7. Smirni, E., Reed, D.A.: Workload Characterization of Input/Output Intensive Parallel Applications. In: Marie, R., Plateau, B., Calzarossa, M.C., Rubino, G.J. (eds.) TOOLS 1997. LNCS, vol. 1245, pp. 169–180. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  8. Sun, H., Han, J.J., Levendel, H.: A generic availability model for clustered computing systems. In: Proceedings of Pacific Rim Int. Symposium on Dependable Computing, December 17-19, pp. 241–248 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cotroneo, D., Paolillo, G., Russo, S., Lauria, M. (2005). CSAR-2: A Case Study of Parallel File System Dependability Analysis. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_23

Download citation

  • DOI: https://doi.org/10.1007/11557654_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29031-5

  • Online ISBN: 978-3-540-32079-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics