Skip to main content

Top five challenges facing the practice of fault-tolerance

  • Field Experiences with Fault Tolerant Systems
  • Conference paper
  • First Online:
Hardware and Software Architectures for Fault Tolerance (Fault Tolerance 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 774))

Included in the following conference series:

Abstract

This paper identifies key problem areas for the fault-tolerant community to address. Changes in technology, expectation of society, and needs of the market pressure the design point for fault-tolerance in their own special manner. A developer, who has only a finite set of resources and limited time, responds to these pressures with a set of priorities. I believe that the top five challenges, which ultimately drive the exploitation of fault-tolerant technology are: (1) Shipping a product on schedule, (2) Reducing unavailability, (3) Non-disruptive change management, (4) Human fault-tolerance, (5) All over again in the distributed world. Each of these are discussed to explore their influence on the choice for fault-tolerance. Understanding them is key to guide research investment and maximize its derivatives.

This paper represents a personal view of the author and should not be interpreted as an official position of the IBM Corporation, either stated or implied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Bozman, “Identifies the sources as intl. data Corp.,” Computerworld, pp. 75–78, Mar 30 1992.

    Google Scholar 

  2. J. J. Stiffler, “Panel: On establishing fault tolerance objectives,” The 21st Intl. Symposium on Fault-Tolerant Computing, June 1991.

    Google Scholar 

  3. IEEE Intl. Workshop on Fault and Error Models. Palm Beach, FL, January 1993.

    Google Scholar 

  4. D. Siewiorek and R. Swarz, Reliable Computer Systems. Digital Press, 1992.

    Google Scholar 

  5. J. Gray, “A census of tandem system availability between 1985 and 1990,” IEEE Transactions on Reliability, vol. 39, October 1990.

    Google Scholar 

  6. M. Sullivan and R. Chillarege, “Software defects and their impact on system availability — a study of field failures in operating systems,” The 21st Intl. Symposium on Fault-tolerant Computing, pp. 2–9, June 1991.

    Google Scholar 

  7. J. F. Isenberg, “Panel: Evolving systems for continuous availaibility,” The 21st Intl. Symposium on Fault-Tolerant Computing, June 1991.

    Google Scholar 

  8. IMS/VS Extended Recovery Facility: Technical Reference. IBM GC24-3153, 1987.

    Google Scholar 

  9. D. Gupta and P. Jalote, “Increasing system availaibility through on-line software version change,” The 23rd Intl. Symposium on Fault-Tolerant Computing, June 1993.

    Google Scholar 

  10. R. Chillarege, B. K. Ray, A. W. Garrigan, and D. Ruth, “Estimating the recreate problemm in software failures,” The 4th Intl. Symposium on Software Reliability Engineering, November 1993.

    Google Scholar 

  11. L. Spainhover, J. Isenberg, R. Chillarege, and J. Berding, “Design for fault-tolerance in system es/9000 model 900,” The 22nd Intl. Symposium on Fault-Tolerant Computing, pp. 38–47, July 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michel Banâtre Peter A. Lee

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chillarege, R. (1994). Top five challenges facing the practice of fault-tolerance. In: Banâtre, M., Lee, P.A. (eds) Hardware and Software Architectures for Fault Tolerance. Fault Tolerance 1993. Lecture Notes in Computer Science, vol 774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020018

Download citation

  • DOI: https://doi.org/10.1007/BFb0020018

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57767-6

  • Online ISBN: 978-3-540-48330-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics