Abstract
This paper identifies key problem areas for the fault-tolerant community to address. Changes in technology, expectation of society, and needs of the market pressure the design point for fault-tolerance in their own special manner. A developer, who has only a finite set of resources and limited time, responds to these pressures with a set of priorities. I believe that the top five challenges, which ultimately drive the exploitation of fault-tolerant technology are: (1) Shipping a product on schedule, (2) Reducing unavailability, (3) Non-disruptive change management, (4) Human fault-tolerance, (5) All over again in the distributed world. Each of these are discussed to explore their influence on the choice for fault-tolerance. Understanding them is key to guide research investment and maximize its derivatives.
This paper represents a personal view of the author and should not be interpreted as an official position of the IBM Corporation, either stated or implied.
Preview
Unable to display preview. Download preview PDF.
References
J. Bozman, “Identifies the sources as intl. data Corp.,” Computerworld, pp. 75–78, Mar 30 1992.
J. J. Stiffler, “Panel: On establishing fault tolerance objectives,” The 21st Intl. Symposium on Fault-Tolerant Computing, June 1991.
IEEE Intl. Workshop on Fault and Error Models. Palm Beach, FL, January 1993.
D. Siewiorek and R. Swarz, Reliable Computer Systems. Digital Press, 1992.
J. Gray, “A census of tandem system availability between 1985 and 1990,” IEEE Transactions on Reliability, vol. 39, October 1990.
M. Sullivan and R. Chillarege, “Software defects and their impact on system availability — a study of field failures in operating systems,” The 21st Intl. Symposium on Fault-tolerant Computing, pp. 2–9, June 1991.
J. F. Isenberg, “Panel: Evolving systems for continuous availaibility,” The 21st Intl. Symposium on Fault-Tolerant Computing, June 1991.
IMS/VS Extended Recovery Facility: Technical Reference. IBM GC24-3153, 1987.
D. Gupta and P. Jalote, “Increasing system availaibility through on-line software version change,” The 23rd Intl. Symposium on Fault-Tolerant Computing, June 1993.
R. Chillarege, B. K. Ray, A. W. Garrigan, and D. Ruth, “Estimating the recreate problemm in software failures,” The 4th Intl. Symposium on Software Reliability Engineering, November 1993.
L. Spainhover, J. Isenberg, R. Chillarege, and J. Berding, “Design for fault-tolerance in system es/9000 model 900,” The 22nd Intl. Symposium on Fault-Tolerant Computing, pp. 38–47, July 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chillarege, R. (1994). Top five challenges facing the practice of fault-tolerance. In: Banâtre, M., Lee, P.A. (eds) Hardware and Software Architectures for Fault Tolerance. Fault Tolerance 1993. Lecture Notes in Computer Science, vol 774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020018
Download citation
DOI: https://doi.org/10.1007/BFb0020018
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57767-6
Online ISBN: 978-3-540-48330-4
eBook Packages: Springer Book Archive