Abstract
Ensuring that a system meets its prescribed specification is a growing challenge that confronts software developers and system engineers. Meeting this challenge is particularly important for distributed systems with strict dependability and timeliness constraints. This paper presents a technique, called script-driven probing and fault injection, for the evaluation and validation of dependable protocols. The proposed approach can be used to demonstrate three aspects of a target protocol: i) detection of design or implementation errors, ii) identification of violations of protocol specifications, and iii) insight into design decisions made by the implementors. To demonstrate the capabilities of this technique, the paper briefly describes a probing and fault injection tool, called the PFI tool, and several experiments on two protocols: the Transmission Control Protocol (TCP) [4, 24] and the Group Membership Protocol (GMP) [19]. The tool can be used to delay, drop, reorder, duplicate, and modify messages. It can also introduce new messages into the system to probe participants. In the case of TCP, we used the PFI tool to duplicate the experiments reported in [7] on several TCP implementations without access to the vendors' TCP source code in a very short time. We also ran several new experiments that are difficult to perform using past approaches based on packet monitoring and filtering. In the case of GMP, we used the tool to test the fault-tolerance capabilities of an implementation under various failure models including daemon/link crash, send/receive omissions, and timing failures. Furthermore, by selective reordering of messages and spontaneous transmission of new messages, we were able to guide a distributed computation into hard to reach global states without instrumenting the protocol implementation.
This work is supported in part by a research grant from the U.S. Office of Naval Research.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
J. Arlat, Y. Crouzet, and J.-C. Laprie. Fault injection for dependability validation of fault-tolerant computing systems. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 348–355, June 1989.
Jean Arlat, Martine Aguera, Yves Crouzet, Jean-Charles Fabre, Eliane Martins, and David Powell. Experimental evaluation of the fault tolerance of an atomic multicast system. IEEE Trans. Reliability, 39(4):455–467, October 1990.
D. Avresky, J. Arlat, J.C. Laprie, and Yves Crouzet. Fault injection for the formal testing of fault tolerance. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 345–354. IEEE, 1992.
R. Braden. RFC-1122: Requirements for internet hosts. Request for Comments, October 1989. Network Information Center.
R. Chillarege and N. S. Bowen. Understanding large system failures — a fault injection experiment. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 356–363, June 1989.
G. Choi, R. Iyer, and V. Carreno. Simulated fault injection: A methodology to evaluate fault tolerant microprocessor architectures. IEEE Trans. Reliability, 39(4):486–490, October 1990.
Douglas E. Comer and John C. Lin. Probing TCP implementations. In Proc. Summer USENIX Conference, June 1994.
F. Cristian. Reaching agreement on processor-group membership in synchronous distributed systems. Distributed Computing, (4):175–187, 1991.
E. Czeck and D. Siewiorek. Effects of transient gate-level faults on program behaviour. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 236–243. IEEE, 1990.
Scott Dawson and Farnam Jahanian. Probing and fault injection of protocol implementations. Technical Report CSE-TR-217-94, The University of Michigan, October 1994.
K. Echtle and Y. Chen. Evaluation of deterministic fault injection for faulttolerant protocol testing. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 418–425. IEEE, 1991.
Klaus Echtle and Martin Leu. The EFA fault injector for fault-tolerant distributed system testing. In Workshop on Fault-Tolerant Parallel and Distributed Systems, pages 28–35. IEEE, 1992.
G. Finelli. Characterization of fault recovery through fault injection on ftmp. IEEE Trans. Reliability, 36(2):164–170, June 1987.
K. Goswami and R. Iyer. Simulation of software behaviour under hardware faults. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 218–227. IEEE, 1993.
Vassos Hadzilacos and Sam Toueg. Fault-tolerant broadcasts and related problems. In Sape Mullender, editor, Distributed Systems. Addison Wesley, 1993. Second Edition.
Seungjae Han, Harold A. Rosenberg, and Kang G. Shin. DOCTOR: An IntegrateD sOftware fault injeCTOn enviRonment. Technical Report CSE-TR-192-93, The University of Michigan, December 1993.
Norman C. Hutchinson and Larry L. Peterson. The x-Kernel: An architecture for implementing network protocols. IEEE Trans. Software Engineering, 17(1):1–13, January 1991.
David B. Ingham and Graham D. Parrington. Delayline: A wide-area network emulation tool. Computing Systems, 7(3):313–332, Summer 1994.
Farnam Jahanian, Ragunathan Rajkumar, and Sameh Fakhouri. Processor group membership protocols: Specification, design and implementation. In Proceedings of the 12th Symposium on Reliable Distributed Systems, pages 2–11, Princeton, New Jersey, October 1993.
G.A Kanawati, N.A. Kanawati, and J.A. Abraham. FERRARI: A tool for the validation of system dependability properties. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 336–344. IEEE, 1992.
Steven McCanne and Van Jacobson. The bsd packet filter: A new architecture for user-level packet capture. In Winter USENIX Conference, pages 259–269, January 1993.
Shivakant Mishra, Larry L. Peterson, and Richard D. Schlichtung. A membership protocol based on partial order. In Second Working Conference on Dependable Computing for Critical Applications, February 1990.
J. Mogul, R. Rashid, and M. Accetta. The packet filter: An efficient mechanism for user-level network code. In Proc. ACM Symp. on Operating Systems Principles, pages 39–51, Austin, TX, November 1987. ACM.
Jon Postel. RFC-793: Transmission control protocol. Request for Comments, September 1981. Network Information Center.
A. M. Ricciardi and K. P. Birman. Using process groups to implement failure detection in asynchronous environments. In Proceedings of the 11th ACM Symposium on Principles of Distributed Computing, Montreal, Quebec, August 1991.
Z. Segall et al. Fiat — fault injection based automated testing environment. In FTCS-18, pages 102–107, 1988.
K. G. Shin and Y. H. Lee. Measurement and application of fault latency. IEEE Trans. Computers, C-35(4):370–375, April 1986.
Masanobu Yuhara, Brian N. Bershad, Chris Maeda, and J. Eliot B. Moss. Efficient packet demultiplexing for multiple endpoints and large messages. In Winter USENIX Conference, January 1994. Second Edition.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dawson, S., Jahanian, F. (1995). Deterministic fault injection of distributed systems. In: Birman, K.P., Mattern, F., Schiper, A. (eds) Theory and Practice in Distributed Systems. Lecture Notes in Computer Science, vol 938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60042-6_13
Download citation
DOI: https://doi.org/10.1007/3-540-60042-6_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60042-8
Online ISBN: 978-3-540-49409-6
eBook Packages: Springer Book Archive