Skip to main content

Systematic and design diversity — Software techniques for hardware fault detection

  • Session 8: Software diversity
  • Conference paper
  • First Online:
Dependable Computing — EDCC-1 (EDCC 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 852))

Included in the following conference series:

  • 193 Accesses

Abstract

For the detection of hardware operational faults in most safe systems static redundancy is used. Thus, in the most simple case we have the well known Duplex System. If design fault detection is required, design diversity in the software has to be used, too. We suggest the combined utilization of so called systematic diversity and design diversity in a time-redundant system instead of the structural redundant Duplex System. For this purpose two diversly designed and systematically transformed variants of an application program are executed sequentially on the same processor. We call this new approach a Virtual Duplex System. In this paper we investigate the safety of a Virtual Duplex System.We propose the use of software diversity techniques (i.e. systematic diversity) to detect nearly all hardware faults in this system. Transient faults are effectively detected through the time redundancy and permanent faults by the new software diversity approach. In addition software design faults and even compiler-, library-, operating system- and underlying hardware design faults can be detected. The proposed software techniques are either new or never considered systematically for the detection of hardware faults in a general purpose system environment with design diversity.

As an example the new systematic diversity technique ‘simple register permutation’ was applied on different application programs by means of a simple heuristic. The technique was evaluated experimentally by injecting permanent hardware faults with the fault injection tool ProFI and measuring the safety of Virtual Duplex Systems. The results are compared to systems that do not use special fault detection (Simplex Systems) and Virtual Duplex Systems that use pure design diversity. The experiments show that even by simple systematic diversity most permanent hardware faults are detected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. J. A. Abraham, S. M. Thatte: Test Generation for microprocessors; IEEE Transactions on Computers, Vol. C-29, No. 6, Juni 1980, pp. 429–441.

    Google Scholar 

  2. Paul E. Ammann, John C. Knight: Data diversity: an approach to software fault tolerance; FTCS-17, conf. proc., IEEE, 1987, pp. 122–126.

    Google Scholar 

  3. Jean Arlat, Jean-Claude Laprie: On the dependability evaluation of high safety sytems; FTCS-15, conf. proc., IEEE, 1987, pp. 318–323.

    Google Scholar 

  4. Algirdas Avizienis: Fault-tolerance in computer systems; System reliability and integrity, Vol. 2, Infotech International, 1978, pp. 39–62

    Google Scholar 

  5. S. S. Brilliant, J. C. Knight, N. G. Leveson: The consistent comparison problem in n-version software; SIGSOFT software engineering notes, vol. 12, no. 1, acm, 1987, pp. 29–34.

    Google Scholar 

  6. Susan S. Brilliant, John C. Knight, Nancy G. Leveson: Analysis of Faults in an N-Version Software Experiment; IEEE Transactions on Software Engineering, Vol. 16, No. 2, February 1990, pp 238–247.

    Article  Google Scholar 

  7. Liming Chen, Algirdas Avizienis: N-Version Programming: A Fault-Tolerance Approach to the Reliability of Software Operations; FTCS-8, conf. proc., IEEE, 1978, pp. 3–9.

    Google Scholar 

  8. Edward W. Czeck, Daniel P. Siewiorek: Effects of Transient Gate-Level Faults on Program Behavior; FTCS-20, conf. proc., IEEE, 1990, pp. 236–243.

    Google Scholar 

  9. H. Dücker: Ergebnisvalidierung und nebenläufige Hardwarefehlererkennung mittels systematisch erzeugter Diversität; Verlä\liche Informationssysteme, Vieweg, Braunschw. 1993, pp. 135–162.

    Google Scholar 

  10. H. Dücker, K. Echtle: Detection of design faults by diverse software in layered systems; 4th European workshop on dependable computing, EWDC-4, Prague, 1992 (available on request from the author).

    Google Scholar 

  11. K. Echtle, B. Hinz, T. Nikolov: On hardware fault diagnosis by diverse software; Hardware and software fault tolerance in parallel computing systems, Ellis Horwood Chichester 92, pp.313–325.

    Google Scholar 

  12. W. Hahn, M. Gössel: Pseudoduplication of floating point addition — a method of compiler generated checking of permanent harware faults; Conf. Proc. 9th anual IEE VLSI test symposium, 1991.

    Google Scholar 

  13. J. P. Kelly; Current experiences with fault tolerant software design; dependability through diverse formal specifications?; 4. int. GI-ITG-GMA-Conf. “Fehlertolerierende Rechensysteme”, Informatik-Fachberichte 214, Springer-Verlag, Heidelberg, 1989, pp. 134–149.

    Google Scholar 

  14. R. Konakovsky; Verfahren der vollständigen Fehlererkennung durch gezielten Einsatz von Diversität; Proze\rechner 1988, Informatik-Fachberichte 167, Springer 1988, pp. 281–290.

    Google Scholar 

  15. H. Kopetz, H. Kantz, G. Grünsteidl, P. Puschner, J. Reisinger: Tolerating Transient Faults in MARS; FTCS-20, conf. proc., IEEE, 1990, pp. 466–473.

    Google Scholar 

  16. Jaynarayan H. Lala, Linda S. Alger: Hardware and Software fault tolerance: a unified architectural approach; FTCS-18, conf. proc., IEEE, 1988, pp. 240–245.

    Google Scholar 

  17. J.C. Laprie, J. Arlat, C. Beounes, K. Kanoun, C. Hourtolle: Hardware-and software-fault tolerance: definition and analysis of architectural solutions; FTCS-17, conf. proc., IEEE, 1987, pp. 116–121.

    Google Scholar 

  18. Richard J. Lipton: New Directions in Testing; DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 2, 1991.

    Google Scholar 

  19. Günther Leber, Herman Kopez: Preliminary Results of the Validation of the MARS system by EMI Fault Injection; IEEE International Workshop on Fault and Error Injection for Dependability Validation of Computer Systems, 17–18 Juni, Göteborg, Schweden.

    Google Scholar 

  20. T. Lovric, K. Echtle: ProFI: Prozessor fault injection for dependability validation; IEEE Int. Workshop on Fault and Error Injection for Dependability Validation of Computer Systems, June 17–18, Göteborg, Sweden (available on request from the author).

    Google Scholar 

  21. T. Lovric, K. Echtle; Hardware and Software Fault Tolerance using Fail-Silent Virtual Duplex Systems; 1994 IEEE Workshop on Fault Tolerant and Distributed Systems, 13–14 June 1994, Texas.

    Google Scholar 

  22. Tomislav Lovric: Erkennung permanenter Hardwarefehler durch Entwurfsdiversität und systematische Diversität im Virtuellen Duplex-System; Universität Dortmund, interner Bericht 502 (available on request from the author).

    Google Scholar 

  23. H. Madeira, F. Moreira, M. Rela, P. Furtado, G. J. Silva: Pin-Level Fault Injection for Dependability Validation: Some Research Results at the University of Coimbra; IEEE Int. Workshop on Fault and Error Injection for Dependability Validation of Computer Systems, 17–18 Juni, Göteborg, Schweden.

    Google Scholar 

  24. Victor P Nelson, Bill D. Caroll: Reliability Modeling and General Redundancy Techniques; Tutorial: Fault Tolerant Computing, Chapter 2, IEEE, 1987, pp. 45–67.

    Google Scholar 

  25. Choong Gun Oh, Hee Yong Youn, Vijay K. Raj: Rearranged Hamming Checksum for Matrix Computations with Algorithm-Based Fault Tolerance; 1992 IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, pp. 185–192.

    Google Scholar 

  26. Behrooz Parhami: Optimal Algorithms for Exact, Inexact, and Approval Voting; Fault-tolerant computing symposium FTCS-22, conf. proc., IEEE, 1992, pp. 404–411.

    Google Scholar 

  27. H. H. Patel, L. Y. Fung: Concurrent Error Detection in ALU's by Recomputing with Shifted Operands; IEEE Transactions on computers, C-31, 1982, pp. 589–595

    Google Scholar 

  28. H. H. Patel, L. Y. Fung: Concurrent Error Detection in Multiply and Divide Arrays; IEEE Transactions on computers, C-32, 1983, pp. 417–422

    Google Scholar 

  29. Ronit Rubinfeld: A Mathematical Theorie of Self-Checking, Self Testing and Self Correcting Programms; doctoral thesis, Univ. Calif. Berkley, 1990.

    Google Scholar 

  30. Keith Scott, James W. Gault, David F. McAllister: Fault-tolerant software reliability modeling; Transactions on software engineering, vol. SE-13, no. 1, IEEE, 1987, pp. 3–14.

    Google Scholar 

  31. Udo Voges: Software-Diversität und ihre Modellierung; Informatik Fachberichte 224, Springer '89.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Klaus Echtle Dieter Hammer David Powell

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lovric, T. (1994). Systematic and design diversity — Software techniques for hardware fault detection. In: Echtle, K., Hammer, D., Powell, D. (eds) Dependable Computing — EDCC-1. EDCC 1994. Lecture Notes in Computer Science, vol 852. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58426-9_138

Download citation

  • DOI: https://doi.org/10.1007/3-540-58426-9_138

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58426-1

  • Online ISBN: 978-3-540-48785-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics