Skip to main content
Log in

The TTA's Approach to Resilience after Transient Upsets

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

The Time-Triggered Architecture, as architecture for safety-critical real-time applications, incorporates fault-tolerance mechanisms to ensure correct system operation despite failures. The primary fault hypothesis of the TTA claims to tolerate either the arbitrary failure of any one of its nodes or the passively arbitrary failure of any one of its communication channels. To cover these failure modes, active redundancy techniques are used, which basically means that nodes and channels are physically replicated. The primary fault hypothesis, is, however, not strong enough for certain applications that have to tolerate transient upsets of multiple, possibly all, components in the system. Such a transient upset of the system may break up the synchrony of the nodes and leave disjoined sets of nodes synchronized to each other while the overall synchronization is lost. Although the TTA provides a clique avoidance algorithm that is able to correct a wide class of such multiple transient failures, a stronger algorithm is needed for full coverage. In this paper we discuss a secondary fault hypothesis for the TTA that addresses the transient upset of multiple components and present a new clique resolving algorithm based on the TTA's integrated diagnosis and startup service.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arora, A. and Gouda, M. 1994. Distributed reset. In IEEE Transaction on Computers, IEEE, pp. 1026–1038.

  • Arora, A. and Kulkarni, S. S. 1998. Detectors and correctors: A theory of fault-tolerance components. In Proc. of the 18th International Conference on Distributed Computing Systems, IEEE.

  • Bauer, G., Kopetz, H. and Steiner, W. 2003. The central guardian approach to enforce fault isolation in a time-triggered system. In Proc. of ISADS, IEEE, pp. 37–44.

  • Bauer, G. and Paulitsch, M. 2000. An investigation of membership and clique avoidance in TTP/C. In Proc. of SRDS.

  • Bouajjani, A. and Merceron, A. 2002. Parametric verification of a group membership algorithm. In Proceedings of the 7th International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. Lecture Notes in Computer Science; Vol. 2469, Oldenburg, Germany, Springer-Verlag, pp. 311–330.

  • Constantinescu, C. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23(4):14–19.

    Google Scholar 

  • Dijkstra, E. W. 1974. Self-stabilizing systems in spite of distributed control. Communications of the ACM. 17(11):643–644.

    Article  MATH  Google Scholar 

  • Hall, B., Driscoll, K., Paulitsch, M. and Dajani-Brown, S. 2005. Ringing out fault tolerance. A new ring network for superior low-cost dependability, in International Conference on Dependable Systems and Networks (DSN'05), pp. 298–307.

  • Heiner, G. and Thurner, T. 1998. Time-triggered architectrue for safety-related distributed real-time systems in transportation systems. In Proceedings of the 28th Annual Symposium on Fault-Tolerant Computing, FTCS, IEEE, pp. 402–407.

  • Kopetz, H. and Bauer, G. 2003. The time-triggered architecture. In Proceedings of the IEEE Special Issue on Modeling and Design of Embedded Software.

  • Kopetz, H. and Ochsenreiter, W. 1987. Clock synchronization in distributed real-time systems. IEEE Transactions on Computers C-36(8):933–940.

    Google Scholar 

  • Kopetz, H. 1997. Real-Time Systems. Kluwer Academic Publishers.

  • Kopetz, H. Paulitsch, M., Jones, C., Killijian, M.-O., Marsden, E., Moffat, N., Powell, D., Randell, B., Romanovsky, A. and Stroud, R. 2001. Revised version of DSoS conceptual model. Project Deliverable for DSoS (Dependable Systems of Systems), Research Report 35/2001, Technische Universität Wien, Institut für Technische Informatik, Treitlstr. 1-3/182-1, 1040 Vienna, Austria, 2001.

  • Normand, E. 1996. Single event upset at ground level. IEEE Transactions on Nuclear Science 43:2742–2750.

    Google Scholar 

  • O'Gorman, T. J. 1994. The effect of cosmic rays on the soft error rate of a DRAM at ground level. IEEE Transactions on Electron Devices 41:553–557.

    Article  Google Scholar 

  • Paulitsch, M., Morris, J., Hall, B., Driscoll, K., Latronico, E. and Koopman, P. 2005. Coverage and the use of cyclic redundancy codes in ultra-dependable systems, in International Conference on Dependable Systems and Networks (DSN'05), pp. 346–355.

  • Pfeifer, H. 2000. Formal verification of the TTP group membership algorithm. In T. Bolognesi and D. Latella (editors), Formal Methods for Distributed System Development Proceedings of FORTE XIII/PSTV XX 2000, Pisa, Italy, Kluwer Academic Publishers, pp. 3–18.

  • Pauli, B. and Meyna, A. 1998. Reliability of electronic control units in motor vehicles. SAE Technical Paper Series.

  • Rushby, J. 2001. A Comparison of bus architectures for safety-critical embedded systems. CSL Technical Report, SRI International, Menlo Park, CA 94025, USA.

  • Rushby, J. 2002. An overview of formal verification for the time-triggered architecture. In W. Damm and E.-R. Olderog (editors), Formal Techniques in Real-Time and Fault-Tolerant Systems, volume 2469 of Lecture Notes in Computer Science, Oldenburg, Germany: Springer-Verlag, pp. 83–105.

  • Schneider, M. 1993. Self-stabilization. ACM Computing Surveys (CSUR) 25(1):45–67.

    Google Scholar 

  • Steiner, W., Paulitsch, M. and Kopetz, H. 2003. Multiple failure correction in the time-triggered architecture. In Proc. of 9th Workshop on Object-oriented Real-time Dependable Systems (WORDS 2003f).

  • Steiner, W., Rushby, J., Sorea, M. and Pfeifer, H. 2004. Model checking a fault-tolerant startup algorithm: From design exploration to exhaustive fault simulation. In The International Conference on Dependable Systems and Networks (DSN 2004).

  • Steiner, W. 2004. Startup and Recovery of Fault-Tolerant Time-Triggered Communication. PhD thesis, Technische Universität Wien, Institut für Technische Informatik, Treitlstr. 3/3/182-1, 1040 Vienna, Austria.

  • Temple, C. 1998. Avoiding the babbling-idiot failure in a time-triggered communication system. In Proceedings of 28th Annual International Symposium on Fault-Tolerant Computing, pp. 218–227.

  • Wilde, J., Wondrak, W. and Senske, W. 1999. Reliability requirements for microtechnologies used in automotive applications. In Proceedings of the Congress for Microsystems and Precision Engineering, MicroEngineering 99, Stuttgart, Germany, Stuttgarter Messe- und Kongressgesellschaft GmbH.

Download references

Author information

Authors and Affiliations

Authors

Additional information

This paper is a revised version of Steiner et al. (2003). This work has been funded by the European Project DECOS (Project number: IST-511764).

Michael Paulitsch is currently affiliated with Honeywell International.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Steiner, W., Paulitsch, M. & Kopetz, H. The TTA's Approach to Resilience after Transient Upsets. Real-Time Syst 32, 213–233 (2006). https://doi.org/10.1007/s11241-005-4681-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-005-4681-6

Keywords

Navigation