Skip to main content

Model-Based Failure Management for Distributed Reactive Systems

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4888))

Abstract

Failure management is key to the development of safety-critical, distributed, reactive systems common in such applications as avionics, automotive, and sensor/actuator networks. Specific challenges to effective failure management include (i) developing an understanding of the application domain so as to define what constitutes a failure; (ii) disentangling failure management concepts at design and runtime; and (iii) detecting and mitigating failures at the level of systems-of-systems integration. In this paper, we address (i) and (ii) by developing a failure ontology for logical and deployment architectures, respectively, including a mapping between the two. This ontology is based on the interaction patterns (or services) defining the component interplay in a distributed system. We address (iii) by defining detectors and mitigators at the service/interaction level – we discuss how to derive detectors for a significant subset of the failure ontology directly from the interaction patterns. We demonstrate the utility of our techniques using a large scale oceanographic sensor/actuator network.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mellor, S., Clark, A., Futagami, T.: Special Issue on Model-Driven Development. In: IEEE Software, vol. 20(5), IEEE, Los Alamitos (2003)

    Google Scholar 

  2. W3C: Web services architecture (2004), http://www.w3.org/TR/2004/NOTE-ws-arch-20040211

  3. Arora, A., Kulkarni, S.S.: Component based design of multitolerant systems. IEEE Transactions on Software Engineering 24 (1998)

    Google Scholar 

  4. Ermagan, V., Krüger, I., Menarini, M., Mizutani, J.I., Oguchi, K., Weir, D.: Towards Model-Based Failure-Management for Automotive Software. In: Proceedings of the ICSE 2007 Workshop on Software Engineering for Automotive Systems (SEAS) (2007)

    Google Scholar 

  5. CI Conceptual Architecture Design Team: Orion cyberinfrastructure conceptual architecture, www.orionprogram.org/advisory/committees/ciarch/default.html

  6. ITU-TS: Recommendation Z.120: Message Sequence Chart (MSC) (2004)

    Google Scholar 

  7. Krüger, I.H.: Distributed System Design with Message Sequence Charts. PhD thesis, Technische Universität München (2000)

    Google Scholar 

  8. OMG: UML 2.1.1 Superstructure Specification. Number formal/07-02-03. OMG (2007)

    Google Scholar 

  9. Krüger, I.H.: Capturing Overlapping, Triggered, and Preemptive Collaborations Using MSCs. In: Pezzé, M. (ed.) FASE 2003. LNCS, vol. 2621, pp. 387–402. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Krüger, I.H., Mathew, R.: Systematic Development and Exploration of Service-Oriented Software Architectures. In: Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (WICSA), pp. 177–187. IEEE, Los Alamitos (2004)

    Chapter  Google Scholar 

  11. Leveson, N.G.: Safeware: system safety and computers. ACM Press, New York (1995)

    Google Scholar 

  12. Putman, J.: Architecting With Rm-Odp. Prentice-Hall, Englewood Cliffs (2000)

    Google Scholar 

  13. Gilliers, F., Kordon, F., Regep, D.: A Model Based Development Approach for Distributed Embedded Systems. In: Radical Innovations of Software and Systems Engineering in the Future, Springer, Heidelberg (2004)

    Google Scholar 

  14. Jackson, E.K., Sztipanovits, J.: Corrected through Construction: A Model-based Approach to Embedded Systems Reality. In: 13th Annual IEEE International Symposium and Workshop on Engineering of Computer Based Systems, IEEE, Los Alamitos (2006)

    Google Scholar 

  15. Krüger, I., Meisinger, M., Menarini, M.: Applying Service-Oriented Development to Complex System: a BART case study. In: Kordon, F., Sztipanovits, J. (eds.) Reliable Systems on Unreliable Networked Platforms. 12th Monterey Workshop 2005, Laguna Beach, CA, USA, September 22-24, 2005. LNCS, vol. 4322, Springer, Heidelberg (2007) Revised Selected Papers

    Chapter  Google Scholar 

  16. Ahluwalia, J., Krüger, I.H., Phillips, W., Meisinger, M.: Model-based run-time monitoring of end-to-end deadlines. In: EMSOFT 2005. 5th ACM international conference on Embedded Software, ACM Press, New York (2005)

    Google Scholar 

  17. Krüger, I.H., Mathew, R.: Component synthesis from service specifications. In: Leue, S., Systä, T.J. (eds.) Scenarios: Models, Transformations and Tools. LNCS, vol. 3466, Springer, Heidelberg (2005)

    Google Scholar 

  18. Krüger, I., Grosu, R., Scholz, P., Broy, M.: From mscs to statecharts. In: Rammig, F.J. (ed.) Distributed and Parallel Embedded Systems, pp. 61–71. Kluwer Academic Publishers, Dordrecht (1999)

    Chapter  Google Scholar 

  19. Finkbeiner, B., Krüger, I.: Using message sequence charts for component-based formal verification. In: Specification and Verification of Component Based Systems (SAVCBS), Iowa State University Workshop at OOPSLA (2001)

    Google Scholar 

  20. Back, R., von Wright, J.: Combining Angels, Demons and Miracles in Program Specifications. TCS 100(2), 365–383 (1992)

    Article  MATH  Google Scholar 

  21. Krüger, I., Meisinger, M., Menarini, M., Pasco, S.: Rapid Systems of Systems Integration - Combining an Architecture-Centric Approach with Enterprise Service Bus Infrastructure. In: 2006 IEEE International Conference on Information Reuse and Integration (IRI 2006), IEEE Systems, Man, and Cybernetics Society, pp. 51–56 (2006)

    Google Scholar 

  22. OMG: Fault Tolerant CORBA. vol. formal/04-03-21. OMG (2004)

    Google Scholar 

  23. Baldoni, R., Marchetti, C., Virgillito, A., Zito, F.: Failure management for ft-corba applications. In: WORDS 2001. Proceedings of the Sixth International Workshop on Object-Oriented Real-Time Dependable Systems, IEEE, Los Alamitos (2001)

    Google Scholar 

  24. Völzer, H.: Verifying fault tolerance of distributed algorithms formally – an example. In: Proceedings of the International Conference on Application of Concurrency to System Design, IEEE, Los Alamitos (1998)

    Google Scholar 

  25. Chen, L., Avizienis, A.: N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation. In: Proc. 8th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-8), pp. 3–9 (1978)

    Google Scholar 

  26. Giese, H., Henkler, S.: Architecture-driven platform independent deterministic replay for distributed hard real-time systems. In: Proceedings of the ISSTA 2006 workshop on Role of software architecture for testing and analysis, pp. 28–38 (2006)

    Google Scholar 

  27. Tichy, M., Schilling, D., Giese, H.: Design of self-managing dependable systems with UML and fault tolerance patterns. In: Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems, pp. 105–109 (2004)

    Google Scholar 

  28. Elbaum, S., Munson, J.: Software Black Box: an Alternative Mechanism for Failure Analysis. In: International Symposium on Software Reliability Engineering, pp. 365–376 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Fabrice Kordon Oleg Sokolsky

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ermagan, V., Krüger, I., Menarini, M. (2008). Model-Based Failure Management for Distributed Reactive Systems. In: Kordon, F., Sokolsky, O. (eds) Composition of Embedded Systems. Scientific and Industrial Issues. Monterey Workshop 2006. Lecture Notes in Computer Science, vol 4888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77419-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77419-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77418-1

  • Online ISBN: 978-3-540-77419-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics