Skip to main content

A Framework for Reconfiguration-Based Fault-Tolerance in Distributed Systems

  • Conference paper
Architecting Dependable Systems II

Abstract

Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to be able to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system’s reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Garlan, D., Cheng, S.W., Schmerl, B.: Increasing System Dependability through Architecture-based Self-repair. In: de Lemos, R., Gacek, C., Romanovsky, A. (eds.) Architecting Dependable Systems. LNCS, vol. 2677, Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Knight, J.C., Heimbigner, D., Wolf, A.L., Carzaniga, A., Hill, J., Devanbu, P., Gertz, M.: The Willow Architecture: Comprehensive Survivability for Large-Scale Distributed Applications. In: International Conference of Dependable Computer and Systems (DSN 2002), Washington DC (2002)

    Google Scholar 

  3. Kramer, J., Magee, J.: Dynamic Configuration of Distributed System. IEEE Transaction of Software Engineering SE, 424–436 (1985)

    Article  Google Scholar 

  4. Kramer, J., Magee, J.: The Evolving Philosophers Problem: Dynamic Change Management. IEEE Transactions on Software Engineering 16, 1293–1306 (1990)

    Article  Google Scholar 

  5. Young, A.J., Magee, J.N.: A Flexible Approach to Evolution of Reconfigurable Systems. In: Proc. of IEE/IFIP Int. Workshop on Configurable Distributed Systems (1992)

    Google Scholar 

  6. Magee, J.: Configuration of Distributed Systems. In: Sloman, M. (ed.) Network and Distributed Systems Management, Addison-Wesley, Reading (1994)

    Google Scholar 

  7. Kramer, J., Magee, J.: Analysing Dynamic Change in Software Architectures: A Case Study. In: Proc. 4th Int. Conf. on Configurable Distributed Architecture, pp. 91–100 (1998)

    Google Scholar 

  8. Wermelinger, M.: Towards a Chemical Model for Software Architecture Reconfiguration. In: Proceedings of the 4th International Conference on Configurable Distributed Systems (1998)

    Google Scholar 

  9. Castaldi, M., De Angelis, G., Inverardi, P.: A Reconfiguration Language for Remote Analysis and Application Adaptation. In: Orso, A., Porter, A. (eds.) Proceedings of Remote Analysis and Measurement of Software Systems, pp. 35–38 (2003)

    Google Scholar 

  10. Castaldi, M., Carzaniga, A., Inverardi, P., Wolf, A.: A Light-weight Infrastructure for Reconfiguring Applications. In: Westfechtel, B., van der Hoek, A. (eds.) SCM 2001 and SCM 2003. LNCS, vol. 2649, pp. 231–244. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Castaldi, M., Costantini, S., Gentile, S., Tocchio, A.: A Logic-based Infrastructure for Reconfiguring Applications. Technical report, University of L’Aquila, Department of Computer Science, To appear in LNAI, Springer (2003)

    Google Scholar 

  12. Rose, M.T.: The Simple Book: An Introduction to Networking Management. Prentice-Hall, Englewood Cliffs (1996)

    Google Scholar 

  13. Castaldi, M., Ryan, N.D.: Supporting Component-based Development by Enriching the Traditional API. In: Proceedings of Net.Object Days 2002 - Workshop on Generative and Component-based Software Engineering, Erfurt, Germany, pp. 44–48 (2002)

    Google Scholar 

  14. Huang, Y., Kintala, C., Kollettis, N.: Software rejuvenation: Analysis, module and applications. In: Proc. of 25th Int. Symposium on Fault-Tolerance Computing (FTCS-25), Pasadena, CA, USA (June 1995)

    Google Scholar 

  15. Petty, M.D., Weisel, E.W.: A Composability Lexicon. In: Proceedings of the Spring 2003 Simulation Interoperability Workshop, Orlando FL, USA (2003)

    Google Scholar 

  16. Betous-Almeida, C., Kanoun, K.: Stepwise Construction and Refinement of Dependability Models. In: IEEE International Conference on Dependable Systems and Networks, Washington D.C, USA (2002)

    Google Scholar 

  17. Siewiorek, D.P., Swarz, R.S.: Reliable Computer System - Design and Evaluation, 3rd edn. Digital Press (2001)

    Google Scholar 

  18. Chohra, A., Porcarelli, S., Di Giandomenico, F., Bondavalli, A.: Towards Optimal Database Maintenance in Wireless Communication System. In: 5th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI 2001), Orlando, Florida (2001)

    Google Scholar 

  19. Powell, D.: Failure Mode Assumptions and Assumption Coverage. In: Laprie, J., Randell, B., Kopetz, H., Littlewood, B. (eds.) Predictably Dependable Computing Systems, pp. 3–24. Springer, Heidelberg (1995)

    Google Scholar 

  20. Bondavalli, A., Mura, I., Chiaradonna, S., Filippini, R., Poli, S., Sandrini, F.: DEEM: a Tool for the Dependability Modeling and Evaluation of Multiple Phased Systems. In: Proc. of Dependable Systems and Networks, New York, USA (2000)

    Google Scholar 

  21. Marsan, M.A., Chiola, G.: On Petri Nets with Deterministic and Exponentially Distribuited Firing Times. In: Rozenberg, G. (ed.) APN 1987. LNCS, vol. 266, pp. 132–145. Springer, Heidelberg (1987)

    Google Scholar 

  22. Muppala, A.K., Ciardo, G., Trivedi, K.S.: Stochastic reward nets for reliability prediction. Communications in Reliability, Maintenability and Serviceability 1, 9–20 (1994)

    Google Scholar 

  23. Garlan, D., Schmerl, B., Chang, J.: Using Gauges for Architecture-Based Monitoring and Adaptation. In: Proceedings of Working Conference on Complex and Dynamic Systems Architecture, Brisbane, Australia (2001)

    Google Scholar 

  24. Garlan, D., Monroe, R., Wile, D.: Acme: Architectural Description of Component- Based Systems. In: Leavens, G.T., Sitaraman, M. (eds.) Foundations of Component- Based Systems, pp. 47–68. Cambridge University Press, Cambridge (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Porcarelli, S., Castaldi, M., Di Giandomenico, F., Bondavalli, A., Inverardi, P. (2004). A Framework for Reconfiguration-Based Fault-Tolerance in Distributed Systems. In: de Lemos, R., Gacek, C., Romanovsky, A. (eds) Architecting Dependable Systems II. Lecture Notes in Computer Science, vol 3069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25939-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25939-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23168-4

  • Online ISBN: 978-3-540-25939-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics