Abstract
The challenges of current software-intensive systems, large-scale information and computing systems environments, which are highly dynamic, heterogeneous, and unpredictable, have motivated the development of techniques that enhance these systems with autonomous behaviors. Even though different concerns about these systems have been deeply studied, their design is still considerably more challenging than traditional ones. Self-healing is one of the main features that characterize autonomic computing systems. Failure detection, recovery strategies, and reliability are of paramount importance to ensure continuous operation and correct functioning even in the presence of a given maximum amount of faulty components. Most existing research and implementations focus on architecture-specific solutions to introduce self-healing behaviors. This implies that users must tailor their software by taking into account architecture-specific fault tolerance features, which requires too much effort from developers and users. This paper proposes a distributed formal model for the specification, verification, and analysis of self-healing behaviors in autonomous systems, from failure-detection to self-recovery. Such a high-level model allows users to specify and apply the desired type of failure detection and recovery without requiring any knowledge about its implementation. Our model allows not only formal verification of different properties but also performance evaluation. We provide the verification of qualitative properties using state-space exploration tools, and quantitative properties are also validated through statistical model-checking. All these properties are preserved in actual implementation by ensuring that the deployed code is consistent with the validated model.

















Similar content being viewed by others
References
Oreizy P, Medvidovic N, Taylor RN (1998) Architecture-based runtime software evolution. In: Proceedings of the 20th International Conference on Software Engineering, IEEE, pp 177–186
Hölzl M, Rauschmayer A, Wirsing M (2008) Engineering of software-intensive systems: state of the art and research challenges. Software-Intensive Systems and New Computing Paradigms. Springer, New York, pp 1–44
Oquendo F (2016) Software architecture challenges and emerging research in software-intensive systems-of-systems. European Conference on Software Architecture. Springer, New York, pp 3–21
Gerostathopoulos I, Bures T, Hnetynka P, Keznikl J, Kit M, Plasil F, Plouzeau N (2016) Self-adaptation in software-intensive cyber-physical systems: from system goals to architecture configurations. J Syst Softw 122:378–397
Wang H, Zhong D, Zhao T (2019) Avionics system failure analysis and verification based on model checking. Eng Fail Anal 105:373–385
Pelliccione P, Tivoli M, Bucchiarone A, Polini A (2008) An architectural approach to the correct and automatic assembly of evolving component-based systems. J Syst Softw 81(12):2237–2251
Guarro S, Yau MK, Ozguner U, Aldemir T, Kurt A, Hejase M, Knudson M (2017) Formal framework and models for validation and verification of software-intensive aerospace systems. In: AIAA Information Systems-AIAA Infotech@ Aerospace, p 0418
Salvador R, Otero A, Mora J, de la Torre E, Sekanina L, Riesgo T (2011) Fault tolerance analysis and self-healing strategy of autonomous, evolvable hardware systems. In: Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs, IEEE, pp. 164–169
Pierce WH (2014) Failure-Tolerant Computer Design. Academic Press, New York
Stengel RF (1991) Intelligent failure-tolerant control. IEEE Control Syst Mag 11(4):14–23
Schneider M (1993) Self-stabilization. ACM Comput Surv (CSUR) 25(1):45–67
Kochte MA, Wunderlich H (2018) Self-test and diagnosis for self-aware systems. IEEE Design Test 35(5):7–18
Basu A, Bensalem S, Bozga M, Combaz J, Jaber M, Nguyen T, Sifakis J (2011) Rigorous component-based system design using the BIP framework. IEEE Softw 28(3):41–48
Nouri A, Mediouni BL, Bozga M, Combaz J, Bensalem S, Legay A (2018) Performance evaluation of stochastic real-time systems with the SBIP framework. IJCCBS 8(3/4):340–370
Nouri A, Bensalem S, Bozga M, Delahaye B, Jégourel C, Legay A (2015) Statistical model checking QoS properties of systems with SBIP. STTT 17(2):171–185
McGann C, Py F, Rajan K, Thomas H, Henthorn R, McEwen RS (2008) A deliberative architecture for AUV control. In: Proceedings of the 2008 IEEE International Conference on Robotics and Automation, ICRA, IEEE, pp 1049–1054
Psaier H, Dustdar S (2011) A survey on self-healing systems: approaches and systems. Computing 91(1):43–73
Pereira EG, Pereira R, Taleb-Bendiab A (2005) Performance evaluation for self-healing distributed services. In: Proceedings of the 11th International Conference on Parallel and Distributed Systems, ICPADS, pp 135–139
McMinn P (2004) Search-based software test data generation: a survey. Softw Test Verif Reliab 14(2):105–156
Briand L, Nejati S, Sabetzadeh M, Bianculli D (2016) Testing the untestable: model testing of complex software-intensive systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp 789–792
Deonandan I, Valerdi R, Lane JA, Macias F (2010) Cost and risk considerations for test and evaluation of unmanned and autonomous systems of systems. In: Proceedings of the 2010 5th International Conference on System of Systems Engineering, IEEE, pp 1–6
Krishna CM (2014) Fault-tolerant scheduling in homogeneous real-time systems. ACM Comput Surv (CSUR) 46(4):1–34
Devaraj R, Sarkar A, Biswas S (2017) Fault-tolerant preemptive aperiodic RT scheduling by supervisory control of TDES on multiprocessors. ACM Trans Embed Comput Syst (TECS) 16(3):1–25
Devaraj R, Sarkar A Resource-optimal fault-tolerant scheduler design for task graphs using supervisory control. IEEE Trans Ind Inform
Ye L, Lin LZ (2010) Study of superconducting fault current limiters for system integration of wind farms. IEEE Trans Appl Supercond 20(3):1233–1237
Azad SP, Niazmand B, Janson K, George N, Oyeniran AS, Putkaradze T, Kaur A, Raik J, Jervan G, Ubar R (2017) From online fault detection to fault management in network-on-chips: a ground-up approach. In: IEEE 20th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS). IEEE 2017, pp 48–53
Hu J, Bhowmick P, Jang I, Arvin F, Lanzon A A decentralized cluster formation containment framework for multirobot systems. IEEE Trans Robot
Filippidis I, Dimarogonas DV, Kyriakopoulos KJ (2012) Decentralized multi-agent control from local LTL specifications. In: Proceedings of the 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), IEEE, pp 6235–6240
Weyns D, Iftikhar MU, de la Iglesia DG, Ahmad T (2012) A survey of formal methods in self-adaptive systems. In: Fifth International C* Conference on Computer Science and Software Engineering, C3S2E ’12, pp 67–79
Iftikhar MU, Weyns D (2012) A case study on formal verification of self-adaptive behaviors in a decentralized system. In: Proceedings 11th International Workshop on Foundations of Coordination Languages and Self Adaptation, FOCLASA, pp 45–62
Güdemann M, Ortmeier F, Reif W (2006) Safety and dependability analysis of self-adaptive systems. In: Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (isola 2006), IEEE, pp 177–184
Mian NA, Ahmad F (2018) Agent based architecture for modeling and analysis of self adaptive systems using formal methods. Int J Adv Comput Sci Appl 9(1):563–567
Salehie M, Tahvildari L (2009) Self-adaptive software: landscape and research challenges. ACM Trans Auton Adapt Syst (TAAS) 4(2):1–42
Dashofy EM, Van der Hoek A, Taylor RN (2002) Towards architecture-based self-healing systems. In: Proceedings of the First Workshop on Self-Healing Systems, pp 21–26
Garlan D, Schmerl B (2002) Model-based adaptation for self-healing systems. In: Proceedings of the First Workshop on Self-Healing Systems, pp 27–32
Oreizy P, Gorlick MM, Taylor RN, Heimhigner D, Johnson G, Medvidovic N, Quilici A, Rosenblum DS, Wolf AL (1999) An architecture-based approach to self-adaptive software. IEEE Intell Syst Appl 14(3):54–62
Putze F, Ihrig T, Schultz T, Stuerzlinger W (2020) Platform for studying self-repairing auto-corrections in mobile text entry based on brain activity, gaze, and context. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp 1–13
Oquendo F (2016) Formally describing the architectural behavior of software-intensive systems-of-systems with sosadl. In: Proceedings of the 21st International Conference on Engineering of Complex Computer Systems (ICECCS), IEEE, pp 13–22
Ben-Rayana S, Bozga M, Bensalem S, Combaz J (2016) Rtd-finder: A tool for compositional verification of real-time component-based systems. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Springer, pp 394–406
Gurunathan A, Viswanatham VM (2017) Autonomic performance enhancement environment for websphere application server. Int J Pure Appl Math 116(23):719–731
Simmons R, Pecheur C, Srinivasan G (2000) Towards automatic verification of autonomous systems. In: Proceedings of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113), Vol. 2, pp 1410–1415
Ehrig H, Ermel C, Runge O, Bucchiarone A, Pelliccione P (2010) Formal analysis and verification of self-healing systems. In: International Conference on Fundamental Approaches to Software Engineering, Springer, pp 139–153
Basu A, Bozga M, Sifakis J (2006) Modeling heterogeneous real-time components in bip. In: Fourth IEEE International Conference on Software Engineering and Formal Methods (SEFM’06), IEEE, pp 3–12
Mediouni BL, Nouri A, Bozga M, Dellabani M, Legay A, Bensalem S (2018) S BIP 2.0: Statistical model checking stochastic real-time systems. In: International Symposium on Automated Technology for Verification and Analysis, Springer, pp 536–542
Bliudze S, Sifakis J (2008) The algebra of connectors: structuring interaction in BIP. IEEE Trans Comput 57(10):1315–1330
Park T, Byun I, Kim H, Yeom HY (2002) The performance of checkpointing and replication schemes for fault tolerant mobile agent systems. In: Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems, 2002. IEEE, pp 256–261
Glass M, Lukasiewycz M, Streichert T, Haubelt C, Teich J (2007) Reliability-aware system synthesis, design. Automation Test in Europe Conference Exhibition pp 1–6
Ben-Hafaiedh I, Graf S, Quinton S (2011) Building distributed controllers for systems with priorities. J Log Algeb Prog 80(3–5):194–218
Köhler A, Bertsche B (2021) Cyclisation of safety diagnoses: influence on the evaluation of fault metrics. In: Annual Reliability and Maintainability Symposium (RAMS). IEEE pp 1–7
Fleury S, Herrb M, Chatila R (1997) G\(^{\text{en}}\)om: a tool for the specification and the implementation of operating modules in a distributed robot architecture. In: Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS, IEEE, 1997, pp 842–849
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hafaiedh, I.B., Slimane, M.B. A distributed formal-based model for self-healing behaviors in autonomous systems: from failure detection to self-recovery. J Supercomput 78, 18725–18753 (2022). https://doi.org/10.1007/s11227-022-04614-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04614-0