Skip to main content
Log in

Software health management: a necessity for safety critical systems

  • SI: SwHM
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

As software and software intensive systems are becoming increasingly ubiquitous, the impact of failures can be tremendous. In some industries such as aerospace, medical devices, or automotive, such failures can cost lives or endanger mission success. Software faults can arise due to the interaction between the software, the hardware, and the operating environment. Unanticipated environmental changes lead to software anomalies that may have significant impact on the overall success of the mission. Latent coding errors can at any time during system operation trigger faults despite the fact that usually a significant effort has been expended in verification and validation (V&V) of the software system. Nevertheless, it is becoming increasingly more apparent that pre-deployment V&V is not enough to guarantee that a complex software system meets all safety, security, and reliability requirements. Software Health Management (SWHM) is a new field that is concerned with the development of tools and technologies to enable automated detection, diagnosis, prediction, and mitigation of adverse events due to software anomalies, while the system is in operation. The prognostic capability of the SWHM to detect and diagnose failures before they happen will yield safer and more dependable systems for the future. This paper addresses the motivation, needs, and requirements of software health management as a new discipline and motivates the need for SWHM in safety critical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. In this article, we refer to the host system as the system, which is undergoing health management. The host system may comprise hardware, software, or a combination thereof.

  2. http://www.aerosocietychannel.com/aerospace-insight/2010/12/exclusive-qantas-qf32-flight-from-the-cockpit/.

References

  1. ADAC: Pannenstatistik (Wikipedia Germany) (2008). http://de.wikipedia.org/wiki/Pannenstatistik

  2. Adler M (2006) The planetary society blog: spirit Sol 18 Anomaly. http://www.planetary.org/blog/article/00000702/

  3. Andrews D (2011) Managing the bad day. NASA Acad Shar Knowl 44:5–10

    Google Scholar 

  4. Associates B (2009) Run-time verification and validation for safety-critical flight control systems. Air Force SBIR/STTR, AF04-246 https://www.afsbirsttr.com/Publications/Documents/Innovation-121109-BarronAssociates-AF04-246.pdf

  5. Barringer H, Falcone Y, Finkbeiner B, Havelund K, Lee I, Pace GJ, Rosu G, Sokolsky O, Tillmann N (eds) (2010) Runtime verification—first international conference, RV 2010, 2010. Proceedings, Lecture Notes in Computer Science, vol 6418. Springer, Berlin

  6. Barry M, Horvath G (2009) Goal-based flight software health management services (extended abstract). In: Karsai [30]. http://www.isis.vanderbilt.edu/workshops/smc-it-2009-shm

  7. Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New york

  8. Board NTS (1989) NTSB identification DCA97MA058, Korean Airlines LTD. http://www.ntsb.gov/ntsb/brief.asp?ev_id=20001213X31759&key=1

  9. Boehm B (2007) Software risk management: principles and practices. In: Selby RW (ed) Software engineering: Barry W. Boehm’s lifetime contributions to software. Wiley, London

  10. Chakarov A, Sankaranarayanan S, Fainekos GE (2012) Combining time and frequency domain specifications for periodic signals. In: Khurshid and Sen [32], pp 294–309

  11. Charette R (2009) This car runs on code. http://spectrum.ieee.org/green-tech/advanced-cars/this-car-runs-on-code

  12. Cherry S (2012) How stuxnet is rewriting the cyberterrorism playbook. IEEE Spectrum. http://spectrum.ieee.org/podcast/telecom/security/how-stuxnet-is-rewriting-the-cyberterrorism-playbook

  13. Codetta-Raiteri D, Portinale L, Guiotto A, Yushstein Y (2012) Evaluation of anomaly and failure scenarios involving an exploration rover: a Bayesian network approach. In: Proceedings of the 11th international symposium on artificial intelligence, robotics, and automation in space (iSAIRAS-2012)

  14. Darwiche A (2009) Modeling and reasoning with Bayesian networks. Cambridge University Press, Cambridge

  15. Degani A (2004) Taming HAL: designing interfaces beyond 2001. Palgrave Macmillan, New York

  16. Dong W, Leucker M, Schallhart C (2008) Impartial anticipations in runtime verification. In: 6th International symposium on automated technology for verification and analysis (ATVA’08), no. 5311 in LNCS. Springer, Berlin

  17. Dubey A, Karsai G, Kereskenyi R, Mahadevan M (2010) A real-time component framework: experience with CCM and ARINC-653. In: IEEE international symposium on object-oriented real-time, distributed computing

  18. F-22: F-22 Raptor stealthfighter (1992). http://www.f-22raptor.com/index_airframe.php1992

  19. FAA: Airworthiness directive 2005–18-51 (2005). http://rgl.faa.gov/Regulatory_and_Guidance_Library

  20. Filman RE, Elrad T, Clarke S, Aksit M (2004) Aspect-oriented software development. Addison-Wesley, Reading

  21. GlobalSecurity.org: F-22 Raptor (2004). http://www.globalsecurity.org/military/systems/aircraft/f-22-testfly.htm

  22. Goodlow A, Pike L (2009) Toward monitoring fault-tolerant embedded systems (extended abstract). In: Karsai [30]. http://www.isis.vanderbilt.edu/workshops/smc-it-2009-shm

  23. Greenwell WS, Knight JC (2003) What should aviation safety incidents teach us? Technical Report. University of Virginia

  24. Havelund K, Roşu G (2001) Monitoring Java programs with Java PathExplorer. In: Proceeding of the first workshop on runtime verification. Electronic notes in theoretical computer science, vol. 55(2). Elsevier, Amsterdam

  25. Iverson DL (2004) Inductive system health monitoring. In: Proceedings of the 2004 international conference on artificial intelligence (IC-AI’04), CSREA Press

  26. Jackson D, Thomas M, Millett LI (2007) Software for dependable systems: sufficient evidence? National Academy Press, Washington

  27. Jardine A, Lin D, Banjevic D (2006) A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech Syst Signal Process 20(7):1483–1510

    Article  Google Scholar 

  28. Jee E, Wang S, Kim JK, Lee J, Sokolsky O, Lee I (2010) A safety-assured development approach for real-time software. In: RTCSA. IEEE Computer Society, pp 133–142

  29. Johnson D (2007) Raptors arrive at Kadena. http://www.af.mil/news/story.asp?storyID=123041567

  30. Karsai G (ed) (2009) 1st international workshop on software health management (SHM 2009). ISIS, Vanderbilt University. http://www.isis.vanderbilt.edu/workshops/smc-it-2009-shm

  31. Karsai G (ed) (2011) 2nd international workshop on software health management (SHM 2011). ISIS, Vanderbilt University. http://www.isis.vanderbilt.edu/workshops/smc-it-2011-shm

  32. Khurshid S, Sen K (eds) (2012) Runtime verification—second international conference, RV 2011, San Francisco, September 27–30, 2011. Revised selected papers, Lecture Notes in Computer Science, vol 7186. Springer, Berlin

  33. Kurtoglu T, Lutz R, Patterson-Hine A (2009) Using auto-generated diagnostic trees for optimized fault handling (extended abstract). In: Karsai [30]. http://www.isis.vanderbilt.edu/workshops/smc-it-2009-shm

  34. Leveson N (1995) Safeware system safety and computers. Addison-Wesley, Reading

  35. Leveson N, Turner CS (1993) An investigation of the Therac-25 accidents. IEEE Comput 26(1):18–41

    Article  Google Scholar 

  36. Lindsey AE, Pecheur C (2004) Simulation-based verification of autonomous controllers via Livingstone Pathfinder. In: Jensen K, Podelski A (eds) Proceedings TACAS 2004, Lecture Notes in Computer Science, vol 2988. Springer, Berlin, pp 357–371

  37. Mars Spirit Wiki (2005) Mars spirit software problem. http://c2.com/cgi/wiki?MarsSpiritSoftwareProblem

  38. Melone L (2012) Car-hacking: remote access and other security issues. Computer World. http://www.computerworld.com/s/article/9229919/Car_hacking_Remote_access_and_other_security_issues

  39. Milea NA, Khoo SC, Lo D, Pop C (2011) Nort: runtime anomaly-based monitoring of malicious behavior for windows. In: Proceedings of runtime verification (RV 2011), LNCS, vol 7186. Springer, Berlin, pp 115–130

  40. Mobley R (2004) Condition based maintenance. In: Davies A (ed) Handbook of condition monitoring: techniques and methodologies. Chapman & Hall, London, pp 35–54

  41. Narasimhan S (2007) Automated diagnosis of physical systems. In: International conference on accelerator and large experimental physics control systems (ICALEPCS ’07)

  42. Narasimhan S, Brownston L (2007) HyDE—a general framework for Stochastic and Hybrid model-based diagnosis. In: 18th international workshop on principles of diagnosis (DX ’07)

  43. Neumann P (2009) Illustrative risks to the public in the use of computer systems and related technology. http://www.csl.sri.com/users/neumann/illustrative.html

  44. Pike L, Niller S, Wegmann N (2012) Runtime verification for ultra-critical systems. In: Khurshid and Sen [32], pp 310–324

  45. Pizka M, Panas T (2009) Establishing economic effectiveness through software health management (extended abstract). In: Karsai [30]. http://www.isis.vanderbilt.edu/workshops/smc-it-2009-shm

  46. Qadeer S (ed) (2012) Runtime verification 2012 (RV’12). preproceedings, Springer LNCS, Berlin. http://rv2012.ku.edu.tr/accepted-papers/ (to be published)

  47. Rawnsley A (2011) Iran’s alleged drone hack: tough, but possible. Wired

  48. Regan P, Hamilton S (2004) NASA’s mission reliable. IEEE Comput 37(1):59–68

    Article  Google Scholar 

  49. Richardson J (2011) Stuxnet as cyberwarfare: applying the law of war to the virtual battlefield. Soc Sci Res Netw. http://ssrn.com/abstract=1892888 or doi:10.2139/ssrn.1892888

  50. RTCA: DO-178B: software considerations in airborne systems and equipment certification (1992). http://www.rtca.org

  51. RTCA: DO-178C/ED-12C: software considerations in airborne systems and equipment certification (2012). http://www.rtca.org

  52. Sistla AP, Zefran M, Feng Y (2012) Runtime monitoring of stochastic cyber-physical systems with hybrid state. In: Khurshid and Sen [32], pp 276–293

  53. Sophos: top 10 malware (2008). http://www.sophos.com/security/top-10/

  54. Srivastava AN, Das S (2009) Detection and prognostics on low dimensional systems. IEEE Trans Syst Man Cybern Part C 39(1)

  55. Srivastava AN, Meyer C, Mah R (2009) Integrated vehicle health management technical plan. Technical report, NASA

  56. Stephenson D (2006) The airplane doctors. Boeing Frontiers 5(1):36–41. http://www.boeing.com/news/frontiers/archive/2006/august/ts_sf09.pdf

    Google Scholar 

  57. Süddeutsche Zeitung S (2010) Bevor es zu spät ist: Rückrufe in der Automobilbranche. http://www.sueddeutsche.de/automobil/13/503237/text/

  58. Toyota: Toyota Prius recall—update ABS software (2010). http://www.toyota.com/recall/abs.html?srchid=K610_p280864979

  59. Wikipedia: Mars Rover spirit (2005) http://en.wikipedia.org/wiki/Spirit_rover

  60. Wikipedia: autonomic computing (2012) http://en.wikipedia.org/wiki/Autonomic_computing

  61. Wilhide P (2000) Mars program assessment report outlines route to success. http://mars.jpl.nasa.gov/msp98/news/news71.html

  62. Winter D (2008) Statement of Mr. Don C. Winter, VP Eng & IT, boeing phantom works before a hearing on NITRD. Committee on Science and Technology, U.S. House of Representatives

  63. Zhao C, Dong W, Wang J, Sui P, Qi Z (2009) Software active online monitoring under anticipatory semantics (extended abstract). In: Karsai [30] http://www.isis.vanderbilt.edu/workshops/smc-it-2009-shm

Download references

Acknowledgments

The authors would like to thank Eric Cooper, Paul Miner, Robert Mah, Claudia Meyer, Serdar Uckun, Gabor Karsai, and the NASA partners working on software health management. The authors would also like to thank the reviewers for valuable comments. This article was written under the support of the NASA Aviation Safety Program Integrated Vehicle Health Management project and NASA’s OSMA SARP project “Advanced tools and techniques for V&V of IVHM systems”. This paper is a substantially revised and extended version of a paper presented at SMC-IT 2011.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johann Schumann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srivastava, A.N., Schumann, J. Software health management: a necessity for safety critical systems. Innovations Syst Softw Eng 9, 219–233 (2013). https://doi.org/10.1007/s11334-013-0212-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-013-0212-0

Keywords

Navigation