Availability of enterprise IT systems: an expert-based Bayesian framework

Franke, Ulrik; Johnson, Pontus; König, Johan; Marcks von Würtemberg, Liv

doi:10.1007/s11219-011-9141-z

Availability of enterprise IT systems: an expert-based Bayesian framework

Published: 13 May 2011

Volume 20, pages 369–394, (2012)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Ulrik Franke¹,
Pontus Johnson¹,
Johan König¹ &
…
Liv Marcks von Würtemberg¹

562 Accesses
22 Citations
Explore all metrics

Abstract

Ensuring the availability of enterprise IT systems is a challenging task. The factors that can bring systems down are numerous, and their impact on various system architectures is difficult to predict. At the same time, maintaining high availability is crucial in many applications, ranging from control systems in the electric power grid, over electronic trading systems on the stock market to specialized command and control systems for military and civilian purposes. This paper describes a Bayesian decision support model, designed to help enterprise IT system decision-makers evaluate the consequences of their decisions by analyzing various scenarios. The model is based on expert elicitation from 50 experts on IT systems availability, obtained through an electronic survey. The Bayesian model uses a leaky Noisy-OR method to weigh together the expert opinions on 16 factors affecting systems availability. Using this model, the effect of changes to a system can be estimated beforehand, providing decision support for improvement of enterprise IT systems availability. The Bayesian model thus obtained is then integrated within a standard, reliability block diagram-style, mathematical model for assessing availability on the architecture level. In this model, the IT systems play the role of building blocks. The overall assessment framework thus addresses measures to ensure high availability both on the level of individual systems and on the level of the entire enterprise architecture. Examples are presented to illustrate how the framework can be used by practitioners aiming to ensure high availability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on fault detection and diagnosis techniques: basics and beyond

Article 10 November 2020

A failure mode and effect analysis (FMEA)-based approach for risk assessment of scientific processes in non-regulated research laboratories

Article Open access 11 August 2020

Review of System Development Life Cycle (SDLC) Models for Effective Application Delivery

References

Ashrafi, N., Berman, O., & Cutler, M. (2002). Optimal design of large software-systems using N-version programming. IEEE Transactions on Reliability, 43(2), 344–350.
Article Google Scholar
Askåker, J., & Kulle, M. (2008, June 4). Miljardaffärer gick förlorade. Dagens Industri, pp. 6–7 (in Swedish).
Baecher, G. (1988). Judgemental probability in geotechnical risk assessment. Technical report, The Office of the Chief, US Army Corps of Engineers.
Blaxter, L., Hughes, C., & Tight, M. (2006). How to research. Buckingham: Open University Press.
Google Scholar
Campbell, K., Gordon, L. A., Loeb, M. P., & Zhou, L. (2003). The economic cost of publicly announced information security breaches: Empirical evidence from the stock market. Journal of Computer Security, 11(3), 431–448.
Google Scholar
Chen, L., & Avizienis A. (1978). N-version programming: A fault-tolerance approach to reliability of software operation. In Proceedings of the 8th IEEE international symposium on fault-tolerant computing (FTCS-8), pp. 3–9.
Cooke, R. M. (1991). Experts in uncertainty. Opinion and subjective probability in science. Oxford: Oxford University Press.
Google Scholar
Czaja, R., & Blair, J. (2005). Designing surveys. Beverly Hills: Sage Publications Inc.
Google Scholar
Druzdzel, M. J. (1999). Genie: A development environment for graphical decision-analytic models. In Proceedings of the 1999 annual symposium of the American medical informatics association (AMIA-1999), p. 1206, more information available on http://genie.sis.pitt.edu/about.htm.
Fenton, N. E., Pfleeger, S. L. (1997). Software Metrics (2nd ed.). Boston, MA: PWS Publishing Company.
Google Scholar
Fowler, F. J. J. (2002). Survey research methods. Beverly Hills: Sage Publications Inc.
Google Scholar
Franke, U., Johnson, P., König, J., & von Würtemberg, L. M. (2010a). Availability of enterprise IT systems—an expert-based Bayesian model. In Proceedings of the fourth international workshop on software quality and maintainability (SQM 2010), Madrid.
Franke, U., Lagerström, R., Ekstedt, M., Saat, J., & Winter, R. (2010b). Trends in enterprise architecture practice—A survey. In Proceedings of the 5th trends in enterprise architecture research (TEAR2010) workshop.
Friedman, N., & Goldszmidt, M. (1999). Learning Bayesian networks with local structure. In M. I. Jordan (Eds.), Graphical models (pp. 421–459). Cambridge, MA: MIT Press.
Google Scholar
Friedman, N., Linial, M., & Nachman, I. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology, 7, 601–620.
Article Google Scholar
Garthwaite, P. H., Kadane, J. B., & O’Hagan, A. (2005). Statistical methods for eliciting probability distributions. Journal of the American Statistical Association 100, 680–701. http://ideas.repec.org/a/bes/jnlasa/v100y2005p680-701.htm.
Google Scholar
Henrion, M. (1989). Some practical issues in constructing belief networks. In L. Kanal, T. Levitt, & J. Lemmer (Eds.), Uncertainty in artificial intelligence 3 (pp. 161–173). North Holland: Elsevier Science Publishers B.V.
IBM Global Services. (1998). Improving systems availability. Technical report, IBM Global Services.
International Organization for Standardization. (2003). Software engineering—Product quality—Part 2: External metrics. International standard ISO/IEC TR 9126-2:2003(E), International Organization for Standardization.
Jensen, F. V. (2001). Bayesian networks and decision graphs. Secaucus, NJ: Springer.
MATH Google Scholar
Johansson, F., & Falkman, G. (2006). Implementation and integration of a bayesian network for prediction of tactical intention into a ground target simulator. In 2006 9th international conference on information fusion, pp. 1–7. doi:10.1109/ICIF.2006.301605.
Lagerström, R., Franke, U., Johnson, P., & Ullberg, J. (2009a). A method for creating enterprise architecture metamodels—Applied to systems modifiability analysis. International Journal of Computer Science & Applications, VI, 89–120.
Google Scholar
Lagerström, R., Johnson, P., Höök, D., & König, J. (2009b). Software change project cost estimation—A Bayesian network and a method for expert elicitation. In Third international workshop on software quality and maintainability (SQM 2009).
Laird, L., & Brennan C. (2006). Software measurement and estimation: A practical approach. New York: IEEE Computer Society/Wiley.
Malek, M., Milic, B., & Milanovic, N. (2008). Analytical availability assessment of it services. In ISAS, pp. 207–224.
Malik, B. (2009). Q&A: How much soes an hour of downtime cost?. Technical report, Gartner, Inc.
Mangione, T. W. (1995). Mail surveys improving the quality. Beverley Hills: Sage Publications Inc.
Google Scholar
Marcus, E., & Stern, H. (2003). Blueprints for high availability (2nd ed). Indianapolis, IN: Wiley.
Google Scholar
Milanovic, N., Milic, B., & Malek, M. (2008). Modeling business process availability. In SERVICES ’08: Proceedings of the 2008 IEEE congress on services—Part I (pp. 315–321). Washington, DC: IEEE Computer Society. doi:10.1109/SERVICES-1.2008.
Musa, J. (1999). Software reliability engineering: More reliable software, faster development and testing. New York: McGraw-Hill.
Google Scholar
Neapolitan, R. E. (2003). Learning Bayesian networks. Upper Saddle River, NJ: Prentice-Hall Inc.
Google Scholar
Onisko, A., Druzdzel, M. J., & Wasyluk, H. (2001). Learning Bayesian network parameters from small data sets: Application of noisy-or gates. International Journal of Approximate Reasoning, 27(2), 165–182. doi:10.1016/S0888-613X(01)00039-1.
Article MATH Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann Publishers Inc.
Google Scholar
Pham, H. (2000). Software reliability. Singapore: Springer.
MATH Google Scholar
Rausand, M., & Høyland, A. (2004). System reliability theory: Models, statistical methods, and applications (2nd ed.). Hoboken, NJ: Wiley. http://www.ntnu.no/ross/srt.
Renooij, S. (2001). Qualitative approaches to quantifying probabilistic networks. PhD thesis, Utrecht University, Utrecht, The Netherlands.
Sallak, M., Simon, C., & Aubry J. F. (2006). Optimal design of safety instrumented systems. In Workshop on advanced control and diagnosis, ACD, Nancy.
Scott, D. (2009). How to assess your IT service availability levels. Technical report, Gartner, Inc.
Shachter, R. D. (1988). Probabilistic inference and influence diagrams. Operations Research, 36(4), 589–604.
Article MATH Google Scholar
Woodberry, O., Nicholson, A. E., Korb, K. B., & Pollino, C. (2005). Parameterising bayesian networks. In AI 2004: Advances in artificial intelligence (pp. 1101—1107). Berlin: Springer.
Zhang, R., Cope, E., Heusler, L., & Cheng, F. (2009). A Bayesian network approach to modeling IT service availability using system logs. In Workshop on the analysis of system logs.
Zhang, X., & Pham, H. (2000) An analysis of factors affecting software reliability. Journal of Systems and Software 50(1), 43–56. 10.1016/S0164-1212(99)00075-8.
Google Scholar

Download references

Acknowledgments

The authors are grateful to those colleagues and respondents who helped improving the survey before it was sent out, as well as to all the researchers who responded when invited. A special thanks is due to Lars Nordström for his contribution and support throughout the project. Three anonymous reviewers provided useful comments upon the paper. A previous version of this paper was presented at the Workshop on Software Quality and Maintainability (SQM 2010)—Franke et al. (2010)—and the authors would also like to thank the anonymous reviewers of that version for their comments.

Author information

Authors and Affiliations

Industrial Information and Control Systems, Royal Institute of Technology, Stockholm, Sweden
Ulrik Franke, Pontus Johnson, Johan König & Liv Marcks von Würtemberg

Authors

Ulrik Franke
View author publications
You can also search for this author in PubMed Google Scholar
Pontus Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Johan König
View author publications
You can also search for this author in PubMed Google Scholar
Liv Marcks von Würtemberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ulrik Franke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franke, U., Johnson, P., König, J. et al. Availability of enterprise IT systems: an expert-based Bayesian framework. Software Qual J 20, 369–394 (2012). https://doi.org/10.1007/s11219-011-9141-z

Download citation

Published: 13 May 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s11219-011-9141-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Availability of enterprise IT systems: an expert-based Bayesian framework

Abstract

Access this article

Similar content being viewed by others

A review on fault detection and diagnosis techniques: basics and beyond

A failure mode and effect analysis (FMEA)-based approach for risk assessment of scientific processes in non-regulated research laboratories

Review of System Development Life Cycle (SDLC) Models for Effective Application Delivery

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Availability of enterprise IT systems: an expert-based Bayesian framework

Abstract

Access this article

Similar content being viewed by others

A review on fault detection and diagnosis techniques: basics and beyond

A failure mode and effect analysis (FMEA)-based approach for risk assessment of scientific processes in non-regulated research laboratories

Review of System Development Life Cycle (SDLC) Models for Effective Application Delivery

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation