Skip to main content
Log in

Availability of enterprise IT systems: an expert-based Bayesian framework

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Ensuring the availability of enterprise IT systems is a challenging task. The factors that can bring systems down are numerous, and their impact on various system architectures is difficult to predict. At the same time, maintaining high availability is crucial in many applications, ranging from control systems in the electric power grid, over electronic trading systems on the stock market to specialized command and control systems for military and civilian purposes. This paper describes a Bayesian decision support model, designed to help enterprise IT system decision-makers evaluate the consequences of their decisions by analyzing various scenarios. The model is based on expert elicitation from 50 experts on IT systems availability, obtained through an electronic survey. The Bayesian model uses a leaky Noisy-OR method to weigh together the expert opinions on 16 factors affecting systems availability. Using this model, the effect of changes to a system can be estimated beforehand, providing decision support for improvement of enterprise IT systems availability. The Bayesian model thus obtained is then integrated within a standard, reliability block diagram-style, mathematical model for assessing availability on the architecture level. In this model, the IT systems play the role of building blocks. The overall assessment framework thus addresses measures to ensure high availability both on the level of individual systems and on the level of the entire enterprise architecture. Examples are presented to illustrate how the framework can be used by practitioners aiming to ensure high availability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ashrafi, N., Berman, O., & Cutler, M. (2002). Optimal design of large software-systems using N-version programming. IEEE Transactions on Reliability, 43(2), 344–350.

    Article  Google Scholar 

  • Askåker, J., & Kulle, M. (2008, June 4). Miljardaffärer gick förlorade. Dagens Industri, pp. 6–7 (in Swedish).

  • Baecher, G. (1988). Judgemental probability in geotechnical risk assessment. Technical report, The Office of the Chief, US Army Corps of Engineers.

  • Blaxter, L., Hughes, C., & Tight, M. (2006). How to research. Buckingham: Open University Press.

    Google Scholar 

  • Campbell, K., Gordon, L. A., Loeb, M. P., & Zhou, L. (2003). The economic cost of publicly announced information security breaches: Empirical evidence from the stock market. Journal of Computer Security, 11(3), 431–448.

    Google Scholar 

  • Chen, L., & Avizienis A. (1978). N-version programming: A fault-tolerance approach to reliability of software operation. In Proceedings of the 8th IEEE international symposium on fault-tolerant computing (FTCS-8), pp. 3–9.

  • Cooke, R. M. (1991). Experts in uncertainty. Opinion and subjective probability in science. Oxford: Oxford University Press.

    Google Scholar 

  • Czaja, R., & Blair, J. (2005). Designing surveys. Beverly Hills: Sage Publications Inc.

    Google Scholar 

  • Druzdzel, M. J. (1999). Genie: A development environment for graphical decision-analytic models. In Proceedings of the 1999 annual symposium of the American medical informatics association (AMIA-1999), p. 1206, more information available on http://genie.sis.pitt.edu/about.htm.

  • Fenton, N. E., Pfleeger, S. L. (1997). Software Metrics (2nd ed.). Boston, MA: PWS Publishing Company.

    Google Scholar 

  • Fowler, F. J. J. (2002). Survey research methods. Beverly Hills: Sage Publications Inc.

    Google Scholar 

  • Franke, U., Johnson, P., König, J., & von Würtemberg, L. M. (2010a). Availability of enterprise IT systems—an expert-based Bayesian model. In Proceedings of the fourth international workshop on software quality and maintainability (SQM 2010), Madrid.

  • Franke, U., Lagerström, R., Ekstedt, M., Saat, J., & Winter, R. (2010b). Trends in enterprise architecture practice—A survey. In Proceedings of the 5th trends in enterprise architecture research (TEAR2010) workshop.

  • Friedman, N., & Goldszmidt, M. (1999). Learning Bayesian networks with local structure. In M. I. Jordan (Eds.), Graphical models (pp. 421–459). Cambridge, MA: MIT Press.

    Google Scholar 

  • Friedman, N., Linial, M., & Nachman, I. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology, 7, 601–620.

    Article  Google Scholar 

  • Garthwaite, P. H., Kadane, J. B., & O’Hagan, A. (2005). Statistical methods for eliciting probability distributions. Journal of the American Statistical Association 100, 680–701. http://ideas.repec.org/a/bes/jnlasa/v100y2005p680-701.htm.

    Google Scholar 

  • Henrion, M. (1989). Some practical issues in constructing belief networks. In L. Kanal, T. Levitt, & J. Lemmer (Eds.), Uncertainty in artificial intelligence 3 (pp. 161–173). North Holland: Elsevier Science Publishers B.V.

  • IBM Global Services. (1998). Improving systems availability. Technical report, IBM Global Services.

  • International Organization for Standardization. (2003). Software engineering—Product quality—Part 2: External metrics. International standard ISO/IEC TR 9126-2:2003(E), International Organization for Standardization.

  • Jensen, F. V. (2001). Bayesian networks and decision graphs. Secaucus, NJ: Springer.

    MATH  Google Scholar 

  • Johansson, F., & Falkman, G. (2006). Implementation and integration of a bayesian network for prediction of tactical intention into a ground target simulator. In 2006 9th international conference on information fusion, pp. 1–7. doi:10.1109/ICIF.2006.301605.

  • Lagerström, R., Franke, U., Johnson, P., & Ullberg, J. (2009a). A method for creating enterprise architecture metamodels—Applied to systems modifiability analysis. International Journal of Computer Science & Applications, VI, 89–120.

    Google Scholar 

  • Lagerström, R., Johnson, P., Höök, D., & König, J. (2009b). Software change project cost estimation—A Bayesian network and a method for expert elicitation. In Third international workshop on software quality and maintainability (SQM 2009).

  • Laird, L., & Brennan C. (2006). Software measurement and estimation: A practical approach. New York: IEEE Computer Society/Wiley.

  • Malek, M., Milic, B., & Milanovic, N. (2008). Analytical availability assessment of it services. In ISAS, pp. 207–224.

  • Malik, B. (2009). Q&A: How much soes an hour of downtime cost?. Technical report, Gartner, Inc.

  • Mangione, T. W. (1995). Mail surveys improving the quality. Beverley Hills: Sage Publications Inc.

    Google Scholar 

  • Marcus, E., & Stern, H. (2003). Blueprints for high availability (2nd ed). Indianapolis, IN: Wiley.

    Google Scholar 

  • Milanovic, N., Milic, B., & Malek, M. (2008). Modeling business process availability. In SERVICES ’08: Proceedings of the 2008 IEEE congress on services—Part I (pp. 315–321). Washington, DC: IEEE Computer Society. doi:10.1109/SERVICES-1.2008.

  • Musa, J. (1999). Software reliability engineering: More reliable software, faster development and testing. New York: McGraw-Hill.

    Google Scholar 

  • Neapolitan, R. E. (2003). Learning Bayesian networks. Upper Saddle River, NJ: Prentice-Hall Inc.

    Google Scholar 

  • Onisko, A., Druzdzel, M. J., & Wasyluk, H. (2001). Learning Bayesian network parameters from small data sets: Application of noisy-or gates. International Journal of Approximate Reasoning, 27(2), 165–182. doi:10.1016/S0888-613X(01)00039-1.

    Article  MATH  Google Scholar 

  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  • Pham, H. (2000). Software reliability. Singapore: Springer.

    MATH  Google Scholar 

  • Rausand, M., & Høyland, A. (2004). System reliability theory: Models, statistical methods, and applications (2nd ed.). Hoboken, NJ: Wiley. http://www.ntnu.no/ross/srt.

  • Renooij, S. (2001). Qualitative approaches to quantifying probabilistic networks. PhD thesis, Utrecht University, Utrecht, The Netherlands.

  • Sallak, M., Simon, C., & Aubry J. F. (2006). Optimal design of safety instrumented systems. In Workshop on advanced control and diagnosis, ACD, Nancy.

  • Scott, D. (2009). How to assess your IT service availability levels. Technical report, Gartner, Inc.

  • Shachter, R. D. (1988). Probabilistic inference and influence diagrams. Operations Research, 36(4), 589–604.

    Article  MATH  Google Scholar 

  • Woodberry, O., Nicholson, A. E., Korb, K. B., & Pollino, C. (2005). Parameterising bayesian networks. In AI 2004: Advances in artificial intelligence (pp. 1101—1107). Berlin: Springer.

  • Zhang, R., Cope, E., Heusler, L., & Cheng, F. (2009). A Bayesian network approach to modeling IT service availability using system logs. In Workshop on the analysis of system logs.

  • Zhang, X., & Pham, H. (2000) An analysis of factors affecting software reliability. Journal of Systems and Software 50(1), 43–56. 10.1016/S0164-1212(99)00075-8.

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to those colleagues and respondents who helped improving the survey before it was sent out, as well as to all the researchers who responded when invited. A special thanks is due to Lars Nordström for his contribution and support throughout the project. Three anonymous reviewers provided useful comments upon the paper. A previous version of this paper was presented at the Workshop on Software Quality and Maintainability (SQM 2010)—Franke et al. (2010)—and the authors would also like to thank the anonymous reviewers of that version for their comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ulrik Franke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franke, U., Johnson, P., König, J. et al. Availability of enterprise IT systems: an expert-based Bayesian framework. Software Qual J 20, 369–394 (2012). https://doi.org/10.1007/s11219-011-9141-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-011-9141-z

Keywords

Navigation