Abstract
Fault management and security of computer networks present new challenges of increasing complexity. Decision procedures for fault diagnosis, security-related threats, and follow-up actions must, however, be evaluated on the basis of sound theoretical foundations and economic costs of various strategies. This paper presents a minimum expected cost solution for fault diagnosis and corrective actions. Several notions new to fault management are introduced. The methodology is applicable to both non-malicious and malicious faults. As a novel security-related application, the problem of choosing between two strategies for containing the spread of network worms is discussed.

Similar content being viewed by others
References
Foster, I., Kesselman, C.: The Grid 2: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann publishers, Burlington (2004)
Cottrell, L., Matthews, W., Logg, C.: Tutorial on Internet monitoring and pingER at SLAC, http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html#variable, (2014)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Xu, T., Zhou, Y.: Systems approaches to tackling configuration errors: a survey. ACM Comput. Surv. 47(4), 41 Article 70 (2015)
Kirmani, E., Hood, C.S.: Analysis of a scanning model of worm propagation. J. Comput. Virol. 6(1), 31–42 (2010)
Sellke, S.H., Shroff, N.B., Begchi, S.: Modeling and automated containment of worms. IEEE Trans. Dependable Secure Comput. 5(2), 71–86 (2008)
Singh, A., Singh, B., Joseph, H. (eds.): Vulnerability Analysis and Defense for the Internet. Springer, New York (2008)
Ye, N.: Secure Computer and Network Systems: Modeling, Analysis and Design. Wiley, New York (2008)
Burden, R.L., Faires, J.D.: Numerical Analysis, 7th edn. Brooks/Cole Publishing Co., Pacific Grove (2001)
Cottrell, L.: Comparison of one and two way jitter, http://www.slac.stanford.edu/comp/net/wan-mon/oneway-jitter.html, (1998)
Ross, S.M.: Stochastic Processes, 2nd edn. Wiley, New York (1996)
Metz, C.E.: Basic principles of ROC analysis. Semin. Nucl. Med. 8(4), 283–298 (1978)
Zweig, M.H., Campbell, G.: Receiver-Operating Characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993)
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)
Stelling, P., Foster, I., Kesselman, C., Lee, C., von Laszewski, G.: A fault detection service for wide area distributed computations. In: Proceedings of 7th IEEE Symp. on High Performance Distributed Computing, pp. 268–278, (1998)
Troubleshoot EtherNet/IP Networks\(^{TM}\), Rockwell Automation Publication ENET-AT003B-EN-P, June (2014)
Buyya, R.: High Performance Cluster Computing: Architecture and Systems, vol. 1. Prentice Hall PTR, Upper Saddle River (1999)
Moore, D., Shannon, C.: “The Spread of the Code-Red Worm (CRv2).” http://www.caida.org/analysis/security/codered/coderedv2_analysis.xml
Staniford, S., Paxon, V., Weaver, N.: How to own the Internet in your spare time. In: Proceedings of 11th USENIX Security Symp., San Francisco, CA, pp. 149–170 (2002)
Acknowledgments
We wish to thank the Editor-in-Chief and the three referees for their most constructive comments on the original version of this paper. This work was supported by NSF-0325378ITR.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kirmani, E., Hood, C.S. A Decision-Theoretic Approach to Network Fault Diagnosis and Follow-up Action. J Netw Syst Manage 25, 159–179 (2017). https://doi.org/10.1007/s10922-016-9386-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10922-016-9386-8