Abstract
Fault localization is a central element in network fault management. This paper takes a weighted bipartite graph as a fault propagation model and presents a heuristic fault localization algorithm based on the idea of incremental coverage, which is resilient to inaccurate fault propagation model and the noisy environment. Furthermore, a sliding window mechanism is proposed to tackle the inaccuracy of this algorithm in the presence of improper time windows. As shown in the simulation study, our scheme achieves higher detection rate and lower false positive rate in the noisy environment as well as in the presence of inaccurate windows, than current fault localization algorithms.
Similar content being viewed by others
References
Steinder M, Sethi A S. A survey of fault localization techniques in computer networks. Sci Comput Progr, 2004, 53: 165–194
Mas C, Thiran P. An efficient algorithm for locating soft and hard failures in WDM networks. IEEE J Sel Area Commun, 2000, 18: 1900–1911
Wang C, Schwartz M. Fault detection with multiple observers. IEEE/ACM Trans Netw, 1993, 1: 48–55
Liu G, Mok A K, Yang E J. Composite events for network event correlation. In: Proceedings of IFIP/IEEE International Symposium on Integrated Network Management(IM), Boston, 1999. 247–260
Lewis L. A case-based reasoning approach to the resolution of faults in communications networks. In: Proceedings of IFIP/IEEE International Symposium on Integrated Network Management(IM), San Francisco, 1993. 671–681
Wietgrefe H. Investigation and practical assessment of alarm correlation methods for the use in GSM access networks. In: Proceedings of IFIP/IEEE Network Operation and Management Symposium(NOMS), Florence, 2002. 391–404
Benveniste A, Fabre E, Haar S, et al. Diagnosis of asynchronous discrete-event systems: a net unfolding approach. IEEE Trans Aut Contr, 2003, 48: 714–727
Rouvellou I, Hart G W. Automatic alarm correlation for fault identification. In: Proceedings of 14th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM). Bringing Information to People, Boston, 1995. 553–561
Zhang C, Liao J X, Zhu X M. SWPM: An incremental fault localization algorithm based on sliding window with preprocessing mechanism. In: Proceedings of 9th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), New Zealand, 2008. 235–242
Brodie M, Rish I, Ma S, et al. Active probing strategies for problem diagnosis in distributed systems. In: Proceeding of International Joint Conferences on Artificial Intelligence(IJCAI), Acapulco, 2003. 1337–1338
Tang Y N, Al-Shaer E S, Boutaba R. Active integrated fault localization in communication networks. In: Proceeding of 9th IFIP/IEEE International Symposium on Integrated Network Management (IM), Nice, 2005. 543–556
Katzela I, Schwartz M. Schemes for fault identification in communication networks. IEEE/ACM Trans Netw, 1995, 3: 733–764
Peng G Q, Cheng H. A causal model for diagnostic reasoning. J Comput Sci Tech, 2000, 15: 287–294
Kandula S, Katabi D, Vasseur J P. Shrink: a tool for failure diagnosis in IP networks. In: ACM SIGCOMM Workshop on Mining Network Data (MineNet), Philadelphia, 2005. 173-178
Khanafer R M, Solana B, Triola J, et al. Automated diagnosis for UMTS networks using Bayesian network approach. IEEE Trans Vehic Tech, 2008, 57: 2451–2461
Steinder M, Sethi A S. Probabilistic fault localization in communication systems using belief networks. IEEE/ACM Trans Netw, 2004, 12: 809–822
Rao N S V. Computational complexity issues in operative diagnosis of graph-based systems. IEEE Trans Comput, 1993, 42: 447–457
Kompella R R, Yates J, Greenberg A, et al. IP fault localization via risk modeling. In: Proceedings of 2nd ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI), Boston, 2005. 57–70
Huang X H, Zou S H, Wang W D, et al. Fault management for Internet service: modeling and algorithms. In: Proceedings of IEEE Communication on Conference (ICC), Istanbul, 2006. 854–859
Steinder M, Sethi A S. Probabilistic event-driven fault diagnosis through incremental hypothesis updating. In: Proceedings of IFIP/IEEE International Symposium on Integrated Network Management(IM), Colorado Springs, 2003. 635–648
Zheng Q H, Qian Y T. An event correlation approach based on the combination of IHU and codebook. In: International Conference Computational Intelligence and Security(CIS), Xi’an, 2005. 757–763
Zheng Q H, Qian Y T, Yao M. A network event correlation algorithm based on fault filtration. In: Proceeding of the 9th Pacific Rim International Conference on Artificial Intelligence (PRICAI), Guilin, 2006. 864–869
Natu M, Sethi A S. Probabilistic fault diagnosis using adaptive probing. In: IFIP/IEEE International Workshop on Distributed Systems: Operations and Managements(DSOM), San Jose, 2007. 38–49
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, C., Liao, J., Li, T. et al. Probabilistic fault localization with sliding windows. Sci. China Inf. Sci. 55, 1186–1200 (2012). https://doi.org/10.1007/s11432-012-4567-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-012-4567-x