Abstract
The network maintenance and failure localization are particularly important to ensure the high reliability of network operation. In this work, we propose one importance indicator based on reliability theory to measure the importance of each link and design the corresponding Monte Carlo algorithm. Meanwhile, we adopt intelligent BP neural network to deal with the problem of failure localization by predicting the failure probability of each link. Simulations indicate that the approach can achieve a high localization accuracy and reduce the use of monitoring equipment effectively. Whether the failed link located is worth maintaining is determined by its importance indicator. The proposed approach can be used by service providers to reduce the cost on network failure localization and maintenance as well as maintain the high reliable operation of network. As the approach is not restricted to specific network technologies, it can be widely applied to different network types.









Similar content being viewed by others
Data Availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Wang PP, Ma YQ, Liu HH (2007) Research of an integrate event correlation techniques in network fault management. Network Computer Security 15(4):481–502
Barbosa F, Sousa AD, Agra A (2018) Topology design of transparent optical networks resilient to multiple node failures. In: 2018 10th international workshop on resilient networks design and modeling
Panayiotou T, Chatzis SP, Ellinas G (2018) Leveraging statistical machine learning to address failure localization in optical networks. IEEE/OSA J Opt Commun Netw 10(3):162–173. https://doi.org/10.1364/JOCN.10.000162
Ahuja SS, Ramasubramanian S, Krunz MM (2009) Single edge failure detection in all-optical networks using monitoring cycles and paths. IEEE ACM T Network 17(4):1080–1093. https://doi.org/10.1109/TNET.2008.2008000
Wu B, Ho PH, Tapolcai J, Jiang X (2010) A novel framework of fast and unambiguous link failure localization via monitoring trails. In: INFOCOM IEEE conference on computer communications workshops. pp 1-5. https://doi.org/10.1109/INFCOMW.2010.5466637
Ali ML, Ho PH, Tapolcai J, Subramaniam S (2014) Multilink failure localization via monitoring bursts. J Opt Commun Netw 6(11):952–964. https://doi.org/10.1364/JOCN.6.000952
Zhao Y, Li X, Li H, Wang X, Zhang J, Huang S (2013) Multilink faults localization and restoration based on fuzzy fault set for dynamic optical networks. Opt Express 21(2):1496–1511. https://doi.org/10.1364/OE.21.001496
Tapolcai J, Ho PH, Rónyai L, Wu B (2012) Network-wide local unambiguous failure localization (NWL-UFL) via monitoring trails. IEEE ACM T Network 20(6):1762–1773. https://doi.org/10.1109/TNET.2012.2186461
Wen Y, Chan VWS, Zheng Y (2005) Efficient fault diagnosis algorithms for all-optical WDM networks with probabilistic link failures. J Lightw Technol 23(10):3358–3371. https://doi.org/10.1109/JLT.2005.855695
Wu B, Yeung KL (2009) Monitoring cycle design for fast link failure detection in all-optical networks. In: IEEE global telecommunications conference
Christodoulopoulos K, Sambo N, Varvarigos EM (2016) Exploiting network kriging for fault localization. In: Optical fiber communication conference
Gertsbakh I, Shpungin Y (2020) Network reliability: a lecture course. Springer, Singapore. https://doi.org/10.1007/978-981-15-1458-6
King D, Farrel A, Zhao Q et al (2015) A PCE-based architecture for application-based network operations. IETF RFC
Gertsbakh I, Shpungin Y (2020) Direct network reliability calculation
Zio E (2013) The monte carlo simulation method for system reliability and risk analysis. France, Paris
Markopoulou A, Iannaccone G, Bhattacharyya S, Chuah CN, Diot C (2019) Characterization of failures in an IP backbone. In: IEEE joint conference of the IEEE computer & communications societies. https://doi.org/10.1109/infcom.2004.1354653
Tiwari D, Gupta S, Vazhkudai SS (2014) Lazy checkpointing: Exploiting temporal locality in failures to mitigate checkpointing overheads on extreme-scale systems. In: Proceedings 44th IEEE/IFIP international conference on dependable systems and networks
Heien E, Kondo D, Gainaru A, LaPine D, Kramer B, Cappello F (2011) Modeling and tolerating heterogeneous failures in large parallel systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis
Schroeder B, Gibson G (2010) A large-scale study of failures in high-performance computing systems. IEEE Trans Depend Secure Comput 7(4):337–350
Liu Y, Nassar R, Leangsuksun C, Naksinehabo N, Paun M, Scott S L (2008) An optimal checkpoint/restart model for a large-scale high-performance computing system. In: 2008 IEEE international symposium on parallel and distributed processing
Sigdel P, Yuan X, Tzeng NF (2020) Realizing best checkpointing control in computing systems. IEEE Trans Parallel Distrib Syst 32(2):315–329. https://doi.org/10.1109/TPDS.2020.3015805
Leung A (2022) Maximum likelihood estimation. Elsevier
Panayiotou T, Chatzis SP, Ellinas G (2017) A Probabilistic approach for failure localization. Optical Network Design and Modeling (ONDM).
Zhou ZH (2016) Machine learning. Beijing, China
Langer S (2021) Approximating smooth functions by deep neural networks with sigmoid activation function. J Multiv Anal 182:104696
Cui L, Xiao YF, Huang YQ (2012) Factorization realizing approximate estimation of 2-terminal networks reliability. Comp Eng Appl 48(12):53–57. https://doi.org/10.3778/j.issn.1002-8331.2012.12.011
Wang RB, Xu HY, Li B (2018) Research on method of determining hidden layer nodes in BP neural network. Comp Technol Develop 28(4):31–35
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
There are no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zheng, Q., Shao, F. Intelligent failure localization and maintenance of network based on reliability. J Supercomput 79, 389–418 (2023). https://doi.org/10.1007/s11227-022-04653-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04653-7