Skip to main content
Log in

Intelligent failure localization and maintenance of network based on reliability

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The network maintenance and failure localization are particularly important to ensure the high reliability of network operation. In this work, we propose one importance indicator based on reliability theory to measure the importance of each link and design the corresponding Monte Carlo algorithm. Meanwhile, we adopt intelligent BP neural network to deal with the problem of failure localization by predicting the failure probability of each link. Simulations indicate that the approach can achieve a high localization accuracy and reduce the use of monitoring equipment effectively. Whether the failed link located is worth maintaining is determined by its importance indicator. The proposed approach can be used by service providers to reduce the cost on network failure localization and maintenance as well as maintain the high reliable operation of network. As the approach is not restricted to specific network technologies, it can be widely applied to different network types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Wang PP, Ma YQ, Liu HH (2007) Research of an integrate event correlation techniques in network fault management. Network Computer Security 15(4):481–502

    Google Scholar 

  2. Barbosa F, Sousa AD, Agra A (2018) Topology design of transparent optical networks resilient to multiple node failures. In: 2018 10th international workshop on resilient networks design and modeling

  3. Panayiotou T, Chatzis SP, Ellinas G (2018) Leveraging statistical machine learning to address failure localization in optical networks. IEEE/OSA J Opt Commun Netw 10(3):162–173. https://doi.org/10.1364/JOCN.10.000162

    Article  Google Scholar 

  4. Ahuja SS, Ramasubramanian S, Krunz MM (2009) Single edge failure detection in all-optical networks using monitoring cycles and paths. IEEE ACM T Network 17(4):1080–1093. https://doi.org/10.1109/TNET.2008.2008000

    Article  Google Scholar 

  5. Wu B, Ho PH, Tapolcai J, Jiang X (2010) A novel framework of fast and unambiguous link failure localization via monitoring trails. In: INFOCOM IEEE conference on computer communications workshops. pp 1-5. https://doi.org/10.1109/INFCOMW.2010.5466637

  6. Ali ML, Ho PH, Tapolcai J, Subramaniam S (2014) Multilink failure localization via monitoring bursts. J Opt Commun Netw 6(11):952–964. https://doi.org/10.1364/JOCN.6.000952

    Article  Google Scholar 

  7. Zhao Y, Li X, Li H, Wang X, Zhang J, Huang S (2013) Multilink faults localization and restoration based on fuzzy fault set for dynamic optical networks. Opt Express 21(2):1496–1511. https://doi.org/10.1364/OE.21.001496

    Article  Google Scholar 

  8. Tapolcai J, Ho PH, Rónyai L, Wu B (2012) Network-wide local unambiguous failure localization (NWL-UFL) via monitoring trails. IEEE ACM T Network 20(6):1762–1773. https://doi.org/10.1109/TNET.2012.2186461

    Article  Google Scholar 

  9. Wen Y, Chan VWS, Zheng Y (2005) Efficient fault diagnosis algorithms for all-optical WDM networks with probabilistic link failures. J Lightw Technol 23(10):3358–3371. https://doi.org/10.1109/JLT.2005.855695

    Article  Google Scholar 

  10. Wu B, Yeung KL (2009) Monitoring cycle design for fast link failure detection in all-optical networks. In: IEEE global telecommunications conference

  11. Christodoulopoulos K, Sambo N, Varvarigos EM (2016) Exploiting network kriging for fault localization. In: Optical fiber communication conference

  12. Gertsbakh I, Shpungin Y (2020) Network reliability: a lecture course. Springer, Singapore. https://doi.org/10.1007/978-981-15-1458-6

    Book  MATH  Google Scholar 

  13. King D, Farrel A, Zhao Q et al (2015) A PCE-based architecture for application-based network operations. IETF RFC

  14. Gertsbakh I, Shpungin Y (2020) Direct network reliability calculation

  15. Zio E (2013) The monte carlo simulation method for system reliability and risk analysis. France, Paris

    Book  Google Scholar 

  16. Markopoulou A, Iannaccone G, Bhattacharyya S, Chuah CN, Diot C (2019) Characterization of failures in an IP backbone. In: IEEE joint conference of the IEEE computer & communications societies. https://doi.org/10.1109/infcom.2004.1354653

  17. Tiwari D, Gupta S, Vazhkudai SS (2014) Lazy checkpointing: Exploiting temporal locality in failures to mitigate checkpointing overheads on extreme-scale systems. In: Proceedings 44th IEEE/IFIP international conference on dependable systems and networks

  18. Heien E, Kondo D, Gainaru A, LaPine D, Kramer B, Cappello F (2011) Modeling and tolerating heterogeneous failures in large parallel systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis

  19. Schroeder B, Gibson G (2010) A large-scale study of failures in high-performance computing systems. IEEE Trans Depend Secure Comput 7(4):337–350

    Article  Google Scholar 

  20. Liu Y, Nassar R, Leangsuksun C, Naksinehabo N, Paun M, Scott S L (2008) An optimal checkpoint/restart model for a large-scale high-performance computing system. In: 2008 IEEE international symposium on parallel and distributed processing

  21. Sigdel P, Yuan X, Tzeng NF (2020) Realizing best checkpointing control in computing systems. IEEE Trans Parallel Distrib Syst 32(2):315–329. https://doi.org/10.1109/TPDS.2020.3015805

    Article  Google Scholar 

  22. Leung A (2022) Maximum likelihood estimation. Elsevier

    Google Scholar 

  23. Panayiotou T, Chatzis SP, Ellinas G (2017) A Probabilistic approach for failure localization. Optical Network Design and Modeling (ONDM).

  24. Zhou ZH (2016) Machine learning. Beijing, China

    Google Scholar 

  25. Langer S (2021) Approximating smooth functions by deep neural networks with sigmoid activation function. J Multiv Anal 182:104696

    Article  MathSciNet  MATH  Google Scholar 

  26. Cui L, Xiao YF, Huang YQ (2012) Factorization realizing approximate estimation of 2-terminal networks reliability. Comp Eng Appl 48(12):53–57. https://doi.org/10.3778/j.issn.1002-8331.2012.12.011

    Article  Google Scholar 

  27. Wang RB, Xu HY, Li B (2018) Research on method of determining hidden layer nodes in BP neural network. Comp Technol Develop 28(4):31–35

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fangming Shao.

Ethics declarations

Conflict of Interest

There are no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Q., Shao, F. Intelligent failure localization and maintenance of network based on reliability. J Supercomput 79, 389–418 (2023). https://doi.org/10.1007/s11227-022-04653-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04653-7

Keywords

Navigation