Abstract
Links inevitably fail in expanding networks, leading to user-perceived service interruptions. To localize link failures quickly and accurately is thus essential, and route-aware active probing makes it possible. Given the limited routing capacity and high traffic overhead, cross verification enables a light-weight probing scheme using reachability verification for distinct subsets of crossed paths to pinpoint the exact faulty links. Aiming to quickly optimize the crossed path design, we propose pruning genetic algorithm (PGA), which builds a pruning module on top of genetic algorithm to consistently produce high-quality solutions across various networks and avoid slow convergence in an exponentially large solution space by eliminating redundant paths. PGA also introduces extra repair operations to guarantee solution feasibility after crossover and mutation. Our experimental results on real-world network topologies demonstrate that PGA achieves a significant reduction of 23.0% to 58.3% in probing cost and 23.0% to 45.3% in forwarding cost in seconds or even milliseconds compared to its counterparts.



















Similar content being viewed by others
Availability of data and materials
The real-world topology dataset used for evaluation can be accessed at www.topology-zoo.org.
References
Abbasi, M., Shahraki, A., Taherkordi, A.: Deep learning for network traffic monitoring and analysis (NTMA): A survey. Comput. Commun. 170, 19–41 (2021)
Ahuja, S.S., Ramasubramanian, S., Krunz, M.: Single-link failure detection in all-optical networks using monitoring cycles and paths. IEEE/ACM Trans. Netw. 17(4), 1080–1093 (2009)
Ahuja, S.S., Ramasubramanian, S., Krunz, M.: SRLG failure localization in optical networks. IEEE/ACM Trans. Netw. 19(4), 989–999 (2011)
Alam, T., Qamar, S., Dixit, A., Benaida, M.: Genetic algorithm: Reviews, implementations, and applications. Int. J. Eng. Pedagog. 10(6), 57–77 (2020)
Aubry, F., Lebrun, D., Vissicchio, S., Khong, M.T., Deville, Y., Bonaventure, O.: Scmon: Leveraging segment routing to improve network monitoring. In: 35th Annual IEEE International Conference on Computer Communications, INFOCOM 2016, pp. 1–9. IEEE, San Francisco, CA, USA, April 10-14, 2016 (2016)
Basuki, A.I., Kuipers, F.: Localizing link failures in legacy and SDN networks. In: 10th International Workshop on Resilient Networks Design and Modeling, RNDM 2018, pp. 1–6. IEEE, Longyearbyen, Svalbard, Norway, August 27-29, 2018 (2018)
Cao, J., Xia, R., Yang, P., Guo, C., Lu, G., Yuan, L., Zheng, Y., Wu, H., Xiong, Y., Maltz, D.: Per-packet load-balanced, low-latency routing for clos-based data center networks. In: Proceedings of the ninth ACM conference on Emerging networking experiments and technologies, pp. 49–60 (2013)
Filsfils, C., Nainar, N.K., Pignataro, C., Cardona, J.C., Francois, P.: The segment routing architecture. In: 2015 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2015)
Gao, H., Zhao, L., Wang, H., Tian, Z., Nie, L., Li, K.: Xshot: Light-weight link failure localization using crossed probing cycles in SDN. In: ICPP 2020: 49th International Conference on Parallel Processing, pp. 56:1–56:11. ACM, Edmonton, AB, Canada, August 17-20, 2020 (2020)
Gill, P., Jain, N., Nagappan, N.: Understanding network failures in data centers: measurement, analysis, and implications. In: Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 350–361. ACM, Toronto, ON, Canada, August 15-19, 2011 (2011)
Gyimóthi, L., Tapolcai, J.: A heuristic algorithm for network-wide local unambiguous node failure localization. In: 2015 IEEE 16th International Conference on High Performance Switching and Routing (HPSR), pp. 1–6. IEEE (2015)
Herodotou, H., Ding, B., Balakrishnan, S., Outhred, G., Fitter, P.: Scalable near real-time failure localization of data center networks. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp. 1689–1698. ACM, New York, NY, USA - August 24–27, 2014 (2014)
Knight, S., Nguyen, H., Falkner, N., Bowden, R., Roughan, M.: The internet topology zoo. IEEE J. Sel. Areas Commun. 29(9), 1765–1775 (2011)
Li, X., Yeung, K.L.: Ilp formulation for monitoring-cycle construction using segment routing. In: 2018 IEEE 43rd Conference on Local Computer Networks (LCN), pp. 485–492. IEEE (2018)
Li, X., Yeung, K.L.: Monitoring trail design based on segment routing. IEEE Trans. Netw. Serv. Manag. 17(4), 2648–2661 (2020)
Li, Z., Chen, Q., Koltun, V.: Combinatorial optimization with graph convolutional networks and guided tree search. Adv. Neural Inf. Process. Syst. 31 (2018)
McKeown, N., Anderson, T.E., Balakrishnan, H., Parulkar, G.M., Peterson, L.L., Rexford, J., Shenker, S., Turner, J.S.: Openflow: enabling innovation in campus networks. Comput. Commun. Rev. 38(2), 69–74 (2008)
Ogino, N., Kitahara, T.: Greedy computation of all-optical monitoring trails to minimize total monitoring cost. Opt. Switch. Netw. 32, 1–13 (2019)
Ogino, N., Yokota, H.: Heuristic computation method for all-optical monitoring trails terminated at specified nodes. J. Light. Technol. 32(3), 467–482 (2013)
Roy, A., Zeng, H., Bagga, J., Snoeren, A.C.: Passive realtime datacenter fault detection and localization. In: 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, pp. 595–612. USENIX Association, Boston, MA, USA, March 27-29, 2017 (2017)
Saquib, S.M., Chinthalapati, E., Kumar, D.: Efficient topology failure detection in sdn networks (Aug 22 2017), uS Patent 9,742,648
Tan, C., Jin, Z., Guo, C., Zhang, T., Wu, H., Deng, K., Bi, D., Xiang, D.: Netbouncer: Active device and link failure localization in data center networks. In: 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, pp. 599–614. USENIX Association, Boston, MA, February 26–28, 2019 (2019)
Tapolcai, J., Ho, P.H., Babarczi, P., Rónyai, L.: On achieving all-optical failure restoration via monitoring trails. In: 2013 Proceedings IEEE INFOCOM, pp. 380–384. IEEE (2013)
Tapolcai, J., Ho, P.H., Rónyai, L., Babarczi, P., Wu, B.: Failure localization for shared risk link groups in all-optical mesh networks using monitoring trails. J. Light. Technol. 29(10), 1597–1606 (2011)
Tapolcai, J., Wu, B., Ho, P.: On monitoring and failure localization in mesh all-optical networks. In: INFOCOM 2009. 28th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 19–25 April 2009, pp. 1008–1016. IEEE, Rio de Janeiro, Brazil (2009)
Wang, X., Malboubi, M., Pan, Z., Ren, J., Wang, S., Xu, S., Chuah, C.: Proglimi: Programmable link metric identification in software-defined networks. IEEE/ACM Trans. Netw. 26(5), 2376–2389 (2018)
Wu, B., Ho, P.H., Yeung, K.L.: Monitoring trail: On fast link failure localization in all-optical wdm mesh networks. J. Light. Technol. 27(18), 4175–4185 (2009)
Wu, B., Yeung, K.L.: \(\text{M}^{2}\)-CYCLE: an optical layer algorithm for fast link failure detection in all-optical mesh networks. In: Proceedings of the Global Telecommunications Conference, 2006. GLOBECOM ’06. IEEE, San Francisco, CA, USA, 27 November–1 December 2006 (2006)
Wu, B., Yeung, K.L., Ho, P.H.: Monitoring cycle design for fast link failure localization in all-optical networks. J. Light. Technol. 27(10), 1392–1401 (2009)
Xing, Z., Tu, S., Xu, L.: Solve traveling salesman problem by monte carlo tree search and deep neural network (2020). arXiv:2005.06879
Xu, Y., Fang, M., Chen, L., Xu, G., Du, Y., Zhang, C.: Reinforcement learning with multiple relational attention for solving vehicle routing problems. IEEE Trans. Cybern. 52(10), 11107–11120 (2021)
Zeng, H., Huang, C.: Fault detection and path performance monitoring in meshed all-optical networks. In: Proceedings of the Global Telecommunications Conference, 2004. GLOBECOM ’04, pp. 2014–2018. IEEE, Dallas, Texas, USA, 29 November–3 December 2004 (2004)
Zeng, H., Huang, C., Vukovic, A.: Spanning-tree based monitoring-cycle construction for fault detection and localization in mesh aons. In: IEEE International Conference on Communications, 2005. ICC 2005. 2005. vol. 3, pp. 1726–1730. IEEE (2005)
Zeng, H., Huang, C., Vukovic, A.: A novel fault detection and localization scheme for mesh all-optical networks based on monitoring-cycles. Photonic Netw. Commun. 11(3), 277–286 (2006)
Zeng, H., Kazemian, P., Varghese, G., McKeown, N.: Automatic test packet generation. IEEE/ACM Trans. Netw. 22(2), 554–566 (2014)
Zhao, G., Xu, H., Fan, J., Huang, L., Qiao, C.: Achieving fine-grained flow management through hybrid rule placement in sdns. IEEE Trans. Parallel Distrib. Syst. 32(3), 728–742 (2020)
Funding
This work is supported by the National Key Research and Development Program of China No.2022YFB4500702; project ZR2022LZH018 supported by Shandong Provincial Natural Science Foundation; the National Natural Science Foundation of China under grant 62141218, 62372322 and the open project of Zhejiang Lab (2021DA0AM01/003).
Author information
Authors and Affiliations
Contributions
Hongyun Gao wrote the main manuscript text and prepared all the experiments and figures. Laiping Zhao guided the core logic of the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
Not applicable.
Ethics approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, H., Zhao, L., Chen, S. et al. Low-cost crossed probing path planning for network failure localization. World Wide Web 26, 3891–3914 (2023). https://doi.org/10.1007/s11280-023-01206-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-023-01206-7