Elsevier

Computer Communications

Volume 36, Issue 6, 15 March 2013, Pages 645-655
Computer Communications

Optimizing IGP link costs for improving IP-level resilience with Loop-Free Alternates

https://doi.org/10.1016/j.comcom.2012.09.004Get rights and content

Abstract

The IP Fast ReRoute-Loop-Free Alternates (LFA) standard is a simple and easily deployable technique to provide fast failure protection right in the IP layer. To our days, most major IP device vendors have products on the market that support LFA out of the box. Unfortunately, LFA usually cannot protect all possible failure scenarios in a general network topology. Therefore, it is crucial to develop LFA-based network optimization tools in order to assist operators in deciding whether deploying LFA in their network will supply sufficient resiliency. In this paper, we give a new graph theoretical framework for analyzing LFA failure case coverage, and then we investigate how to optimize the Interior Gateway Protocol (IGP) link costs in order to maximize the number of protected failure scenarios. We show that this problem is NP-complete even in a very restricted formulation, and we give an exact algorithm as well as a complete family of heuristics to solve it. Our simulation studies indicate that a deliberate tuning of the approximation strategy can significantly improve the quality of the IGP link costs, and we conclude that LFA cost optimization has the potential for boosting LFA-based resilience in most operational networks significantly.

Introduction

To our days, the Internet Protocol (IP) suite has become the de facto standard for large-scale inter-networking throughout the world. The protocol suite, with its accompanying control plane protocols, has come a long way to become a viable bearing platform for commercial telecom services. Unfortunately, there still exists missing functionality in IP that make it difficult to sustain the transmission quality required by multimedia applications, like VoIP, IPTV, online gaming, etc., in an IP environment. One of the most prominent shortcomings in existence today is the slow reaction to device and link failures. Interior Gateway Protocols (IGPs), like the Open Shortest Path First (OSPF, [1]) or the Integrated IS–IS (IS–IS, [2]) routing protocol, adopt a restoration-based resilience approach, based on a global flooding of failure information and a lengthy network-wide re-convergence process. This slow reaction to failures, inherent to the traditional IP control plane, does not only hinder operators providing telecom services over pure IP, but a growing number of service providers that switched to MultiProtocol Label Switching–Label Distribution Protocol (MPLS/LDP) also suffer, because MPLS-LDP also relies on the IP control plane for routing information.

The key to the slow convergence of IGPs is the global, reactive response philosophy they adopt: failure information is distributed to all routers in the administrative scope, which in turn react by recomputing their routing tables and refreshing their forwarding information bases in accordance with the changed network topology. This often leads to convergence time in the range of couple of hundreds of milliseconds to several seconds, and even a very careful adjustment of the IGP parameters [3] is insufficient to decrease this to less than 50 ms, usually used as a rough estimate on the longest outage a modern multimedia application can tolerate.

In order to achieve a sub-50 ms convergence time, one needs to go beyond conventional IGP-based restoration and invoke a proactive, local protection method, called IP Fast ReRoute (IPFRR, [4]). In IPFRR, routers precompute alternate next-hops proactively, and traffic is instantly switched to these secondary next-hops should the primary next-hop becomes unavailable. This ensures that traffic flows without interruption until the IGP converges in the background. Note that in IPFRR only the routers in the immediate vicinity of the failed component participate in the failure recovery process, and routers several hops away do not even get notified about the outage. This saves the time needed for global failure notification, one of the most time-consuming steps in IGP-based restoration.

It turned out, however, that combining local protection with IP’s intrinsic destination-based forwarding scheme is notoriously difficult. This is because a router not immediately adjacent to the failure, not knowing that a failure in fact has occurred, has no way to decide whether a received packet is traveling on its default shortest path to the destination, or it is actually being routed around a failure and so out-of-order forwarding rules should be applied to it. Any IPFRR mechanism, therefore, that does not adopt special remedies to this problem, is prone to either producing micro-loops or being unable to handle certain failure cases [5]. To avoid this, IPFRR proposals either apply explicit or implicit failure signaling [6], [7], [8], or alter IP’s destination-based forwarding [9], or introduce tunnels to route around the failed component [10], [11], [12]. Deploying these IPFRR mechanisms, however, would either demand non-trivial modifications to the essential IP infrastructure or impose considerable management burden on network operations [13] (or both), making network device vendors reluctant to implement them and discouraging operators from deploying IPFRR all together.

To our days, only a single IPFRR specification has found its way into commercial IP routers: Loop-Free Alternates (LFA, [14]). This can be attributed to the fact that LFA is a clever trade-off between simplicity and protection-capability, in that LFA has never been intended to provide 100% protection for all possible failure cases because, as we argued above, this would require widespread modifications to the IP infrastructure and so would hinder deployment. Instead, LFA is as simple as it can get: traffic impacted by a failure is passed onto an alternate next-hop (called a Loop Free Alternate) that still has an intact path to the destination. When the aim is merely to protect against link outages then it is enough to ensure that the detour bypasses the link to the next-hop, while for node-protecting LFAs it is a requirement to avoid both the link to the next-hop and the next-hop itself. LFA can be implemented with a straightforward upgrade to IGPs, without special staff-training and extensive pilot deployments, and so it can be introduced incrementally. On the other hand, as the price of this simplicity, depending on the network topology and IGP link costs very often not all routers have LFAs to all destinations, making it impossible to repair certain failure scenarios rapidly with LFA.

Consequently, many operators are hesitating to enable LFA, trying to measure the expected benefits against the additional costs. In this paper, we seek ways to assist in making this important decision. In the first part, we give new graph theoretical tools for analyzing LFA failure case coverage in operational networks. Similar protectability analyses are already available for some non-standardized IPFRR mechanisms: [15] considers the O2 method and [16] discusses a centralized destination-based routing scheme. For LFA, only simulation-based reports have been available this far [17], [18], [19], [20], and mathematical analysis has been confined to the link-protection case [21], [22]. Below, we extend previous work on mathematical LFA-coverage analysis with new tools for studying both the link- and node-protection cases as well.

Initial deployments as well as numerical analyses confirmed that in many operational networks LFA indeed does not guarantee protection for all failure scenarios [19]. This calls for developing network optimization tools to tune the network topology in a way as to increase the number of failure cases protectable by LFA. There are various approaches to reach this end. One way is LFA network design, which aims to design LFA-friendly network topologies right from the outset [20]. Another approach is LFA graph extension, where the task is to alter the network topology to boost LFA coverage [21]). Third, LFA cost optimization asks to construct IGP link costs in a way as to maximize the number of possible failure cases protectable by LFA [23], [24], [22]. This LFA cost optimization problem is in the main focus in the second part of this paper. While improving IP resilience is a recurring theme in the literature (see [25] for deflection routing, [15] for O2, or [16] for a review), for the specific case of LFA only the joint optimization of network performance and resilience has been investigated previously [23], [24]. Thus, at the moment very little understanding is available as to how much LFA-based IP Fast ReRoute is suitable to protect an IP network and to what extent this can be improved by optimizing link costs.

The main contributions in this paper are as follows.

  • We develop a comprehensive graph theoretical LFA analysis framework, for the first time considering both the link-protection and node-protection cases.

  • We study the LFA cost optimization problem in huge detail. We show that this problem is NP-complete, and we give an exact algorithm of exponential complexity as well as a family of heuristics with tunable performance and running time. Our selection of heuristics facilitate for picking the right approximation algorithm for the particular problem under consideration.

  • We provide a comprehensive numerical evaluation of LFA cost optimization methods to compare their performance on a wide range of artificial and realistic graph topologies.

The rest of this paper is organized as follows. After reviewing the related literature in Section 2 and introducing the notations and the model in Section 3, we first discuss LFA failure coverage analysis (see Section 4). Then, in Section 5 we turn to discuss the LFA cost optimization problem. In Section 6, we evaluate the proposed algorithms numerically and finally we conclude our work with Section 7.

Section snippets

Related works

The IP fast ReRoute Framework was initiated by the Internet Engineering Task Force in [4], and the Loop-Free Alternates standard, as the basic specification for IPFRR, was subsequently documented in [14]. It was from the very beginning made clear by the IETF that LFA does not guarantee fast protection for all possible failure scenarios in all network topologies. This was later confirmed by extensive simulation studies, which indicated that, depending on the topology and link cost settings, LFA

Model and problem formulation

We model the network with a connected, undirected graph G(V,E), the set of nodes is denoted by V (|V|=n) and set of edges by E (|E|=m). Let Ni denote the set of neighbors of some node iV. IGP link costs are represented by an edge cost function c:EN. The cost of an edge (i,j) is denoted by c(i,j). We presume that the network topology G(V,E) and the cost function c are readily available to the network nodes through the IGP, using which all routers can compute the shortest path distance between

LFA failure coverage analysis

Before turning to discuss how to solve the LFA cost optimization problem, first we show some simple theoretical limits on LFA coverage. In particular, we give tight graph theoretical lower and upper bounds on the LFA coverage achievable in a given graph under any selection of link costs. We shall discuss both the link-protecting and the node-protecting cases.

Our analysis is intended to serve for operators to quickly assess the benefits LFA-based fast protection can bring in their network as

LFA cost optimization

Next, we turn to the LFA cost optimization problem. This problem asks for an IGP link cost setting that maximizes the LFA coverage, given the inherent limitations of the network topology under consideration. First, we characterize the extent to which such an optimization can improve LFA coverage, then we discuss the complexity and the algorithmic aspects of the problem. Most of the observations apply to LFAs generally, without regard to link-protection or node-protection, so, unless otherwise

Numerical evaluations

In the course of our numerical studies, first we were curious as to how close the approximate LFA cost optimization algorithms can get to the optimum. Therefore, we implemented the ILP (6), (7), (8), (9), (10), (11) and the approximation framework described in Section 5. Below, only results for the greedy cost selection rule (choose_greedy_cost(c)) and the temperature-proportional acceptance rule (proportional_test(Δη,T)) are given, with a tabu list of size 20, no restarting and no quantum

Conclusions

In this paper, we have assessed the possibilities of improving fast resilience in operational IP networks using the Loop-Free Alternates IPFRR technique. The motivation for choosing LFA over its alternatives is its simplicity, easy deployability, and availability in IP routers. We presented new tools to quickly estimate LFA failure case coverage both in the link-protecting and the node-protecting cases, and we sought ways to improve it by carefully adjusting IGP link costs. We showed that this

Acknowledgements

G.R. was supported by the János Bolyai Fellowship of the Hungarian Academy of Sciences. J.T. was supported by the Magyary Zoltán program. The project was supported by TÁMOP 4.2.2.B-10/1-2010-0009 grant.

References (44)

  • M. Menth et al.

    Loop-free alternates and not-via addresses: a proper combination for IP fast reroute?

    Comput. Netw.

    (2010)
  • J. Moy, OSPF Version 2, RFC 2328, April...
  • R. Callon, Use of OSI IS-IS for routing in TCP/IP and dual environments, RFC 1195 (December...
  • P. Francois et al.

    Achieving sub-second IGP convergence in large IP networks

    SIGCOMM Comput. Commun. Rev.

    (2005)
  • M. Shand et al.

    IP fast reroute framework

    RFC

    (2010)
  • G. Enyedi, G. Rétvári, T. Cinkler, A novel loop-free IP fast reroute algorithm, in: EUNICE, 2007, pp....
  • I. Hokelek et al.

    Loop-free IP fast Reroute using local and remote LFAPs

    Internet Draft

    (2008)
  • A. Li, X. Yang, D. Wetherall, SafeGuard: safe forwarding during route changes, in: ACM CoNEXT, 2009, pp....
  • A. Kvalbein et al.

    Multiple routing configurations for fast IP network recovery

    IEEE/ACM Trans. Netw.

    (2009)
  • S. Lee, Y. Yu, S. Nelakuditi, Z.-L. Zhang, C.-N. Chuah, Proactive vs reactive approaches to failure resilient routing,...
  • S. Bryant et al.

    IP fast reroute using tunnels

    Internet Draft

    (2007)
  • S. Bryant et al.

    IP fast reroute using not-via addresses

    Internet Draft

    (2010)
  • G. Enyedi, P. Szilágyi, G. Rétvári, A. Császár, IP fast ReRoute: lightweight not-via without additional addresses, in:...
  • A. Li, P. Francois, X. Yang, On improving the efficiency and manageability of NotVia, in: ACM CoNEXT, 2007, pp....
  • A. Atlas, A. Zinin, Basic specification for IP fast reroute: loop-free alternates, RFC 5286...
  • C. Reichert, T. Magedanz, Topology requirements for resilient IP networks, in: MMB, 2004, pp....
  • K.-W. Kwong, L. Gao, R. Guerin, Z.-L. Zhang, On the feasibility and efficacy of protection routing in IP networks, in:...
  • P. Francois, O. Bonaventure, An evaluation of IP-based fast reroute techniques, in: ACM CoNEXT, 2005, pp....
  • S. Previdi

    IP fast ReRoute technologies

    APRICOT

    (2006)
  • M. Gjoka, V. Ram, X. Yang, Evaluation of IP fast reroute proposals, in: IEEE Comsware, 2007, pp....
  • C. Filsfils

    LFA applicability in SP networks

    Internet Draft

    (2010)
  • G. Rétvári, J. Tapolcai, G. Enyedi, A. Császár, IP fast ReRoute: loop free alternates revisited, in: INFOCOM, 2011, pp....
  • Cited by (23)

    • Routing optimization for IP networks with loop-free alternates

      2016, Computer Networks
      Citation Excerpt :

      Retvari et al. [14,16] propose a mixed integer program and a heuristic approach to improve the LFA coverage by link cost optimization. They show that the problem is NP-complete, and recently included the protection of node failures as well as lower and upper bounds on LFA coverage in their work [34]. As it may be impossible to achieve full LFA coverage, additions and modifications to LFAs have been proposed.

    • Reliable network-based services

      2013, Computer Communications
    • An Intra-Domain Routing Protection Algorithm Based on Forwarding Graph

      2024, Jisuanji Yanjiu yu Fazhan/Computer Research and Development
    • Robust LFA Protection for Software-Defined Networks (RoLPS)

      2021, IEEE Transactions on Network and Service Management
    View all citing articles on Scopus
    View full text