Elsevier

Computer Networks

Volume 97, 14 March 2016, Pages 1-30
Computer Networks

Building Nemo, a system to monitor IP routing and traffic paths in real time

https://doi.org/10.1016/j.comnet.2015.12.011Get rights and content

Abstract

Efficiently operating and quickly troubleshooting Internet backbones require timely and accurate information about the routing system, their traffic demands, and how they are routed. Unfortunately, the dynamism and complexity of routing protocols, their lack of monitoring features and intricate interaction, together with the volume of routing state make their monitoring network-wide (e.g. across an AS or ISP) hard; while the narrow scope of traffic data provided by existent measurement tools and scalability issues make it difficult to infer the spatial distribution of the traffic. We present the design of a Network Monitoring tool (Nemo) conceived to offer two critical functions: first, monitoring nodes’ routing and forwarding state in nearly real time; second, the ability to derive forwarding paths, which can permit obtaining spatial representations of the traffic when combined with traffic measurement data. We discuss the hardest challenges to overcome to support these functions, how we addressed them, and show preliminary results obtained in two operational networks to illustrate the feasibility and utility of the tool. Nemo was conceived to work with legacy equipment, relying only on standard, widely-supported functions. Nevertheless, we identify downsides of recent monitoring features and suggest simple modifications to router behavior that could greatly ease monitoring the routing system.

Section snippets

Introduction, motivation and contribution

IP routing is the most critical component of the Internet, and offers users and applications the illusion of being connected to a single network. Although routing protocols autonomously run without human intervention, efficiently operating complex IP backbones require timely and accurately tracking the routing system. Internal routing protocols (IGPs) can automatically react to topology changes (node or link failures), while the paths to external domains inadvertently swing for reasons out of

IP forwarding and routing in the Internet

Internet routing obeys the CIDR paradigm, a strategy for allocating and routing addresses that eliminated the idea of classes. When a router receives a packet, it consults its forwarding table (or Forwarding Information Base, FIB) that tells how to treat it according to the destination. Routes associate prefixes (of some length l) with an outgoing interface (Oif) on which packets are to be sent or the address of the next device en route (next-hop).

A prefix P/l represents a range of 232l

Design constraints, challenges and requirements

Nemo was conceived with two constraints in mind: it had to work with legacy equipment and be as much vendor-independent as possible. Thus, not only had the tool to reside out of routers but rely only on standard, widely-supported functions.

Tracking R&F state in real time poses many challenges. An Internet router can keep over 500K IPv4 routes. Acquiring such a number of routes can be costly (in terms of router CPU and bandwidth), or too lengthy for a real-time monitoring. Further, the collected

Related work

Several tools exist to monitor IP routing in an AS (for internal operations) or the Internet (open to the community). Many ISPs and public networks use route servers and looking glasses with which users can inspect BGP tables, issue traceroute commands or CLI queries to routers. The Routeviews [33] project and RIPE RIS service [34] systematize the collection of BGP routes with open-source daemons that peer with nodes in major ISPs and Internet Exchanges (IXPs) [35], exposing RIB dumps and

On acquiring (or inferring) routers’ R&F state

Router R&F data can be obtained in several ways. One is opening a telnet, ssh or rsh session to the device, issuing CLI commands, and displaying or teeing the output. While used by looking glasses or screen-scraping scripts to pull fresh information, this is unsuitable for a systematic acquisition and processing of data. CLI syntaxes and output formats vary across brands, which requires tailoring commands and parsers; while refreshing the data entails issuing new commands periodically (or in

Overview of Nemo: components and internal structure

Nemo has three components: a modified Quagga bgpd daemon to handle the monitoring sessions, a core process, and a web server producing the GUI. bgpd is set up to store the BGP messages and FSM state changes of each session in a file, in MRT format. Our modifications include: (1) advertising the add-path capability, (2) disabling the processing of BGP updates and (3) an additional thread to handle a TCP connection with the core.

These modifications serve the following. As bgpd processes no update

The topological database (TDB) and how it is built

The TDB is similar to a link-state database, but IGP-agnostic. Initially, it contains data for the set of local routers (known by configuration), such as a nickname, an integer ID, an SNMP community and an address (of a loopback interface), used by the core to query each router via SNMP and know the router the BGP messages relayed by bgpd pertain to.

To build the TDB, Nemo requests each router a list of its interfaces and addresses via SNMP. These are needed to determine the router that an IP

The routing store (RS)

The RS is critical as it must keep R&F data for all routers, each potentially having several hundred thousand routes. Its design must balance several opposing needs: to be as informative as possible while compact in terms of memory (to scale with the number of routes and nodes), allowing for fast updates, lookups and trajectory computations. Next, we describe a first implementation that we used to check the correctness of Nemo in Rediris and GRnet. A more memory-efficient, but far less

Detecting R&F changes and keeping the RS in synch

The RS should mirror routers’ R&F state. Otherwise, Nemo could report misleading data or output incorrect paths, defeating its purpose. But, R&F state is dynamic and may change: IGPs may react to failures, engineers may add static routes, adjust link costs or BGP policies, and BGP routes be announced a new or withdrawn. Keeping the RS in sync entails handling R&F changes, for which they must be detected in the first place. To see how changes are dealt we must first understand whether changes in

Nemo’s graphical user interface

Visualization aids can greatly assist network operations [3]. Since, by design, we wanted Nemo to expose routers’ R&F data not only timely and accurately, but also comprehensibly, we implemented a CLI and a web-based GUI, with which data in the RS and TDB can be consulted and path queries made. To illustrate the utility of the tool, next we briefly discuss the GUI and show snapshots obtained in GRnet. There, we used the eBGP multihop setup, without redistributing the IGP (OSPF, 1 area) or

Summary and conclusion

Efficiently operating and troubleshooting large IP backbones require monitoring the routing system that determines how the traffic is steered and the network resources used. However, routing protocols offer little monitoring functions. Knowing how the traffic is switched and comprehending why requires visibility into both the data and control planes: forwarding state reflects how packets are switched, but offers no insight on route selection; better suited for diagnosis, routing state is less

Acknowledgments

The author is indebted to Alberto Escolano for his help in testing early versions of Nemo in Rediris, and Yannis Mitsos, Andreas Polyrakis, Afrodite Sevasti and the NOC team for their support and help when testing it in GRnet. He is also thankful to David Rincon and the anonymous reviewers for their valuable comments. This work was partially funded by the GN3 project of the EU FP7-ICT Programme (ref. 238875) and by the Ministry of Economy and Competitiveness of the Spanish Government (ref.

Frederic Raspall received M.Sc. (2001) and Ph.D. (2009) degrees in telecommunication engineering at the Technical University of Catalonia (UPC), Barcelona. From 2000 to 2004, he was with NEC Network Laboratories Heidelberg (Germany), where he developed network prototypes and participated in research projects funded by the European Union. Since 2004, he has been a lecturer at two engineering schools within the UPC and participated in research projects funded by the EU and the Spanish Ministry of

References (61)

  • A. Sahoo et al.

    BGP convergence delay after multiple simultaneous router failures: characterization and solutions

    Comput. Commun.

    (2009)
  • A. Flavel et al.

    BGP route prediction within ISPs

    Comput. Commun.

    (2010)
  • BGPMON, Route Monitoring. http://www.bgpmon.net,...
  • K. Butler et al.

    A survey of BGP security issues and solutions

    Proc. IEEE

    (2010)
  • H. Ballani et al.

    A study of prefix hijacking and interception in the internet

    SIGCOMM CCR

    (2007)
  • E. Biersack et al.

    Visual analytics for BGP monitoring and prefix hijacking identification

    IEEE Netw.

    (2012)
  • A. Feldmann et al.

    Locating internet routing instabilities

    SIGCOMM CCR

    (2004)
  • X. Wang et al.

    Stabilizing BGP routing without harming convergence

    2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)

    (2011)
  • A. Fabrikant et al.

    There’s something about MRAI: timing diversity can exponentially worsen BGP convergence

    2011 Proceedings IEEE INFOCOM

    (2011)
  • N. Feamster et al.

    Network-wide prediction of BGP routes

    IEEE/ACM Trans. Netw.

    (2007)
  • S. Vissicchio et al.

    Improving network agility with seamless BGP reconfigurations

    IEEE/ACM Trans. Netw.

    (2013)
  • M. Caesar et al.

    BGP routing policies in ISP networks

    IEEE Netw.

    (2005)
  • N. Feamster et al.

    Detecting BGP configuration faults with static analysis

    Proceedings of the Second Conference on Symposium on Networked Systems Design & Implementation—Volume 2, NSDI’05

    (2005)
  • A. Lakhina et al.

    Diagnosing network-wide traffic anomalies

    SIGCOMM Comput. Commun. Rev.

    (2004)
  • R. Teixeira et al.

    A measurement framework for pin-pointing routing changes

    Proceedings of the ACM SIGCOMM Workshop on Network Troubleshooting: Research, Theory and Operations Practice Meet Malfunctioning Reality, NetT ’04

    (2004)
  • O. Akashi et al.

    Diagnosis of IP-service anomalies based on BGP-update temporal analysis

  • A. Markopoulou et al.

    Characterization of failures in an operational IP backbone network

    IEEE/ACM Trans. Netw.

    (2008)
  • Packet Design, Route Explorer. http://www.packetdesign.com/products/route-explorer,...
  • IETF, Interface to the Routing System (I2RS) WG,...
  • A. Feldmann et al.

    Netscope: traffic engineering for IP networks

    IEEE Netw.

    (2000)
  • A. Feldmann et al.

    Deriving traffic demands for operational IP networks: methodology and experience

    IEEE/ACM Trans. Netw.

    (2001)
  • A. Medina et al.

    Traffic matrix estimation: existing techniques and new directions

    SIGCOMM Comput. Commun. Rev.

    (2002)
  • C. Estan et al.

    Building a better netflow

    SIGCOMM Comput. Commun. Rev.

    (2004)
  • N.G. Duffield et al.

    Trajectory sampling for direct traffic observation

    IEEE/ACM Trans. Netw.

    (2001)
  • B. Claise, A. Johnson, J. Quittek, Packet Sampling (PSAMP) Protocol Specifications, RFC 5476,...
  • V. Sekar et al.

    CSAMP: a system for network-wide flow monitoring

    Proceedings of USENIX NSDI’08

    (2008)
  • Cisco Systems, http://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/12512-41.html,...
  • Ext. Discussion. http://raspall.net/nemoExt.pdf,...
  • https://github.com/Fredi-raspall/Nemo,...
  • R. Zhang, M. Bartell, BGP Design and Implementation,...
  • Cited by (0)

    Frederic Raspall received M.Sc. (2001) and Ph.D. (2009) degrees in telecommunication engineering at the Technical University of Catalonia (UPC), Barcelona. From 2000 to 2004, he was with NEC Network Laboratories Heidelberg (Germany), where he developed network prototypes and participated in research projects funded by the European Union. Since 2004, he has been a lecturer at two engineering schools within the UPC and participated in research projects funded by the EU and the Spanish Ministry of Science and Education. His research interests include network monitoring and management, traffic measurements, Internet routing, network algorithms, estimation problems and software-defined networking.

    View full text