Building Nemo, a system to monitor IP routing and traffic paths in real time
Section snippets
Introduction, motivation and contribution
IP routing is the most critical component of the Internet, and offers users and applications the illusion of being connected to a single network. Although routing protocols autonomously run without human intervention, efficiently operating complex IP backbones require timely and accurately tracking the routing system. Internal routing protocols (IGPs) can automatically react to topology changes (node or link failures), while the paths to external domains inadvertently swing for reasons out of
IP forwarding and routing in the Internet
Internet routing obeys the CIDR paradigm, a strategy for allocating and routing addresses that eliminated the idea of classes. When a router receives a packet, it consults its forwarding table (or Forwarding Information Base, FIB) that tells how to treat it according to the destination. Routes associate prefixes (of some length l) with an outgoing interface (Oif) on which packets are to be sent or the address of the next device en route (next-hop).
A prefix P/l represents a range of
Design constraints, challenges and requirements
Nemo was conceived with two constraints in mind: it had to work with legacy equipment and be as much vendor-independent as possible. Thus, not only had the tool to reside out of routers but rely only on standard, widely-supported functions.
Tracking R&F state in real time poses many challenges. An Internet router can keep over 500K IPv4 routes. Acquiring such a number of routes can be costly (in terms of router CPU and bandwidth), or too lengthy for a real-time monitoring. Further, the collected
Related work
Several tools exist to monitor IP routing in an AS (for internal operations) or the Internet (open to the community). Many ISPs and public networks use route servers and looking glasses with which users can inspect BGP tables, issue traceroute commands or CLI queries to routers. The Routeviews [33] project and RIPE RIS service [34] systematize the collection of BGP routes with open-source daemons that peer with nodes in major ISPs and Internet Exchanges (IXPs) [35], exposing RIB dumps and
On acquiring (or inferring) routers’ R&F state
Router R&F data can be obtained in several ways. One is opening a telnet, ssh or rsh session to the device, issuing CLI commands, and displaying or teeing the output. While used by looking glasses or screen-scraping scripts to pull fresh information, this is unsuitable for a systematic acquisition and processing of data. CLI syntaxes and output formats vary across brands, which requires tailoring commands and parsers; while refreshing the data entails issuing new commands periodically (or in
Overview of Nemo: components and internal structure
Nemo has three components: a modified Quagga bgpd daemon to handle the monitoring sessions, a core process, and a web server producing the GUI. bgpd is set up to store the BGP messages and FSM state changes of each session in a file, in MRT format. Our modifications include: (1) advertising the add-path capability, (2) disabling the processing of BGP updates and (3) an additional thread to handle a TCP connection with the core.
These modifications serve the following. As bgpd processes no update
The topological database (TDB) and how it is built
The TDB is similar to a link-state database, but IGP-agnostic. Initially, it contains data for the set of local routers (known by configuration), such as a nickname, an integer ID, an SNMP community and an address (of a loopback interface), used by the core to query each router via SNMP and know the router the BGP messages relayed by bgpd pertain to.
To build the TDB, Nemo requests each router a list of its interfaces and addresses via SNMP. These are needed to determine the router that an IP
The routing store (RS)
The RS is critical as it must keep R&F data for all routers, each potentially having several hundred thousand routes. Its design must balance several opposing needs: to be as informative as possible while compact in terms of memory (to scale with the number of routes and nodes), allowing for fast updates, lookups and trajectory computations. Next, we describe a first implementation that we used to check the correctness of Nemo in Rediris and GRnet. A more memory-efficient, but far less
Detecting R&F changes and keeping the RS in synch
The RS should mirror routers’ R&F state. Otherwise, Nemo could report misleading data or output incorrect paths, defeating its purpose. But, R&F state is dynamic and may change: IGPs may react to failures, engineers may add static routes, adjust link costs or BGP policies, and BGP routes be announced a new or withdrawn. Keeping the RS in sync entails handling R&F changes, for which they must be detected in the first place. To see how changes are dealt we must first understand whether changes in
Nemo’s graphical user interface
Visualization aids can greatly assist network operations [3]. Since, by design, we wanted Nemo to expose routers’ R&F data not only timely and accurately, but also comprehensibly, we implemented a CLI and a web-based GUI, with which data in the RS and TDB can be consulted and path queries made. To illustrate the utility of the tool, next we briefly discuss the GUI and show snapshots obtained in GRnet. There, we used the eBGP multihop setup, without redistributing the IGP (OSPF, 1 area) or
Summary and conclusion
Efficiently operating and troubleshooting large IP backbones require monitoring the routing system that determines how the traffic is steered and the network resources used. However, routing protocols offer little monitoring functions. Knowing how the traffic is switched and comprehending why requires visibility into both the data and control planes: forwarding state reflects how packets are switched, but offers no insight on route selection; better suited for diagnosis, routing state is less
Acknowledgments
The author is indebted to Alberto Escolano for his help in testing early versions of Nemo in Rediris, and Yannis Mitsos, Andreas Polyrakis, Afrodite Sevasti and the NOC team for their support and help when testing it in GRnet. He is also thankful to David Rincon and the anonymous reviewers for their valuable comments. This work was partially funded by the GN3 project of the EU FP7-ICT Programme (ref. 238875) and by the Ministry of Economy and Competitiveness of the Spanish Government (ref.
Frederic Raspall received M.Sc. (2001) and Ph.D. (2009) degrees in telecommunication engineering at the Technical University of Catalonia (UPC), Barcelona. From 2000 to 2004, he was with NEC Network Laboratories Heidelberg (Germany), where he developed network prototypes and participated in research projects funded by the European Union. Since 2004, he has been a lecturer at two engineering schools within the UPC and participated in research projects funded by the EU and the Spanish Ministry of
References (61)
- et al.
BGP convergence delay after multiple simultaneous router failures: characterization and solutions
Comput. Commun.
(2009) - et al.
BGP route prediction within ISPs
Comput. Commun.
(2010) - BGPMON, Route Monitoring. http://www.bgpmon.net,...
- et al.
A survey of BGP security issues and solutions
Proc. IEEE
(2010) - et al.
A study of prefix hijacking and interception in the internet
SIGCOMM CCR
(2007) - et al.
Visual analytics for BGP monitoring and prefix hijacking identification
IEEE Netw.
(2012) - et al.
Locating internet routing instabilities
SIGCOMM CCR
(2004) - et al.
Stabilizing BGP routing without harming convergence
2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
(2011) - et al.
There’s something about MRAI: timing diversity can exponentially worsen BGP convergence
2011 Proceedings IEEE INFOCOM
(2011) - et al.
Network-wide prediction of BGP routes
IEEE/ACM Trans. Netw.
(2007)
Improving network agility with seamless BGP reconfigurations
IEEE/ACM Trans. Netw.
BGP routing policies in ISP networks
IEEE Netw.
Detecting BGP configuration faults with static analysis
Proceedings of the Second Conference on Symposium on Networked Systems Design & Implementation—Volume 2, NSDI’05
Diagnosing network-wide traffic anomalies
SIGCOMM Comput. Commun. Rev.
A measurement framework for pin-pointing routing changes
Proceedings of the ACM SIGCOMM Workshop on Network Troubleshooting: Research, Theory and Operations Practice Meet Malfunctioning Reality, NetT ’04
Diagnosis of IP-service anomalies based on BGP-update temporal analysis
Characterization of failures in an operational IP backbone network
IEEE/ACM Trans. Netw.
Netscope: traffic engineering for IP networks
IEEE Netw.
Deriving traffic demands for operational IP networks: methodology and experience
IEEE/ACM Trans. Netw.
Traffic matrix estimation: existing techniques and new directions
SIGCOMM Comput. Commun. Rev.
Building a better netflow
SIGCOMM Comput. Commun. Rev.
Trajectory sampling for direct traffic observation
IEEE/ACM Trans. Netw.
CSAMP: a system for network-wide flow monitoring
Proceedings of USENIX NSDI’08
Cited by (0)
Frederic Raspall received M.Sc. (2001) and Ph.D. (2009) degrees in telecommunication engineering at the Technical University of Catalonia (UPC), Barcelona. From 2000 to 2004, he was with NEC Network Laboratories Heidelberg (Germany), where he developed network prototypes and participated in research projects funded by the European Union. Since 2004, he has been a lecturer at two engineering schools within the UPC and participated in research projects funded by the EU and the Spanish Ministry of Science and Education. His research interests include network monitoring and management, traffic measurements, Internet routing, network algorithms, estimation problems and software-defined networking.