Skip to main content
Log in

An analysis of interdomain availability and causes of failures based on active measurements

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

With the objective to better understand how the global Internet should achieve an availability in the order of “five nines”, i.e. be available 0.99999 of the time, active measurements were performed between Norway and China through the Global Research Network. End-to-end downtime statistics were collected during two 3-month periods, mid November 2009 till mid February 2010 and July 2010 till September 2010. Probe packets were sent every 10 ms between the two measurement systems supplemented by traceroute measurements every two minutes. The collected data (TTL, timestamps, sequence numbers and traceroute output) enabled identification and characterization of IP-level paths between the end-points. Causes of observed network failures were identified and insight is gained into processes preceding and following communication downtimes. We distinguish inter- and intradomain failures and, when possible, identify an exact link or an Autonomous System where a certain event has happened. The study shows that the end-to-end path availability is mainly affected by interdomain failures and long BGP convergence time as well as series of events not straight forwardly explained by the anticipated (re)routing behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, S., Chuan, C. N., Bhattacharyya, S., & Diot, C. (2004). The impact of bgp dynamics on intra-domain traffic. In Proceedings of ACM SIGMETRICS, New York, USA, June 2004.

    Google Scholar 

  2. Alaettinoglou, A., & Casner, S. (2002). Detailed analysis of isis routing protocol on the qwest backbone. NANOG. Available from: http://www.nanog.org/mtg-0202/ppt/cengiz.pdf.

  3. Butskoy, D. (2010). traceroute for Linux. Available via http://traceroute.sourceforge.net.

  4. Chang, D. F., Govindan, R., & Heidemann, J. (2003). The temporal and topological characteristics of BGP path changes. In Proceedings of the international conference on network protocols (ICNP 2003). Atlanta, Georga, USA, November 2003 (pp. 190–199).

    Chapter  Google Scholar 

  5. Watson, D., Jahanian, F., & Labovitz, C. (2003). Experiences with monitoring ospf on a regional service provider network. In Proceedings of the 23rd international conference on distributed computing systems (ICDCS 2003). Providence, RI, USA, May 2003.

    Google Scholar 

  6. Francois, P., Filsfils, C., Evans, J., & Bonaventure, O. (2005). Achieving sub-second IGP convergence in large IP networks. ACM SIGCOMM Computer Communication Review, 35(3), 35–44.

    Article  Google Scholar 

  7. Gredler, H., & Goralski, W. (2004). The complete IS-IS routing protocol. Berlin: Springer.

    Google Scholar 

  8. ITU-T. (1996). G.841—types and characteristics of SDH network protection architectures.

  9. Jacobson, V., Leres, C., & McCanne, S. (2010). tcpdump—dump traffic on a network. Available via http://www.tcpdump.org.

  10. Kilpi, J., Norros, I., & Pulkkinen, U. (2007). Downtime-frequency curves for availability characterization. In Proceedings of the 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN2007), Edinburgh, UK, June 2007.

    Google Scholar 

  11. Labovitz, C., & Ahuja, A. (2001). The impact of Internet policy and topology on delayed routing convergence. In Proceedings of IEEE INFOCOM, Anchorage, AK, April 2001 (Vol. 1, pp. 537–546).

    Google Scholar 

  12. Labovitz, C., Ahuja, A., & Jahanian, F. (1999). Experimental study of Internet stability and wide-area network failures. In Proceedings of the twenty-ninth annual international symposium on fault-tolerant computing (FTCS-29), Madison, Wisconsin, USA, June 1999.

    Google Scholar 

  13. Labovitz, C., Ahuja, A., Bose, A., & Jahanian, F. (2001). Delayed Internet routing convergence. IEEE/ACM Transactions on Networking, 9, 293–306.

    Article  Google Scholar 

  14. Laine, J., Saaristo, S., & Prior, R. (2010). Rude/crude real-time udp data emitter/collector. http://rude.sourceforge.net/.

  15. Markopoulou, A., Iannaccone, G., Bhattacharyya, S., Chuah, C., & Diot, C. (2008). Characterization of failures in an operational ip backbone network. IEEE/ACM Transactions on Networking, 16(4), 749–762.

    Article  Google Scholar 

  16. Mills, D., Delaware, U., Martin, J., Burbank, J., & Kasch, W. (2010). RFC5905—network time protocol version 4: Protocol and algorithms specification. IETF, June 2010.

  17. Moy, J. (1998). OSPF: anatomy of an Internet routing protocol. Reading: Addison-Wesley.

    Google Scholar 

  18. Myakotnykh, E., Helvik, B., Wittner, O. J., Kvittem, O., Hellan, J. K., Skjesol, T., & Øslebø, A. (2010). An empirical analysis of dependability characteristics on a global route. In Proceedings of IEEE international workshop on quality of services (IWQoS 2010). Beijing, China, June 2010. New York: IEEE Press.

    Google Scholar 

  19. University of Oregon. (2010). Routeviews routing table archive. Available from: http://www.routeviews.org/.

  20. Paxson, V. (1997). End-to-end routing behaviour in the Internet. IEEE/ACM Transactions on Networking, 5(5), 601–615.

    Article  Google Scholar 

  21. Pei, D., Azuma, M., Nguyen, N., Chen, J., Massey, D., & Zhang, L. (2005). BGP-RCN: improving BGP convergence through root cause notification. Computer Networks, 48(2), 175–194.

    Article  Google Scholar 

  22. Pei, D., Zhang, B., Massey, D., & Zhang, L. (2006). An analysis of convergence delay in path vector routing protocols. Computer Networks, 30(3), 398–421.

    Article  Google Scholar 

  23. Redelmeier, R. (2010). CPUburn—CPU testing utility. Available at http://pages.sbcglobal.net/redelm/.

  24. RIPE. (2010). Routing information service project. Available from: http://www.ripe.net.

  25. Sahoo, A., Kant, K., & Mohapatra, P. (2006). Characterization of BGP recovery under large-scale failures. In Proceedings of IEEE international conference on communications (ICC 2006), Istanbul, Turkey, June 2006.

    Google Scholar 

  26. Shaikh, A., & Greenberg, A. (2004). OSPF monitoring: Architecture, design and deployment experience. In USENIX 1st symp. networked systems design and implementation (NSDI 2004), San Francisco, CA, March 2004 (pp. 57–70).

    Google Scholar 

  27. Shaikh, A., Isett, C., Greenberg, A., Roughan, M., & Gottlieb, J. (2002). A case study in OSPF behavior in a large enterprise network. In Proceedings of the ACM SIGCOMM Internet measurement workshop, Marseille, France, November 2002.

    Google Scholar 

  28. Wang, F., Feamster, N., & Gao, L. (2007). Measuring the contributions of routing dynamics to prolonged end-to-end Internet path failures. In Proceedings of the IEEE global telecommunications conference (GLOBECOM 2007) Washington DC, USA, November 2007.

    Google Scholar 

  29. Wang, F., Mao, Z. M., Wang, J., Gao, L., & Bush, R. (2006). A measurement study on the impact of routing events on end-to-end Internet path performance. In Proceedings of ACM SIGCOMM, Pisa, Italy, September 2006 (pp. 375–386).

    Google Scholar 

  30. Wang, F., Qiu, J., Gao, L., & Wang, J. (2009). On understanding transient interdomain routing failures. IEEE/ACM Transactions on Networking, 17(3).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Otto J. Wittner.

Additional information

E.S. Myakotnykh, B.E. Helvik, and A. Abdelkefi: “Centre for Quantifiable Quality of Service in Communication Systems, Centre of Excellence” appointed by The Research Council of Norway, funded by the Research Council, NTNU and UNINETT. http://www.q2s.ntnu.no.

O.J. Wittner, J.K. Hellan, O. Kvittem, T. Skjesol, A. Øslebø: UNINETT is a group of companies which supplies network services for universities, university colleges and research institutions in Norway and handles other national ICT tasks. The Group is owned by the Norwegian Ministry of Education and Research. http://www.uninett.no.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Myakotnykh, E.S., Wittner, O.J., Helvik, B.E. et al. An analysis of interdomain availability and causes of failures based on active measurements. Telecommun Syst 52, 847–860 (2013). https://doi.org/10.1007/s11235-011-9586-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-011-9586-1

Keywords

Navigation