Abstract
With the objective to better understand how the global Internet should achieve an availability in the order of “five nines”, i.e. be available 0.99999 of the time, active measurements were performed between Norway and China through the Global Research Network. End-to-end downtime statistics were collected during two 3-month periods, mid November 2009 till mid February 2010 and July 2010 till September 2010. Probe packets were sent every 10 ms between the two measurement systems supplemented by traceroute measurements every two minutes. The collected data (TTL, timestamps, sequence numbers and traceroute output) enabled identification and characterization of IP-level paths between the end-points. Causes of observed network failures were identified and insight is gained into processes preceding and following communication downtimes. We distinguish inter- and intradomain failures and, when possible, identify an exact link or an Autonomous System where a certain event has happened. The study shows that the end-to-end path availability is mainly affected by interdomain failures and long BGP convergence time as well as series of events not straight forwardly explained by the anticipated (re)routing behavior.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal, S., Chuan, C. N., Bhattacharyya, S., & Diot, C. (2004). The impact of bgp dynamics on intra-domain traffic. In Proceedings of ACM SIGMETRICS, New York, USA, June 2004.
Alaettinoglou, A., & Casner, S. (2002). Detailed analysis of isis routing protocol on the qwest backbone. NANOG. Available from: http://www.nanog.org/mtg-0202/ppt/cengiz.pdf.
Butskoy, D. (2010). traceroute for Linux. Available via http://traceroute.sourceforge.net.
Chang, D. F., Govindan, R., & Heidemann, J. (2003). The temporal and topological characteristics of BGP path changes. In Proceedings of the international conference on network protocols (ICNP 2003). Atlanta, Georga, USA, November 2003 (pp. 190–199).
Watson, D., Jahanian, F., & Labovitz, C. (2003). Experiences with monitoring ospf on a regional service provider network. In Proceedings of the 23rd international conference on distributed computing systems (ICDCS 2003). Providence, RI, USA, May 2003.
Francois, P., Filsfils, C., Evans, J., & Bonaventure, O. (2005). Achieving sub-second IGP convergence in large IP networks. ACM SIGCOMM Computer Communication Review, 35(3), 35–44.
Gredler, H., & Goralski, W. (2004). The complete IS-IS routing protocol. Berlin: Springer.
ITU-T. (1996). G.841—types and characteristics of SDH network protection architectures.
Jacobson, V., Leres, C., & McCanne, S. (2010). tcpdump—dump traffic on a network. Available via http://www.tcpdump.org.
Kilpi, J., Norros, I., & Pulkkinen, U. (2007). Downtime-frequency curves for availability characterization. In Proceedings of the 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN2007), Edinburgh, UK, June 2007.
Labovitz, C., & Ahuja, A. (2001). The impact of Internet policy and topology on delayed routing convergence. In Proceedings of IEEE INFOCOM, Anchorage, AK, April 2001 (Vol. 1, pp. 537–546).
Labovitz, C., Ahuja, A., & Jahanian, F. (1999). Experimental study of Internet stability and wide-area network failures. In Proceedings of the twenty-ninth annual international symposium on fault-tolerant computing (FTCS-29), Madison, Wisconsin, USA, June 1999.
Labovitz, C., Ahuja, A., Bose, A., & Jahanian, F. (2001). Delayed Internet routing convergence. IEEE/ACM Transactions on Networking, 9, 293–306.
Laine, J., Saaristo, S., & Prior, R. (2010). Rude/crude real-time udp data emitter/collector. http://rude.sourceforge.net/.
Markopoulou, A., Iannaccone, G., Bhattacharyya, S., Chuah, C., & Diot, C. (2008). Characterization of failures in an operational ip backbone network. IEEE/ACM Transactions on Networking, 16(4), 749–762.
Mills, D., Delaware, U., Martin, J., Burbank, J., & Kasch, W. (2010). RFC5905—network time protocol version 4: Protocol and algorithms specification. IETF, June 2010.
Moy, J. (1998). OSPF: anatomy of an Internet routing protocol. Reading: Addison-Wesley.
Myakotnykh, E., Helvik, B., Wittner, O. J., Kvittem, O., Hellan, J. K., Skjesol, T., & Øslebø, A. (2010). An empirical analysis of dependability characteristics on a global route. In Proceedings of IEEE international workshop on quality of services (IWQoS 2010). Beijing, China, June 2010. New York: IEEE Press.
University of Oregon. (2010). Routeviews routing table archive. Available from: http://www.routeviews.org/.
Paxson, V. (1997). End-to-end routing behaviour in the Internet. IEEE/ACM Transactions on Networking, 5(5), 601–615.
Pei, D., Azuma, M., Nguyen, N., Chen, J., Massey, D., & Zhang, L. (2005). BGP-RCN: improving BGP convergence through root cause notification. Computer Networks, 48(2), 175–194.
Pei, D., Zhang, B., Massey, D., & Zhang, L. (2006). An analysis of convergence delay in path vector routing protocols. Computer Networks, 30(3), 398–421.
Redelmeier, R. (2010). CPUburn—CPU testing utility. Available at http://pages.sbcglobal.net/redelm/.
RIPE. (2010). Routing information service project. Available from: http://www.ripe.net.
Sahoo, A., Kant, K., & Mohapatra, P. (2006). Characterization of BGP recovery under large-scale failures. In Proceedings of IEEE international conference on communications (ICC 2006), Istanbul, Turkey, June 2006.
Shaikh, A., & Greenberg, A. (2004). OSPF monitoring: Architecture, design and deployment experience. In USENIX 1st symp. networked systems design and implementation (NSDI 2004), San Francisco, CA, March 2004 (pp. 57–70).
Shaikh, A., Isett, C., Greenberg, A., Roughan, M., & Gottlieb, J. (2002). A case study in OSPF behavior in a large enterprise network. In Proceedings of the ACM SIGCOMM Internet measurement workshop, Marseille, France, November 2002.
Wang, F., Feamster, N., & Gao, L. (2007). Measuring the contributions of routing dynamics to prolonged end-to-end Internet path failures. In Proceedings of the IEEE global telecommunications conference (GLOBECOM 2007) Washington DC, USA, November 2007.
Wang, F., Mao, Z. M., Wang, J., Gao, L., & Bush, R. (2006). A measurement study on the impact of routing events on end-to-end Internet path performance. In Proceedings of ACM SIGCOMM, Pisa, Italy, September 2006 (pp. 375–386).
Wang, F., Qiu, J., Gao, L., & Wang, J. (2009). On understanding transient interdomain routing failures. IEEE/ACM Transactions on Networking, 17(3).
Author information
Authors and Affiliations
Corresponding author
Additional information
E.S. Myakotnykh, B.E. Helvik, and A. Abdelkefi: “Centre for Quantifiable Quality of Service in Communication Systems, Centre of Excellence” appointed by The Research Council of Norway, funded by the Research Council, NTNU and UNINETT. http://www.q2s.ntnu.no.
O.J. Wittner, J.K. Hellan, O. Kvittem, T. Skjesol, A. Øslebø: UNINETT is a group of companies which supplies network services for universities, university colleges and research institutions in Norway and handles other national ICT tasks. The Group is owned by the Norwegian Ministry of Education and Research. http://www.uninett.no.
Rights and permissions
About this article
Cite this article
Myakotnykh, E.S., Wittner, O.J., Helvik, B.E. et al. An analysis of interdomain availability and causes of failures based on active measurements. Telecommun Syst 52, 847–860 (2013). https://doi.org/10.1007/s11235-011-9586-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-011-9586-1