skip to main content
research-article

Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis

Published:17 August 2015Publication History
Skip Abstract Section

Abstract

Can we get network latency between any two servers at any time in large-scale data center networks? The collected latency data can then be used to address a series of challenges: telling if an application perceived latency issue is caused by the network or not, defining and tracking network service level agreement (SLA), and automatic network troubleshooting. We have developed the Pingmesh system for large-scale data center network latency measurement and analysis to answer the above question affirmatively. Pingmesh has been running in Microsoft data centers for more than four years, and it collects tens of terabytes of latency data per day. Pingmesh is widely used by not only network software developers and engineers, but also application and service developers and operators.

Skip Supplemental Material Section

Supplemental Material

p139-guo.webm

webm

161.8 MB

References

  1. M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In Proc. SIGCOMM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexey Andreyev. Introducing data center fabric, the next-generation Facebook data center network. https://code.facebook.com/posts/360346274145943/, Nov 2014.Google ScholarGoogle Scholar
  3. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  4. Peter Bailis and Kyle Kingsbury. The Network is Reliable: An Informal Survey of Real-World Communications Failures. ACM Queue, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Luiz Barroso, Jeffrey Dean, and Urs H$\ddoto$lzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, March-April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Theophilus Benson, Aditya Akella, and David A. Maltz. Network Traffic Characteristics of Data Centers in the Wild. In Internet Measurement Conference, November 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. et.al Brad Calder. Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cisco. IP SLAs Configuration Guide, Cisco IOS Release 12.4T. http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/ipsla/configuration/12--4t/sla-12--4t-book.pdf.Google ScholarGoogle Scholar
  9. Citrix. What is Load Balancing? http://www.citrix.com/glossary/load-balancing.html.Google ScholarGoogle Scholar
  10. Jeffrey Dean and Luiz Andr$\acutee$ Barroso. The Tail at Scale. CACM, Februry 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Albert Greenberg et al. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chi-Yao Hong et al. Achieving High Utilization with Software-Driven WAN. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Parveen Patel et al. Ananta: Cloud Scale Load Balancing. In ACM SIGCOMMM. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Chaiken et al. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In VLDB'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sushant Jain et al. B4: Experience with a Globally-Deployed Software Defined WAN. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In ACM SOSP. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nicolas Guilbaud and Ross Cartlidge. Google Backbone Monitoring, Localizing Packet Loss in a Large Complex Network, Feburary 2013. Nanog57.Google ScholarGoogle Scholar
  19. Nikhil Handigol, Brandon Heller, Vimalkumar Jeyakumar, David Mazi$\gravee$res, and Nick McKeown. I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks. In NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael Isard. Autopilot: Automatic Data Center Management. ACM SIGOPS Operating Systems Review, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken. The nature of data center traffic: Measurements & analysis. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, IMC '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rishi Kapoor, Alex C. Snoeren, Geoffrey M. Voelker, and George Porter. Bullet Trains: A Study of NIC Burst Behavior at Microsecond Timescales. In ACM CoNEXT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cade Metz. Return of the Borg: How Twitter Rebuilt Google's Secret Weapon. http://www.wired.com/2013/03/google-borg-twitter-mesos/all/, March 2013.Google ScholarGoogle Scholar
  24. Wenfei Wu, Guohui Wang, Aditya Akella, and Anees Shaikh. Virtual Network Diagnosis as a Service. In SoCC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hongyi Zeng, Peyman Kazemian, George Varghese, and Nick McKeown. Automatic Test Packet Generation. In CoNEXT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGCOMM Computer Communication Review
            ACM SIGCOMM Computer Communication Review  Volume 45, Issue 4
            SIGCOMM'15
            October 2015
            659 pages
            ISSN:0146-4833
            DOI:10.1145/2829988
            Issue’s Table of Contents
            • cover image ACM Conferences
              SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
              August 2015
              684 pages
              ISBN:9781450335423
              DOI:10.1145/2785956

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 August 2015

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader