skip to main content
10.1145/3184407.3184427acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article
Public Access

Measuring Network Latency Variation Impacts to High Performance Computing Application Performance

Published:30 March 2018Publication History

ABSTRACT

In this paper, we study the impacts of latency variation versus latency mean on application runtime, library performance, and packet delivery. Our contributions include the design and implementation of a network latency injector that is suitable for most QLogic and Mellanox InfiniBand cards. We fit statistical distributions of latency mean and variation to varying levels of network contention for a range of parallel application workloads. We use the statistical distributions to characterize the latency variation impacts to application degradation. The level of application degradation caused by variation in network latency depends on application characteristics, and can be significant. Observed degradation varies from no degradation for applications without communicating processes to 3.5 times slower for communication-intensive parallel applications. We support our results with statistical analysis of our experimental observations. For communication-intensive high performance computing applications, we show statistically significant evidence that changes in performance are more highly correlated with changes of variation in network latency than with changes of mean network latency alone.

References

  1. Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In ACM SIGCOMM Computer Communication Review, Vol. Vol. 40. ACM, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. OpenFabrics Alliance. 2012. Openfabrics Enterprise Distribution. (2012). https://www.openfabrics.org/index.php/openfabrics-software.htmlGoogle ScholarGoogle Scholar
  3. David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et almbox.. 1991. The NAS parallel benchmarks. The International Journal of Supercomputing Applications, Vol. 5, 3 (1991), 63--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Abhinav Bhatelé and Laxmikant V Kalé. 2009. Quantifying network contention on large parallel machines. Parallel Processing Letters Vol. 19, 04 (2009), 553--572.Google ScholarGoogle ScholarCross RefCross Ref
  5. Abhinav Bhatelé, Kathryn Mohror, Steven H Langer, and Katherine E Isaacs. 2013. There goes the neighborhood: performance degradation due to nearby jobs Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Abhinav Bhatelé, Andrew R Titus, Jayaraman J Thiagarajan, Nikhil Jain, Todd Gamblin, Peer-Timo Bremer, Martin Schulz, and Laxmikant V Kalé. 2015. Identifying the culprits behind network congestion Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mark S Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D Underwood, and Robert C Zak. 2015. Intel® Omni-path architecture: Enabling scalable, high performance fabrics High-Performance Interconnects (HOTI), 2015 IEEE 23rd Annual Symposium on. IEEE, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nanette J Boden, Danny Cohen, Robert E Felderman, Alan E. Kulawik, Charles L Seitz, Jakov N Seizovic, and Wen-King Su. 1995. Myrinet: A gigabit-per-second local area network. IEEE Micro, Vol. 15, 1 (1995), 29--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten Von Eicken. 1993. LogP: Towards a realistic model of parallel computation ACM Sigplan Notices, Vol. Vol. 28. ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Daniel Bristot de Oliveira. 2015. {RFC} workqueue: avoiding unbounded wq on isolated CPUs by default. (2015). https://lists.gt.net/linux/kernel/2218495Google ScholarGoogle Scholar
  11. Michael DeHaan. 2012. Ansible. (2012). https://www.github.com/ansible/ansible {Online}.Google ScholarGoogle Scholar
  12. Corbin Higgs and Jason Anderson. 2016. Narrowing the Gap: Effects of Latency with Docker in IP Networks The International Conference for High Performance Computing, Networking, Storage and Analysis, Student Poster.Google ScholarGoogle Scholar
  13. Torsten Hoefler, Lavinio Cerquetti, Torsten Mehlan, Frank Mietke, and Wolfgang Rehm. 2005. A practical approach to the rating of barrier algorithms using the LogP model and Open MPI Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on. IEEE, 562--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Keith R Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, John Shalf, Harvey J Wasserman, and Nicholas J Wright. 2010. Performance analysis of high performance computing applications on the Amazon Web Services cloud Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on. IEEE, 159--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Van Jacobson. 1988. Congestion avoidance and control. In ACM SIGCOMM Computer Communication Review, Vol. Vol. 18. ACM, 314--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet neighborhoods: Key to protect job performance predictability Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 449--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eric Jones, Travis Oliphant, Pearu Peterson, et al. 2001--. SciPy: Open source scientific tools for Python. (2001--). http://www.scipy.org/ {Online}.Google ScholarGoogle Scholar
  18. Mark Karol, Michael Hluchyj, and Samuel Morgan. 1987. Input versus output queueing on a space-division packet switch. IEEE Transactions on Communications Vol. 35, 12 (1987), 1347--1356.Google ScholarGoogle ScholarCross RefCross Ref
  19. G. Maurice Kendall. 1948. The Advanced Theory Of Statistics. Vol. Vol. 1. Charles Griffin and Company Limited, 42 Drury Lane, London.Google ScholarGoogle Scholar
  20. Richard B Langley. 1997. GPS receiver system noise. GPS World, Vol. 8, 6 (1997), 40--45.Google ScholarGoogle Scholar
  21. Jacob Leverich, Matteo Monchiero, Vanish Talwar, Parthasarathy Ranganathan, and Christos Kozyrakis. 2009. Power management of datacenter workloads using per-core power gating. IEEE Computer Architecture Letters Vol. 8, 2 (2009), 48--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J Martin, V Rajasekaran, and James Westall. 2005. Virtual machine effects on network traffic dynamics Performance, Computing, and Communications Conference, 2005. IPCCC 2005. 24th IEEE International. IEEE, 233--238.Google ScholarGoogle Scholar
  23. Makoto Matsumoto and Takuji Nishimura. 1998. Mersenne Twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol. 8, 1 (1998), 3--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wes McKinney. 2010. Data Structures for Statistical Computing in Python Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 51 -- 56.Google ScholarGoogle Scholar
  25. Daniel Molka, Daniel Hackenberg, Robert Schone, and Matthias S Muller. 2009. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system Parallel Architectures and Compilation Techniques, 2009. PACT'09. 18th International Conference on. IEEE, 261--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gottfried E. Noether. 1967. Elements of Nonparametric Statistics. John Wiley and Sons, Inc., New York.Google ScholarGoogle Scholar
  27. Fabrizio Petrini, Darren J Kerbyson, and Scott Pakin. 2003. The case of the missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In Supercomputing, 2003 ACM/IEEE Conference. IEEE, 55--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Gregory F Pfister. 2001. An introduction to the Infiniband architecture. High Performance Mass Storage and Parallel I/O Vol. 42 (2001), 617--632.Google ScholarGoogle Scholar
  29. Rolf Rabenseifner, Georg Hager, and Gabriele Jost. 2009. Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on. IEEE, 427--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Robert Ricci, Eric Eide, and the CloudLab Team. 2014. Introducing CloudLab: scientific infrastructure for advancing cloud architectures and applications. ;login:, Vol. 39, 6 (Dec.. 2014), 36--38. https://www.usenix.org/publications/login/dec14/ricciGoogle ScholarGoogle Scholar
  31. Stephen M Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K Ousterhout. 2011. It's time for low latency. In HotOS, Vol. Vol. 13. 11--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Piyush Shivam, Pete Wyckoff, and Dhabaleswar Panda. 2001. EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing Supercomputing, ACM/IEEE 2001 Conference. IEEE, 49--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jonathan Turner. 1986. New directions in communications (or which way to the information age?). IEEE communications Magazine Vol. 24, 10 (1986), 8--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Robert Underwood, Jason Anderson, and Amy Apon. 2018. ICPE 2018 Artifact - Measuring Network Latency Variation Impacts to High Performance Computing Application Performance. (Jan. 2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Qi Wang, Ludmila Cherkasova, Jun Li, and Haris Volos. 2016. Interconnect emulator for aiding performance analysis of distributed memory applications Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering. ACM, 75--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lixia Zhang, Scott Shenker, and Daivd D Clark. 1991. Observations on the dynamics of a congestion control algorithm: The effects of two-way traffic. ACM SIGCOMM Computer Communication Review Vol. 21, 4 (1991), 133--147. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Measuring Network Latency Variation Impacts to High Performance Computing Application Performance

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader