On noise and the performance benefit of nonblocking collectives

Widener, Patrick M.; Levy, Scott; Ferreira, Kurt B.; Hoefler, Torsten

doi:10.1177/1094342015611952

Title: On noise and the performance benefit of nonblocking collectives

Journal Article · Mon Nov 02 00:00:00 EST 2015 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/1094342015611952· OSTI ID:1257977

Widener, Patrick M. ^[1]; Levy, Scott ^[2]; Ferreira, Kurt B. ^[1]; Hoefler, Torsten ^[3]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Univ. of New Mexico, Albuquerque, NM (United States)
ETH Zurich (Switzerland)

Relaxed synchronization offers the potential of maintaining application scalability by allowing many processes to make independent progress when some processes suffer delays. Yet, the benefits of this approach in important parallel workloads have not been investigated in detail. In this paper, we use a validated simulation approach to explore the noise mitigation effects of idealized nonblocking collectives in workloads where these collectives are a major contributor to total execution time. In conclusion, although nonblocking collectives are unlikely to provide significant noise mitigation to applications in the low-OS-noise environments expected in next-generation HPC systems, we show that they can potentially improve application runtime with respect to other noise types.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1257977

Report Number(s):: SAND-2014-19529J; 641904

Journal Information:: International Journal of High Performance Computing Applications, Vol. 30, Issue 1; ISSN 1094-3420

Publisher:: SAGECopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 5 works

Citation information provided by
Web of Science

References (20)

Benchmarking the effects of operating system interference on extreme-scale parallel machines Beckman, Pete; Iskra, Kamil; Yoshii, Kazutomo Cluster Computing, Vol. 11, Issue 1 https://doi.org/10.1007/s10586-007-0047-2	journal	January 2008
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications Brightwell, Ron; Riesen, Rolf; Underwood, Keith D. The International Journal of High Performance Computing Applications, Vol. 19, Issue 2 https://doi.org/10.1177/1094342005054257	journal	May 2005
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications Bronevetsky, Greg 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2009 International Symposium on Code Generation and Optimization https://doi.org/10.1109/CGO.2009.32	conference	March 2009
LogP: towards a realistic model of parallel computation Culler, David; Karp, Richard; Patterson, David ACM SIGPLAN Notices, Vol. 28, Issue 7 https://doi.org/10.1145/173284.155333	journal	July 1993
A higher order estimate of the optimum checkpoint interval for restart dumps Daly, J. T. Future Generation Computer Systems, Vol. 22, Issue 3, p. 303-312 https://doi.org/10.1016/j.future.2004.11.016	journal	February 2006
Characterizing application sensitivity to OS interference using kernel-level noise injection Ferreira, Kurt B.; Bridges, Patrick; Brightwell, Ron 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2008.5219920	conference	November 2008
Understanding the Effects of Communication and Coordination on Checkpointing at Scale Ferreira, Kurt B.; Widener, Patrick; Levy, Scott SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.77	conference	November 2014
Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm Ghysels, P.; Vanroose, W. Parallel Computing, Vol. 40, Issue 7 https://doi.org/10.1016/j.parco.2013.06.001	journal	July 2014
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK Giampapa, Mark; Gooding, Thomas; Inglett, Todd 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.22	conference	November 2010
BoomerAMG: A parallel algebraic multigrid solver and preconditioner Henson, Van Emden; Yang, Ulrike Meier Applied Numerical Mathematics, Vol. 41, Issue 1 https://doi.org/10.1016/S0168-9274(01)00115-5	journal	April 2002
A Case for Standard Non-blocking Collective Operations Hoefler, Torsten; Kambadur, Prabhanjan; Graham, Richard L. Recent Advances in Parallel Virtual Machine and Message Passing Interface https://doi.org/10.1007/978-3-540-75416-9_22	book	January 2007
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.12	conference	November 2010
LogGOPSim: simulating large-scale applications in the LogGOPS model Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851564	conference	January 2010
Scalable communication protocols for dynamic sparse data exchange Hoefler, Torsten; Siebert, Christian; Lumsdaine, Andrew Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693476	conference	January 2010
Time, clocks, and the ordering of events in a distributed system Lamport, Leslie Communications of the ACM, Vol. 21, Issue 7 https://doi.org/10.1145/359545.359563	journal	July 1978
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q Petrini, Fabrizio; Kerbyson, Darren J.; Pakin, Scott Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03 https://doi.org/10.1145/1048935.1050204	conference	January 2003
Fast Parallel Algorithms for Short-Range Molecular Dynamics Plimpton, Steve Journal of Computational Physics, Vol. 117, Issue 1 https://doi.org/10.1006/jcph.1995.1039	journal	March 1995
Designing and implementing lightweight kernels for capability computing Riesen, Rolf; Brightwell, Ron; Bridges, Patrick G. Concurrency and Computation: Practice and Experience, Vol. 21, Issue 6 https://doi.org/10.1002/cpe.1361	journal	April 2009
Optimization of Collective Communication Operations in MPICH Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William The International Journal of High Performance Computing Applications, Vol. 19, Issue 1 https://doi.org/10.1177/1094342005051521	journal	February 2005
Characterizing the Performance of Big Memory on Blue Gene Linux Yoshii, Kazutomo; Iskra, Kamil; Naik, Harish 2009 International Conference on Parallel Processing Workshops (ICPPW) https://doi.org/10.1109/ICPPW.2009.35	conference	September 2009

Cited By (1)

The unexpected virtue of almost: Exploiting MPI collective operations to approximately coordinate checkpoints Levy, Scott; Ferreira, Kurt B.; Widener, Patrick Concurrency and Computation: Practice and Experience, Vol. 32, Issue 3 https://doi.org/10.1002/cpe.4890	journal	September 2018

Similar Records

HPC-Colony: Services and Interfaces to Aupport Systems With Very Large Numbers of Processors

Technical Report · Wed Jan 31 00:00:00 EST 2007 · OSTI ID:1257977

Jones, T; Kale, L; Moreira, J; +4 more

Mini-Ckpts: Surviving OS Failures in Persistent Memory

Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1257977

Fiala, David; Mueller, Frank; Ferreira, Kurt Brian; +1 more

A Fault Oblivious Extreme-Scale Execution Environment

Technical Report · Thu Nov 20 00:00:00 EST 2014 · OSTI ID:1257977

McKie, Jim

Related Subjects

97 MATHEMATICS AND COMPUTING
HPC
collectives
nonblocking
resilience
checkpointing
simulation

Title: On noise and the performance benefit of nonblocking collectives

Citation Formats

References (20)

Cited By (1)

Similar Records

Related Subjects