skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On noise and the performance benefit of nonblocking collectives

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [2];  [1];  [3]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Univ. of New Mexico, Albuquerque, NM (United States)
  3. ETH Zurich (Switzerland)

Relaxed synchronization offers the potential of maintaining application scalability by allowing many processes to make independent progress when some processes suffer delays. Yet, the benefits of this approach in important parallel workloads have not been investigated in detail. In this paper, we use a validated simulation approach to explore the noise mitigation effects of idealized nonblocking collectives in workloads where these collectives are a major contributor to total execution time. In conclusion, although nonblocking collectives are unlikely to provide significant noise mitigation to applications in the low-OS-noise environments expected in next-generation HPC systems, we show that they can potentially improve application runtime with respect to other noise types.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1257977
Report Number(s):
SAND-2014-19529J; 641904
Journal Information:
International Journal of High Performance Computing Applications, Vol. 30, Issue 1; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

References (20)

Benchmarking the effects of operating system interference on extreme-scale parallel machines journal January 2008
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications journal May 2005
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications
  • Bronevetsky, Greg
  • 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2009 International Symposium on Code Generation and Optimization https://doi.org/10.1109/CGO.2009.32
conference March 2009
LogP: towards a realistic model of parallel computation journal July 1993
A higher order estimate of the optimum checkpoint interval for restart dumps journal February 2006
Characterizing application sensitivity to OS interference using kernel-level noise injection conference November 2008
Understanding the Effects of Communication and Coordination on Checkpointing at Scale
  • Ferreira, Kurt B.; Widener, Patrick; Levy, Scott
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.77
conference November 2014
Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm journal July 2014
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK
  • Giampapa, Mark; Gooding, Thomas; Inglett, Todd
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.22
conference November 2010
BoomerAMG: A parallel algebraic multigrid solver and preconditioner journal April 2002
A Case for Standard Non-blocking Collective Operations book January 2007
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
  • Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.12
conference November 2010
LogGOPSim: simulating large-scale applications in the LogGOPS model
  • Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 https://doi.org/10.1145/1851476.1851564
conference January 2010
Scalable communication protocols for dynamic sparse data exchange
  • Hoefler, Torsten; Siebert, Christian; Lumsdaine, Andrew
  • Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693476
conference January 2010
Time, clocks, and the ordering of events in a distributed system journal July 1978
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q conference January 2003
Fast Parallel Algorithms for Short-Range Molecular Dynamics journal March 1995
Designing and implementing lightweight kernels for capability computing journal April 2009
Optimization of Collective Communication Operations in MPICH journal February 2005
Characterizing the Performance of “Big Memory” on Blue Gene Linux conference September 2009

Cited By (1)

The unexpected virtue of almost: Exploiting MPI collective operations to approximately coordinate checkpoints journal September 2018

Similar Records

HPC-Colony: Services and Interfaces to Aupport Systems With Very Large Numbers of Processors
Technical Report · Wed Jan 31 00:00:00 EST 2007 · OSTI ID:1257977

Mini-Ckpts: Surviving OS Failures in Persistent Memory
Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1257977

A Fault Oblivious Extreme-Scale Execution Environment
Technical Report · Thu Nov 20 00:00:00 EST 2014 · OSTI ID:1257977