Demystifying asynchronous I/O Interference in HPC applications
- Univ. of California, Irvine, CA (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, here we investigate the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing background I/O, as well as to open new opportunities to design advanced asynchronous I/O strategies.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1831116
- Journal Information:
- International Journal of High Performance Computing Applications, Vol. 35, Issue 4; ISSN 1094-3420
- Publisher:
- SAGECopyright Statement
- Country of Publication:
- United States
- Language:
- English
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES
|
journal | June 2011 |
HACC: extreme scaling and performance across diverse architectures
|
conference | January 2013 |
Toward Scalable and Asynchronous Object-Centric Data Management for HPC
|
conference | May 2018 |
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution
|
conference | January 2013 |
Understanding and Improving Computational Science Storage Access through Continuous Characterization
|
journal | October 2011 |
Improving collective I/O performance using threads
|
conference | January 1999 |
Exascale computing and big data
|
journal | June 2015 |
Reducing I/O variability using dynamic I/O path characterization in petascale storage systems
|
journal | November 2016 |
Understanding the Effects of Communication and Coordination on Checkpointing at Scale
|
conference | November 2014 |
Managing Variability in the IO Performance of Petascale Storage Systems
|
conference | November 2010 |
Light-weight parallel Python tools for earth system modeling workflows
|
conference | October 2015 |
CHARM++: a portable concurrent object oriented system based on C++
|
conference | January 1993 |
Rucio: Scientific Data Management
|
journal | August 2019 |
InterferenceRemoval: removing interference of disk access for MPI programs through data replication
|
conference | January 2010 |
Towards Asynchronous Many-Task in Situ Data Analysis Using Legion
|
conference | May 2016 |
Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms
|
conference | May 2018 |
I/O-Aware Batch Scheduling for Petascale Computing Systems
|
conference | September 2015 |
Tuning Object-Centric Data Management Systems for Large Scale Scientific Applications
|
conference | December 2019 |
On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems
|
conference | May 2016 |
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
|
journal | November 2010 |
Harnessing Data Movement in Virtual Clusters for In-Situ Execution
|
journal | March 2019 |
Scheduling the I/O of HPC Applications Under Congestion
|
conference | May 2015 |
VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale
|
conference | May 2019 |
Enterprise HPC storage systems
|
conference | September 2014 |
NiMC: Characterizing and Eliminating Network-Induced Memory Contention
|
conference | May 2016 |
Storage challenges at Los Alamos National Lab | conference | April 2012 |
Transferring a petabyte in a day
|
journal | November 2018 |
DAOS and Friends: A Proposal for an Exascale Storage System
|
conference | November 2016 |
Similar Records
SCR-Exa: Enhanced Scalable Checkpoint Restart (SCR) Library for Next Generation Exascale Computing
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures