skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA

Conference ·
 [1];  [1];  [2];  [3];  [4];  [1]
  1. University of Utah
  2. Sandia National Laboratories (SNL)
  3. Texas Advanced Computing Center
  4. ORNL

The extreme-scale computing landscape is increasingly dominated by GPU-accelerated systems. At the same time, in-situ workflows that employ memory-to-memory inter-application data exchanges have emerged as an effective approach for leveraging these extreme-scale systems. In the case of GPUs, GPUDirect RDMA enables third-party devices, such as network interface cards, to access GPU memory directly and has been adopted for intra-application communications across GPUs. In this paper, we present an interoperable framework for GPU-based in-situ workflows that optimizes data movement using GPUDirect RDMA. Specifically, we analyze the characteristics of the possible data movement pathways between GPUs from an in-situ workflow perspective, and design a strategy that maximizes throughput. Furthermore, we implement this approach as an extension of the DataSpaces data staging service, and experimentally evaluate its performance and scalability on a current leadership GPU cluster. The performance results show that the proposed design reduces data-movement time by up to 53% and 40% for the sender and receiver, respectively, and maintains excellent scalability for up to 256 GPUs.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
2000374
Resource Relation:
Journal Volume: 14100; Conference: Euro-Par 2023: European Conference on Parallel Processing - Limassol, , Cyprus - 8/28/2023 6:00:00 PM-9/1/2023 6:00:00 PM
Country of Publication:
United States
Language:
English

Similar Records

CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows
Journal Article · Sun May 31 00:00:00 EDT 2020 · ACM Transactions on Parallel Computing · OSTI ID:2000374

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
Conference · Sun Sep 30 00:00:00 EDT 2018 · OSTI ID:2000374

GPU Direct I/O with HDF5
Conference · Sun Nov 01 00:00:00 EDT 2020 · OSTI ID:2000374

Related Subjects