Performance Profile of Transformer Fine-Tuning in Multi-GPU Cloud Environments

Begoli, Edmon; Lim, Seung-Hwan; Srinivasan, Sudarshan

doi:10.1109/BigData52589.2021.9671389

Title: Performance Profile of Transformer Fine-Tuning in Multi-GPU Cloud Environments

Conference · Wed Dec 01 00:00:00 EST 2021

DOI:https://doi.org/10.1109/BigData52589.2021.9671389· OSTI ID:1883970

^[1];

^[1]; Srinivasan, Sudarshan ^[1]

ORNL

The study presented here focuses on performance characteristics and trade-offs associated with running machine-learning tasks in multi-GPU environments on both on-site cloud computing resources and commercial cloud services (Azure). Specifically, this study examines these tradeoffs by examining the performance of training and fine-tuning of transformer-based deep-learning (DL) networks on clinical notes and data, a task of critical importance in the medical domain. To this end, we perform DL-related experiments on the widely deployed NVIDIA V100 GPUs and on the newer A100 GPUs connected via NVLink or PCIe. This study analyzes the execution time of major operations to train DL models and investigate popular options to optimize each of them. We examine and present the findings on the impacts that various operations (e.g. data loading into GPUs, training, fine-tuning), optimizations, and system configurations (single vs. multi-GPU, NVLink vs. PCIe) have on the overall training performance.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1883970

Resource Relation:: Conference: 2021 IEEE International Conference on Big Data - Orlanda, Florida, United States of America - 12/14/2021 5:00:00 AM-

Country of Publication:: United States

Language:: English

Similar Records

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Journal Article · Wed Jan 01 00:00:00 EST 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1883970

Li, Ang; Song, Shuaiwen; Chen, Jieyang; +4 more

Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU System

Conference · Sat Jun 17 00:00:00 EDT 2023 · OSTI ID:1883970

Dutta, Sankha; Naghibijouybari, Hoda; Gupta, Arjun; +3 more

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures

Conference · Mon Aug 09 00:00:00 EDT 2021 · OSTI ID:1883970

Xie, Chenhao; Chen, Jieyang; Firoz, Jesun S.; +5 more

Title: Performance Profile of Transformer Fine-Tuning in Multi-GPU Cloud Environments

Citation Formats

Similar Records

Related Subjects