skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Strategies for Integrating Deep Learning Surrogate Models with HPC Simulation Applications

Conference ·

The emerging trend of the convergence of high performance computing (HPC), machine learning/deep learning (ML/DL), and big data analytics presents a host of challenges for large-scale computing campaigns that seek best practices to interleave traditional scientific simulation-based workloads with ML/DL models. A portfolio of systematic approaches to incorporate deep learning into modeling and simulation serves a vital need when we support AI for science at a computing facility. In this paper, we evaluate several strategies for deploying deep learning surrogate models in a representative physics application on supercomputers at the Oak Ridge Leadership Computing Facility (OLCF). We discuss a set of recommended deployment architectures and implementation approaches. We analyze and evaluate these alternatives and show their performance and scalability up to 1000 GPUs on two mainstream platforms equipped with different deep learning hardware and software stacks.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1885297
Resource Relation:
Conference: ExSAIS 2022: Workshop on Extreme Scaling of AI for Science, co-Located with IPDPS 2022 - Lyons, , France - 5/30/2022 4:00:00 AM-6/3/2022 4:00:00 AM
Country of Publication:
United States
Language:
English