skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer

Abstract

Optimizing scientific applications on today’s accelerator-based high performance computing systems can be challenging, especially when multiple GPUs and CPUs with heterogeneous memories and persistent non-volatile memories are present. An example is Summit, an accelerator-based system at the Oak Ridge Leadership Computing Facility (OLCF) that is rated as the world’s fastest supercomputer to-date. New strategies are thus needed to expose the parallelism in legacy applications, while being amenable to efficient mapping to the underlying architecture.In this paper we discuss our experiences and strategies to port a scientific application, DCA++, to Summit. DCA++ is a highperformance research application that solves quantum manybody problems with a cutting edge quantum cluster algorithm, the dynamical cluster approximation.Our strategies aim to synergize the strengths of the different programming models in the code. These include: (a) streamlining the interactions between the CPU threads and the GPUs, (b) implementing computing kernels on the GPUs and decreasing CPU-GPU memory transfers, (c) allowing asynchronous GPU communications, and (d) increasing compute intensity by combining linear algebraic operations.Full-scale production runs using all 4600 Summit nodes attained a peak performance of 73.5 PFLOPS with a mixed precision implementation.We observed a perfect strong and weak scaling for the quantum Monte Carlo solver in DCA++,more » while encountering about 2× input/output (I/O) and MPI communication overhead on the time-to-solution for the full machine run. Our hardware agnostic optimizations are designed to alleviate the communication and I/O challenges observed, while improving the compute intensity and obtaining optimal performance on a complex, hybrid architecture like Summit.« less

Authors:
 [1]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [2];  [3]; ORCiD logo [2]; ORCiD logo [2];  [3]
  1. ETH Zurich, Institute of Applied Physics, Switzerland
  2. ORNL
  3. ETH Zurich, Switzerland
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1607140
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 28th International Conference on Parallel Architectures and Compilation Techniques (PACT19) - Seattle, Washington, United States of America - 9/21/2019 4:00:00 AM-9/25/2019 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Balduzzi, Giovanni, Chatterjee, Ronnie, Li, Ying Wai, Doak, Peter W., Hähner, Urs, D'azevedo, Ed, Maier, Thomas, and Schulthess, Thomas. Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer. United States: N. p., 2019. Web. doi:10.1109/PACT.2019.00041.
Balduzzi, Giovanni, Chatterjee, Ronnie, Li, Ying Wai, Doak, Peter W., Hähner, Urs, D'azevedo, Ed, Maier, Thomas, & Schulthess, Thomas. Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer. United States. https://doi.org/10.1109/PACT.2019.00041
Balduzzi, Giovanni, Chatterjee, Ronnie, Li, Ying Wai, Doak, Peter W., Hähner, Urs, D'azevedo, Ed, Maier, Thomas, and Schulthess, Thomas. 2019. "Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer". United States. https://doi.org/10.1109/PACT.2019.00041. https://www.osti.gov/servlets/purl/1607140.
@article{osti_1607140,
title = {Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer},
author = {Balduzzi, Giovanni and Chatterjee, Ronnie and Li, Ying Wai and Doak, Peter W. and Hähner, Urs and D'azevedo, Ed and Maier, Thomas and Schulthess, Thomas},
abstractNote = {Optimizing scientific applications on today’s accelerator-based high performance computing systems can be challenging, especially when multiple GPUs and CPUs with heterogeneous memories and persistent non-volatile memories are present. An example is Summit, an accelerator-based system at the Oak Ridge Leadership Computing Facility (OLCF) that is rated as the world’s fastest supercomputer to-date. New strategies are thus needed to expose the parallelism in legacy applications, while being amenable to efficient mapping to the underlying architecture.In this paper we discuss our experiences and strategies to port a scientific application, DCA++, to Summit. DCA++ is a highperformance research application that solves quantum manybody problems with a cutting edge quantum cluster algorithm, the dynamical cluster approximation.Our strategies aim to synergize the strengths of the different programming models in the code. These include: (a) streamlining the interactions between the CPU threads and the GPUs, (b) implementing computing kernels on the GPUs and decreasing CPU-GPU memory transfers, (c) allowing asynchronous GPU communications, and (d) increasing compute intensity by combining linear algebraic operations.Full-scale production runs using all 4600 Summit nodes attained a peak performance of 73.5 PFLOPS with a mixed precision implementation.We observed a perfect strong and weak scaling for the quantum Monte Carlo solver in DCA++, while encountering about 2× input/output (I/O) and MPI communication overhead on the time-to-solution for the full machine run. Our hardware agnostic optimizations are designed to alleviate the communication and I/O challenges observed, while improving the compute intensity and obtaining optimal performance on a complex, hybrid architecture like Summit.},
doi = {10.1109/PACT.2019.00041},
url = {https://www.osti.gov/biblio/1607140}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Sep 01 00:00:00 EDT 2019},
month = {Sun Sep 01 00:00:00 EDT 2019}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: