abstract

Comparing the Performance of SYCL Runtimes for Molecular Dynamics Applications

Authors:
Andrey Alekseenko

SciLifeLab, KTH Royal Institute of Technology, Sweden

SciLifeLab, KTH Royal Institute of Technology, Sweden

0000-0003-4906-7241
View Profile

,
Szilárd Páll

PDC Center for High Performance Computing, KTH Royal Institute of Technology, Sweden

PDC Center for High Performance Computing, KTH Royal Institute of Technology, Sweden

0000-0003-0603-5514
View Profile

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCLApril 2023Article No.: 6Pages 1–2https://doi.org/10.1145/3585341.3585350

Published:18 April 2023Publication History

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

Pages 1–2

ABSTRACT

SYCL is a cross-platform, royalty-free standard for programming a wide range of hardware accelerators. It is a powerful and convenient way to write standard C++ 17 code that can take full advantage of available devices. There are already multiple SYCL implementations targeting a wide range of platforms, from embedded to HPC clusters. Since several implementations can target the same hardware, application developers and users must know how to choose the most fitting runtime for their needs. In this talk, we will compare the runtime performance of two major SYCL runtimes targeting GPUs, oneAPI DPC++ and Open SYCL [3], to the native implementations for the purposes of GROMACS, a high-performance molecular dynamics engine.

Molecular dynamics (MD) applications were one of the earliest adopters of GPU acceleration, with force calculations being an obvious target for offloading. It is an iterative algorithm where, in its most basic form, on each step, forces acting between particles are computed, and then the equations of motions are integrated. As the computational power of the GPUs grew, the strong scaling problem became apparent: the biophysical systems modeled with molecular dynamics typically have fixed sizes, and the goal is to perform more time steps, each taking less than a millisecond of wall time. This places high demands on the underlying GPU framework, requiring it to efficiently schedule multiple small tasks with minimal overhead, allowing to achieve overlap between CPU and GPU work for large systems and allowing to keep GPU occupied for smaller systems. Another requirement is the ability of application developers to have control over the scheduling to optimize for external dependencies, such as MPI communication.

GROMACS is a widely-used MD engine, supporting a wide range of hardware and software platforms, from laptops to the largest supercomputers [1]. Portability and performance across multiple architectures have always been one of the primary goals of the project, necessary to keep the code not only efficient but also maintainable. The initial support for NVIDIA accelerators, using CUDA, was added to GROMACS in 2010. Since then, heterogeneous parallelization has been a major target for performance optimization, not limited to NVIDIA devices but later adding support for GPUs of other vendors, as well as Xeon Phi accelerators. GROMACS initially adopted SYCL in its 2021 release to replace its previous GPU portability layer, OpenCL [2]. In further releases, the number of offloading modes supported by the SYCL backend steadily increased. As of GROMACS 2023, SYCL support in GROMACS achieved near feature parity with CUDA while allowing the use of a single code to target the GPUs of all three major vendors with minimal specialization.

While this clearly supports the portability promise of modern SYCL implementations, the performance of such portable code remains an open question, especially given the strict requirements of MD algorithms. In this talk, we compare the performance of GROMACS across a wide range of system sizes when using oneAPI DPC++ and Open SYCL runtimes on high-performance NVIDIA, AMD, and Intel GPUs. Besides the analysis of individual kernel performance, we focus on the runtime overhead and the efficiency of task scheduling when compared to a highly optimized implementation using the native frameworks and discuss the possible sources of suboptimal performance and the amount of vendor-specific code branches, such as intrinsics or workarounds for compiler bugs, required to achieve the optimal performance.

References

Mark James Abraham, Teemu Murtola, Roland Schulz, Szilárd Páll, Jeremy C. Smith, Berk Hess, and Erik Lindahl. 2015. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2 (2015), 19–25. https://doi.org/10.1016/j.softx.2015.06.001Google ScholarCross Ref
Andrey Alekseenko, Szilárd Páll, and Erik Lindahl. 2021. Experiences With Adding SYCL Support to GROMACS. In International Workshop on OpenCL (Munich, Germany) (IWOCL’21). Association for Computing Machinery, New York, NY, USA, Article 17, 1 pages. https://doi.org/10.1145/3456669.3456690Google ScholarDigital Library
Aksel Alpay, Bálint Soproni, Holger Wünsche, and Vincent Heuveline. 2022. Exploring the Possibility of a HipSYCL-Based Implementation of OneAPI. In International Workshop on OpenCL (Bristol, United Kingdom, United Kingdom) (IWOCL’22). Association for Computing Machinery, New York, NY, USA, Article 10, 12 pages. https://doi.org/10.1145/3529538.3530005Google ScholarDigital Library

Index Terms

Comparing the Performance of SYCL Runtimes for Molecular Dynamics Applications

Recommendations

Experiences With Adding SYCL Support to GROMACS
IWOCL '21: Proceedings of the 9th International Workshop on OpenCL

GROMACS is an open-source, high-performance molecular dynamics (MD) package primarily used for biomolecular simulations, accounting for 5% of HPC utilization worldwide. Due to the extreme computing needs of MD, significant efforts are invested in ...
Read More
Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

In recent years, Intel introduced oneAPI as a unified and cross-architecture programming model based on the Data Parallel C++ (DPC++) language, which in turn, is based on the C++ and SYCL standard languages. In order to facilitate the migration of ...
Read More
Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL
April 2023
133 pages
ISBN:9798400707452
DOI:10.1145/3585341

Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2023
Check for updates
Author Tags
GROMACS
SYCL
heterogeneous acceleration
molecular dynamics
performance-portability
Qualifiers
- abstract
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate84of152submissions,55%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 43
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Comparing the Performance of SYCL Runtimes for Molecular Dynamics Applications

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

ABSTRACT

References

Cited By

Index Terms

Recommendations

Experiences With Adding SYCL Support to GROMACS

Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Comparing the Performance of SYCL Runtimes for Molecular Dynamics Applications

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

ABSTRACT

References

Cited By

Index Terms

Recommendations

Experiences With Adding SYCL Support to GROMACS

Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media