Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

Gong, Jing; Markidis, Stefano; Laure, Erwin; Otten, Matthew; Fischer, Paul; Min, Misun

doi:10.1007/s11227-016-1744-5

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

Published: 18 July 2016

Volume 72, pages 4160–4180, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jing Gong¹,
Stefano Markidis¹,
Erwin Laure¹,
Matthew Otten²,
Paul Fischer^3,4 &
…
Misun Min⁴

700 Accesses
20 Citations
Explore all metrics

Abstract

We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix–matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather–scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Nek5000 with OpenACC

A Look at Performance and Scalability of the GPU Accelerated Sparse Linear System Solver Spliss

Scalability Issues in FFT Computation

References

Otten M, Gong J, Mametjanov A, Vose A, Levesque J, Fischer P, Min M (2015) An MPI/OpenACC implementation of a high order electromagnetics solver with GPUDirect communication. In: Int J High Perform Comput Appl (accepted)
Jespersen DC (2010) Acceleration of a CFD code with a GPU. Sci Program 18(3–4):193–201
Google Scholar
Hoshino T, Maruyama N, Matsuoka S, Takaki R (2013) CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: The proceeding of 13th IEEE/ACM international symposium on cluster, cloud, and grid computing, Delft, The Netherlands
Kraus J, Schlottke M, Adinetz A, Pleiter D (2014) Accelerating a C++ CFD code with OpenACC. In: The proceedings of the first workshop on accelerator programming using directives SC14, LA, USA, pp 47–54
Xia Y, Luo H, Luo L, Edwards J, Lou J (2015) OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows. Int J Numer Meth Fluids 78(3):123–139
Article MathSciNet Google Scholar
Niemeyer K, Sung C (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564
Article Google Scholar
Fischer P, Lottes JW, Kerkemeier SG Nek5000 web page. http://nek5000.mcs.anl.gov
Fischer P, Lottes JW (2004) Hybrid Schwarz-multigrid methods for the spectral element method: extensions to Navier–Stokes. In: Kornhuber R, Hoppe R, Périaux J, Pironneau O, Widlund O, Xu J (eds) Domain decomposition methods in science and engineering series. Springer, Berlin
Google Scholar
Lottes JW, Fischer P (2005) Hybrid multigrid/Schwarz algorithms for the spectral element method. J Sci Comput 24:45–78
Article MathSciNet MATH Google Scholar
Fischer P, Lottes J, Pointer WD, Siegel A (2008) Petascale algorithms for reactor hydrodynamics. J Phys Conf Ser 125:012076
Article Google Scholar
Tufo HM, Fishcer P (2001) Fast parallel direct solvers for coarse-grid problems. J Parall Distrib Comput 61:151–177
Article MATH Google Scholar
Deville M, Fischer P, Mund E (2002) High-order methods for incompressible fluid flow. Cambridge University Press, Cambridge
Book MATH Google Scholar
Markidis S, Gong J, Schliephake M, Laure E, Hart A, Henty D, Heisey K, Fischer P (2015) OpenACC acceleration of the Nek5000 spectral element code. Int J High Perform Comput Appl 29:311–319
Article Google Scholar
Gong J, Markidis S, Schliephake M, Laure E, Henningson D, Schlatter P, Peplinski A, Hart A, Doleschal J, Henty D, Fischer P (2015) Nek5000 with OpenACC. In: Markidis S, Laure E (eds) Solving Software Challenges for Exascale, the International Conference on Exascale Applications and Software, EASC 2014 Stockholm, Sweden, April 20–23, 2014. Springer, Berlin, LNCS8759

Download references

Acknowledgments

This material is based upon work supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357, and partially supported by the Swedish e-Science Research Centre (SeRC). This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. The research also used computing resources of the French Alternative Energies and Atomic Energy Commission (CEA) in France via the Partnership for Advanced Computing in Europe (PRACE).

Author information

Authors and Affiliations

PDC, KTH, Stockholm, Sweden
Jing Gong, Stefano Markidis & Erwin Laure
Cornell University, Ithaca, USA
Matthew Otten
University of Illinois Urbana-Champaign, Champaign, USA
Paul Fischer
Argonne National Laboratory, Lemont, USA
Paul Fischer & Misun Min

Authors

Jing Gong
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Markidis
View author publications
You can also search for this author in PubMed Google Scholar
Erwin Laure
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Otten
View author publications
You can also search for this author in PubMed Google Scholar
Paul Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Misun Min
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Misun Min.

Additional information

COST IC1305.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gong, J., Markidis, S., Laure, E. et al. Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. J Supercomput 72, 4160–4180 (2016). https://doi.org/10.1007/s11227-016-1744-5

Download citation

Published: 18 July 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11227-016-1744-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

Abstract

Access this article

Similar content being viewed by others

Nek5000 with OpenACC

A Look at Performance and Scalability of the GPU Accelerated Sparse Linear System Solver Spliss

Scalability Issues in FFT Computation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

Abstract

Access this article

Similar content being viewed by others

Nek5000 with OpenACC

A Look at Performance and Scalability of the GPU Accelerated Sparse Linear System Solver Spliss

Scalability Issues in FFT Computation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation