Skip to main content
Log in

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix–matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather–scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Otten M, Gong J, Mametjanov A, Vose A, Levesque J, Fischer P, Min M (2015) An MPI/OpenACC implementation of a high order electromagnetics solver with GPUDirect communication. In: Int J High Perform Comput Appl (accepted)

  2. Jespersen DC (2010) Acceleration of a CFD code with a GPU. Sci Program 18(3–4):193–201

    Google Scholar 

  3. Hoshino T, Maruyama N, Matsuoka S, Takaki R (2013) CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: The proceeding of 13th IEEE/ACM international symposium on cluster, cloud, and grid computing, Delft, The Netherlands

  4. Kraus J, Schlottke M, Adinetz A, Pleiter D (2014) Accelerating a C++ CFD code with OpenACC. In: The proceedings of the first workshop on accelerator programming using directives SC14, LA, USA, pp 47–54

  5. Xia Y, Luo H, Luo L, Edwards J, Lou J (2015) OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows. Int J Numer Meth Fluids 78(3):123–139

    Article  MathSciNet  Google Scholar 

  6. Niemeyer K, Sung C (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564

    Article  Google Scholar 

  7. Fischer P, Lottes JW, Kerkemeier SG Nek5000 web page. http://nek5000.mcs.anl.gov

  8. Fischer P, Lottes JW (2004) Hybrid Schwarz-multigrid methods for the spectral element method: extensions to Navier–Stokes. In: Kornhuber R, Hoppe R, Périaux J, Pironneau O, Widlund O, Xu J (eds) Domain decomposition methods in science and engineering series. Springer, Berlin

    Google Scholar 

  9. Lottes JW, Fischer P (2005) Hybrid multigrid/Schwarz algorithms for the spectral element method. J Sci Comput 24:45–78

    Article  MathSciNet  MATH  Google Scholar 

  10. Fischer P, Lottes J, Pointer WD, Siegel A (2008) Petascale algorithms for reactor hydrodynamics. J Phys Conf Ser 125:012076

    Article  Google Scholar 

  11. Tufo HM, Fishcer P (2001) Fast parallel direct solvers for coarse-grid problems. J Parall Distrib Comput 61:151–177

    Article  MATH  Google Scholar 

  12. Deville M, Fischer P, Mund E (2002) High-order methods for incompressible fluid flow. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  13. Markidis S, Gong J, Schliephake M, Laure E, Hart A, Henty D, Heisey K, Fischer P (2015) OpenACC acceleration of the Nek5000 spectral element code. Int J High Perform Comput Appl 29:311–319

    Article  Google Scholar 

  14. Gong J, Markidis S, Schliephake M, Laure E, Henningson D, Schlatter P, Peplinski A, Hart A, Doleschal J, Henty D, Fischer P (2015) Nek5000 with OpenACC. In: Markidis S, Laure E (eds) Solving Software Challenges for Exascale, the International Conference on Exascale Applications and Software, EASC 2014 Stockholm, Sweden, April 20–23, 2014. Springer, Berlin, LNCS8759

Download references

Acknowledgments

This material is based upon work supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357, and partially supported by the Swedish e-Science Research Centre (SeRC). This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. The research also used computing resources of the French Alternative Energies and Atomic Energy Commission (CEA) in France via the Partnership for Advanced Computing in Europe (PRACE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Misun Min.

Additional information

COST IC1305.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, J., Markidis, S., Laure, E. et al. Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. J Supercomput 72, 4160–4180 (2016). https://doi.org/10.1007/s11227-016-1744-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1744-5

Keywords

Navigation