CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

Chandar, Dominic D. J.; Sitaraman, Jayanarayanan; Mavriplis, Dimitri

doi:10.1007/s11227-013-0985-9

CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

Published: 07 August 2013

Volume 67, pages 47–68, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Dominic D. J. Chandar¹,
Jayanarayanan Sitaraman¹ &
Dimitri Mavriplis¹

399 Accesses
7 Citations
Explore all metrics

Abstract

The application of graphics processing units (GPU) to solve partial differential equations is gaining popularity with the advent of improved computer hardware. Various lower level interfaces exist that allow the user to access GPU specific functions. One such interface is NVIDIA’s Compute Unified Device Architecture (CUDA) library. However, porting existing codes to run on the GPU requires the user to write kernels that execute on multiple cores, in the form of Single Instruction Multiple Data (SIMD). In the present work, a higher level framework, termed CU++, has been developed that uses object oriented programming techniques available in C++ such as polymorphism, operator overloading, and template meta programming. Using this approach, CUDA kernels can be generated automatically during compile time. Briefly, CU++ allows a code developer with just C/C++ knowledge to write computer programs that will execute on the GPU without any knowledge of specific programming techniques in CUDA. This approach is tremendously beneficial for Computational Fluid Dynamics (CFD) code development because it mitigates the necessity of creating hundreds of GPU kernels for various purposes. In its current form, CU++ provides a framework for parallel array arithmetic, simplified data structures to interface with the GPU, and smart array indexing. An implementation of heterogeneous parallelism, i.e., utilizing multiple GPUs to simultaneously process a partitioned grid system with communication at the interfaces using Message Passing Interface (MPI) has been developed and tested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

WRF-MOSIT: a modular and cross-platform tool for configuring and installing the WRF model

Article 10 November 2023

References

Cohen JM, Molemaker MJ (2009) A fast double precision code using CUDA. In: Proceedings of parallel CFD, Moffett Field, CA
Google Scholar
General-purpose computation on graphics hardware. http://gpgpu.org
Hagen TR, Lie K-A, Natvig JR (2006) Solving the Euler Equations on Graphics Processing Units/ In. Lecture Notes in Computer Science, vol 3994. Springer, Berlin, pp 220–227
Google Scholar
Elsen E, LeGresley P, Darve E (2008) Large calculation of the flow over a hypersonic vehicle using a GPU. J Comput Phys 227(24):10148–10161
Article MATH Google Scholar
Brandvik T, Pullan G (2008) Acceleration of a 3D Euler solver using commodity graphics hardware. 46th AIAA aerospace sciences meeting and exhibit, AIAA-2008-0607, Reno, NV
Google Scholar
Buck I (2003) Data parallel computing on graphics hardware. Graphics Hardware
NVIDIA CUDA C programming Guide 4.0. http://developer.nvidia.com/cuda-toolkit-40
Phillips EH, Zhang Y, Davis RL, Owens JD (2009) Rapid aerodynamic performance prediction on a cluster of graphics processing units. In: 47th aerospace sciences meeting and exhibit, AIAA-2009-0565, Orlando, FL
Google Scholar
Bailey P, Myre J, Walsh SDC, Lilja DJ (2009) Accelerating lattice Boltzmann fluid flow simulations using graphics processors. In: Parallel processing, Vienna, Austria, pp 550–557. doi:10.1109/ICPP.2009.38
Google Scholar
NAS parallel benchmarks. http://www.nas.nasa.gov/publications/npb.html. Accessed 10 June 2013
Lu F, Song J, Cao X, Zhu X (2011) Acceleration for CFD applications on large GPU clusters: an NPB case study. In: Computer sciences and convergence information technology, Seogwipo, South Korea, pp 534–538. ISBN:978-1-4577-0472-7
Google Scholar
Vandevoorde D, Josuttis N (2003) C++ templates: the complete guide. Pearson Education, Upper Sadle River
Google Scholar
Cohen J (2012) Processing device arrays with C++ metaprogramming. In: GPU computing gems, Jade edition. Morgan Kaufmann, San Mateo. doi:10.1016/B978-0-12-385963-1.00044-7
Google Scholar
Chen J, Joo B, Watson W, Edwards R (2012) Automatic offloading C++ expression templates to CUDA enabled GPUs. In: Parallel and distributed processing symposium workshops and PhD forum, Shanghai, China, pp 2359–2368. doi:10.1109/IPDPSW.2012.293
Google Scholar
Enmyren J, Kessler CW (2010) SkePU: A multi-backend skeleton programming library for multi-GPU systems. In: Proc 4th int workshop on high-level parallel programming and applications (HLPP-2010), Baltimore, Maryland, USA, September 2010. ACM, New York
Google Scholar
Corrigan A, Camelli F, Lohner R, Mut F (2011) Semi-automatic porting of a large-scale Fortran CFD code to GPUs. Int J Numer Methods Fluids 69(6):314–331
Google Scholar
Poole D (2012) Introduction to OpenACC directives. In: NVIDIA GPU technology conference
Google Scholar
Quinlan D (2000) A++P++ manual. UCRL Report No: UCRL-MA-136511, Lawrence Livermore National Laboratory
Brown DL, Chesshire GS, Henshaw WD, Quinlan DJ (1997) Overture: an object oriented software system for solving partial differential equations in serial and parallel environments. In: Eighth conference on parallel processing for scientific computing. Society for Industrial and Applied Mathematics, Paper CP97
Google Scholar
Chandar D, Damodaran M (2008) Computational study of unsteady low Reynolds number airfoil aerodynamics on moving overlapping meshes. AIAA J 46(2):429–438
Article Google Scholar
Chandar D, Damodaran M (2010) Numerical study of the free flight characteristics of a flapping wing in low Reynolds numbers. J Aircr 47(1):141–150
Article Google Scholar
Chandar D, Damodaran M (2009) Computation of low Reynolds number flexible flapping wing aerodynamics on overlapping grids. AIAA 2009-1273, presented at the 47th AIAA aerospace sciences meeting and exhibit, Orlando, FL, USA, January 2009
Pulliam TH (1984) Euler and thin layer Navier–Stokes codes: ARC2D, ARC3D. UTSI E02-4005-023-84. Computational fluid dynamics, University of Tennessee Space Institute
Sankaran V, Sitaraman J, Wissink A, Datta A, Jayaraman B, Potsdam M, Mavriplis D, Yang Z, O’Brien D, Saberi H, Cheng R, Hariharan N, Strawn R (2010) Application of the Helios computational platform to rotorcraft flowfields. In: 48th AIAA aerospace sciences meeting and exhibit, AIAA-2010-1230, Orlando, FL
Google Scholar
Soni K, Chandar DDJ, Sitaraman J (2011) Development of an overset grid computational fluid dynamics solver on graphical processing units. In: 49th AIAA aerospace sciences meeting and exhibit, AIAA-2011-1268, Orlando, FL
Google Scholar
Chandar D, Sitaraman J, Mavriplis D (2012) Dynamic overset grid computations for CFD applications on graphics processing units. Paper ICCFD7-12-2. In: Proceedings of the international conference on computational fluid dynamics, Big Island, Hawaii
Google Scholar
Kennedy CA, Carpenter MH, Lewis RM (1999) Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. NASA/CR 1999-209349
Henshaw WD (2011) Cgins reference manual: an overture solver for the incompressible Navier–Stokes equations on composite overlapping grids. Lawrence Livermore National Laboratory Report LLNL-SM-455871, 2011
Crumpton PI, Moinier P, Giles MB (1997) An unstructured algorithm for high Reynolds number flows on highly stretched grids. In: Numerical methods in laminar and turbulent flow. Pineridge Press, Whiting, pp 561–572
Google Scholar
Chandar D, Sitaraman J, Mavriplis DJ (2012) On the integral constraint of the pressure Poisson equation for incompressible flows on an unstructured grid. Int J Comput Fluid Dyn. doi:10.1080/10618562.2012.723127
MathSciNet Google Scholar
NVIDIA GPUDirect Technology, Mellanox technologies white paper, http://www.mellanox.com/pdf/whitepapers/TB_GPU_Direct.pdf. Accessed 25 July 2012
Jones KD, Dohring CM, Platzer MF (1998) Experimental and computational investigation of the Knoller–Betz effect. AIAA J 36(7):1240–1246
Article Google Scholar
Tuncer IH, Kaya M (2003) Thrust generation caused by flapping airfoils in a biplane configuration. J Aircr 40:509–515
Google Scholar
Chandar D, Sitaraman J, Mavriplis DJ (2013) Overset grid based computations for rotary wing flows on GPU architectures. Presented at the American helicopter society forum, AHS69, May 2013

Download references

Acknowledgements

We gratefully acknowledge support from the Office of Naval Research under ONR Grant N00014-09-1-1060.

Author information

Authors and Affiliations

University of Wyoming, 1000E. University Avenue, Dept. 3295, Laramie, WY, 82072, USA
Dominic D. J. Chandar, Jayanarayanan Sitaraman & Dimitri Mavriplis

Authors

Dominic D. J. Chandar
View author publications
You can also search for this author in PubMed Google Scholar
Jayanarayanan Sitaraman
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Mavriplis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominic D. J. Chandar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chandar, D.D.J., Sitaraman, J. & Mavriplis, D. CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units. J Supercomput 67, 47–68 (2014). https://doi.org/10.1007/s11227-013-0985-9

Download citation

Published: 07 August 2013
Issue Date: January 2014
DOI: https://doi.org/10.1007/s11227-013-0985-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

WRF-MOSIT: a modular and cross-platform tool for configuring and installing the WRF model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

WRF-MOSIT: a modular and cross-platform tool for configuring and installing the WRF model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation