Abstract
The application of graphics processing units (GPU) to solve partial differential equations is gaining popularity with the advent of improved computer hardware. Various lower level interfaces exist that allow the user to access GPU specific functions. One such interface is NVIDIA’s Compute Unified Device Architecture (CUDA) library. However, porting existing codes to run on the GPU requires the user to write kernels that execute on multiple cores, in the form of Single Instruction Multiple Data (SIMD). In the present work, a higher level framework, termed CU++, has been developed that uses object oriented programming techniques available in C++ such as polymorphism, operator overloading, and template meta programming. Using this approach, CUDA kernels can be generated automatically during compile time. Briefly, CU++ allows a code developer with just C/C++ knowledge to write computer programs that will execute on the GPU without any knowledge of specific programming techniques in CUDA. This approach is tremendously beneficial for Computational Fluid Dynamics (CFD) code development because it mitigates the necessity of creating hundreds of GPU kernels for various purposes. In its current form, CU++ provides a framework for parallel array arithmetic, simplified data structures to interface with the GPU, and smart array indexing. An implementation of heterogeneous parallelism, i.e., utilizing multiple GPUs to simultaneously process a partitioned grid system with communication at the interfaces using Message Passing Interface (MPI) has been developed and tested.
Similar content being viewed by others
References
Cohen JM, Molemaker MJ (2009) A fast double precision code using CUDA. In: Proceedings of parallel CFD, Moffett Field, CA
General-purpose computation on graphics hardware. http://gpgpu.org
Hagen TR, Lie K-A, Natvig JR (2006) Solving the Euler Equations on Graphics Processing Units/ In. Lecture Notes in Computer Science, vol 3994. Springer, Berlin, pp 220–227
Elsen E, LeGresley P, Darve E (2008) Large calculation of the flow over a hypersonic vehicle using a GPU. J Comput Phys 227(24):10148–10161
Brandvik T, Pullan G (2008) Acceleration of a 3D Euler solver using commodity graphics hardware. 46th AIAA aerospace sciences meeting and exhibit, AIAA-2008-0607, Reno, NV
Buck I (2003) Data parallel computing on graphics hardware. Graphics Hardware
NVIDIA CUDA C programming Guide 4.0. http://developer.nvidia.com/cuda-toolkit-40
Phillips EH, Zhang Y, Davis RL, Owens JD (2009) Rapid aerodynamic performance prediction on a cluster of graphics processing units. In: 47th aerospace sciences meeting and exhibit, AIAA-2009-0565, Orlando, FL
Bailey P, Myre J, Walsh SDC, Lilja DJ (2009) Accelerating lattice Boltzmann fluid flow simulations using graphics processors. In: Parallel processing, Vienna, Austria, pp 550–557. doi:10.1109/ICPP.2009.38
NAS parallel benchmarks. http://www.nas.nasa.gov/publications/npb.html. Accessed 10 June 2013
Lu F, Song J, Cao X, Zhu X (2011) Acceleration for CFD applications on large GPU clusters: an NPB case study. In: Computer sciences and convergence information technology, Seogwipo, South Korea, pp 534–538. ISBN:978-1-4577-0472-7
Vandevoorde D, Josuttis N (2003) C++ templates: the complete guide. Pearson Education, Upper Sadle River
Cohen J (2012) Processing device arrays with C++ metaprogramming. In: GPU computing gems, Jade edition. Morgan Kaufmann, San Mateo. doi:10.1016/B978-0-12-385963-1.00044-7
Chen J, Joo B, Watson W, Edwards R (2012) Automatic offloading C++ expression templates to CUDA enabled GPUs. In: Parallel and distributed processing symposium workshops and PhD forum, Shanghai, China, pp 2359–2368. doi:10.1109/IPDPSW.2012.293
Enmyren J, Kessler CW (2010) SkePU: A multi-backend skeleton programming library for multi-GPU systems. In: Proc 4th int workshop on high-level parallel programming and applications (HLPP-2010), Baltimore, Maryland, USA, September 2010. ACM, New York
Corrigan A, Camelli F, Lohner R, Mut F (2011) Semi-automatic porting of a large-scale Fortran CFD code to GPUs. Int J Numer Methods Fluids 69(6):314–331
Poole D (2012) Introduction to OpenACC directives. In: NVIDIA GPU technology conference
Quinlan D (2000) A++P++ manual. UCRL Report No: UCRL-MA-136511, Lawrence Livermore National Laboratory
Brown DL, Chesshire GS, Henshaw WD, Quinlan DJ (1997) Overture: an object oriented software system for solving partial differential equations in serial and parallel environments. In: Eighth conference on parallel processing for scientific computing. Society for Industrial and Applied Mathematics, Paper CP97
Chandar D, Damodaran M (2008) Computational study of unsteady low Reynolds number airfoil aerodynamics on moving overlapping meshes. AIAA J 46(2):429–438
Chandar D, Damodaran M (2010) Numerical study of the free flight characteristics of a flapping wing in low Reynolds numbers. J Aircr 47(1):141–150
Chandar D, Damodaran M (2009) Computation of low Reynolds number flexible flapping wing aerodynamics on overlapping grids. AIAA 2009-1273, presented at the 47th AIAA aerospace sciences meeting and exhibit, Orlando, FL, USA, January 2009
Pulliam TH (1984) Euler and thin layer Navier–Stokes codes: ARC2D, ARC3D. UTSI E02-4005-023-84. Computational fluid dynamics, University of Tennessee Space Institute
Sankaran V, Sitaraman J, Wissink A, Datta A, Jayaraman B, Potsdam M, Mavriplis D, Yang Z, O’Brien D, Saberi H, Cheng R, Hariharan N, Strawn R (2010) Application of the Helios computational platform to rotorcraft flowfields. In: 48th AIAA aerospace sciences meeting and exhibit, AIAA-2010-1230, Orlando, FL
Soni K, Chandar DDJ, Sitaraman J (2011) Development of an overset grid computational fluid dynamics solver on graphical processing units. In: 49th AIAA aerospace sciences meeting and exhibit, AIAA-2011-1268, Orlando, FL
Chandar D, Sitaraman J, Mavriplis D (2012) Dynamic overset grid computations for CFD applications on graphics processing units. Paper ICCFD7-12-2. In: Proceedings of the international conference on computational fluid dynamics, Big Island, Hawaii
Kennedy CA, Carpenter MH, Lewis RM (1999) Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. NASA/CR 1999-209349
Henshaw WD (2011) Cgins reference manual: an overture solver for the incompressible Navier–Stokes equations on composite overlapping grids. Lawrence Livermore National Laboratory Report LLNL-SM-455871, 2011
Crumpton PI, Moinier P, Giles MB (1997) An unstructured algorithm for high Reynolds number flows on highly stretched grids. In: Numerical methods in laminar and turbulent flow. Pineridge Press, Whiting, pp 561–572
Chandar D, Sitaraman J, Mavriplis DJ (2012) On the integral constraint of the pressure Poisson equation for incompressible flows on an unstructured grid. Int J Comput Fluid Dyn. doi:10.1080/10618562.2012.723127
NVIDIA GPUDirect Technology, Mellanox technologies white paper, http://www.mellanox.com/pdf/whitepapers/TB_GPU_Direct.pdf. Accessed 25 July 2012
Jones KD, Dohring CM, Platzer MF (1998) Experimental and computational investigation of the Knoller–Betz effect. AIAA J 36(7):1240–1246
Tuncer IH, Kaya M (2003) Thrust generation caused by flapping airfoils in a biplane configuration. J Aircr 40:509–515
Chandar D, Sitaraman J, Mavriplis DJ (2013) Overset grid based computations for rotary wing flows on GPU architectures. Presented at the American helicopter society forum, AHS69, May 2013
Acknowledgements
We gratefully acknowledge support from the Office of Naval Research under ONR Grant N00014-09-1-1060.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chandar, D.D.J., Sitaraman, J. & Mavriplis, D. CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units. J Supercomput 67, 47–68 (2014). https://doi.org/10.1007/s11227-013-0985-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0985-9