ABSTRACT
TURF sim (Target Unsteady Reacting Flow simulation) is a CFD application that solves engine combustion problems on structured grids. In this paper, PGI CUDA Fortran is used to implement the CPU + GPU heterogeneous parallelization. To reduce the data transfer overhead between the CPU and GPU and the MPI communication overhead, we design a packing/unpacking based method to decrease the volume of data transferred and the number of data transfer, and use the Pinned memory to improve data transfer rate. To overlap the CPU computation, GPU computation and CPU-GPU data transfer, we reorganize the computing process, and utilize the asynchronous data transfer and the stream processing technology. We also employ loop transformations to optimize the register utilization and task partitions on the GPU. Performance evaluation is performed on two GPU-based platforms. The experimental results show that our optimization techniques for GPU are effective and the resulting GPU version significantly outperforms the original pure CPU version. The GPU version, when using a NVIDIA GTX1060 GPU, achieves a 2.96X performance acceleration over the pure CPU version runs on an Intel i7-8700 CPU (6-cores). When using a NVIDIA V100 GPU, the GPU version achieves a 1.88X performance acceleration over the pure CPU version runs on two Intel Xeon Skylake Gold CPUs (24-cores).
- Anderson, W.K., Gropp, W.D., Kaushik, D.K., Keyes, D.E. and Smith, B.F. (1999) Achieving High Sustained Performance in an Unstructured Mesh CFD Application. Proc. 1999 ACM/IEEE Conf. on Supercomputing, Portland, OR, USA, November 13-18, p. 69. ACM, New York, USA.Google Scholar
- Gabriel Staffelbach. High performance computing for combustion applications. Proceedings of the 2006 ACM/IEEE conference on Supercomputing (SC 06). Tampa, Florida, November 11-17, 2006. Article No. 56.Google ScholarDigital Library
- Andres, E., Widhalm, M. and Caloto, A. (2009) Achieving High Speed CFD Simulations: Optimization, Parallelization, and FPGA Acceleration for the Unstructured DLR TAU Code. Proc. 47th AIAA Aerospace Sciences Meeting Including The New Horizons Forum and Aerospace Exposition, Orlando, FL, January 5-8, pp. 8745--8764. AIAA, VA, USAGoogle Scholar
- Ji-dong ZHAI, Wen-guang CHEN. Avision of post-exascale programming. Frontiers of Information Technology & Electronic Engineering. 2018 19(10): 1261--1266.Google Scholar
- Stan POSEY, Simon SEE and Michael WANG. GPU Progress and Directions in Applied CFD. Eleventh International Conference on CFD in the Minerals and Process Industries CSIRO, Melbourne, Australia 7-9 December 2015: 1--6.Google Scholar
- Menshov, I., Pavlukhin, P. Highly scalable implementation of an implicit matrix-free solver for gas dynamics on GPU-accelerated clusters. J Supercomput 73, 631--638 (2017) doi:10.1007/s11227-016-1800-1.Google ScholarDigital Library
- Xiangke Liao, Liquan Xiao, Canqun Yang, Yutong Lu:Mlky Way-2 supercomputer: system and application[J]. Frontiers of Computer Science.8(3): 345--356(2014).Google ScholarDigital Library
- Xu Chuanfu, Deng Xiaogang, et al. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. JOURNAL OF COMPUTATIONAL PHYSICS, 2014, 278: 275--297Google ScholarDigital Library
- Zifei Ji, Bing Wang and Huiqiang Zhang. Steady State Characteristics of Scramjet Engines Using Hydrogen for High Mach Numbers. International Space Planes and Hypersonic Systems and Technologies Conferences. 6-9 March 2017, Xiamen, China.Google Scholar
- Hengjie Guo, Yanfei Li, Bo Wang, Huiqiang Zhang, Hongming Xu. Numerical investigation on flashing jet behaviors of single-hole GDI injector. International Journal of Heat and Mass Transfer. 130 (2019) 50-59.Google ScholarCross Ref
- Yonggang Che, Meifang Yang, Chuanfu Xu, Yutong Lu. Petascale scramjet combustion simulation on the Tianhe-2 heterogeneous supercomputer. Parallel Computing. 2018, 77: 101--117.Google ScholarCross Ref
- Hemanth Kolla, Jacqueline H. Chen. Turbulent Combustion Simulations with High-Performance Computing. Energy, Environment, and Sustainability, 2018: 73--97.Google Scholar
Index Terms
- Parallelization and Optimization of a Combustion Simulation Application on GPU Platform
Recommendations
GPU Acceleration of a High-Order CFD Program
HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and CommunicationsHNSC (High order Navier-Stokes simulator for Compressible flow) is an aerodynamics numerical simulation application that involves high computational cost. This paper presents our efforts porting and optimizing HNSC on the heterogeneous architecture ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsThe graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
Comments