research-article

Parallelization and Optimization of a Combustion Simulation Application on GPU Platform

Authors:
Zhuoqian Li

Institute of Quantum Information and State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, China

Institute of Quantum Information and State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, China
View Profile

,
Yonggang Che

Institute of Quantum Information and State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, China

Institute of Quantum Information and State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, China
View Profile

HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and CommunicationsJune 2020Pages 50–55https://doi.org/10.1145/3407947.3407960

Published:06 August 2020Publication History

HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications

Pages 50–55

ABSTRACT

TURF sim (Target Unsteady Reacting Flow simulation) is a CFD application that solves engine combustion problems on structured grids. In this paper, PGI CUDA Fortran is used to implement the CPU + GPU heterogeneous parallelization. To reduce the data transfer overhead between the CPU and GPU and the MPI communication overhead, we design a packing/unpacking based method to decrease the volume of data transferred and the number of data transfer, and use the Pinned memory to improve data transfer rate. To overlap the CPU computation, GPU computation and CPU-GPU data transfer, we reorganize the computing process, and utilize the asynchronous data transfer and the stream processing technology. We also employ loop transformations to optimize the register utilization and task partitions on the GPU. Performance evaluation is performed on two GPU-based platforms. The experimental results show that our optimization techniques for GPU are effective and the resulting GPU version significantly outperforms the original pure CPU version. The GPU version, when using a NVIDIA GTX1060 GPU, achieves a 2.96X performance acceleration over the pure CPU version runs on an Intel i7-8700 CPU (6-cores). When using a NVIDIA V100 GPU, the GPU version achieves a 1.88X performance acceleration over the pure CPU version runs on two Intel Xeon Skylake Gold CPUs (24-cores).

References

Anderson, W.K., Gropp, W.D., Kaushik, D.K., Keyes, D.E. and Smith, B.F. (1999) Achieving High Sustained Performance in an Unstructured Mesh CFD Application. Proc. 1999 ACM/IEEE Conf. on Supercomputing, Portland, OR, USA, November 13-18, p. 69. ACM, New York, USA.Google Scholar
Gabriel Staffelbach. High performance computing for combustion applications. Proceedings of the 2006 ACM/IEEE conference on Supercomputing (SC 06). Tampa, Florida, November 11-17, 2006. Article No. 56.Google ScholarDigital Library
Andres, E., Widhalm, M. and Caloto, A. (2009) Achieving High Speed CFD Simulations: Optimization, Parallelization, and FPGA Acceleration for the Unstructured DLR TAU Code. Proc. 47th AIAA Aerospace Sciences Meeting Including The New Horizons Forum and Aerospace Exposition, Orlando, FL, January 5-8, pp. 8745--8764. AIAA, VA, USAGoogle Scholar
Ji-dong ZHAI, Wen-guang CHEN. Avision of post-exascale programming. Frontiers of Information Technology & Electronic Engineering. 2018 19(10): 1261--1266.Google Scholar
Stan POSEY, Simon SEE and Michael WANG. GPU Progress and Directions in Applied CFD. Eleventh International Conference on CFD in the Minerals and Process Industries CSIRO, Melbourne, Australia 7-9 December 2015: 1--6.Google Scholar
Menshov, I., Pavlukhin, P. Highly scalable implementation of an implicit matrix-free solver for gas dynamics on GPU-accelerated clusters. J Supercomput 73, 631--638 (2017) doi:10.1007/s11227-016-1800-1.Google ScholarDigital Library
Xiangke Liao, Liquan Xiao, Canqun Yang, Yutong Lu:Mlky Way-2 supercomputer: system and application[J]. Frontiers of Computer Science.8(3): 345--356(2014).Google ScholarDigital Library
Xu Chuanfu, Deng Xiaogang, et al. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. JOURNAL OF COMPUTATIONAL PHYSICS, 2014, 278: 275--297Google ScholarDigital Library
Zifei Ji, Bing Wang and Huiqiang Zhang. Steady State Characteristics of Scramjet Engines Using Hydrogen for High Mach Numbers. International Space Planes and Hypersonic Systems and Technologies Conferences. 6-9 March 2017, Xiamen, China.Google Scholar
Hengjie Guo, Yanfei Li, Bo Wang, Huiqiang Zhang, Hongming Xu. Numerical investigation on flashing jet behaviors of single-hole GDI injector. International Journal of Heat and Mass Transfer. 130 (2019) 50-59.Google ScholarCross Ref
Yonggang Che, Meifang Yang, Chuanfu Xu, Yutong Lu. Petascale scramjet combustion simulation on the Tianhe-2 heterogeneous supercomputer. Parallel Computing. 2018, 77: 101--117.Google ScholarCross Ref
Hemanth Kolla, Jacqueline H. Chen. Turbulent Combustion Simulations with High-Performance Computing. Energy, Environment, and Sustainability, 2018: 73--97.Google Scholar

Index Terms

Parallelization and Optimization of a Combustion Simulation Application on GPU Platform
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

GPU Acceleration of a High-Order CFD Program
HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications

HNSC (High order Navier-Stokes simulator for Compressible flow) is an aerodynamics numerical simulation application that involves high computational cost. This paper presents our efforts porting and optimizing HNSC on the heterogeneous architecture ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications
June 2020
191 pages
ISBN:9781450376914
DOI:10.1145/3407947

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Combustion simulation
GPU
parallel algorithm
performance evaluation
performance optimization
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 72
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallelization and Optimization of a Combustion Simulation Application on GPU Platform

HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications

ABSTRACT

References

Cited By

Index Terms

Recommendations

GPU Acceleration of a High-Order CFD Program

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Parallelization and Optimization of a Combustion Simulation Application on GPU Platform

HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications

ABSTRACT

References

Cited By

Index Terms

Recommendations

GPU Acceleration of a High-Order CFD Program

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media