ABSTRACT
Performance of applications running on GPUs is mainly affected by hardware occupancy and global memory latency. Scientific applications that rely on analysis using unstructured grids could benefit from the high performance capabilities provided by GPUs, however, its memory access pattern and algorithm limit the potential benefits.
In this paper we analyze the algorithm for unstructured grid analysis on the basis of hardware occupancy and memory access efficiency. In general, the algorithm can be divided into three stages: cell-oriented analysis, edge-oriented analysis and information update, which present different memory access patterns. Based on the analysis we modify the algorithm to make it suitable for GPUs. The proposed algorithm aims for high hardware occupancy and efficient global memory access. Finally, through implementation we show that our design achieves up to 88 times speedup compared to the sequential CPU version.
- Owens, J. D, Houston, M., Luebke, D., Green, S., Stone, J. E., Phillips, J. C., GPU Computing, Processdings of the IEEE, Vol. 95(5) pp. 879--899, 2008.Google ScholarCross Ref
- NVidia, NVIDIA CUDA Programming Guide v.2.3.1, Aug. 2009.Google Scholar
- Owens, J. D, Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A. E., Purcell, T., A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, Vol. 26(1) pp. 80--113, Mar. 2007.Google ScholarCross Ref
- Hu, H., Turner, E., Parallel CFD Computing Using Shared Memory OpenMP, Lecture Notes on Computer Science, pp. 1137--1146, 2001. Google ScholarDigital Library
- Mavriplis, D. J., Unstructured Grid Techniques, Anual Review of Fluid Mechanics, Vol. 29, pp. 473--514, Jan, 1997.Google ScholarCross Ref
- Kaushik, D. K., Keyes, D. E., Efficient Parallelization of an Unstructured Grid Solver: A Memory-Centric Approach, Istambul Technical University, 1999.Google Scholar
- Asanovic, K., Bodik, R., Catanzaro, B., Gebis, J., Husbands, P., Keutzer, K., Patterson, D., Plischker, W., Shalf, J., Williams, S., Yelick, K., The Landscape of Parallel Computing Research: A View from Berkeley, Electrical Engineering and Computer Sciences, University of California, Berkeley, Technical Report No. UCB/EECS-2006-183, Dec. 18, 2006Google Scholar
- Corrigan, A., Camelli, F., Rainald, L., Running Unstructured Grid Based CFD Solvers on Modern Graphics Hardware, 19th AIAA Computational Fluid Dynamics, Jun, 2009.Google Scholar
- Guo, W., Jin, C., Jianhua, Li., High performance lattice Boltzmann algorithms for fluid flows, International Symposium on Information Science and Engineering, 2008. Google ScholarDigital Library
- Nickolls, J., Dally, W., The GPU Computing Era, Micro, IEEE, Vol: 30, 2, pp: 56--69, Mar. 2010. Google ScholarDigital Library
- Wang, Z. J., Gao, H., A unifying lifting collocation penalty formulation including the discontinuous Galerkin, spectral volume/difference methods for conservation laws on mixed grids, Journal of Computational Physics, Vol: 228, 21, Nov. 2009. Google ScholarDigital Library
Index Terms
- Unstructured grid applications on GPU: performance analysis and improvement
Recommendations
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Comments