NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model

Xu, Rengan; Tian, Xiaonan; Chandrasekaran, Sunita; Yan, Yonghong; Chapman, Barbara

doi:10.1007/978-3-319-17473-0_5

NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model

Rengan Xu¹⁵,
Xiaonan Tian¹⁵,
Sunita Chandrasekaran¹⁵,
Yonghong Yan¹⁵ &
…
Barbara Chapman¹⁵

Conference paper
First Online: 01 January 2015

940 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Abstract

The broad adoption of accelerators boosts the interest in accelerator programming. Accelerators such as GPGPUs are optimized for throughput and offer high GFLOPS and memory bandwidth. CUDA has been adopted quite rapidly but it is proprietary and only applicable to GPUs, and the difficulty in writing efficient CUDA code has kindled the necessity to create higher-level programming approaches such as OpenACC. Directive-based programming models such as OpenMP and OpenACC offer programmers an option to rapidly create prototype applications by adding annotations to guide compiler optimizations. In this paper we study the effectiveness of a high-level directive based programming model, OpenACC, for parallelizing NAS Parallel Benchmarks (NPB) on GPGPUs. We present the application of techniques such as array privatization, memory coalescing, cache optimization and examine their impact on the performance of the benchmarks. The right choice or combination of techniques/hints are crucial for compilers to generate highly efficient codes tuned to a particular type of accelerator. Poorly selected choice or combination of techniques can lead to degraded performance. We also propose a new clause, ‘scan’, that handles scan operations for arbitrary input array size. We hope that the practices discussed in this paper will provide useful guidance to users to effectively migrate their sequential/CPU-parallel codes to GPGPU architectures and achieve optimal performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

NPB-CUDA (2013). http://www.tu-chemnitz.de/informatik/PI/forschung/download/npb-gpu/
NPB-UPC (2013). http://threads.hpcl.gwu.edu/sites/npb-upc
OpenACC (2013). http://www.openacc-standard.org
OpenCL Standard (2013). http://www.khronos.org/opencl
OpenMP (2013). www.openmp.org
11 Tricks for Maximizing Performance with OpenACC Directives in Fortran (2014). http://www.pgroup.com/resources/openacc_tips_fortran.htm
CUDA (2014). http://www.nvidia.com/object/cuda_home_new.html
CUDA C Programming Guide (2014). http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Pathscale NPB2.3 OpenACC (2014). https://github.com/pathscale/NPB2.3-OpenACC-C
Bailey, D., et al.: The NAS Parallel Benchmarks. NASA Ames Research Center (1994)
Google Scholar
Baker, M., Pophale, S., Vasnier, J.-C., Jin, H., Hernandez, O.: Hybrid programming using OpenSHMEM and OpenACC. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 74–89. Springer, Heidelberg (2014)
Google Scholar
Ding, W., Hernandez, O., Chapman, B.: A similarity-based analysis tool for porting OpenMP applications. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge III. LNCS, vol. 7686, pp. 13–24. Springer, Heidelberg (2013)
Chapter Google Scholar
Ding, W., Hsu, C.-H., Hernandez, O., Chapman, B.M., Graham, R.L.: KLONOS: similarity-based planning tool support for porting scientific applications. Concurrency Comput. Pract. Experience 25(8), 1072–1088 (2013)
Article Google Scholar
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: a hybrid multi-core parallel programming environment. In: Workshop on GPGPU (2007)
Google Scholar
Frumkin, M., Jin, H., Yan, J.: Implementation of NAS parallel benchmarks in high performance fortran. NAS Techinical report NAS-98-009 (1998)
Google Scholar
Grewe, D., Wang, Z., O’Boyle, M.F.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: 2013 IEEE/ACM International Symposium on CGO, pp. 1–10. IEEE (2013)
Google Scholar
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. GPU Gems 3(39), 851–876 (2007)
Google Scholar
Jin, H., Frumkin, M., Yan, J.: The OpenMP implementation of NAS parallel benchmarks and its performance. Technical report, NAS-99-011, NASA Ames Research Center (1999)
Google Scholar
Lee, S., Li, D., Vetter, J.S.: Interactive program debugging and optimization for directive-based, Efficient GPU Computing (2014)
Google Scholar
Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. In: SC 2012, pp. 23:1–23:11. IEEE Computer Society Press (2012)
Google Scholar
Pennycook, S.J., Hammond, S.D., Jarvis, S.A., Mudalige, G.R.: Performance analysis of a hybrid MPI/CUDA implementation of the NAS LU benchmark. ACM SIGMETRICS Perform. Eval. Rev. 38(4), 23–29 (2011)
Article Google Scholar
Reyes, R., López-Rodríguez, I., Fumero, J.J., de Sande, F.: accULL: an OpenACC implementation with CUDA and OpenCL support. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 871–882. Springer, Heidelberg (2012)
Chapter Google Scholar
Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: IEEE International Symposium on IISWC, pp. 137–148. IEEE (2011)
Google Scholar
Tian, X., Xu, R., Yan, Y., Yun, Z., Chandrasekaran, S., Chapman, B.: Compiling a high-level directive-based programming model for GPGPUs. In: Caṣcaval, C., Montesinos-Ortego, P. (eds.) LCPC 2013 - Testing. LNCS, vol. 8664, pp. 105–120. Springer, Heidelberg (2014)
Chapter Google Scholar
Wu, X., Taylor, V.: Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore clusters. Comput. J. 55(2), 154–167 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, Houston, TX, 77004, USA
Rengan Xu, Xiaonan Tian, Sunita Chandrasekaran, Yonghong Yan & Barbara Chapman

Authors

Rengan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaonan Tian
View author publications
You can also search for this author in PubMed Google Scholar
Sunita Chandrasekaran
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Chapman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rengan Xu .

Editor information

Editors and Affiliations

Intel Corporation, Santa Clara, California, USA
James Brodman
Intel Corporation, Santa Clara, California, USA
Peng Tu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, R., Tian, X., Chandrasekaran, S., Yan, Y., Chapman, B. (2015). NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-17473-0_5
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics