Abstract
Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. However, the complexity of their architectures and the impenetrable structure of some large applications makes the hand-tuning algorithms process more challenging and unproductive. On the contrary, auto-tuning technology has appeared as a solution to this problems since it can address the inherent complexity of the latest and future computer architectures. By auto-tuning, an application may be optimised for a target platform by making automated optimal choices. To exploit this technology on modern GPUs, we have created an auto-tuned version of Nek5000 based on OpenACC directives which has demonstrated to obtained improved results over a hand-tune optimised version of the same computation kernels. This paper focuses on a particular role for auto-tuning Nek5000 to utilise a massively parallel GPU accelerated system based on OpenACC directive to adapt the Nek5000 code for the Exascale computation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Egri, G., Fodor, Z., Hoelbling, C., Katz, S., Nogradi, D., Szabo, K.: Lattice QCD as a video game. Comput. Phys. Commun. 177, 631–639 (2007)
Yasuda, K.: J. Comput. Chem. 29, 334 (2007)
Fung, W.W.L., Aamodt, T.M.: Energy efficient GPU transactional memory via space-time optimizations. ACM, MICRO-46, pp. 408–420 (2013)
Nivia Tesla architecture (2014). http://www.nvidia.com/object/tesla-supercomputing-solutions.html. Accesed 14 January 2014
The CUDA Toolkit (2014). https://developer.nvidia.com/cuda-downloads. Accesed 14 January 2014
Coleman, D.M., Feldman, D.R.: Porting existing radiation code for GPU acceleration. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6(6), 1–6 (2013)
Delgado, J., Gazolla, J., Clua, E., Masoud Sadjadi, S.: A case study on porting scientific applications to GPU/CUDA. J. Comput. Interdisc. Sci. 2(1), 3–11 (2011)
OpenMP 4.0 (2014). http://openmp.org/wp/. Accessed 14 January 2014
OpenACC. OpenACC Home Page (2014). http://openacc.org/. Accessed 14 January 2014
Hoshino, T., Maruyama, N., Matsuoka, S., Takaki, R.: CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: IEEE International Symposium on Cluster Computing and the Grid, pp. 136–143 (2013)
Gray, A., Hart, A., Richardson, A., Stratford, K.: Lattice boltzmann for large-scale gpu systems. In: PARCO, pp. 167–174 (2011)
Chen, J.H., Choudhary, A., De Supinski, B., DeVries, M., Hawkes, E., Klasky, S., Liao, W., Ma, K., Mellor-Crummey, J., Podhorszki, N., et al.: Terascale direct numerical simulations of turbulent combustion using s3d. Comput. Sci. Discov. 2, 1 (2009)
Fischer, P., Heisey, K., Kruse, J., Mullen, J., Tufo, H., Lottes, J.: Nek5000 Premier (2014). http://www.csc.cs.colorado.edu/voran/nek/nekdoc/primer.pdf. Accessed 10 January 2014
Fischer, P., Heisey, K.: NEKBONE: Thermal Hydraulics mini-application. Nekbone Release 2.1 (2013). https://cesar.mcs.anl.gov/content/software/thermal_hydraulics. Accessed 10 January 2014
Markidis, S., Gong, J., Schliephake, M., Laure E., Hart, A., Henty, D., Heisey, P., Fischer, P.: OpenACC Acceleration of Nek5000, Spectral Element Code
Shin, J., Hall, M.W., Chame, J., Chen, C., Fischer, P.F., Hovland, P.D.: Speeding up Nek5000 with autotuning and specialization. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 253–262 (2010)
Patera, A.T.: A spectral element method for uid dynamics: laminar flow in a channel expansion. J. Comput. Phys. 54(3), 468–488 (1984)
Dongarra, J.J., Du Croz, J., Duff, I.S., Hammarling, S.: Algorithm 679: a set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft. 16, 18–28 (1990)
IBM Compilers (2014). http://www-03.ibm.com/software/products/en/subcategory/SW780. Accessed 15 January 2014
Intel Compilers (2014). http://software.intel.com/en-us/intel-compilers. Accessed 15 January 2014
The Portland Group (PGI). http://www.pgroup.com/. Accessed 15 January 2014
The GNU Compiler Collection. http://gcc.gnu.org. Accessed 15 January 2014
Richardson, H.: Domain specific language (DSL) for expressing parallel auto-tuning, CRESTA Project Deliverable D3.6.2 (2014). http://cresta-project.eu/table/deliverables/year-1-deliverables/. Accessed 16 January 2014
Anderson, J.: Modern Compressible Flow: With Historical Perspective. McGraw-Hill, New York (2003)
CRESTA Research Project (2014). http://cresta-project.eu/. Accessed 20 March 2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Example of DSL Script
A Example of DSL Script

Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Cebamanos, L., Henty, D., Richardson, H., Hart, A. (2015). Auto-tuning an OpenACC Accelerated Version of Nek5000. In: Markidis, S., Laure, E. (eds) Solving Software Challenges for Exascale. EASC 2014. Lecture Notes in Computer Science(), vol 8759. Springer, Cham. https://doi.org/10.1007/978-3-319-15976-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-15976-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15975-1
Online ISBN: 978-3-319-15976-8
eBook Packages: Computer ScienceComputer Science (R0)