Performance of CPU/GPU compiler directives on ISO/TTI kernels

Ghosh, Sayan; Liao, Terrence; Calandra, Henri; Chapman, Barbara M.

doi:10.1007/s00607-013-0367-4

Performance of CPU/GPU compiler directives on ISO/TTI kernels

Published: 21 November 2013

Volume 96, pages 1149–1162, (2014)
Cite this article

Computing Aims and scope Submit manuscript

Sayan Ghosh¹,
Terrence Liao²,
Henri Calandra³ &
…
Barbara M. Chapman¹

313 Accesses
3 Citations
Explore all metrics

Abstract

GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models—PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5–1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementing a GPU-Portable Field Line Tracing Application with OpenMP Offload

Feasibility Studies in Multi-GPU Target Offloading

Evaluating GPU Programming Models for the LUMI Supercomputer

References

OpenMP ARB (2010) The OpenMP API specification for parallel programming. http://openmp.org/wp/
Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198
Article Google Scholar
Ayguadé E, Badia RM, Igual FD, Labarta J, Mayo R, Quintana-Ortí ES (2009) An extension of the starss programming model for platforms with multiple gpus. In: Euro-Par 2009 parallel processing, Springer, New York, pp 851–862
Benkner S, Pllana S, Traff JL, Tsigas P, Dolinsky U, Augonnet C, Bachmayer B, Kessler C, Moloney D, Osipov V (2011) Peppher: efficient and productive usage of hybrid computing systems. Micro IEEE 31(5):28–41
Article Google Scholar
Bihan S, Moulard GE, Dolbeau R, Calandra H, Abdelkhalek R (2009) Directive-based heterogeneous programming-a gpu-accelerated rtm use case. In: Proceedings of the 7th international conference on computing, communications and control technologies
Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, IEEE Press, New York, p 4
Dolbeau R, Bihan S, Bodin F (2007) Hmpp: a hybrid multi-core parallel programming environment. In: Workshop on general purpose processing on graphics processing units (GPGPU 2007)
The Portland Group (2010) Pgi accelerator programming model. http://www.pgroup.com/resources/accel.htm
Lee S, Eigenmann R (2010) Openmpc: extended openmp programming and tuning for gpus. In: Proceedings of the 2010 ACM/IEEE international conference for high performance computing, networking, storage and analysis, IEEE Computer Society, pp 1–11
Nvidia (2011) Nvidia cuda visual profiler. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Compute_Visual_Profiler_User_Guide.pdf
CUDA Nvidia (2007) Compute unified device architecture programming guide
PGI Nvidia, CAPS and Cray (2011) Openacc application programming interface: directives for accelerators. http://www.openacc.org
Stone JE, Gohara D, Shi G (2010) Opencl: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66
Article Google Scholar
Whitehead N, Fit-Florea A (2011) Precision & performance: floating point and ieee 754 compliance for nvidia gpus. rn (A + B) 21:1–1874919 424
Google Scholar

Download references

Acknowledgments

We wish to thank Georges-Emmanuel Moulard of CAPS enterprise, Matthew Colgrove of PGI and, Philippe Thierry of Intel Corp., who helped us immensely by providing answers to our questions, and suggesting improvements. This work would not have been possible without their guidance. Finally, we would like to thank TOTAL for granting permission to publish this paper.

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, Houston, TX, USA
Sayan Ghosh & Barbara M. Chapman
TOTAL E&P R&T USA, LLC, Houston, TX, USA
Terrence Liao
TOTAL E&P, Pau, France
Henri Calandra

Authors

Sayan Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Terrence Liao
View author publications
You can also search for this author in PubMed Google Scholar
Henri Calandra
View author publications
You can also search for this author in PubMed Google Scholar
Barbara M. Chapman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayan Ghosh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, S., Liao, T., Calandra, H. et al. Performance of CPU/GPU compiler directives on ISO/TTI kernels. Computing 96, 1149–1162 (2014). https://doi.org/10.1007/s00607-013-0367-4

Download citation

Received: 20 March 2013
Accepted: 29 October 2013
Published: 21 November 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s00607-013-0367-4

Keywords

Mathematics Subject Classification

68U99

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance of CPU/GPU compiler directives on ISO/TTI kernels

Abstract

Access this article

Similar content being viewed by others

Implementing a GPU-Portable Field Line Tracing Application with OpenMP Offload

Feasibility Studies in Multi-GPU Target Offloading

Evaluating GPU Programming Models for the LUMI Supercomputer

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Performance of CPU/GPU compiler directives on ISO/TTI kernels

Abstract

Access this article

Similar content being viewed by others

Implementing a GPU-Portable Field Line Tracing Application with OpenMP Offload

Feasibility Studies in Multi-GPU Target Offloading

Evaluating GPU Programming Models for the LUMI Supercomputer

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation