The Implementation of a High Performance GPGPU Compiler

Yang, Yi; Zhou, Huiyang

doi:10.1007/s10766-012-0228-3

The Implementation of a High Performance GPGPU Compiler

Published: 09 November 2012

Volume 41, pages 768–781, (2013)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Yi Yang¹ &
Huiyang Zhou¹

1053 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper we present our experience in developing an optimizing compiler for general purpose computation on graphics processing units (GPGPU) based on the Cetus compiler framework. The input to our compiler is a naïve GPU kernel procedure, which is functionally correct but without any consideration for performance optimization. Our compiler applies a set of optimization techniques to the naive kernel and generates the optimized GPU kernel. Our compiler supports optimizations for GPU kernels using either global memory or texture memory. The implementation of our compiler is facilitated with a source-to-source compiler infrastructure, Cetus. The code transformation in the Cetus compiler framework is called a pass. We classify all the passes used in our work into two categories: functional passes and optimization passes. The functional passes translate input kernels into desired intermediate representation, which clearly represents memory access patterns and thread configurations. A series of optimization passes improve the performance of the kernels by adapting them to the target GPGPU architecture. Our experiments show that the optimized code achieves very high performance, either superior or very close to highly fine-tuned libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.W.: An adaptive performance modling tool for GPU architectures. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2010)
Lee, S.-I., Johnson, T., Eigenmann, R.: Cetus—an extensible compiler infrastructure for source- to-source transformation. In: Proceedings of Workshops on Languages and Compilers for Parallel Computing (2003)
Lee, S., Min, S.-J., Eigenmann, R.: OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2009)
Lee, J., Lakshminarayana, N.B., Kim, H., Vuduc, R.: Many-thread aware prefetching mechanisms for gpgpu applications. IEEE/ACM International Symposium on Microarchitecture (2010)
Liu, Y., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU programs optimization. In: Proceedings of IEEE International Parallel and Distributed Processing, Symposium (2009)
NVIDIA CUDA C Programming Guide 3.1. (2010)
OpenCL. http://www.khronos.org/opencl/
Ruetsch, G., Micikevicius, P.: Optimize Matrix Transpose in CUDA. NVIDIA (2009)
Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S., Stratton, J.A., Hwu,W.W.: Optimization space pruning for a multi-threaded GPU. International Symposium on Code Generation and Optimization (2008)
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2008)
Stratton, J.A., Stone, S.S., Hwu, W.W.: MCUDA: An Efficient Implementation of CUDA Kernels on Multicores. IMPACT Technical Report IMPACT-08-01, UIUC, Feb (2008)
Ueng, S., Lathara, M., Baghsorkhi, S.S., Hwu, W.W.: CUDA-lite: Reducing GPU programming complexity. In: Proceedings of Workshops on Languages and Compilers for Parallel Computing (2008)
Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. ACM SIGPLAN conference on Programming Language Design and Implementation (2010)
Yang, Y., Xiang, P., Kong, J., Mantor, M., Zhou, H.: A unified optimizing compiler framework for different GPGPU architectures. In: ACM Transactions on Architecture and Code, Optimization (2012)
Yang, Y., Zhou, H.: http://code.google.com/p/gpgpucompiler/

Download references

Acknowledgments

This work is supported by the National Science Foundation, CAREER award CCF-0968667.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, USA
Yi Yang & Huiyang Zhou

Authors

Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Huiyang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Zhou, H. The Implementation of a High Performance GPGPU Compiler. Int J Parallel Prog 41, 768–781 (2013). https://doi.org/10.1007/s10766-012-0228-3

Download citation

Received: 06 January 2012
Accepted: 25 October 2012
Published: 09 November 2012
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10766-012-0228-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Implementation of a High Performance GPGPU Compiler

Abstract

Access this article

Similar content being viewed by others

Directive-Based Compilers for GPUs

Compiling a High-Level Directive-Based Programming Model for GPGPUs

GLES: A Practical GPGPU Optimizing Compiler Using Data Sharing and Thread Coarsening

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Implementation of a High Performance GPGPU Compiler

Abstract

Access this article

Similar content being viewed by others

Directive-Based Compilers for GPUs

Compiling a High-Level Directive-Based Programming Model for GPGPUs

GLES: A Practical GPGPU Optimizing Compiler Using Data Sharing and Thread Coarsening

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation