Abstract
We present a framework for representing image processing kernels based on decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access pattern of a kernel. The framework performs source-to-source translation of kernels expressed in high-level framework-specific C++ classes into low-level CUDA or OpenCL code with effective device-dependent optimizations such as global memory padding for memory coalescing and optimal memory bandwidth utilization. We evaluate the framework on several image filters, comparing generated code against highly-optimized CPU and GPU versions in the popular OpenCV library.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Clang: Clang: A C Language Family Frontend for LLVM (2007–2011), http://clang.llvm.org
Cornwall, J., Howes, L., Kelly, P., Parsonage, P., Nicoletti, B.: High-Performance SIMT Code Generation in an Active Visual Effects Library. In: Proceedings of the 6th ACM Conference on Computing Frontiers, pp. 175–184. ACM (2009)
Du, P., Weber, R., Luszczek, P., Tomov, S., Peterson, G., Dongarra, J.: From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming. Tech. rep. (2010)
Howes, L., Lokhmotov, A., Donaldson, A.F., Kelly, P.H.J.: Towards Metaprogramming for Parallel Systems on a Chip. In: Lin, H.-X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 36–45. Springer, Heidelberg (2010)
Lin, C., Snyder, L.: Principles of Parallel Programming. Addison-Wesley Publishing Company, USA (2008)
NVIDIA: CUDA (2006–2011), http://www.nvidia.com/cuda
Ryoo, S., Rodrigues, C., Stone, S., Stratton, J., Ueng, S., Baghsorkhi, S., Hwu, W.: Program Optimization Carving for GPU Computing. Journal of Parallel and Distributed Computing 68(10), 1389–1401 (2008)
The Khronos Group: OpenCL (2008–2011), http://www.khronos.org/opencl
Willow Garage: Open Source Computer Vision (OpenCV) (1999–2011), http://opencv.willowgarage.com/wiki
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Membarth, R., Lokhmotov, A., Teich, J. (2012). Generating GPU Code from a High-Level Representation for Image Processing Kernels. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29737-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-29737-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29736-6
Online ISBN: 978-3-642-29737-3
eBook Packages: Computer ScienceComputer Science (R0)