ABSTRACT
Significant performance gains can be achieved by using hardware architectures that integrate GPUs with conventional CPUs to form a hybrid and highly parallel computational engine. However, programming these novel architectures is tedious and error prone, reducing their ease of acceptance in an even wider range of computationally intensive applications. In this paper we discuss a refactoring technique, called Extract Kernel that transforms a loop written in C into a parallel function that uses NVIDIA's CUDA framework to execute on a GPU. The selected approach and the challenges encountered are described, as well as some early results that demonstrate the potential of this refactoring.
- Jeffrey C. Carver, Richard P. Kendall, Susan E. Squires, and Douglass E. Post. Software development environments for scientific and engineering software: A series of case studies. In ICSE '07: Proceedings of the 29th International Conference on Software Engineering, pages 550--559, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- Danny Dig, Mihai Tarce, Cosmin Radoi, Marius Minea, and Ralph Johnson. Relooper: refactoring for loop parallelism in java. In Proceeding of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, OOPSLA '09, 2009. Google ScholarDigital Library
- Eclipse Parallel Tools Platform (PTP). http://www.eclipse.org/ptp, accessed January 2011.Google Scholar
- Stuart Faulk, Eugene Loh, Michael L. Van De Vanter, Susan Squires, and Lawrence G. Votta. Scientific computing's productivity gridlock: How software engineering can help. Computing in Science and Engineering, 11:30--39, 2009. Google ScholarDigital Library
- Fredrik Kjolstad, Danny Dig, and Marc Snir. Bringing the hpc programmer's ide into the 21st century through refactoring. In SPLASH 2010 Workshop on Concurrency for the Application Programmer, October 2010.Google Scholar
- David M. Kunzman and Laxmikant V. Kalé. Towards a framework for abstracting accelerators in parallel applications: experience with cell. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009. Google ScholarDigital Library
- Shih-Wei Liao, Amer Diwan, Robert P. Bosch, Jr., Anwar Ghuloum, and Monica S. Lam. SUIF Explorer: an interactive and interprocedural parallelizer. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), 1999. Google ScholarDigital Library
- F. H. McMahon. Livermore fortran kernels: A computer test of numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, CA, December 1986.Google Scholar
- NVIDIA's Thrust GPU Library. http://code.google.com/p/thrust, accessed March 2011.Google Scholar
- OpenCL - The open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl, accessed January 2011.Google Scholar
- S. Squires, M. Van De Vanter, and L. Votta. Yes, there is an 'expertise gap' in hpc application development. In Proceedings of the 3rd International Workshop on Productivity and Performance in High-End Computing (PPHEC). IEEE CS Press, 2006.Google Scholar
- Top 500 Supercomputing Sites. http://www.top500.org, accessed January 2011.Google Scholar
- Mingliang Wang and M. Parashar. Object-oriented stream programming using aspects. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), April 2010.Google ScholarCross Ref
Index Terms
- A refactoring tool to extract GPU kernels
Recommendations
Out-of-core implementation for accelerator kernels on heterogeneous clouds
Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsThe graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi
The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of ...
Comments