Abstract
Graphical Processing Units (GPUs) have recently been used to enable parallel application development. The most prominent initiative has been provided by NVIDIA™ with the so-called CUDA™ architecture, designed to GeForce™ graphic cards. However, even with CUDA C-like programming language, parallel codification remains somewhat awkward if compared to sequential codification. The programmer still has to deal with low-level hardware details such as generation and synchronization of threads and GPU tracks and sectors. In this paper, we propose a programmer-friendly interface for CUDA-C programming, in such a way that most hardware details are hidden from the programmer. We show how code readability is improved without undermining parallel execution performance.
- Ioannis E. Venetis, Theodore S. Papatheodorou, Guang R. Gao. 2006. Handling Massive Parallelism Efficiently: Introducing Batches of Thread.Google Scholar
- David Culler, J.P. Singh and Anoop Gupta. 1998. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann; 1st edition. Google ScholarDigital Library
- N. M. Amato, R. Iyer, S. Sundaresan, Y. Wu. 1998. A Comparison of Parallel Sorting Algorithms on Different Architectures.Google Scholar
- Raymond Greenlaw, H. James Hoover and Walter L. Ruzzo. 1995. Limits to Parallel Computation: P-Completeness Theory. Google ScholarDigital Library
- David Geer. 2005. Industry Trends: Chip Makers Turn to Multicore Processors. Computer, vol. 38, no. 5, pp. 11--13, May 2005. Google ScholarDigital Library
- 2006. The Technical Impact of Moore's Law. IEEE solid-state circuits society newsletter.Google Scholar
- OpenMP ARB. 2011. OpenMP.org. Available at: http://www.openmp.org/ (Accessed: 25 October 2011).Google Scholar
- David Luebke and Greg Humphreys. 2007. How GPUs Work. IEEE Computer. Google ScholarDigital Library
- NVIDIA Corporation. 2011. CUDA. Available at: http://www.nvidia.com/cuda. (Accessed: 25 October 2011).Google Scholar
- NVIDIA Corporation. 2011. GeForce 8400. Available at: http://www.nvidia.com/object/geforce_8400.html. (Accessed: 25 October 2011).Google Scholar
- NVIDIA Corporation. 2011. CUDA Programming Guide Version 4.0.Google Scholar
- NVIDIA Corporation. 2011. CUDA Reference Manual Version 4.0.Google Scholar
- Udi Manber. 1989. Introduction to Algorithms: A Creative Approach. Addison-Wesley; 1 edition Google ScholarDigital Library
- Cheer-Sun D. Yang, Amie L. Souter, and Lori L. Pollock. 1998. All-du-path coverage for parallel programs. SIGSOFT Softw. Eng. Notes 23, 2 (March 1998), 153-162. DOI=10.1145/271775.271804 http://doi.acm.org/10.1145/271775.271804 Google ScholarDigital Library
- Chia-Chu Chiang. 2005. Implicit heterogeneous and parallel programming. SIGSOFT Softw. Eng. Notes 30, 3 (May 2005), 1-6. DOI=10.1145/1061874.1061887 http://doi.acm.org/10.1145/1061874.1061887 Google ScholarDigital Library
Index Terms
- Improving CUDA™ C/C++ encoding readability to foster parallel application development
Recommendations
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Hybrid Parallel Programming on GPU Clusters
ISPA '10: Proceedings of the International Symposium on Parallel and Distributed Processing with ApplicationsNowadays, NVIDIA’s CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has ...
A Parallel Implementation of the 2D Wavelet Transform Using CUDA
PDP '09: Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based ProcessingThere is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing ...
Comments