ABSTRACT
Users always care about performance. Although often it's just a matter of making sure the software is doing only what it should, there are many cases where it is vital to get down to the metal and leverage the fundamental characteristics of the processor.
Until recently, performance improvement was not difficult. Processors just kept getting faster. Waiting a year for the customer's hardware to be upgraded was a valid optimization strategy. Nowadays, however, individual processors don't get much faster; systems just get more of them.
Much comment has been made on coding paradigms to target multiple-processor cores, but the data-parallel paradigm is a newer approach that may just turn out to be easier to code to, and easier for processor manufacturers to implement.
This article provides a high-level description of data-parallel computing and some practical information on how and where to use it. It also covers data-parallel programming environments, paying particular attention to those based on programmable graphics processors.
- Govindaraju, N. K., Gray, J., Kumar, R., Manocha, D. 2006. GPUTeraSort: High-performance graphics coprocessor sorting for large database management. Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data; http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2005-183). Google ScholarDigital Library
- Krüger, J., Westermann, R. 2003. Linear algebra operators for GPU implementation of numerical algorithms. ACM Transactions on Graphics 22(3). Google ScholarDigital Library
- Blythe, D. 2008. The Rise of the GPU. Proceedings of the IEEE 96(5).Google Scholar
- Shubhabrata, S., Lefohn, A. E., Owens, J. D. 2006. A work-efficient step-efficient prefix sum algorithm. Proceedings of the Workshop on Edge Computing Using New Commodity Architectures: D-26-27.Google Scholar
- Lefohn, A. E., Kniss, J., Strzodka, R., Sengupta, S., Owens, J. D. 2006. Glift: Generic, efficient, randomaccess GPU data structures. ACM Transactions on Graphics 25(1). Google ScholarDigital Library
- See reference 1.Google Scholar
Index Terms
- Data-parallel computing
Recommendations
Parallel Computing Experiences with CUDA
The CUDA programming model provides a straightforward means of describing inherently parallel computations, and NVIDIA's Tesla GPU architecture delivers high computational throughput on massively parallel problems. This article surveys experiences ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Comments