ABSTRACT
The peak performance of graphics processing units (GPUs) has traditionally been increased by increasing the number of compute resources and/or their frequency. However, these approaches significantly increase the power consumption of GPUs. Consequently, modern high-performance GPUs are power constrained and must employ more power efficient approaches for performance improvements in future processors. In this paper we propose three power-efficient techniques for improving the performance of GPUs. First, we observe that many GPGPU applications are integer instruction intensive. For such applications, we propose to utilize the fused multiply-add (FMA) units to fuse dependent integer instructions into a composite instruction, improving power efficiency and performance by reducing the number of fetched/executed instructions. Secondly, GPUs often perform computations that are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar pipeline. Finally, the register file bandwidth in GPUs is a critical resource that is optimized for 32-bit instruction operands. However, many operands require considerably fewer bits for accurate representation and computations. We propose a sliced GPU architecture that improves performance of the GPU by dual-issuing instructions to two 16-bit execution slices. Overall, our techniques result in more than a 25% (geometric mean) power efficiency improvement.
- GeForce 8800 & NVIDIA CUDA: A new architecture for computing on the GPU. {Online}. www.gpgpu.orgGoogle Scholar
- J. Lee et al., "Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling," in International Conference on Parallel Architectures and Compilation Techniques, 2011, pp. 111--120. Google ScholarDigital Library
- ITRS. (2011) International technology roadmap for semiconductors.Google Scholar
- S. Hong and H. Kim, "An integrated GPU power and performance model," in International Symposium on Computer Architecture, 2010, pp. 280--289. Google ScholarDigital Library
- Advanced Micro Devices. Heterogeneous Computing: OpenCL™ and the ATI Radeon™ HD 5870 ("Evergreen") architecture. {Online}. http://developer.amd.com/gpu_assets/Heterogeneous_Computing_OpenCL_and_the_ATI_Radeon_HD_5870_Architecture_201003.pdfGoogle Scholar
- L. Seiler et al., "Larrabee: a many-core x86 architecture for visual computing," ACM Transactions on Graphics, vol. 27, no. 3, pp. 18:1--18:15, 2008. Google ScholarDigital Library
- E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, pp. 39--55, 2008. Google ScholarDigital Library
- H. Wong et al., "Demystifying GPU microarchitecture through microbenchmarking," in IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010, pp. 235--246.Google Scholar
Index Terms
Power-efficient computing for compute-intensive GPGPU applications
Recommendations
Power-efficient computing for compute-intensive GPGPU applications
HPCA '13: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)The peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting further performance increase due to ...
Throughput computing
ICS '10: Proceedings of the 24th ACM International Conference on SupercomputingA qualitative change in the scaling of semiconductor technology has ended the performance scaling of the single-thread processors that have been used as the building blocks for high-performance computers for the last decade and has made computers of all ...
Throughput and Power Efficiency Evaluations of Block Ciphers on Kepler and GCN GPUs
CANDAR '13: Proceedings of the 2013 First International Symposium on Computing and NetworkingComputer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption is a primary deterrent for data center security on cloud services and handheld devices such as smartphones and ...
Comments