skip to main content
10.1145/2370816.2370888acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
poster

Power-efficient computing for compute-intensive GPGPU applications

Published:19 September 2012Publication History

ABSTRACT

The peak performance of graphics processing units (GPUs) has traditionally been increased by increasing the number of compute resources and/or their frequency. However, these approaches significantly increase the power consumption of GPUs. Consequently, modern high-performance GPUs are power constrained and must employ more power efficient approaches for performance improvements in future processors. In this paper we propose three power-efficient techniques for improving the performance of GPUs. First, we observe that many GPGPU applications are integer instruction intensive. For such applications, we propose to utilize the fused multiply-add (FMA) units to fuse dependent integer instructions into a composite instruction, improving power efficiency and performance by reducing the number of fetched/executed instructions. Secondly, GPUs often perform computations that are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar pipeline. Finally, the register file bandwidth in GPUs is a critical resource that is optimized for 32-bit instruction operands. However, many operands require considerably fewer bits for accurate representation and computations. We propose a sliced GPU architecture that improves performance of the GPU by dual-issuing instructions to two 16-bit execution slices. Overall, our techniques result in more than a 25% (geometric mean) power efficiency improvement.

References

  1. GeForce 8800 & NVIDIA CUDA: A new architecture for computing on the GPU. {Online}. www.gpgpu.orgGoogle ScholarGoogle Scholar
  2. J. Lee et al., "Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling," in International Conference on Parallel Architectures and Compilation Techniques, 2011, pp. 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ITRS. (2011) International technology roadmap for semiconductors.Google ScholarGoogle Scholar
  4. S. Hong and H. Kim, "An integrated GPU power and performance model," in International Symposium on Computer Architecture, 2010, pp. 280--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Advanced Micro Devices. Heterogeneous Computing: OpenCL™ and the ATI Radeon™ HD 5870 ("Evergreen") architecture. {Online}. http://developer.amd.com/gpu_assets/Heterogeneous_Computing_OpenCL_and_the_ATI_Radeon_HD_5870_Architecture_201003.pdfGoogle ScholarGoogle Scholar
  6. L. Seiler et al., "Larrabee: a many-core x86 architecture for visual computing," ACM Transactions on Graphics, vol. 27, no. 3, pp. 18:1--18:15, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, pp. 39--55, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Wong et al., "Demystifying GPU microarchitecture through microbenchmarking," in IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010, pp. 235--246.Google ScholarGoogle Scholar

Index Terms

  1. Power-efficient computing for compute-intensive GPGPU applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader