skip to main content
10.1145/2370816.2370888acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
poster

Power-efficient computing for compute-intensive GPGPU applications

Published: 19 September 2012 Publication History

Abstract

The peak performance of graphics processing units (GPUs) has traditionally been increased by increasing the number of compute resources and/or their frequency. However, these approaches significantly increase the power consumption of GPUs. Consequently, modern high-performance GPUs are power constrained and must employ more power efficient approaches for performance improvements in future processors. In this paper we propose three power-efficient techniques for improving the performance of GPUs. First, we observe that many GPGPU applications are integer instruction intensive. For such applications, we propose to utilize the fused multiply-add (FMA) units to fuse dependent integer instructions into a composite instruction, improving power efficiency and performance by reducing the number of fetched/executed instructions. Secondly, GPUs often perform computations that are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar pipeline. Finally, the register file bandwidth in GPUs is a critical resource that is optimized for 32-bit instruction operands. However, many operands require considerably fewer bits for accurate representation and computations. We propose a sliced GPU architecture that improves performance of the GPU by dual-issuing instructions to two 16-bit execution slices. Overall, our techniques result in more than a 25% (geometric mean) power efficiency improvement.

References

[1]
GeForce 8800 & NVIDIA CUDA: A new architecture for computing on the GPU. {Online}. www.gpgpu.org
[2]
J. Lee et al., "Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling," in International Conference on Parallel Architectures and Compilation Techniques, 2011, pp. 111--120.
[3]
ITRS. (2011) International technology roadmap for semiconductors.
[4]
S. Hong and H. Kim, "An integrated GPU power and performance model," in International Symposium on Computer Architecture, 2010, pp. 280--289.
[5]
Advanced Micro Devices. Heterogeneous Computing: OpenCL™ and the ATI Radeon™ HD 5870 ("Evergreen") architecture. {Online}. http://developer.amd.com/gpu_assets/Heterogeneous_Computing_OpenCL_and_the_ATI_Radeon_HD_5870_Architecture_201003.pdf
[6]
L. Seiler et al., "Larrabee: a many-core x86 architecture for visual computing," ACM Transactions on Graphics, vol. 27, no. 3, pp. 18:1--18:15, 2008.
[7]
E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, pp. 39--55, 2008.
[8]
H. Wong et al., "Demystifying GPU microarchitecture through microbenchmarking," in IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010, pp. 235--246.

Cited By

View all
  • (2024)Advanced Dynamic Scalarisation for RISC-V GPGPUs2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00047(260-267)Online publication date: 18-Nov-2024
  • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
  • (2017)Decoupled Affine Computation for SIMT GPUsACM SIGARCH Computer Architecture News10.1145/3140659.308020545:2(295-306)Online publication date: 24-Jun-2017
  • Show More Cited By

Index Terms

  1. Power-efficient computing for compute-intensive GPGPU applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
    September 2012
    512 pages
    ISBN:9781450311823
    DOI:10.1145/2370816

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 September 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. gpu
    2. low-power
    3. power efficiency

    Qualifiers

    • Poster

    Conference

    PACT '12
    Sponsor:
    • IFIP WG 10.3
    • SIGARCH
    • IEEE CS TCPP
    • IEEE CS TCAA

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Advanced Dynamic Scalarisation for RISC-V GPGPUs2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00047(260-267)Online publication date: 18-Nov-2024
    • (2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
    • (2017)Decoupled Affine Computation for SIMT GPUsACM SIGARCH Computer Architecture News10.1145/3140659.308020545:2(295-306)Online publication date: 24-Jun-2017
    • (2017)Decoupled Affine Computation for SIMT GPUsProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080205(295-306)Online publication date: 24-Jun-2017
    • (2014)A Survey of Methods for Analyzing and Improving GPU Energy EfficiencyACM Computing Surveys10.1145/263634247:2(1-23)Online publication date: 25-Aug-2014
    • (2013)Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancementProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465022(433-442)Online publication date: 10-Jun-2013
    • (2013)Computing infrastructure for big data processingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-013-3900-x7:2(165-170)Online publication date: 1-Apr-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media