poster

Power-efficient computing for compute-intensive GPGPU applications

Authors:

Syed Zohaib Gilani,

Nam Sung Kim,

Michael J. SchulteAuthors Info & Claims

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 445 - 446

https://doi.org/10.1145/2370816.2370888

Published: 19 September 2012 Publication History

Get Access

Abstract

The peak performance of graphics processing units (GPUs) has traditionally been increased by increasing the number of compute resources and/or their frequency. However, these approaches significantly increase the power consumption of GPUs. Consequently, modern high-performance GPUs are power constrained and must employ more power efficient approaches for performance improvements in future processors. In this paper we propose three power-efficient techniques for improving the performance of GPUs. First, we observe that many GPGPU applications are integer instruction intensive. For such applications, we propose to utilize the fused multiply-add (FMA) units to fuse dependent integer instructions into a composite instruction, improving power efficiency and performance by reducing the number of fetched/executed instructions. Secondly, GPUs often perform computations that are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar pipeline. Finally, the register file bandwidth in GPUs is a critical resource that is optimized for 32-bit instruction operands. However, many operands require considerably fewer bits for accurate representation and computations. We propose a sliced GPU architecture that improves performance of the GPU by dual-issuing instructions to two 16-bit execution slices. Overall, our techniques result in more than a 25% (geometric mean) power efficiency improvement.

References

[1]

GeForce 8800 & NVIDIA CUDA: A new architecture for computing on the GPU. {Online}. www.gpgpu.org

Google Scholar

[2]

J. Lee et al., "Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling," in International Conference on Parallel Architectures and Compilation Techniques, 2011, pp. 111--120.

Digital Library

Google Scholar

[3]

ITRS. (2011) International technology roadmap for semiconductors.

Google Scholar

[4]

S. Hong and H. Kim, "An integrated GPU power and performance model," in International Symposium on Computer Architecture, 2010, pp. 280--289.

Digital Library

Google Scholar

[5]

Advanced Micro Devices. Heterogeneous Computing: OpenCL™ and the ATI Radeon™ HD 5870 ("Evergreen") architecture. {Online}. http://developer.amd.com/gpu_assets/Heterogeneous_Computing_OpenCL_and_the_ATI_Radeon_HD_5870_Architecture_201003.pdf

Google Scholar

[6]

L. Seiler et al., "Larrabee: a many-core x86 architecture for visual computing," ACM Transactions on Graphics, vol. 27, no. 3, pp. 18:1--18:15, 2008.

Digital Library

Google Scholar

[7]

E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, pp. 39--55, 2008.

Digital Library

Google Scholar

[8]

H. Wong et al., "Demystifying GPU microarchitecture through microbenchmarking," in IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010, pp. 235--246.

Google Scholar

Cited By

View all

Naylor MJoannou AMarkettos AMetzger PMoore SJones T(2024)Advanced Dynamic Scalarisation for RISC-V GPGPUs2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00047(260-267)Online publication date: 18-Nov-2024
https://doi.org/10.1109/ICCD63220.2024.00047
Ha DOh YRo WSolihin YHeinrich M(2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589039
Wang KLin C(2017)Decoupled Affine Computation for SIMT GPUsACM SIGARCH Computer Architecture News10.1145/3140659.308020545:2(295-306)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3140659.3080205
Show More Cited By

Index Terms

Power-efficient computing for compute-intensive GPGPU applications
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

Power-efficient computing for compute-intensive GPGPU applications
HPCA '13: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

The peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting further performance increase due to ...
Throughput computing
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing

A qualitative change in the scaling of semiconductor technology has ended the performance scaling of the single-thread processors that have been used as the building blocks for high-performance computers for the last decade and has made computers of all ...
Throughput and Power Efficiency Evaluations of Block Ciphers on Kepler and GCN GPUs
CANDAR '13: Proceedings of the 2013 First International Symposium on Computing and Networking

Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption is a primary deterrent for data center security on cloud services and handheld devices such as smartphones and ...

Comments

Information & Contributors

Information

Published In

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

September 2012

512 pages

ISBN:9781450311823

DOI:10.1145/2370816

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

PACT '12

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '12: International Conference on Parallel Architectures and Compilation Techniques

September 19 - 23, 2012

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
621
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Naylor MJoannou AMarkettos AMetzger PMoore SJones T(2024)Advanced Dynamic Scalarisation for RISC-V GPGPUs2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00047(260-267)Online publication date: 18-Nov-2024
https://doi.org/10.1109/ICCD63220.2024.00047
Ha DOh YRo WSolihin YHeinrich M(2023)R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589039(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589039
Wang KLin C(2017)Decoupled Affine Computation for SIMT GPUsACM SIGARCH Computer Architecture News10.1145/3140659.308020545:2(295-306)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3140659.3080205
Wang KLin C(2017)Decoupled Affine Computation for SIMT GPUsProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080205(295-306)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3079856.3080205
Mittal SVetter J(2014)A Survey of Methods for Analyzing and Improving GPU Energy EfficiencyACM Computing Surveys10.1145/263634247:2(1-23)Online publication date: 25-Aug-2014
https://dl.acm.org/doi/10.1145/2636342
Xiang PYang YMantor MRubin NHsu LZhou HMalony ANemirovsky MMidkiff S(2013)Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancementProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465022(433-442)Online publication date: 10-Jun-2013
https://dl.acm.org/doi/10.1145/2464996.2465022
Liu L(2013)Computing infrastructure for big data processingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-013-3900-x7:2(165-170)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1007/s11704-013-3900-x

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations