research-article

Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files

Authors:

Jenq-Kuen LeeAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 23, Issue 2

Article No.: 18, Pages 1 - 25

https://doi.org/10.1145/3133218

Published: 07 November 2017 Publication History

Get Access

Abstract

A modern GPU can simultaneously process thousands of hardware threads. These threads are grouped into fixed-size SIMD batches executing the same instruction on vectors of data in a lockstep to achieve high throughput and performance. The register files are huge due to each SIMD group accessing a dedicated set of vector registers for fast context switching, and consequently the power consumption of register files has become an important issue. One proposed solution is to replace some of the vector registers by scalar registers, as different threads in a same SIMD group operate on scalar values and so the redundant computations and accesses of these scalar values can be eliminated. However, it has been observed that a significant number of registers containing affine vectors υ such that υ[i] = b + i × s can be represented by base b and stride s. Therefore, this article proposes an affine register file design for GPUs that is energy efficient due to it reducing the redundant executions of both the uniform and affine vectors. This design uses a pair of registers to store the base and stride of each affine vector and provides specific affine ALUs to execute affine instructions. A method of compiler analysis has been developed to detect scalars and affine vectors and annotate instructions for facilitating their corresponding scalar and affine computations. Furthermore, a priority-based register allocation scheme has been implemented to assign scalars and affine vectors to appropriate scalar and affine register files. Experimental results show that this design was able to dispatch 43.56% of the computations to scalar and affine ALUs when using eight scalar and four affine registers per warp. This resulted in the current design also reducing the energy consumption of the register files and ALUs to 21.86% and 26.54%, respectively, and it reduced the overall energy consumption of the GPU by an average of 5.18%.

References

[1]

Mohammad Abdel-Majeed and Murali Annavaram. 2013. Warped register file: A power efficient register file for GPGPUs. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). IEEE, Los Alamitos, CA, 412--423.

Abstract

References

Cited By

Index Terms

Recommendations

Achieving spilling-friendly register file assignment for highly distributed register files

Register coalescing techniques for heterogeneous register architecture with copy sifting

Non-Consistent Dual Register Files to Reduce Register Pressure

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations