Loading [a11y]/accessibility-menu.js
LOOG: Improving GPU Efficiency With Light-Weight Out-Of-Order Execution | IEEE Journals & Magazine | IEEE Xplore

LOOG: Improving GPU Efficiency With Light-Weight Out-Of-Order Execution


Abstract:

GPUs are one of the most prevalent platforms for accelerating general-purpose workloads due to their intuitive programming model, computing capacity, and cost-effectivene...Show More

Abstract:

GPUs are one of the most prevalent platforms for accelerating general-purpose workloads due to their intuitive programming model, computing capacity, and cost-effectiveness. GPUs rely on massive multi-threading and fast context switching to overlap computations with memory operations. Among the diverse GPU workloads, there exists a class of kernels that fail to maintain a sufficient number of active warps to hide the latency of memory operations, and thus suffer from frequent stalling. We observe that these kernels will benefit from increased levels of Instruction-Level Parallelism and we propose a novel architecture with lightweight Out-Of-Order execution capability. To minimize hardware overheads, we carefully design our extension to highly re-use the existing micro-architectural structures. We show that the proposed architecture outperforms traditional platforms by 15 to 46 percent on average for low occupancy kernels, with an area overhead of 0.74 to 3.94 percent. Finally, we prove the potential of our proposal as a GPU u-arch alternative, by providing a 5 percent speedup over a wide collection of 63 general-purpose kernels with as little as 0.74 percent area overhead.
Published in: IEEE Computer Architecture Letters ( Volume: 18, Issue: 2, 01 July-Dec. 2019)
Page(s): 166 - 169
Date of Publication: 04 November 2019

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.