skip to main content
10.1145/2897937.2898100acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Public Access

SwiftGPU: fostering energy efficiency in a near-threshold GPU through a tactical performance boost

Published: 05 June 2016 Publication History

Abstract

In this paper, we investigate the challenges of preserving energy-efficiency in a Near-Threshold Computing (NTC) GPU. Two key factors can significantly undermine the efficacy of GPUs at NTC: (a) elongated delays at NTC make the GPU applications severely sensitive toMulti-cycle Latency Datapaths (MLDs) within the GPU pipeline; and (b) process variation (PV) at NTC induces a substantial performance variance. To address these emerging challenges, we propose SwiftGPU---an energyefficient GPU design paradigm at NTC. SwiftGPU dynamically adjusts the degree of parallelization, and the speed of the MLDs within each stream core of the GPU. The proposed scheme achieves an average of~15% improvement in energy-efficiency over an ideal PV-free GPU, operating at the Super-Threshold regime. SwiftGPU incurs marginal area, wire-length and power overheads of 0.65%, 2.6% and 3.7%, respectively.

References

[1]
AMD Accelerated Parallel Processing (APP) Software Development Kit.
[2]
MIAOW GPU. http://miaowgpu.org.
[3]
Calhoun, B., and Chandrakasan, A. A 256-kb 65-nm Sub-Threshold SRAM Design for Ultra-Low-Voltage Operation. In JSSC (March 2007), vol. 42, pp. 680--688.
[4]
Chang, L. and others Practical Strategies for Power-Efficient Computing Technologies. Proceedings of the IEEE 98, 2, 215--236.
[5]
Dreslinski, R. G. and others Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits. Proceedings of the IEEE 98, 2 (2010), 253--266.
[6]
He, X. and others SuperRange: Wide operational range power delivery design for both STV and NTV computing. In DATE (2014), pp. 1--6.
[7]
J. Lee, V. Sathisha and others Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling. In PACT (Oct. 2011).
[8]
Karpuzcu, U. R. and others VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages. In DSN (2012), pp. 1--11.
[9]
Krimer, E. and others Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures. In ISCA (2012), pp. 237--248.
[10]
Lee, V. W. and others Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proc. of ISCA (2010), pp. 451--460.
[11]
Leng, J. and others GPUWattch: enabling energy optimizations in GPGPUs. In Proc. of ISCA (2013), pp. 487--498.
[12]
Lucas, J. and others How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator. In Proc. of ISPASS (April 2013).
[13]
Ma, K. and others GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures. In ICPP (2012), IEEE, pp. 48--57.
[14]
Magen, N. and others Interconnect-Power Dissipation in a Microprocessor. In Proc. of SLIP (2004), pp. 7--13.
[15]
Markovic, D. and others Ultralow-Power Design in Near-Threshold Region. Proceedings of the IEEE 98, 2 (2010), 237--252.
[16]
Miller, T. N. and others Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips. In HPCA (2012), pp. 1--12.
[17]
Mohanty, S. P., and Pradhan, D. K. ULS: A dual-Vth/high-kappa nano-CMOS universal level shifter for system-level power management. JETC 6, 2 (2010).
[18]
Narasiman, V. and others Improving GPU Performance via Large Warps and Two-LevelWarp Scheduling. In MICRO (2011), ACM, pp. 308--317.
[19]
Pichai, B. and others Architectural Support for Address Translation on GPUs. In Proceedings of ASPLOS (2014).
[20]
Pinckney, N. and others Assessing the Performance Limits of Parallelized Near-Threshold Computing. In DAC (2012), pp. 1143--11148.
[21]
Pu, Y. and others Misleading energy and performance claims in sub/near threshold digital systems. In Proc. of ICCAD (2010), pp. 625--631.
[22]
Silvano, C. and others Voltage island management in near threshold manycore architectures to mitigate dark silicon. In Proc. of DATE (2014), pp. 1--6.
[23]
Ubal, R. and others Multi2Sim: A Simulation Framework for CPU-GPU Computing. In PACT (Sep. 2012).
[24]
Weste, N., and Harris, D. CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Addison-Wesley Publishing Company, USA, 2010.
[25]
Y. Wang, S. Roy, and N. Ranganathan. Run-time Power-gating in Caches of GPUs for Leakage Energy Savings. In DATE (March 2012).
[26]
Zhao, W., and Cao, Y. Predictive TechnologyModel, June 2012.

Cited By

View all
  • (2021)EFFORT: A Comprehensive Technique to Tackle Timing Violations and Improve Energy Efficiency of Near-Threshold Tensor Processing UnitsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.310685829:10(1790-1799)Online publication date: Oct-2021
  • (2020)Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing ErrorsJournal of Low Power Electronics and Applications10.3390/jlpea1004003310:4(33)Online publication date: 16-Oct-2020
  • (2020)Exploring Warp Criticality in Near-Threshold GPGPU Applications Using a Dynamic Choke Point AnalysisIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294345028:2(456-466)Online publication date: Feb-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAC '16: Proceedings of the 53rd Annual Design Automation Conference
June 2016
1048 pages
ISBN:9781450342360
DOI:10.1145/2897937
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '16

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)9
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)EFFORT: A Comprehensive Technique to Tackle Timing Violations and Improve Energy Efficiency of Near-Threshold Tensor Processing UnitsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.310685829:10(1790-1799)Online publication date: Oct-2021
  • (2020)Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing ErrorsJournal of Low Power Electronics and Applications10.3390/jlpea1004003310:4(33)Online publication date: 16-Oct-2020
  • (2020)Exploring Warp Criticality in Near-Threshold GPGPU Applications Using a Dynamic Choke Point AnalysisIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294345028:2(456-466)Online publication date: Feb-2020
  • (2020)A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs)2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)10.1109/CCGrid49817.2020.00-35(579-588)Online publication date: May-2020
  • (2019)Predicting Critical Warps in Near-Threshold GPGPU Applications using a Dynamic Choke Point Analysis2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715059(444-449)Online publication date: Mar-2019
  • (2018)ACE-GPUProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218644(1-6)Online publication date: 23-Jul-2018
  • (2018)GPU NTC Process Variation Compensation With Voltage StackingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.283166526:9(1713-1726)Online publication date: Sep-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media