research-article

Public Access

SwiftGPU: fostering energy efficiency in a near-threshold GPU through a tactical performance boost

Authors:

Koushik Chakraborty,

Sanghamitra RoyAuthors Info & Claims

DAC '16: Proceedings of the 53rd Annual Design Automation Conference

Article No.: 150, Pages 1 - 6

https://doi.org/10.1145/2897937.2898100

Published: 05 June 2016 Publication History

Abstract

In this paper, we investigate the challenges of preserving energy-efficiency in a Near-Threshold Computing (NTC) GPU. Two key factors can significantly undermine the efficacy of GPUs at NTC: (a) elongated delays at NTC make the GPU applications severely sensitive toMulti-cycle Latency Datapaths (MLDs) within the GPU pipeline; and (b) process variation (PV) at NTC induces a substantial performance variance. To address these emerging challenges, we propose SwiftGPU---an energyefficient GPU design paradigm at NTC. SwiftGPU dynamically adjusts the degree of parallelization, and the speed of the MLDs within each stream core of the GPU. The proposed scheme achieves an average of~15% improvement in energy-efficiency over an ideal PV-free GPU, operating at the Super-Threshold regime. SwiftGPU incurs marginal area, wire-length and power overheads of 0.65%, 2.6% and 3.7%, respectively.

References

[1]

AMD Accelerated Parallel Processing (APP) Software Development Kit.

[2]

MIAOW GPU. http://miaowgpu.org.

[3]

Calhoun, B., and Chandrakasan, A. A 256-kb 65-nm Sub-Threshold SRAM Design for Ultra-Low-Voltage Operation. In JSSC (March 2007), vol. 42, pp. 680--688.

[4]

Chang, L. and others Practical Strategies for Power-Efficient Computing Technologies. Proceedings of the IEEE 98, 2, 215--236.

[5]

Dreslinski, R. G. and others Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits. Proceedings of the IEEE 98, 2 (2010), 253--266.

[6]

He, X. and others SuperRange: Wide operational range power delivery design for both STV and NTV computing. In DATE (2014), pp. 1--6.

Digital Library

[7]

J. Lee, V. Sathisha and others Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling. In PACT (Oct. 2011).

Digital Library

[8]

Karpuzcu, U. R. and others VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages. In DSN (2012), pp. 1--11.

Digital Library

[9]

Krimer, E. and others Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures. In ISCA (2012), pp. 237--248.

Digital Library

[10]

Lee, V. W. and others Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proc. of ISCA (2010), pp. 451--460.

Digital Library

[11]

Leng, J. and others GPUWattch: enabling energy optimizations in GPGPUs. In Proc. of ISCA (2013), pp. 487--498.

Digital Library

[12]

Lucas, J. and others How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator. In Proc. of ISPASS (April 2013).

[13]

Ma, K. and others GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures. In ICPP (2012), IEEE, pp. 48--57.

Digital Library

[14]

Magen, N. and others Interconnect-Power Dissipation in a Microprocessor. In Proc. of SLIP (2004), pp. 7--13.

Digital Library

[15]

Markovic, D. and others Ultralow-Power Design in Near-Threshold Region. Proceedings of the IEEE 98, 2 (2010), 237--252.

[16]

Miller, T. N. and others Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips. In HPCA (2012), pp. 1--12.

Digital Library

[17]

Mohanty, S. P., and Pradhan, D. K. ULS: A dual-V_th/high-kappa nano-CMOS universal level shifter for system-level power management. JETC 6, 2 (2010).

Digital Library

[18]

Narasiman, V. and others Improving GPU Performance via Large Warps and Two-LevelWarp Scheduling. In MICRO (2011), ACM, pp. 308--317.

Digital Library

[19]

Pichai, B. and others Architectural Support for Address Translation on GPUs. In Proceedings of ASPLOS (2014).

Digital Library

[20]

Pinckney, N. and others Assessing the Performance Limits of Parallelized Near-Threshold Computing. In DAC (2012), pp. 1143--11148.

Digital Library

[21]

Pu, Y. and others Misleading energy and performance claims in sub/near threshold digital systems. In Proc. of ICCAD (2010), pp. 625--631.

Digital Library

[22]

Silvano, C. and others Voltage island management in near threshold manycore architectures to mitigate dark silicon. In Proc. of DATE (2014), pp. 1--6.

Digital Library

[23]

Ubal, R. and others Multi2Sim: A Simulation Framework for CPU-GPU Computing. In PACT (Sep. 2012).

Digital Library

[24]

Weste, N., and Harris, D. CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Addison-Wesley Publishing Company, USA, 2010.

Digital Library

[25]

Y. Wang, S. Roy, and N. Ranganathan. Run-time Power-gating in Caches of GPUs for Leakage Energy Savings. In DATE (March 2012).

Digital Library

[26]

Zhao, W., and Cao, Y. Predictive TechnologyModel, June 2012.

Cited By

Gundi NShabanian TBasu PPandey PRoy SChakraborty K(2021)EFFORT: A Comprehensive Technique to Tackle Timing Violations and Improve Energy Efficiency of Near-Threshold Tensor Processing UnitsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.310685829:10(1790-1799)Online publication date: Oct-2021
https://doi.org/10.1109/TVLSI.2021.3106858
Pandey PGundi NBasu PShabanian TPatrick MChakraborty KRoy S(2020)Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing ErrorsJournal of Low Power Electronics and Applications10.3390/jlpea1004003310:4(33)Online publication date: 16-Oct-2020
https://doi.org/10.3390/jlpea10040033
Sanyal SBasu PBal ARoy SChakraborty K(2020)Exploring Warp Criticality in Near-Threshold GPGPU Applications Using a Dynamic Choke Point AnalysisIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294345028:2(456-466)Online publication date: Feb-2020
https://doi.org/10.1109/TVLSI.2019.2943450
Show More Cited By

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

DAC '16: Proceedings of the 53rd Annual Design Automation Conference

June 2016

1048 pages

ISBN:9781450342360

DOI:10.1145/2897937

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

DAC '16

DAC '16: The 53rd Annual Design Automation Conference 2016

June 5 - 9, 2016

Texas, Austin

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
441
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)9

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gundi NShabanian TBasu PPandey PRoy SChakraborty K(2021)EFFORT: A Comprehensive Technique to Tackle Timing Violations and Improve Energy Efficiency of Near-Threshold Tensor Processing UnitsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2021.310685829:10(1790-1799)Online publication date: Oct-2021
https://doi.org/10.1109/TVLSI.2021.3106858
Pandey PGundi NBasu PShabanian TPatrick MChakraborty KRoy S(2020)Challenges and Opportunities in Near-Threshold DNN Accelerators around Timing ErrorsJournal of Low Power Electronics and Applications10.3390/jlpea1004003310:4(33)Online publication date: 16-Oct-2020
https://doi.org/10.3390/jlpea10040033
Sanyal SBasu PBal ARoy SChakraborty K(2020)Exploring Warp Criticality in Near-Threshold GPGPU Applications Using a Dynamic Choke Point AnalysisIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294345028:2(456-466)Online publication date: Feb-2020
https://doi.org/10.1109/TVLSI.2019.2943450
Ilager SMuralidhar RRammohanrao KBuyya R(2020)A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs)2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)10.1109/CCGrid49817.2020.00-35(579-588)Online publication date: May-2020
https://doi.org/10.1109/CCGrid49817.2020.00-35
Sanyal SBasu PBal ARoy SChakraborty K(2019)Predicting Critical Warps in Near-Threshold GPGPU Applications using a Dynamic Choke Point Analysis2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715059(444-449)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8715059
Shabanian TBal ABasu PChakraborty KRoy S(2018)ACE-GPUProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218644(1-6)Online publication date: 23-Jul-2018
https://dl.acm.org/doi/10.1145/3218603.3218644
Trapani Possignolo REbrahimi EArdestani ESankaranarayanan ABriz JRenau J(2018)GPU NTC Process Variation Compensation With Voltage StackingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.283166526:9(1713-1726)Online publication date: Sep-2018
https://doi.org/10.1109/TVLSI.2018.2831665

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten