research-article

GreenTPU: Improving Timing Error Resilience of a Near-Threshold Tensor Processing Unit

Authors:

Pramesh Pandey,

Koushik Chakraborty,

Sanghamitra RoyAuthors Info & Claims

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Article No.: 173, Pages 1 - 6

https://doi.org/10.1145/3316781.3317835

Published: 02 June 2019 Publication History

Abstract

The emergence of hardware accelerators has brought about several orders of magnitude improvement in the speed of the deep neural-network (DNN) inference. Among such DNN accelerators, Google Tensor Processing Unit (TPU) has transpired to be the best-in-class, offering more than 15× speedup over the contemporary GPUs. However, the rapid growth in several DNN workloads conspires to escalate the energy consumptions of the TPU-based data-centers. In order to restrict the energy consumption of TPUs, we propose Green TPU---a low-power near-threshold (NTC) TPU design paradigm. To ensure a high inference accuracy at a low-voltage operation, GreenTPU identifies the patterns in the error-causing activation sequences in the systolic array, and prevents further timing errors from the same sequence by intermittently boosting the operating voltage of the specific multiplier-and-accumulator units in the TPU. Compared to a cutting-edge timing error mitigation technique for TPUs, GreenTPU enables 2X--3X higher performance in an NTC TPU, with a minimal loss in the prediction accuracy.

References

[1]

OK Google, Siri, Alexa, Cortana; Can you tell me some stats on voice search? https://edit.co.uk/blog/google-voice-search-stats-growth-trends/.

[2]

Reuters-21578 Dataset.

[3]

Chen, Y.-H. and others Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Micro 37, 3 (2017), 12--21.

Digital Library

[4]

Chollet, F., et al. Keras. https://keras.io, 2015.

[5]

Dreslinski, R. G. and others Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits. Proc. of the IEEE 98, 2 (2010), 253--266.

[6]

Ernst, D. and others Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation. In Proc. of MICRO (2003), pp. 7--18.

Digital Library

[7]

Jouppi, N. P. and others In-datacenter performance analysis of a tensor processing unit. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on (2017), IEEE, pp. 1--12.

Digital Library

[8]

Karpuzcu, U. R. and others VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages. In DSN (2012), pp. 1--11.

Digital Library

[9]

Khatamifard, S. K. and others VARIUS-TC: A modular architecture-level model of parametric variation for thin-channel switches. In ICCD (2016), pp. 654--661.

[10]

Kim, S. and others Energy-Efficient Neural Network Acceleration in the Presence of Bit-Level Memory Errors. IEEE Transactions on Circuits and Systems I: Regular Papers, 99 (2018), 1--14.

[11]

Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images.

[12]

LeCun, Y., and Cortes, C. MNIST handwritten digit database.

[13]

Lin, Y. and others Variation-tolerant architectures for convolutional neural networks in the near threshold voltage regime. In Signal Processing Systems (SiPS), 2016 IEEE International Workshop on (2016), IEEE, pp. 17--22.

[14]

Maas, A. L. and others Learning Word Vectors for Sentiment Analysis. Association for Computational Linguistics, pp. 142--150.

Digital Library

[15]

Miller, T. N. and others Booster: Reactive Core Acceleration for Mitigating the Effects of Process Variation and Application Imbalance in Low-Voltage Chips. In HPCA (2012), pp. 1--12.

Digital Library

[16]

NanGate. http://www.nangate.com/?page_id=2328.

[17]

Reagen, B. and others Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ACM SIGARCH Computer Architecture News (2016), vol. 44, IEEE Press, pp. 267--278.

Digital Library

[18]

Sarangi, S. and others VARIUS:A Model of Process Variation and Resulting Timing Errors for Microarchitects. IEEE Tran. on Semicond. Manufac. 21 (2008), 3--13.

[19]

Shabanian, T. and others ACE-GPU: Tackling Choke Point Induced Performance Bottlenecks in a Near-Threshold Computing GPU. In ISLPED (2018).

Digital Library

[20]

Zhang, J. and others ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators. arXiv preprint arXiv:1802.03806 (2018).

[21]

Zhao, W., and Cao, Y. New Generation of Predictive Technology Model for sub-45nm Early Design Exploration. T. Electron Devices 53, 11 (2006), 2816--2823.

Cited By

Leon VHanif MArmeniakos GJiao XShafique MPekmestzi KSoudris D(2025)Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation TechniquesACM Computing Surveys10.1145/371684557:7(1-36)Online publication date: 12-Feb-2025
https://dl.acm.org/doi/10.1145/3716845
Leon VHanif MArmeniakos GJiao XShafique MPekmestzi KSoudris D(2025)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/371168357:7(1-36)Online publication date: 20-Feb-2025
https://dl.acm.org/doi/10.1145/3711683
Gundaala RK S(2024)Design of Variation Tolerant Near Threshold Processor Using Artificial Ecosystem Optimizer with Hybrid Deep LearningJournal of Machine and Computing10.53759/7669/jmc202404078(841-852)Online publication date: 5-Oct-2024
https://doi.org/10.53759/7669/jmc202404078
Show More Cited By

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

June 2019

1378 pages

ISBN:9781450367257

DOI:10.1145/3316781

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

DAC '19

Sponsor:

SIGDA

DAC '19: The 56th Annual Design Automation Conference 2019

June 2 - 6, 2019

NV, Las Vegas, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
416
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Leon VHanif MArmeniakos GJiao XShafique MPekmestzi KSoudris D(2025)Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation TechniquesACM Computing Surveys10.1145/371684557:7(1-36)Online publication date: 12-Feb-2025
https://dl.acm.org/doi/10.1145/3716845
Leon VHanif MArmeniakos GJiao XShafique MPekmestzi KSoudris D(2025)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/371168357:7(1-36)Online publication date: 20-Feb-2025
https://dl.acm.org/doi/10.1145/3711683
Gundaala RK S(2024)Design of Variation Tolerant Near Threshold Processor Using Artificial Ecosystem Optimizer with Hybrid Deep LearningJournal of Machine and Computing10.53759/7669/jmc202404078(841-852)Online publication date: 5-Oct-2024
https://doi.org/10.53759/7669/jmc202404078
Chamberlin AGerber APalmer MGoodale TGundi NChakraborty KRoy S(2024)Understanding Timing Error Characteristics from Overclocked Systolic Multiply–Accumulate Arrays in FPGAsJournal of Low Power Electronics and Applications10.3390/jlpea1401000414:1(4)Online publication date: 9-Jan-2024
https://doi.org/10.3390/jlpea14010004
Gundi NRoy SChakraborty K(2024)STRIVE: Empowering a Low Power Tensor Processing Unit with Fault Detection and Error ResilienceACM Transactions on Design Automation of Electronic Systems10.1145/3705003Online publication date: 2-Dec-2024
https://doi.org/10.1145/3705003
Miller TDurlik IKostecka EMitan-Zalewska PSokołowska SCembrowska-Lech DŁobodzińska A(2023)Advancements in Artificial Intelligence Circuits and Systems (AICAS)Electronics10.3390/electronics1301010213:1(102)Online publication date: 26-Dec-2023
https://doi.org/10.3390/electronics13010102
Nguyen NAhmed AAbdallah ADang K(2023)Power-Aware Neuromorphic Architecture With Partial Voltage Scaling 3-D Stacking Synaptic MemoryIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.331823131:12(2016-2029)Online publication date: Dec-2023
https://doi.org/10.1109/TVLSI.2023.3318231
Xue XLiu CLiu BHuang HWang YLuo TZhang LLi HLi X(2023)Exploring Winograd Convolution for Cost-Effective Neural Network Fault ToleranceIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.330689431:11(1763-1773)Online publication date: Nov-2023
https://doi.org/10.1109/TVLSI.2023.3306894
Bahoo AAkbari OShafique M(2023)An Energy-Efficient Generic Accuracy Configurable Multiplier Based on Block-Level Voltage OverscalingIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.327941911:4(851-867)Online publication date: Oct-2023
https://doi.org/10.1109/TETC.2023.3279419
Huang HXue XLiu CWang YLuo TCheng LLi HLi X(2023)Statistical Modeling of Soft Error Influence on Neural NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.326640542:11(4152-4163)Online publication date: Nov-2023
https://doi.org/10.1109/TCAD.2023.3266405
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten