research-article

Dyhard-DNN: even more DNN acceleration with dynamic hardware reconfiguration

Authors:

Swagath Venkataramani,

Schuyler Eldridge,

Alper Buyuktosunoglu,

Mircea StanAuthors Info & Claims

DAC '18: Proceedings of the 55th Annual Design Automation Conference

Article No.: 14, Pages 1 - 6

https://doi.org/10.1145/3195970.3196033

Published: 24 June 2018 Publication History

Abstract

Deep Neural Networks (DNNs) have demonstrated their utility across a wide range of input data types, usable across diverse computing substrates, from edge devices to datacenters. This broad utility has resulted in myriad hardware accelerator architectures. However, DNNs exhibit significant heterogeneity in their computational characteristics, e.g., feature and kernel dimensions, and dramatic variances in computational intensity, even between adjacent layers in one DNN. Consequently, accelerators with static hardware parameters run sub-optimally and leave energy-efficiency margins unclaimed. We propose DyHard-DNNs, where accelerator microarchitectural parameters are dynamically reconfigured during DNN execution to significantly improve metrics of interest. We demonstrate the effectiveness of this approach on a configurable SIMD 2D systolic array and show a 15--65% performance improvement (at iso-power) and 25--90% energy improvement (at iso-latency) over the best static configuration in six mainstream DNN workloads.

References

[1]

Albericio, J. et al. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In 43rd ISCA, Seoul, South Korea, June 18-22, 2016.

Digital Library

[2]

Chakradhar, S.T. et al. 2010. A dynamically configurable coprocessor for convolutional neural networks. In 37th ISCA, June 19-23, 2010, Saint-Malo, France.

Digital Library

[3]

Chen, Y. et al. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. J. Solid-State Circuits 52, 1 (2017), 127--138.

[4]

Chen, Y. et al. 2014. DaDianNao: A Machine-Learning Supercomputer. In 47th Annual IEEE/ACM MICRO, Cambridge, United Kingdom, December 13-17, 2014.

Digital Library

[5]

Chung, E. et al. 2017. Accelerating Persistent Neural Networks at Datacenter Scale. In Symposium on High Performance Chips (Hot Chips), 2017.

[6]

Du, Z. et al. 2015. ShiDianNao: shifting vision processing closer to the sensor. In 42nd Annual ISCA, Portland, OR, USA, June 13--17, 2015.

Digital Library

[7]

Eldridge, S. et al. 2015. Towards General-Purpose Neural Network Computing. In PACT 2015, San Francisco, CA, USA, October 18--21, 2015.

Digital Library

[8]

Guo, X. et al. 2017. Back to the Future: Digital Circuit Design in the FinFET Era. J. Low Power Electronics 13, 3 (2017), 338--355.

[9]

Han, S. et al. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In 43rd ACM/IEEE ISCA 2016, Seoul, South Korea, June 18-22, 2016.

Digital Library

[10]

Hannun, A.Y. et al. 2014. Deep Speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014). arXiv:1412.5567

[11]

He, K. et al. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE CVPR, Las Vegas, NV, USA, June 27-30, 2016.

[12]

Hinton, G. et al. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29, 6 (Nov 2012), 82--97.

[13]

Jouppi, N.P. et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In 44th ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. 1--12.

Digital Library

[14]

Krizhevsky, A. et al. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS 2012, Lake Tahoe, NV, USA, December 3-6, 2012.

Digital Library

[15]

Li, J. et al 2016. A Persona-Based Neural Conversation Model. In 54th ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.

[16]

Nafkha, A. et al. 2016. Accurate measurement of power consumption overhead during FPGA dynamic partial reconfiguration. In ISWCS 2016.

[17]

Reagen, B. et al. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In ISCA 2016, Seoul, South Korea, June 18-22,2016.

Digital Library

[18]

Silver, D. et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.

[19]

Simonyan, K. et al. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556

[20]

Szegedy, C. et al. 2015. Going deeper with convolutions. In IEEE CVPR 2015, Boston, MA, USA, June 7-12, 2015.

[21]

Venkataramani, S. et al. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In 44th ISCA 2017, Toronto, ON, Canada, June 24-28, 2017.

Digital Library

Cited By

Zhou HLi MWang NMin GWu J(2023)Accelerating Deep Learning Inference via Model Parallelism and Partial Computation OffloadingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322250934:2(475-488)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TPDS.2022.3222509
Hanson ELi SQian XLi HChen Y(2023)DyNNamic: Dynamically Reshaping, High Data-Reuse Accelerator for Compact DNNsIEEE Transactions on Computers10.1109/TC.2022.318427272:3(880-892)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TC.2022.3184272
Kudithipudi DDaram AZyarah AZohora FAimone JYanguas-Gil ASoures NNeftci EMattina MLomonaco VThiem CEpstein B(2023)Design principles for lifelong learning AI acceleratorsNature Electronics10.1038/s41928-023-01054-36:11(807-822)Online publication date: 16-Nov-2023
https://doi.org/10.1038/s41928-023-01054-3
Show More Cited By

Recommendations

DyHard-DNN: Even More DNN Acceleration with Dynamic Hardware Reconfiguration
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
Deep Neural Networks (DNNs) have demonstrated their utility across a wide range of input data types, usable across diverse computing substrates, from edge devices to datacenters. This broad utility has resulted in myriad hardware accelerator ...
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In this paper, we ...
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures
MLHPC'17: Proceedings of the Machine Learning on HPC Environments

Traditionally, Deep Learning (DL) frameworks like Caffe, TensorFlow, and Cognitive Toolkit exploited GPUs to accelerate the training process. This has been primarily achieved by aggressive improvements in parallel hardware as well as through ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '18: Proceedings of the 55th Annual Design Automation Conference

June 2018

1089 pages

ISBN:9781450357005

DOI:10.1145/3195970

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE Council on Electronic Design Automation (CEDA)

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

DAC '18

Sponsor:

EDAC
SIGDA

DAC '18: The 55th Annual Design Automation Conference 2018

June 24 - 29, 2018

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
928
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou HLi MWang NMin GWu J(2023)Accelerating Deep Learning Inference via Model Parallelism and Partial Computation OffloadingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322250934:2(475-488)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TPDS.2022.3222509
Hanson ELi SQian XLi HChen Y(2023)DyNNamic: Dynamically Reshaping, High Data-Reuse Accelerator for Compact DNNsIEEE Transactions on Computers10.1109/TC.2022.318427272:3(880-892)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TC.2022.3184272
Kudithipudi DDaram AZyarah AZohora FAimone JYanguas-Gil ASoures NNeftci EMattina MLomonaco VThiem CEpstein B(2023)Design principles for lifelong learning AI acceleratorsNature Electronics10.1038/s41928-023-01054-36:11(807-822)Online publication date: 16-Nov-2023
https://doi.org/10.1038/s41928-023-01054-3
Kao SKwon HPellauer MParashar AKrishna T(2022)A Formalism of DNN Accelerator FlexibilityProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35309076:2(1-23)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3530907
Lichtenau CBuyuktosunoglu ABertran RFiguli PJacobi CPapandreou NPozidis HSaporito ASica ATzortzatos ESalapura VZahran MChong FTang L(2022)AI accelerator on IBM Telum processorProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3533042(1012-1028)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3533042
Li MWang NZhou HDuan YWu J(2022)Fused-Layer-based DNN Model Parallelism and Partial Computation OffloadingGLOBECOM 2022 - 2022 IEEE Global Communications Conference10.1109/GLOBECOM48099.2022.10000779(5195-5200)Online publication date: 4-Dec-2022
https://doi.org/10.1109/GLOBECOM48099.2022.10000779
Cheng WLiu XWu HPai HChung P(2021)Reconfigurable Architecture and Dataflow for Memory Traffic Minimization of CNNs ComputationMicromachines10.3390/mi1211136512:11(1365)Online publication date: 5-Nov-2021
https://doi.org/10.3390/mi12111365
Hu XZhao YDeng LLiang LZuo PYe JLin YXie Y(2021)Practical Attacks on Deep Neural Networks by Memory TrojaningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.299534740:6(1230-1243)Online publication date: Jun-2021
https://doi.org/10.1109/TCAD.2020.2995347
Venkataramani SSrinivasan VWang WSen SZhang JAgrawal AKar MJain SMannari ATran HLi YOgawa EIshizaki KInoue HSchaal MSerrano MChoi JSun XWang NChen CAllain ABonano JCao NCasatuta RCohen MFleischer BGuillorn MHaynie HJung JKang MKim KKoswatta SLee SLutz MMueller SOh JRanjan ARen ZRider SSchelm KScheuermann MSilberman JYang JZalani VZhang XZhou CZiegler MShah VOhara MLu PCurran BShukla SChang LGopalakrishnan KMartínez JDuato JJohn L(2021)RaPiDProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00021(153-166)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00021
Ma YCao YVrudhula SSeo J(2020)Performance Modeling for CNN Inference Accelerators on FPGAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.289763439:4(843-856)Online publication date: Apr-2020
https://doi.org/10.1109/TCAD.2019.2897634
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten