research-article

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

Authors:

Yanzhi WangAuthors Info & Claims

GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI

Pages 353 - 358

https://doi.org/10.1145/3194554.3194625

Published: 30 May 2018 Publication History

Abstract

Both industry and academia have extensively investigated hardware accelerations. To address the demands in increasing computational capability and memory requirement, in this work, we propose the structured weight matrices (SWM)-based compression technique for both Field Programmable Gate Array (FPGA) and application-specific integrated circuit (ASIC) implementations. In the algorithm part, the SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O(n²) to O(nlog n) and storage complexity from O(n²) to O(n) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the ESE accelerator. For ASIC implementations, the proposed SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.

References

[1]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et almbox. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.

Digital Library

[2]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits Vol. 52, 1 (2017), 127--138.

[3]

Adam Coates, Honglak Lee, and Andrew Y Ng. 2010. An analysis of single-layer networks in unsupervised feature learning. Ann Arbor Vol. 1001, 48109 (2010), 2.

[4]

Li Deng. 2012. The MNIST database of handwritten digit images for machine learning research {best of the web}. IEEE Signal Processing Magazine Vol. 29, 6 (2012), 141--142.

[5]

Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, et almbox. 2017. CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Micro architecture. ACM, 395--408.

Digital Library

[6]

Steve K Esser, Rathinakumar Appuswamy, Paul Merolla, John V Arthur, and Dharmendra S Modha. 2015. Back propagation for energy-efficient neuromorphic computing Advances in Neural Information Processing Systems. 1117--1125.

Digital Library

[7]

Steven K Esser, Paul A Merolla, John V Arthur, Andrew S Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J Berg, Jeffrey L McKinstry, Timothy Melano, Davis R Barch, et almbox. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences (PNAS) (2016), 201604850.

[8]

Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. 2014. A 240 g-ops/s mobile coprocessor for deep neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 682--687.

Digital Library

[9]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et almbox. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. FPGA. 75--84.

Digital Library

[10]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254.

Digital Library

[11]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[13]

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et almbox. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine Vol. 29, 6 (2012), 82--97.

[14]

Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce Cheng-Yue, et almbox. 2015. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716 (2015).

[15]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et almbox. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (published in ACM ISCA 2017) (2017).

Digital Library

[16]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.

Digital Library

[17]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.

Digital Library

[19]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE Vol. 86, 11 (1998), 2278--2324.

[20]

Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker, Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, et almbox. 1995. Comparison of learning algorithms for handwritten digit recognition ICANN, Vol. Vol. 60. Perth, Australia, 53--60.

[21]

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning. ACM, 609--616.

Digital Library

[22]

Ji Li, Ao Ren, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, and Yanzhi Wang. 2017 a. Towards acceleration of deep convolutional neural networks using stochastic computing The 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE.

[23]

Ji Li, Zihao Yuan, Zhe Li, Caiwen Ding, Ao Ren, Qinru Qiu, Jeffrey Draper, and Yanzhi Wang. 2017 b. Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks. arXiv preprint arXiv:1703.04135 (2017).

[24]

Ji Li, Zihao Yuan, Zhe Li, Ao Ren, Caiwen Ding, Jeffrey Draper, Shahin Nazarian, Qinru Qiu, Bo Yuan, and Yanzhi Wang. 2017 c. Normalization and dropout for stochastic computing-based deep convolutional neural networks. Integration, the VLSI Journal (2017).

[25]

Konstantinos Makantasis, Konstantinos Karantzalos, Anastasios Doulamis, and Nikolaos Doulamis. 2015. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International. IEEE, 4959--4962.

[26]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning NIPS workshop on deep learning and unsupervised feature learning, Vol. Vol. 2011. 5.

[27]

Victor Pan. 2012. Structured matrices and polynomials: unified superfast algorithms. Springer Science & Business Media.

Digital Library

[28]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et almbox. 2016. Going deeper with embedded fpga platform for convolutional neural network Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26--35.

Digital Library

[29]

Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 405--418.

Digital Library

[30]

A. Ren, Z. Li, Y. Wang, Q. Qiu, and B. Yuan. 2016. Designing Reconfigurable Large-Scale Deep Learning Systems Using Stochastic Computing. Proc. of ICRC (2016).

[31]

Hacsim Sak, Andrew Senior, and Franccoise Beaufays. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Fifteenth annual conference of the international speech communication association.

[32]

Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks Vol. 61 (2015), 85--117.

Digital Library

[33]

Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, et almbox. 2015. Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067 (2015).

[34]

Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018 b. C-LSTM: Enabling Efficient LSTM using Structured CompressionTechniques on FPGAs Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM.

Digital Library

[35]

Yanzhi Wang, Caiwen Ding, Geng Yuan, Siyu Liao, Zhe Li, Xiaolong Ma, Bo Yuan, Xuehai Qian, Jian Tang, Qinru Qiu, and Xue Lin. 2018 a. Towards ultra-high performance and energy efficiency of deep learning systems: an algorithm-hardware co-optimization framework AAAI Conference on Artificial Intelligence, (AAAI-18). AAAI.

[36]

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 548--560.

Digital Library

[37]

Zihao Yuan, Ji Li, Zhe Li, Caiwen Ding, Ao Ren, Bo Yuan, Qinru Qiu, Jeffrey Draper, and Yanzhi Wang. 2017. Softmax Regression Design for Stochastic Computing Based Deep Convolutional Neural Networks. In Proceedings of the Great Lakes Symposium on VLSI. ACM, 467--470.

Digital Library

[38]

Liang Zhao, Siyu Liao, Yanzhi Wang, Jian Tang, and Bo Yuan. {n. d.}. Theoretical properties for neural networks with weight matrices of low displacement rank 2017 International Conference on Machine Learning (ICML) (available on ArXiv). ACM/IEEE.

Cited By

Tang SSu CTing C(2024)A Design of Inference Engine for Recurrent Neural Networks Using Block-Circulant Weight Matrices Based on a SoC-FPGA Approach2024 10th International Conference on Applied System Innovation (ICASI)10.1109/ICASI60819.2024.10547870(131-133)Online publication date: 17-Apr-2024
https://doi.org/10.1109/ICASI60819.2024.10547870
Khadir MFarukh Hashmi MKotambkar DGupta A(2024)Innovative Insights: A Review of Deep Learning Methods for Enhanced Video CompressionIEEE Access10.1109/ACCESS.2024.345081412(125706-125725)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3450814
Li ZZuo XSong YLiang DXie Z(2024)A multi-agent reinforcement learning based approach for automatic filter pruningScientific Reports10.1038/s41598-024-82562-w14:1Online publication date: 28-Dec-2024
https://doi.org/10.1038/s41598-024-82562-w
Show More Cited By

Index Terms

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

Recommendations

In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC(Abstract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGAs or ASICs? There is a long-running debate on this. FPGAs are extremely flexible while ASICs offer top efficiency but inflexible. We believe that FPGAs and ASICs are better together, to offer both flexible and efficient solutions. We propose single-...
A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Convolutional neural networks (CNNs) are one of the most successful machine-learning techniques for image, voice, and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI

May 2018

533 pages

ISBN:9781450357241

DOI:10.1145/3194554

General Chair:
Deming Chen
University of Illinois, USA
,
Program Chairs:
Houman Homayoun
George Mason University, USA
,
Baris Taskin
Drexel University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA
IEEE CASS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation Awards

Conference

GLSVLSI '18

Sponsor:

SIGDA

GLSVLSI '18: Great Lakes Symposium on VLSI 2018

May 23 - 25, 2018

IL, Chicago, USA

Acceptance Rates

GLSVLSI '18 Paper Acceptance Rate 48 of 197 submissions, 24%;

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tang SSu CTing C(2024)A Design of Inference Engine for Recurrent Neural Networks Using Block-Circulant Weight Matrices Based on a SoC-FPGA Approach2024 10th International Conference on Applied System Innovation (ICASI)10.1109/ICASI60819.2024.10547870(131-133)Online publication date: 17-Apr-2024
https://doi.org/10.1109/ICASI60819.2024.10547870
Khadir MFarukh Hashmi MKotambkar DGupta A(2024)Innovative Insights: A Review of Deep Learning Methods for Enhanced Video CompressionIEEE Access10.1109/ACCESS.2024.345081412(125706-125725)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3450814
Li ZZuo XSong YLiang DXie Z(2024)A multi-agent reinforcement learning based approach for automatic filter pruningScientific Reports10.1038/s41598-024-82562-w14:1Online publication date: 28-Dec-2024
https://doi.org/10.1038/s41598-024-82562-w
Alimisis VKamperi AEleftheriou NSotiriadis P(2024)A Low-Power Analog Bell-Shaped Classifier Based on Parallel-Connected Gaussian Function CircuitsFrontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications10.1007/978-981-99-9836-4_34(459-470)Online publication date: 25-Feb-2024
https://doi.org/10.1007/978-981-99-9836-4_34
Joseph TBindiya T(2023)Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVMCircuits, Systems, and Signal Processing10.1007/s00034-023-02412-442:11(6660-6683)Online publication date: 8-Jun-2023
https://dl.acm.org/doi/10.1007/s00034-023-02412-4
Huang STang ELi SPing XChen R(2022)Hardware-friendly compression and hardware acceleration for transformer: A surveyElectronic Research Archive10.3934/era.202219230:10(3755-3785)Online publication date: 2022
https://doi.org/10.3934/era.2022192
Kumar PYingge HAli IPu YHwang KYang YJung YHuh HKim SYoo JLee K(2022)A Configurable and Fully Synthesizable RTL-Based Convolutional Neural Network for Biosensor ApplicationsSensors10.3390/s2207245922:7(2459)Online publication date: 23-Mar-2022
https://doi.org/10.3390/s22072459
Zhu HZou JZhang HShi YLuo SWang NCai HWan LWang BJiang XThompson JLuo XZhou XXiao LHuang WPatrick LGu MKwek LLiu A(2022)Space-efficient optical computing with an integrated chip diffractive neural networkNature Communications10.1038/s41467-022-28702-013:1Online publication date: 24-Feb-2022
https://doi.org/10.1038/s41467-022-28702-0
Alimisis VGennis GTouloupas KDimas CGourdouparis MSotiriadis P(2022)Gaussian Mixture Model classifier analog integrated low-power implementation with applications in fault management detectionMicroelectronics Journal10.1016/j.mejo.2022.105510126(105510)Online publication date: Aug-2022
https://doi.org/10.1016/j.mejo.2022.105510
Yuan GBehnam PCai YShafiee AFu JLiao ZLi ZMa XDeng JWang JBojnordi MWang YDing C(2021)TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474235(926-931)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474235
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten