skip to main content
10.1145/3194554.3194625acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

Published: 30 May 2018 Publication History

Abstract

Both industry and academia have extensively investigated hardware accelerations. To address the demands in increasing computational capability and memory requirement, in this work, we propose the structured weight matrices (SWM)-based compression technique for both Field Programmable Gate Array (FPGA) and application-specific integrated circuit (ASIC) implementations. In the algorithm part, the SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O(n2) to O(nlog n) and storage complexity from O(n2) to O(n) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the ESE accelerator. For ASIC implementations, the proposed SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.

References

[1]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et almbox. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.
[2]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits Vol. 52, 1 (2017), 127--138.
[3]
Adam Coates, Honglak Lee, and Andrew Y Ng. 2010. An analysis of single-layer networks in unsupervised feature learning. Ann Arbor Vol. 1001, 48109 (2010), 2.
[4]
Li Deng. 2012. The MNIST database of handwritten digit images for machine learning research {best of the web}. IEEE Signal Processing Magazine Vol. 29, 6 (2012), 141--142.
[5]
Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, et almbox. 2017. CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Micro architecture. ACM, 395--408.
[6]
Steve K Esser, Rathinakumar Appuswamy, Paul Merolla, John V Arthur, and Dharmendra S Modha. 2015. Back propagation for energy-efficient neuromorphic computing Advances in Neural Information Processing Systems. 1117--1125.
[7]
Steven K Esser, Paul A Merolla, John V Arthur, Andrew S Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J Berg, Jeffrey L McKinstry, Timothy Melano, Davis R Barch, et almbox. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences (PNAS) (2016), 201604850.
[8]
Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. 2014. A 240 g-ops/s mobile coprocessor for deep neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 682--687.
[9]
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et almbox. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. FPGA. 75--84.
[10]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254.
[11]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[13]
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et almbox. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine Vol. 29, 6 (2012), 82--97.
[14]
Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce Cheng-Yue, et almbox. 2015. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716 (2015).
[15]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et almbox. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (published in ACM ISCA 2017) (2017).
[16]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.
[17]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.
[19]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE Vol. 86, 11 (1998), 2278--2324.
[20]
Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker, Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, et almbox. 1995. Comparison of learning algorithms for handwritten digit recognition ICANN, Vol. Vol. 60. Perth, Australia, 53--60.
[21]
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning. ACM, 609--616.
[22]
Ji Li, Ao Ren, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, and Yanzhi Wang. 2017 a. Towards acceleration of deep convolutional neural networks using stochastic computing The 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE.
[23]
Ji Li, Zihao Yuan, Zhe Li, Caiwen Ding, Ao Ren, Qinru Qiu, Jeffrey Draper, and Yanzhi Wang. 2017 b. Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks. arXiv preprint arXiv:1703.04135 (2017).
[24]
Ji Li, Zihao Yuan, Zhe Li, Ao Ren, Caiwen Ding, Jeffrey Draper, Shahin Nazarian, Qinru Qiu, Bo Yuan, and Yanzhi Wang. 2017 c. Normalization and dropout for stochastic computing-based deep convolutional neural networks. Integration, the VLSI Journal (2017).
[25]
Konstantinos Makantasis, Konstantinos Karantzalos, Anastasios Doulamis, and Nikolaos Doulamis. 2015. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International. IEEE, 4959--4962.
[26]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning NIPS workshop on deep learning and unsupervised feature learning, Vol. Vol. 2011. 5.
[27]
Victor Pan. 2012. Structured matrices and polynomials: unified superfast algorithms. Springer Science & Business Media.
[28]
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et almbox. 2016. Going deeper with embedded fpga platform for convolutional neural network Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26--35.
[29]
Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 405--418.
[30]
A. Ren, Z. Li, Y. Wang, Q. Qiu, and B. Yuan. 2016. Designing Reconfigurable Large-Scale Deep Learning Systems Using Stochastic Computing. Proc. of ICRC (2016).
[31]
Hacsim Sak, Andrew Senior, and Franccoise Beaufays. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Fifteenth annual conference of the international speech communication association.
[32]
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks Vol. 61 (2015), 85--117.
[33]
Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, et almbox. 2015. Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067 (2015).
[34]
Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018 b. C-LSTM: Enabling Efficient LSTM using Structured CompressionTechniques on FPGAs Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM.
[35]
Yanzhi Wang, Caiwen Ding, Geng Yuan, Siyu Liao, Zhe Li, Xiaolong Ma, Bo Yuan, Xuehai Qian, Jian Tang, Qinru Qiu, and Xue Lin. 2018 a. Towards ultra-high performance and energy efficiency of deep learning systems: an algorithm-hardware co-optimization framework AAAI Conference on Artificial Intelligence, (AAAI-18). AAAI.
[36]
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 548--560.
[37]
Zihao Yuan, Ji Li, Zhe Li, Caiwen Ding, Ao Ren, Bo Yuan, Qinru Qiu, Jeffrey Draper, and Yanzhi Wang. 2017. Softmax Regression Design for Stochastic Computing Based Deep Convolutional Neural Networks. In Proceedings of the Great Lakes Symposium on VLSI. ACM, 467--470.
[38]
Liang Zhao, Siyu Liao, Yanzhi Wang, Jian Tang, and Bo Yuan. {n. d.}. Theoretical properties for neural networks with weight matrices of low displacement rank 2017 International Conference on Machine Learning (ICML) (available on ArXiv). ACM/IEEE.

Cited By

View all
  • (2024)A Design of Inference Engine for Recurrent Neural Networks Using Block-Circulant Weight Matrices Based on a SoC-FPGA Approach2024 10th International Conference on Applied System Innovation (ICASI)10.1109/ICASI60819.2024.10547870(131-133)Online publication date: 17-Apr-2024
  • (2024)Innovative Insights: A Review of Deep Learning Methods for Enhanced Video CompressionIEEE Access10.1109/ACCESS.2024.345081412(125706-125725)Online publication date: 2024
  • (2024)A multi-agent reinforcement learning based approach for automatic filter pruningScientific Reports10.1038/s41598-024-82562-w14:1Online publication date: 28-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI
May 2018
533 pages
ISBN:9781450357241
DOI:10.1145/3194554
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator
  2. asic
  3. deep learning
  4. fpga
  5. structured weight matrices

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation Awards

Conference

GLSVLSI '18
Sponsor:
GLSVLSI '18: Great Lakes Symposium on VLSI 2018
May 23 - 25, 2018
IL, Chicago, USA

Acceptance Rates

GLSVLSI '18 Paper Acceptance Rate 48 of 197 submissions, 24%;
Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25
Great Lakes Symposium on VLSI 2025
June 30 - July 2, 2025
New Orleans , LA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Design of Inference Engine for Recurrent Neural Networks Using Block-Circulant Weight Matrices Based on a SoC-FPGA Approach2024 10th International Conference on Applied System Innovation (ICASI)10.1109/ICASI60819.2024.10547870(131-133)Online publication date: 17-Apr-2024
  • (2024)Innovative Insights: A Review of Deep Learning Methods for Enhanced Video CompressionIEEE Access10.1109/ACCESS.2024.345081412(125706-125725)Online publication date: 2024
  • (2024)A multi-agent reinforcement learning based approach for automatic filter pruningScientific Reports10.1038/s41598-024-82562-w14:1Online publication date: 28-Dec-2024
  • (2024)A Low-Power Analog Bell-Shaped Classifier Based on Parallel-Connected Gaussian Function CircuitsFrontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications10.1007/978-981-99-9836-4_34(459-470)Online publication date: 25-Feb-2024
  • (2023)Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVMCircuits, Systems, and Signal Processing10.1007/s00034-023-02412-442:11(6660-6683)Online publication date: 8-Jun-2023
  • (2022)Hardware-friendly compression and hardware acceleration for transformer: A surveyElectronic Research Archive10.3934/era.202219230:10(3755-3785)Online publication date: 2022
  • (2022)A Configurable and Fully Synthesizable RTL-Based Convolutional Neural Network for Biosensor ApplicationsSensors10.3390/s2207245922:7(2459)Online publication date: 23-Mar-2022
  • (2022)Space-efficient optical computing with an integrated chip diffractive neural networkNature Communications10.1038/s41467-022-28702-013:1Online publication date: 24-Feb-2022
  • (2022)Gaussian Mixture Model classifier analog integrated low-power implementation with applications in fault management detectionMicroelectronics Journal10.1016/j.mejo.2022.105510126(105510)Online publication date: Aug-2022
  • (2021)TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474235(926-931)Online publication date: 1-Feb-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media