skip to main content
research-article
Open access

EncoDeep: Realizing Bit-flexible Encoding for Deep Neural Networks

Published: 29 September 2020 Publication History

Abstract

This article proposes EncoDeep, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. EncoDeep incorporates nonlinear encoding to the computation flow of neural networks to save memory. The encoded features demand significantly lower storage compared to the raw full-precision activation values; therefore, the execution flow of EncoDeep hardware engine is completely performed within the FPGA using on-chip streaming buffers with no access to the off-chip DRAM. We further propose a fully automated optimization algorithm that determines the flexible encoding bitwidths across network layers. EncoDeep full-stack framework comprises a compiler that takes a high-level Python description of an arbitrary neural network. The compiler then instantiates the corresponding elements from EncoDeep Hardware library for FPGA implementation. Our evaluations on MNIST, SVHN, and CIFAR-10 datasets demonstrate an average of 4.65Ć— throughput improvement compared to stand-alone weight encoding. We further compare EncoDeep with six FPGA accelerators on ImageNet, showing an average of 3.6Ć— and 2.54Ć— improvement in throughput and performance-per-watt, respectively.

References

[1]
2018. CHaiDNN: HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs. https://github.com/Xilinx/CHaiDNN.
[2]
Z. Cai, X. He, J. Sun, and N. Vasconcelos. 2017. Deep learning with low precision by half-wave Gaussian quantization. arXiv:1702.00953 (2017).
[3]
Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxin Liu, Ming Wu, and Lintao Zhang. 2019. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In Proceedings of the International Symposium on Field-programmable Gate Arrays. ACM, 63--72.
[4]
W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. 2015. Compressing neural networks with the hashing trick. CoRR, abs/1504.04788 (2015).
[5]
Ahmed T. Elthakeb, Prannoy Pilligundla, Amir Yazdanbakhsh, Sean Kinzer, and Hadi Esmaeilzadeh. 2018. Releq: A reinforcement learning approach for deep quantization of neural networks. arXiv preprint arXiv:1811.01704 (2018).
[6]
Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPEDā€™19). IEEE, 1--6.
[7]
Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Towards collaborative intelligence friendly architectures for deep learning. In Proceedings of the 20th International Symposium on Quality Electronic Design (ISQEDā€™19). IEEE, 14--19.
[8]
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 1--14.
[9]
Joshua Fromm, Shwetak Patel, and Matthai Philipose. 2018. Heterogeneous bitwidth binarization in convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 4006--4015.
[10]
M. Ghasemzadeh, M. Samragh, and F. Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 57--64.
[11]
Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, and Hadi Esmaeilzadeh. 2019. Mixed-signal charge-domain acceleration of deep neural networks through interleaved bit-partitioned arithmetic. arXiv preprint arXiv:1906.11915 (2019).
[12]
Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, and Hadi Esmaeilzadeh. 2020. Bit-parallel vector composability for neural acceleration. arXiv preprint arXiv:2004.05333 (2020).
[13]
S. Han, H. Mao, and W. J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[14]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1135--1143.
[15]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.
[16]
Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866 (2018).
[17]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision (ICCVā€™17), Vol. 2.
[18]
Ali HeydariGorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Alves, and Pai H. Chou. 2020. STANNIS: Low-power acceleration of deep NeuralNetwork training using computational storage. arXiv preprint arXiv:2002.07215 (2020).
[19]
Morteza Hosseini, Mark Horton, Hiren Paneliya, Uttej Kallakuri, Houman Homayoun, and Tinoosh Mohsenin. 2019. On the complexity reduction of dense layers from O (N 2) to O (NlogN) with cyclic sparsely connected layers. In Proceedings of the Design Automation Conference. IEEE, 1--6.
[20]
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[21]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 1 (2017), 6869--6898.
[22]
Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, and Farinaz Koushanfar. 2019. FastWave: Accelerating autoregressive convolutional neural networks on FPGA. In Proceedings of the IEEE/ACM International Conference on Computer-aided Design (ICCADā€™19). IEEE, 1--8.
[23]
Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, and Tajana Rosing. 2018. RAPIDNN: In-memory deep neural network acceleration framework. arXiv preprint arXiv:1806.05794 (2018).
[24]
Mojan Javaheripi, Bita Darvish Rouhani, and Farinaz Koushanfar. 2019. SWNet: Small-world neural networks and rapid convergence. arXiv preprint arXiv:1904.04862 (2019).
[25]
Mojan Javaheripi, Mohammad Samragh, Tara Javidi, and Farinaz Koushanfar. 2020. GeneCAI: Genetic evolution for acquiring compact AI. arXiv preprint arXiv:2004.04249 (2020).
[26]
Mojan Javaheripi, Mohammad Samragh, and Farinaz Koushanfar. 2019. Peeking into the black box: A tutorial on automated design optimization and parameter search. IEEE Solid-state Circ. Mag. 11, 4 (2019), 23--28.
[27]
Chunhui Jiang, Guiying Li, Chao Qian, and Ke Tang. 2018. Efficient DNN neuron pruning by minimizing layer-wise nonlinear reconstruction error. In Proceedings of the International Joint Conference on Artificial Intelligence. 2--2.
[28]
J. Kiefer, J. Wolfowitz, et al. 1952. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23, 3 (1952), 462--466.
[29]
Y. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).
[30]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1097--1105.
[31]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.
[32]
D. Li, X. Wang, and D. Kong. 2017. DeepRebirth: Accelerating deep neural network execution on mobile devices. arXiv preprint:1708.04728 (2017).
[33]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv:1608.08710 (2016).
[34]
Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime neural pruning. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2181--2191.
[35]
Shaohui Lin, Rongrong Ji, Yuchao Li, Yongjian Wu, Feiyue Huang, and Baochang Zhang. 2018. Accelerating convolutional networks via global 8 dynamic filter pruning. In Proceedings of the International Joint Conference on Artificial Intelligence. 2425--2432.
[36]
X. Lin, C. Zhao, and W. Pan. 2017. Towards accurate binary convolutional neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 345--353.
[37]
Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, and Y. Xu. 2017. Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10, 3 (2017), 17.
[38]
Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 17--25.
[39]
Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 5058--5066.
[40]
N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 (2017).
[41]
A. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr. 2017. WRPN: Wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017).
[42]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525--542.
[43]
M. Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. 2019. XONN: XNOR-based oblivious deep neural network inference. In Proceedings of the 28th USENIX Security Symposium (USENIX Securityā€™19). 1501--1518.
[44]
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
[45]
M. Samragh, M. Ghasemzadeh, and F. Koushanfar. 2017. Customizing neural networks for efficient FPGA implementation. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. 85--92.
[46]
M. Samragh, M. Imani, F. Koushanfar, and T. Rosing. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATEā€™17). IEEE, 1775--1780.
[47]
Mohammad Samragh, Mojan Javaheripi, and Farinaz Koushanfar. 2019. AutoRank: Automated rank selection for effective neural network customization. In Proceedings of the ML-for-Systems Workshop at the 46th International Symposium on Computer Architecture (ISCAā€™19).
[48]
H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 17.
[49]
Colin Shea and Tinoosh Mohsenin. 2019. Heterogeneous scheduling of deep neural networks for low-power real-time designs. ACM J. Emerg. Technol. Comput. Syst. 15, 4 (2019), 1--31.
[50]
C. Shea, A. Page, and T. Mohsenin. 2018. SCALENet: A SCalable low power AccELerator for real-time embedded deep neural networks. In Proceedings of the Great Lakes Symposium on VLSI. ACM, 129--134.
[51]
Y. Shen, M. Ferdman, and P. Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the International Symposium on Computer Architecture. 535--547.
[52]
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. Seo, and Y. Cao. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the International Symposium on Field-programmable Gate Arrays. 16--25.
[53]
Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the International Symposium on Field-programmable Gate Arrays. IEEE, 65--74.
[54]
Erwei Wang, James J. Davis, Peter Y. K. Cheung, and George A. Constantinides. 2019. LUTNet: Rethinking inference in FPGA soft logic. In Proceedings of the IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCMā€™19). IEEE, 26--34.
[55]
Huan Wang, Qiming Zhang, Yuehai Wang, and Haoji Hu. 2017. Structured probabilistic pruning for convolutional neural network acceleration. arXiv preprint arXiv:1709.06994 (2017).
[56]
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Mach. Learn. 8, 3--4 (1992), 279--292.
[57]
W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2074--2082.
[58]
T. Yang, A. Howard, B. Chen, X. Zhang, A. Go, V. Sze, and H. Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. arXiv preprint arXiv:1804.03230 (2018).
[59]
C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the International Symposium on Low Power Electronics and Design. 326--331.
[60]
S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).
[61]
Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. 2018. Towards effective low-bitwidth convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7920--7928.

Cited By

View all
  • (2022)š¯–§š¯—’š¯–£š¯–±š¯–¤š¯– : Utilizing Hyperdimensional Computing for a More Robust and Efficient Machine Learning SystemACM Transactions on Embedded Computing Systems10.1145/352406721:6(1-25)Online publication date: 18-Oct-2022
  • (2021)HyDREA: Towards More Robust and Efficient Machine Learning Systems with Hyperdimensional Computing2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474218(723-728)Online publication date: 1-Feb-2021
  • (2021)Binary Precision Neural Network Manycore AcceleratorACM Journal on Emerging Technologies in Computing Systems10.1145/342313617:2(1-27)Online publication date: 5-Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 19, Issue 6
Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing Compilers
November 2020
271 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3427195
Issueā€™s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 29 September 2020
Online AM: 07 May 2020
Accepted: 01 March 2020
Revised: 01 March 2020
Received: 01 November 2019
Published in TECS Volume 19, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Resource-customized computing
  2. automated optimization
  3. neural network customization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)251
  • Downloads (Last 6 weeks)38
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)š¯–§š¯—’š¯–£š¯–±š¯–¤š¯– : Utilizing Hyperdimensional Computing for a More Robust and Efficient Machine Learning SystemACM Transactions on Embedded Computing Systems10.1145/352406721:6(1-25)Online publication date: 18-Oct-2022
  • (2021)HyDREA: Towards More Robust and Efficient Machine Learning Systems with Hyperdimensional Computing2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474218(723-728)Online publication date: 1-Feb-2021
  • (2021)Binary Precision Neural Network Manycore AcceleratorACM Journal on Emerging Technologies in Computing Systems10.1145/342313617:2(1-27)Online publication date: 5-Apr-2021
  • (2021)A Survey of Open-source Tools for FPGA-based Inference of Artificial Neural Networks2021 Ivannikov Memorial Workshop (IVMEM)10.1109/IVMEM53963.2021.00015(50-56)Online publication date: Sep-2021
  • (2021)A Computational Stack for Cross-Domain Acceleration2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00015(54-70)Online publication date: Feb-2021
  • (2020)Divide and conquerProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525208(2880-2891)Online publication date: 13-Jul-2020
  • (2020)Runtime Deep Model Multiplexing for Reduced Latency and Energy Consumption Inference2020 IEEE 38th International Conference on Computer Design (ICCD)10.1109/ICCD50377.2020.00053(263-270)Online publication date: Oct-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media