research-article

Open access

EncoDeep: Realizing Bit-flexible Encoding for Deep Neural Networks

Authors:

Mohammad Samragh,

Mojan Javaheripi,

Farinaz KoushanfarAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 19, Issue 6

Article No.: 43, Pages 1 - 29

https://doi.org/10.1145/3391901

Published: 29 September 2020 Publication History

All formats PDF

Abstract

This article proposes EncoDeep, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. EncoDeep incorporates nonlinear encoding to the computation flow of neural networks to save memory. The encoded features demand significantly lower storage compared to the raw full-precision activation values; therefore, the execution flow of EncoDeep hardware engine is completely performed within the FPGA using on-chip streaming buffers with no access to the off-chip DRAM. We further propose a fully automated optimization algorithm that determines the flexible encoding bitwidths across network layers. EncoDeep full-stack framework comprises a compiler that takes a high-level Python description of an arbitrary neural network. The compiler then instantiates the corresponding elements from EncoDeep Hardware library for FPGA implementation. Our evaluations on MNIST, SVHN, and CIFAR-10 datasets demonstrate an average of 4.65× throughput improvement compared to stand-alone weight encoding. We further compare EncoDeep with six FPGA accelerators on ImageNet, showing an average of 3.6× and 2.54× improvement in throughput and performance-per-watt, respectively.

References

[1]

2018. CHaiDNN: HLS based Deep Neural Network Accelerator Library for Xilinx Ultrascale+ MPSoCs. https://github.com/Xilinx/CHaiDNN.

[2]

Z. Cai, X. He, J. Sun, and N. Vasconcelos. 2017. Deep learning with low precision by half-wave Gaussian quantization. arXiv:1702.00953 (2017).

[3]

Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxin Liu, Ming Wu, and Lintao Zhang. 2019. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In Proceedings of the International Symposium on Field-programmable Gate Arrays. ACM, 63--72.

Digital Library

[4]

W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. 2015. Compressing neural networks with the hashing trick. CoRR, abs/1504.04788 (2015).

[5]

Ahmed T. Elthakeb, Prannoy Pilligundla, Amir Yazdanbakhsh, Sean Kinzer, and Hadi Esmaeilzadeh. 2018. Releq: A reinforcement learning approach for deep quantization of neural networks. arXiv preprint arXiv:1811.01704 (2018).

[6]

Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’19). IEEE, 1--6.

[7]

Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Towards collaborative intelligence friendly architectures for deep learning. In Proceedings of the 20th International Symposium on Quality Electronic Design (ISQED’19). IEEE, 14--19.

[8]

Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 1--14.

Digital Library

[9]

Joshua Fromm, Shwetak Patel, and Matthai Philipose. 2018. Heterogeneous bitwidth binarization in convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 4006--4015.

[10]

M. Ghasemzadeh, M. Samragh, and F. Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 57--64.

[11]

Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, and Hadi Esmaeilzadeh. 2019. Mixed-signal charge-domain acceleration of deep neural networks through interleaved bit-partitioned arithmetic. arXiv preprint arXiv:1906.11915 (2019).

[12]

Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, and Hadi Esmaeilzadeh. 2020. Bit-parallel vector composability for neural acceleration. arXiv preprint arXiv:2004.05333 (2020).

[13]

S. Han, H. Mao, and W. J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[14]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1135--1143.

[15]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.

[16]

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866 (2018).

[17]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the International Conference on Computer Vision (ICCV’17), Vol. 2.

[18]

Ali HeydariGorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Alves, and Pai H. Chou. 2020. STANNIS: Low-power acceleration of deep NeuralNetwork training using computational storage. arXiv preprint arXiv:2002.07215 (2020).

[19]

Morteza Hosseini, Mark Horton, Hiren Paneliya, Uttej Kallakuri, Houman Homayoun, and Tinoosh Mohsenin. 2019. On the complexity reduction of dense layers from O (N 2) to O (NlogN) with cyclic sparsely connected layers. In Proceedings of the Design Automation Conference. IEEE, 1--6.

Digital Library

[20]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[21]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 1 (2017), 6869--6898.

Digital Library

[22]

Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, and Farinaz Koushanfar. 2019. FastWave: Accelerating autoregressive convolutional neural networks on FPGA. In Proceedings of the IEEE/ACM International Conference on Computer-aided Design (ICCAD’19). IEEE, 1--8.

[23]

Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, and Tajana Rosing. 2018. RAPIDNN: In-memory deep neural network acceleration framework. arXiv preprint arXiv:1806.05794 (2018).

[24]

Mojan Javaheripi, Bita Darvish Rouhani, and Farinaz Koushanfar. 2019. SWNet: Small-world neural networks and rapid convergence. arXiv preprint arXiv:1904.04862 (2019).

[25]

Mojan Javaheripi, Mohammad Samragh, Tara Javidi, and Farinaz Koushanfar. 2020. GeneCAI: Genetic evolution for acquiring compact AI. arXiv preprint arXiv:2004.04249 (2020).

[26]

Mojan Javaheripi, Mohammad Samragh, and Farinaz Koushanfar. 2019. Peeking into the black box: A tutorial on automated design optimization and parameter search. IEEE Solid-state Circ. Mag. 11, 4 (2019), 23--28.

[27]

Chunhui Jiang, Guiying Li, Chao Qian, and Ke Tang. 2018. Efficient DNN neuron pruning by minimizing layer-wise nonlinear reconstruction error. In Proceedings of the International Joint Conference on Artificial Intelligence. 2--2.

[28]

J. Kiefer, J. Wolfowitz, et al. 1952. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23, 3 (1952), 462--466.

[29]

Y. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).

[30]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[31]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.

[32]

D. Li, X. Wang, and D. Kong. 2017. DeepRebirth: Accelerating deep neural network execution on mobile devices. arXiv preprint:1708.04728 (2017).

[33]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv:1608.08710 (2016).

[34]

Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime neural pruning. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2181--2191.

[35]

Shaohui Lin, Rongrong Ji, Yuchao Li, Yongjian Wu, Feiyue Huang, and Baochang Zhang. 2018. Accelerating convolutional networks via global 8 dynamic filter pruning. In Proceedings of the International Joint Conference on Artificial Intelligence. 2425--2432.

[36]

X. Lin, C. Zhao, and W. Pan. 2017. Towards accurate binary convolutional neural network. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 345--353.

[37]

Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, and Y. Xu. 2017. Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10, 3 (2017), 17.

Digital Library

[38]

Liqiang Lu, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin, and Yun Liang. 2019. An efficient hardware accelerator for sparse convolutional neural networks on FPGAs. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. IEEE, 17--25.

[39]

Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 5058--5066.

[40]

N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462 (2017).

[41]

A. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr. 2017. WRPN: Wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017).

[42]

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525--542.

[43]

M. Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. 2019. XONN: XNOR-based oblivious deep neural network inference. In Proceedings of the 28th USENIX Security Symposium (USENIX Security’19). 1501--1518.

[44]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).

[45]

M. Samragh, M. Ghasemzadeh, and F. Koushanfar. 2017. Customizing neural networks for efficient FPGA implementation. In Proceedings of the IEEE International Symposium on Field-programmable Custom Computing Machines. 85--92.

[46]

M. Samragh, M. Imani, F. Koushanfar, and T. Rosing. 2017. LookNN: Neural network with no multiplication. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’17). IEEE, 1775--1780.

[47]

Mohammad Samragh, Mojan Javaheripi, and Farinaz Koushanfar. 2019. AutoRank: Automated rank selection for effective neural network customization. In Proceedings of the ML-for-Systems Workshop at the 46th International Symposium on Computer Architecture (ISCA’19).

[48]

H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 17.

[49]

Colin Shea and Tinoosh Mohsenin. 2019. Heterogeneous scheduling of deep neural networks for low-power real-time designs. ACM J. Emerg. Technol. Comput. Syst. 15, 4 (2019), 1--31.

Digital Library

[50]

C. Shea, A. Page, and T. Mohsenin. 2018. SCALENet: A SCalable low power AccELerator for real-time embedded deep neural networks. In Proceedings of the Great Lakes Symposium on VLSI. ACM, 129--134.

[51]

Y. Shen, M. Ferdman, and P. Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the International Symposium on Computer Architecture. 535--547.

[52]

N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. Seo, and Y. Cao. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the International Symposium on Field-programmable Gate Arrays. 16--25.

[53]

Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the International Symposium on Field-programmable Gate Arrays. IEEE, 65--74.

[54]

Erwei Wang, James J. Davis, Peter Y. K. Cheung, and George A. Constantinides. 2019. LUTNet: Rethinking inference in FPGA soft logic. In Proceedings of the IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM’19). IEEE, 26--34.

[55]

Huan Wang, Qiming Zhang, Yuehai Wang, and Haoji Hu. 2017. Structured probabilistic pruning for convolutional neural network acceleration. arXiv preprint arXiv:1709.06994 (2017).

[56]

Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Mach. Learn. 8, 3--4 (1992), 279--292.

Digital Library

[57]

W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2074--2082.

[58]

T. Yang, A. Howard, B. Chen, X. Zhang, A. Go, V. Sze, and H. Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. arXiv preprint arXiv:1804.03230 (2018).

[59]

C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the International Symposium on Low Power Electronics and Design. 326--331.

[60]

S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).

[61]

Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. 2018. Towards effective low-bitwidth convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7920--7928.

Cited By

Morris JErgun KKhaleghi BImani MAksanli BSimunic T(2022)𝖧𝗒𝖣𝖱𝖤𝖠: Utilizing Hyperdimensional Computing for a More Robust and Efficient Machine Learning SystemACM Transactions on Embedded Computing Systems10.1145/352406721:6(1-25)Online publication date: 18-Oct-2022
https://dl.acm.org/doi/10.1145/3524067
Morris JErgun KKhaleghi BImani MAksanli BRosing T(2021)HyDREA: Towards More Robust and Efficient Machine Learning Systems with Hyperdimensional Computing2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474218(723-728)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474218
Hosseini MMohsenin T(2021)Binary Precision Neural Network Manycore AcceleratorACM Journal on Emerging Technologies in Computing Systems10.1145/342313617:2(1-27)Online publication date: 5-Apr-2021
https://dl.acm.org/doi/10.1145/3423136
Show More Cited By

Index Terms

EncoDeep: Realizing Bit-flexible Encoding for Deep Neural Networks
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Reconfigurable computing
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging tools and methodologies
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular Papers

In this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
An FPGA implementation for neural networks with the FDFM processor core approach

This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform

This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 19, Issue 6

Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing Compilers

November 2020

271 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3427195

Editor:
Sandeep K. Shukla
Indian Institute of Technology, India

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 29 September 2020

Online AM: 07 May 2020

Accepted: 01 March 2020

Revised: 01 March 2020

Received: 01 November 2019

Published in TECS Volume 19, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
923
Total Downloads

Downloads (Last 12 months)251
Downloads (Last 6 weeks)38

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Morris JErgun KKhaleghi BImani MAksanli BSimunic T(2022)𝖧𝗒𝖣𝖱𝖤𝖠: Utilizing Hyperdimensional Computing for a More Robust and Efficient Machine Learning SystemACM Transactions on Embedded Computing Systems10.1145/352406721:6(1-25)Online publication date: 18-Oct-2022
https://dl.acm.org/doi/10.1145/3524067
Morris JErgun KKhaleghi BImani MAksanli BRosing T(2021)HyDREA: Towards More Robust and Efficient Machine Learning Systems with Hyperdimensional Computing2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474218(723-728)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474218
Hosseini MMohsenin T(2021)Binary Precision Neural Network Manycore AcceleratorACM Journal on Emerging Technologies in Computing Systems10.1145/342313617:2(1-27)Online publication date: 5-Apr-2021
https://dl.acm.org/doi/10.1145/3423136
Lebedev MBelecky P(2021)A Survey of Open-source Tools for FPGA-based Inference of Artificial Neural Networks2021 Ivannikov Memorial Workshop (IVMEM)10.1109/IVMEM53963.2021.00015(50-56)Online publication date: Sep-2021
https://doi.org/10.1109/IVMEM53963.2021.00015
Kinzer SKim JGhodrati SYatham BAlthoff AMahajan DLerner SEsmaeilzadeh H(2021)A Computational Stack for Cross-Domain Acceleration2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00015(54-70)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00015
Elthakeb APilligundla PMireshghallah FCloninger AEsmaeilzadeh HDaumé HSingh A(2020)Divide and conquerProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525208(2880-2891)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525208
Eshratifar APedram M(2020)Runtime Deep Model Multiplexing for Reduced Latency and Energy Consumption Inference2020 IEEE 38th International Conference on Computer Design (ICCD)10.1109/ICCD50377.2020.00053(263-270)Online publication date: Oct-2020
https://doi.org/10.1109/ICCD50377.2020.00053

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents