Skip to main content

Advertisement

Log in

Accumulation-Aware Shift and Difference-Add Booth Multiplier for Energy-Efficient Convolutional Neural Network Inference

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In recent years, the convolutional neural networks (CNNs) have been applied to many fields due to its high performance for extracting complex features. However, these CNNs models are robust but come at the cost of lots of computational complexity. As a result, a bunch of studies researched the various architectures and data flows for optimizing the throughput and energy efficiency. This paper presents a reused data flow and a shift and difference-add booth multiplier to reduce energy consumption. The evaluation result uses the pre-trained VGG16 model with a batch size of three as a benchmark. The result shows that the proposed design reduces the number of state toggles in the booth multiplier by 1.96 times and reduces the DRAM and global buffer accesses to 61.6% and 74.7% as prior work, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The input dataset is publicly available and detailed output data are given in the manuscript.

References

  1. A. Anderson, A. Vasudevan, C. Keane, D. Gregg, Low-memory GEMM-based convolution algorithms for deep neural networks. “arXiv preprint arXiv:1709.03395 ” (2017)

  2. M. Barakat, W. Saad, M. Shokair, Implementation of efficient multiplier for high speed applications using FPGA, in 2018 13th International Conference on Computer Engineering and Systems (ICCES), pp. 211–214 (2018)

  3. Y. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient accelerator for deep convolutional neural networks. IEEE J. Solid State Circuits 52, 127–138 (2017)

    Article  Google Scholar 

  4. D. Esposito, A.G.M. Strollo, M. Alioto, Low-power approximate MAC unit, in 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME) (2017), pp. 81–84

  5. K. Guo, L. Sui, J. Qiu, S. Yao, S. Han, Y. Wang, H. Yang, Angel-Eye: A complete design flow for mapping CNN onto customized hardware, in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)) (2016), pp. 24–29

  6. G. Jha, E. John, Performance analysis of single-precision floating-point MAC for deep learning, in 2018 IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS) (2018), pp. 885–888

  7. N. Kaur, R. Patial, Implementation of modified booth multiplier using pipeline technique on FPGA. Int. J. Comput. Appl. 68, 38–41 (2013)

    Google Scholar 

  8. D. Lin, S. Talathi, S. Annapureddy, Fixed point quantization of deep convolutional networks, in Proceedings of The 33rd International Conference on Machine Learning, vol. 45 (2016), pp. 2849–2858

  9. W.-J. Li, S.-J. Ruan, D.-S. Yang, Implementation of energy-efficient fast convolution algorithm for deep convolutional neural networks based on FPGA. Electron. Lett. 56, 485–488 (2020)

    Article  Google Scholar 

  10. M. Peemen, A.A.A. Setio, B. Mesman, H. Corporaal, Memory-centric accelerator design for convolutional neural networks, in 2013 IEEE 31st International Conference on Computer Design (ICCD) (2013), pp. 13–19

  11. V. Peluso, A. Calimera, Weak-MAC: Arithmetic relaxation for dynamic energy-accuracy scaling in ConvNets, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) (2018), pp. 1–5

  12. T. Sheng, C. Feng, S. Zhuo, X. Zhang, L. Shen, M. Aleksic, A quantization-friendly separable convolution for MobileNets, in 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) (2018), pp. 14–18

  13. D. Srinu, Implementation of high speed signed multiplier using compressor. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 3, 8096–8106 (2014)

    Google Scholar 

  14. Y. Wang, J. Lin, Z. Wang, FPAP: a folded architecture for efficient computing of convolutional neural networks, in 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2018), pp. 503–508

Download references

Acknowledgements

This research did not receive any specific Grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shanq-Jang Ruan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, ZD., Ruan, SJ. & Yan, BK. Accumulation-Aware Shift and Difference-Add Booth Multiplier for Energy-Efficient Convolutional Neural Network Inference. Circuits Syst Signal Process 40, 6050–6066 (2021). https://doi.org/10.1007/s00034-021-01751-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01751-4

Keywords

Navigation