Abstract
In recent years, the convolutional neural networks (CNNs) have been applied to many fields due to its high performance for extracting complex features. However, these CNNs models are robust but come at the cost of lots of computational complexity. As a result, a bunch of studies researched the various architectures and data flows for optimizing the throughput and energy efficiency. This paper presents a reused data flow and a shift and difference-add booth multiplier to reduce energy consumption. The evaluation result uses the pre-trained VGG16 model with a batch size of three as a benchmark. The result shows that the proposed design reduces the number of state toggles in the booth multiplier by 1.96 times and reduces the DRAM and global buffer accesses to 61.6% and 74.7% as prior work, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00034-021-01751-4/MediaObjects/34_2021_1751_Fig11_HTML.png)
Similar content being viewed by others
Data Availability
The input dataset is publicly available and detailed output data are given in the manuscript.
References
A. Anderson, A. Vasudevan, C. Keane, D. Gregg, Low-memory GEMM-based convolution algorithms for deep neural networks. “arXiv preprint arXiv:1709.03395 ” (2017)
M. Barakat, W. Saad, M. Shokair, Implementation of efficient multiplier for high speed applications using FPGA, in 2018 13th International Conference on Computer Engineering and Systems (ICCES), pp. 211–214 (2018)
Y. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient accelerator for deep convolutional neural networks. IEEE J. Solid State Circuits 52, 127–138 (2017)
D. Esposito, A.G.M. Strollo, M. Alioto, Low-power approximate MAC unit, in 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME) (2017), pp. 81–84
K. Guo, L. Sui, J. Qiu, S. Yao, S. Han, Y. Wang, H. Yang, Angel-Eye: A complete design flow for mapping CNN onto customized hardware, in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)) (2016), pp. 24–29
G. Jha, E. John, Performance analysis of single-precision floating-point MAC for deep learning, in 2018 IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS) (2018), pp. 885–888
N. Kaur, R. Patial, Implementation of modified booth multiplier using pipeline technique on FPGA. Int. J. Comput. Appl. 68, 38–41 (2013)
D. Lin, S. Talathi, S. Annapureddy, Fixed point quantization of deep convolutional networks, in Proceedings of The 33rd International Conference on Machine Learning, vol. 45 (2016), pp. 2849–2858
W.-J. Li, S.-J. Ruan, D.-S. Yang, Implementation of energy-efficient fast convolution algorithm for deep convolutional neural networks based on FPGA. Electron. Lett. 56, 485–488 (2020)
M. Peemen, A.A.A. Setio, B. Mesman, H. Corporaal, Memory-centric accelerator design for convolutional neural networks, in 2013 IEEE 31st International Conference on Computer Design (ICCD) (2013), pp. 13–19
V. Peluso, A. Calimera, Weak-MAC: Arithmetic relaxation for dynamic energy-accuracy scaling in ConvNets, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS) (2018), pp. 1–5
T. Sheng, C. Feng, S. Zhuo, X. Zhang, L. Shen, M. Aleksic, A quantization-friendly separable convolution for MobileNets, in 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) (2018), pp. 14–18
D. Srinu, Implementation of high speed signed multiplier using compressor. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 3, 8096–8106 (2014)
Y. Wang, J. Lin, Z. Wang, FPAP: a folded architecture for efficient computing of convolutional neural networks, in 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2018), pp. 503–508
Acknowledgements
This research did not receive any specific Grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, ZD., Ruan, SJ. & Yan, BK. Accumulation-Aware Shift and Difference-Add Booth Multiplier for Energy-Efficient Convolutional Neural Network Inference. Circuits Syst Signal Process 40, 6050–6066 (2021). https://doi.org/10.1007/s00034-021-01751-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01751-4