BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator

Chhajed, Harsh; Raut, Gopal; Dhakad, Narendra; Vishwakarma, Sudheer; Vishvakarma, Santosh Kumar

doi:10.1007/s00034-021-01873-9

BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator

Published: 08 January 2022

Volume 41, pages 2045–2060, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Harsh Chhajed¹,
Gopal Raut¹,
Narendra Dhakad¹,
Sudheer Vishwakarma¹ &
…
Santosh Kumar Vishvakarma ORCID: orcid.org/0000-0003-4223-0077¹

954 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

Contemporary hardware implementations of deep neural networks face the burden of excess area requirement due to resource-intensive elements such as a multiplier. A semi-custom ASIC approach-based VLSI circuit design of the multiply-accumulate unit in a deep neural network faces the chip area limitation. Therefore, an area and power-efficient architecture for the multiply-accumulate unit is imperative to down the burden of excess area requirement for digital design exploration. The present work addresses this challenge by proposing an efficient processing and bit-serial computation-based multiply-accumulate unit implementation. The proposed architecture is verified using simulation output and synthesized using Synopsys design vision at 180 nm and 45 nm technology and extracted all physical parameters using Cadence Virtuoso. At 45 nm, design shows 34.35% less area-delay-product (ADP). It shows improvement by 25.94% in area, 35.65% in power dissipation, and 14.30% in latency with respect to the state-of-the-art multiply-accumulate unit design. Furthermore, at lower technology node gets higher leakage power dissipation. In order to save leakage power, we exploit the power-gated design for the proposed architecture. The used coarse-grain power-gating technique saves 52.79% leakage/static power with minimal area overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive review of Binary Neural Network

Article 30 March 2023

A review of convolutional neural network architectures and their optimizations

Article 22 June 2022

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

Article 13 April 2024

Data Availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study, and detailed circuit simulation results are given in the manuscript.

References

S. Abed, Y. Khalil, M. Modhaffar, I. Ahmad, High-performance low-power approximate Wallace tree multiplier. Int. J. Circuit Theory Appl. 46(12), 2334–2348 (2018)
Article Google Scholar
M. Alçın, İ Pehlivan, İ Koyuncu, Hardware design and implementation of a novel ANN-based chaotic generator in FPGA. Optik 127(13), 5500–5505 (2016)
Article Google Scholar
S. Anwar, K. Hwang, W. Sung, Fixed point optimization of deep convolutional neural networks for object recognition, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2015), pp. 1131–1135
A. Delmas, S. Sharify, P. Judd, A. Moshovos. Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability (2017). arXiv:1707.09068
K.L. Du, M.N.S. Swamy, Neural network circuits and parallel implementations, in Neural Networks and Statistical Learning, (Springer, London, 2019), pp. 829–851
D. Esposito, A.G. Strollo, M. Alioto, Low-power approximate MAC unit, in 2017 13th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME), (IEEE, 2017), pp. 81–84
D.J. Frank, Power-constrained CMOS scaling limits. IBM J. Res. Dev. 46(23), 235–244 (2002)
Article Google Scholar
D.A. Gudovskiy, L. Rigazio, Generalized low-precision architecture for inference of convolutional neural networks (2017). arXiv:1706.02393
https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/design-compiler-graphical.html
https://www.cadence.com/en_US/home/tools/custom-ic-analog-rf-design/circuit-design/virtuoso-schematic-editor.html
https://communities.mentor.com/docs/DOC-3114
ISO/IEC/IEEE international standard—floating-point arithmetic, ISO/IEC 60559:2020(E) IEEE Std 754-2019, pp. 1–86 (2020)
M. Janveja, V. Niranjan, High performance Wallace tree multiplier using improved adder. ICTACT J. Microelectron. 3(01), 370–374 (2017)
Article Google Scholar
V.K. Jha, M.S. Gupta, Design of 16 bit low power vedic architecture using CSA & UTS (2019)
H. Jiang, C. Liu, F. Lombardi, J. Han, Low-power approximate unsigned multipliers with configurable error recovery. IEEE Trans. Circuits Syst. I Regul. Pap. 66(1), 189–202 (2018)
Article Google Scholar
P. Judd, J. Albericio, T. Hetherington, T.M. Aamodt, A. Moshovos, Stripes: Bit-serial deep neural network computing, in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), (IEEE, 2016), pp. 1–12
R.B.S. Kesava, B.L. Rao, K.B. Sindhuri, N.U. Kumar: Low power and area efficient Wallace tree multiplier using carry select adder with binary to excess-1 converter, in 2016 Conference on Advances in Signal Processing (CASP), (IEEE, 2016), pp. 248–253
H. Kim, Q. Chen, T. Yoo, T.T.H. Kim, B. Kim, A 1-16b precision reconfigurable digital in-memory computing macro featuring column-mac architecture and bit-serial computation, in ESSCIRC 2019-IEEE 45th European Solid State Circuits Conference (ESSCIRC), (IEEE, 2019), pp. 345–348 (2019)
Z. Li, Y.J. Huang, W.C. Lin. FPGA implementation of neuron block for artificial neural network, in 2017 International Conference on Electron Devices and Solid-state Circuits (EDSSC), (IEEE, 2017), pp. 1–2
J. Ma, R.P. Sheridan, A. Liaw, G.E. Dahl, V. Svetnik, Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55(2), 263–274 (2015)
Article Google Scholar
E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, D. Marr, Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (IEEE, 2016), pp. 1–4
R. Pinto, K. Shama, Low-power modified shift-add multiplier design using parallel prefix adder. Journal of Circuits, Systems and Computers 28(02), 1950019 (2019)
Article Google Scholar
G. Rajput, G. Raut, M. Chandra, S.K. Vishvakarma, VLSI implementation of transcendental function hyperbolic tangent for deep neural network accelerators. Microprocessors Microsyst. 84, 104270 (2021)
Article Google Scholar
G. Raut, V. Bhartiy, G. Rajput, S. Khan, A. Beohar, S.K. Vishvakarma, Efficient low-precision cordic algorithm for hardware implementation of artificial neural network, in International Symposium on VLSI Design and Test, (Springer, Singapore, 2019), pp. 321–333
G. Raut, S. Rai, S.K. Vishvakarma, A. Kumar, A CORDIC based configurable activation function for ANN applications, in 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (IEEE, 2020), pp. 78–83
G. Raut, S. Rai, S.K. Vishvakarma, A. Kumar, RECON: resource-efficient CORDIC-based neuron architecture. IEEE Open J. Circuits Syst. 2, 170–181 (2021)
Article Google Scholar
T. Sato, T. Ukezono, A dynamically configurable approximate array multiplier with exact mode, in 2020 5th International Conference on Computer and Communication Systems (ICCCS) (IEEE, 2020), pp. 917–921
H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, H. Esmaeilzadeh, Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network, in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2018), pp. 764–775
Y. Umuroglu, N.J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, K. Vissers. Finn: A framework for fast, scalable binarized neural network inference, in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2017), pp. 65–74
A.S. Vamsi, S.R. Ramesh, An efficient design of 16 bit mac unit using vedic mathematics, in 2019 International Conference on Communication and Signal Processing (ICCSP) (IEEE, 2019), pp. 0319–0322
N. Van Toan, J.G. Lee, FPGA-based multi-level approximate multipliers for high-performance error-resilient applications. IEEE Access 8, 25481–25497 (2020)
Article Google Scholar
T. Yang, T. Sato, T. Ukezono, An approximate multiply-accumulate unit with low power and reduced area, in 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), (IEEE, 2019), pp. 385–390
K. Yugandhar, V. Ganesh Raja, M. Tejkumar, D. Siva. High performance array multiplier using reversible logic structure, in 2018 International Conference on Current Trends Towards Converging Technologies (ICCTCT), (IEEE, 2018), pp. 1–5
M. Yuvaraj, B.J. Kailath, N. Bhaskhar, Design of optimized MAC unit using integrated vedic multiplier, in 2017 International conference on Microelectronic Devices, Circuits and Systems (ICMDCS), (IEEE, 2017), pp. 1–6

Download references

Acknowledgements

The authors would like to thank the Ministry of Education (MoE) and University Grant Commission (UGC), Government of India, for providing financial support. The author also acknowledge the Special Manpower Development Program Chip to System Design (SMDP), Department of Electronics and Information Technology (DeitY) under the Ministry of Communication and Information Technology, Government of India for providing necessary Research Facilities.

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Indore, 453552, Simrol, India
Harsh Chhajed, Gopal Raut, Narendra Dhakad, Sudheer Vishwakarma & Santosh Kumar Vishvakarma

Authors

Harsh Chhajed
View author publications
You can also search for this author in PubMed Google Scholar
Gopal Raut
View author publications
You can also search for this author in PubMed Google Scholar
Narendra Dhakad
View author publications
You can also search for this author in PubMed Google Scholar
Sudheer Vishwakarma
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Kumar Vishvakarma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Kumar Vishvakarma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chhajed, H., Raut, G., Dhakad, N. et al. BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator. Circuits Syst Signal Process 41, 2045–2060 (2022). https://doi.org/10.1007/s00034-021-01873-9

Download citation

Received: 14 December 2020
Revised: 27 September 2021
Accepted: 29 September 2021
Published: 08 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00034-021-01873-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of Binary Neural Network

A review of convolutional neural network architectures and their optimizations

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator

Abstract

Access this article

Similar content being viewed by others

A comprehensive review of Binary Neural Network

A review of convolutional neural network architectures and their optimizations

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation