Skip to main content
Log in

BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Contemporary hardware implementations of deep neural networks face the burden of excess area requirement due to resource-intensive elements such as a multiplier. A semi-custom ASIC approach-based VLSI circuit design of the multiply-accumulate unit in a deep neural network faces the chip area limitation. Therefore, an area and power-efficient architecture for the multiply-accumulate unit is imperative to down the burden of excess area requirement for digital design exploration. The present work addresses this challenge by proposing an efficient processing and bit-serial computation-based multiply-accumulate unit implementation. The proposed architecture is verified using simulation output and synthesized using Synopsys design vision at 180 nm and 45 nm technology and extracted all physical parameters using Cadence Virtuoso. At 45 nm, design shows 34.35% less area-delay-product (ADP). It shows improvement by 25.94% in area, 35.65% in power dissipation, and 14.30% in latency with respect to the state-of-the-art multiply-accumulate unit design. Furthermore, at lower technology node gets higher leakage power dissipation. In order to save leakage power, we exploit the power-gated design for the proposed architecture. The used coarse-grain power-gating technique saves 52.79% leakage/static power with minimal area overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study, and detailed circuit simulation results are given in the manuscript.

References

  1. S. Abed, Y. Khalil, M. Modhaffar, I. Ahmad, High-performance low-power approximate Wallace tree multiplier. Int. J. Circuit Theory Appl. 46(12), 2334–2348 (2018)

    Article  Google Scholar 

  2. M. Alçın, İ Pehlivan, İ Koyuncu, Hardware design and implementation of a novel ANN-based chaotic generator in FPGA. Optik 127(13), 5500–5505 (2016)

    Article  Google Scholar 

  3. S. Anwar, K. Hwang, W. Sung, Fixed point optimization of deep convolutional neural networks for object recognition, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2015), pp. 1131–1135

  4. A. Delmas, S. Sharify, P. Judd, A. Moshovos. Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability (2017). arXiv:1707.09068

  5. K.L. Du, M.N.S. Swamy, Neural network circuits and parallel implementations, in Neural Networks and Statistical Learning, (Springer, London, 2019), pp. 829–851

  6. D. Esposito, A.G. Strollo, M. Alioto, Low-power approximate MAC unit, in 2017 13th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME), (IEEE, 2017), pp. 81–84

  7. D.J. Frank, Power-constrained CMOS scaling limits. IBM J. Res. Dev. 46(23), 235–244 (2002)

    Article  Google Scholar 

  8. D.A. Gudovskiy, L. Rigazio, Generalized low-precision architecture for inference of convolutional neural networks (2017). arXiv:1706.02393

  9. https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/design-compiler-graphical.html

  10. https://www.cadence.com/en_US/home/tools/custom-ic-analog-rf-design/circuit-design/virtuoso-schematic-editor.html

  11. https://communities.mentor.com/docs/DOC-3114

  12. ISO/IEC/IEEE international standard—floating-point arithmetic, ISO/IEC 60559:2020(E) IEEE Std 754-2019, pp. 1–86 (2020)

  13. M. Janveja, V. Niranjan, High performance Wallace tree multiplier using improved adder. ICTACT J. Microelectron. 3(01), 370–374 (2017)

    Article  Google Scholar 

  14. V.K. Jha, M.S. Gupta, Design of 16 bit low power vedic architecture using CSA & UTS (2019)

  15. H. Jiang, C. Liu, F. Lombardi, J. Han, Low-power approximate unsigned multipliers with configurable error recovery. IEEE Trans. Circuits Syst. I Regul. Pap. 66(1), 189–202 (2018)

    Article  Google Scholar 

  16. P. Judd, J. Albericio, T. Hetherington, T.M. Aamodt, A. Moshovos, Stripes: Bit-serial deep neural network computing, in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), (IEEE, 2016), pp. 1–12

  17. R.B.S. Kesava, B.L. Rao, K.B. Sindhuri, N.U. Kumar: Low power and area efficient Wallace tree multiplier using carry select adder with binary to excess-1 converter, in 2016 Conference on Advances in Signal Processing (CASP), (IEEE, 2016), pp. 248–253

  18. H. Kim, Q. Chen, T. Yoo, T.T.H. Kim, B. Kim, A 1-16b precision reconfigurable digital in-memory computing macro featuring column-mac architecture and bit-serial computation, in ESSCIRC 2019-IEEE 45th European Solid State Circuits Conference (ESSCIRC), (IEEE, 2019), pp. 345–348 (2019)

  19. Z. Li, Y.J. Huang, W.C. Lin. FPGA implementation of neuron block for artificial neural network, in 2017 International Conference on Electron Devices and Solid-state Circuits (EDSSC), (IEEE, 2017), pp. 1–2

  20. J. Ma, R.P. Sheridan, A. Liaw, G.E. Dahl, V. Svetnik, Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55(2), 263–274 (2015)

    Article  Google Scholar 

  21. E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, D. Marr, Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (IEEE, 2016), pp. 1–4

  22. R. Pinto, K. Shama, Low-power modified shift-add multiplier design using parallel prefix adder. Journal of Circuits, Systems and Computers 28(02), 1950019 (2019)

    Article  Google Scholar 

  23. G. Rajput, G. Raut, M. Chandra, S.K. Vishvakarma, VLSI implementation of transcendental function hyperbolic tangent for deep neural network accelerators. Microprocessors Microsyst. 84, 104270 (2021)

    Article  Google Scholar 

  24. G. Raut, V. Bhartiy, G. Rajput, S. Khan, A. Beohar, S.K. Vishvakarma, Efficient low-precision cordic algorithm for hardware implementation of artificial neural network, in International Symposium on VLSI Design and Test, (Springer, Singapore, 2019), pp. 321–333

  25. G. Raut, S. Rai, S.K. Vishvakarma, A. Kumar, A CORDIC based configurable activation function for ANN applications, in 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (IEEE, 2020), pp. 78–83

  26. G. Raut, S. Rai, S.K. Vishvakarma, A. Kumar, RECON: resource-efficient CORDIC-based neuron architecture. IEEE Open J. Circuits Syst. 2, 170–181 (2021)

    Article  Google Scholar 

  27. T. Sato, T. Ukezono, A dynamically configurable approximate array multiplier with exact mode, in 2020 5th International Conference on Computer and Communication Systems (ICCCS) (IEEE, 2020), pp. 917–921

  28. H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, H. Esmaeilzadeh, Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network, in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2018), pp. 764–775

  29. Y. Umuroglu, N.J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, K. Vissers. Finn: A framework for fast, scalable binarized neural network inference, in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2017), pp. 65–74

  30. A.S. Vamsi, S.R. Ramesh, An efficient design of 16 bit mac unit using vedic mathematics, in 2019 International Conference on Communication and Signal Processing (ICCSP) (IEEE, 2019), pp. 0319–0322

  31. N. Van Toan, J.G. Lee, FPGA-based multi-level approximate multipliers for high-performance error-resilient applications. IEEE Access 8, 25481–25497 (2020)

    Article  Google Scholar 

  32. T. Yang, T. Sato, T. Ukezono, An approximate multiply-accumulate unit with low power and reduced area, in 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), (IEEE, 2019), pp. 385–390

  33. K. Yugandhar, V. Ganesh Raja, M. Tejkumar, D. Siva. High performance array multiplier using reversible logic structure, in 2018 International Conference on Current Trends Towards Converging Technologies (ICCTCT), (IEEE, 2018), pp. 1–5

  34. M. Yuvaraj, B.J. Kailath, N. Bhaskhar, Design of optimized MAC unit using integrated vedic multiplier, in 2017 International conference on Microelectronic Devices, Circuits and Systems (ICMDCS), (IEEE, 2017), pp. 1–6

Download references

Acknowledgements

The authors would like to thank the Ministry of Education (MoE) and University Grant Commission (UGC), Government of India, for providing financial support. The author also acknowledge the Special Manpower Development Program Chip to System Design (SMDP), Department of Electronics and Information Technology (DeitY) under the Ministry of Communication and Information Technology, Government of India for providing necessary Research Facilities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh Kumar Vishvakarma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chhajed, H., Raut, G., Dhakad, N. et al. BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator. Circuits Syst Signal Process 41, 2045–2060 (2022). https://doi.org/10.1007/s00034-021-01873-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01873-9

Keywords

Navigation