Skip to main content
Log in

M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy

  • Short Paper
  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Convolutional neural network (CNN) is one of the most dominant deep learning networks with good generalization ability. Its high performance in solving large and complex learning problems has enabled usability in IoT devices. However, CNN involves a substantial amount of convolution operations, which demand a large number of power-consuming multipliers. This hinders the deployment of deep CNNs on mobile and IoT edge devices owing to restricted power–area constraints. In this paper, we propose a low-complex methodology named ‘minimal modified distributed arithmetic’ (M2DA) for convolutional neural network (CNN) by exploiting the data symmetry and consequently storing only the unique kernel coefficient’s combinations and the size of required memory and multiplication operations can be reduced, leading to power–area efficient design. For validation, a low-complex CNN architecture for activity recognition application is designed and synthesized in Synopsys using the UMC 65 nm technology wherein average 36.89% and 51.63% improvement is achieved in power and area, respectively, compared to conventional MDA methodology. To demonstrate the significance of the proposed M2DA methodology, we have also implemented the Alexnet which is the most widely and publicly available CNN model for the image classification problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and materials

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study

References

  1. T. Abtahi, C. Shea, A. Kulkarni, T. Mohsenin, Accelerating convolutional neural network with FFT on embedded hardware. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 1737–1749 (2018)

    Article  Google Scholar 

  2. A. Acharyya, K. Maharatna, B.M. Al-Hashimi, S.R. Gunn, Memory reduction methodology for distributed-arithmetic-based DWT/IDWT exploiting data symmetry. IEEE Trans. Circuits Syst. II Express Briefs 56, 285–289 (2009)

    Article  Google Scholar 

  3. R. Andri, L. Cavigelli, D. Rossi, L. Benini, YodaNN: An architecture for ultralow power binary-weight CNN acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37, 48–60 (2017)

    Article  Google Scholar 

  4. Y.-H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 127–138 (2016)

    Article  Google Scholar 

  5. Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al. Dadiannao: a machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE, 2014), pp. 609–622

  6. W. Chen, J. Wilson, S. Tyree, K. Q. Weinberger, Y. Chen, Compressing convolutional neural networks in the frequency domain. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1475–1484

  7. Y.-H. Chen, T.-J. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 292–308 (2019)

    Article  Google Scholar 

  8. T. Chilimbi, Y. Suzue, J. Apacible, K. Kalyanaraman. Project adam: building an efficient and scalable deep learning training system. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14) (2014), pp. 571–582

  9. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  10. A. Gautam, M. Panwar, D. Biswas, A. Acharyya, MyoNet: a transfer learning-based LRCN for lower limb movement recognition and knee joint angle prediction for remote monitoring of rehabilitation progress from sEMG. IEEE J. Transl. Eng. Health Med. 8, 1–10 (2020)

    Google Scholar 

  11. A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural netw. 18, 602–610 (2005)

    Article  Google Scholar 

  12. A. Graves, G. Wayne, I. Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401

  13. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44, 243–254 (2016)

    Article  Google Scholar 

  14. S. Han, H. Mao, W.J. Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149

  15. D. Huang, X. Zhang, R. Zhang, T. Zhi, D. He, J. Guo, C. Liu, Q. Guo, Z. Du, S. Liu et al. DWM: a decomposable winograd method for convolution acceleration. arXiv preprint arXiv:2002.00552

  16. A. Jafari, A. Ganesan, C.S.K. Thalisetty, V. Sivasubramanian, T. Oates, T. Mohsenin, Sensornet: a scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 274–287 (2018)

    Article  Google Scholar 

  17. A. Krizhevsky, I. Sutskever, G.E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (2012), pp. 1097–1105

  18. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  19. C.-T. Liu, T.-W. Lin, Y.-H. Wu, Y.-S. Lin, H. Lee, Y. Tsao, S.-Y. Chien, Computation-performance optimization of convolutional neural networks with redundant filter removal. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 1908–1921 (2018)

    Article  Google Scholar 

  20. Y. Ma, Y. Cao, S. Vrudhula, J. Seo, Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 1354–1367 (2018)

    Article  Google Scholar 

  21. P. Meloni, G. Deriu, F. Conti, I. Loi, L. Raffo, L. Benini, A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) (IEEE, 2016), pp. 1–8

  22. D. Miyashita, E.H. Lee, B. Murmann, Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025

  23. D.T. Nguyen, T.N. Nguyen, H. Kim, H.-J. Lee, A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27, 1861–1873 (2019)

    Article  Google Scholar 

  24. K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2, 1–4 (2015)

    Google Scholar 

  25. A. Page, A. Jafari, C. Shea, T. Mohsenin, Sparcnet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 13, 1–32 (2017)

    Article  Google Scholar 

  26. M. Panwar, S.R. Dyuthi, K.C. Prakash, D. Biswas, A. Acharyya, K. Maharatna, A. Gautam, G.R. Naik, CNN based approach for activity recognition using a wrist-worn accelerometer. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (IEEE, 2017), pp. 2438–2441

  27. M. Panwar, J. Padmini, A. Acharyya, D. Biswas, et al., Modified distributed arithmetic based low complexity CNN architecture design methodology. In 2017 European Conference on Circuit Theory and Design (ECCTD) (IEEE, 2017), pp. 1–4

  28. M. Panwar, D. Biswas, H. Bajaj, M. Jöbges, R. Turk, K. Maharatna, A. Acharyya, Rehab-net: deep learning framework for arm movement classification using wearable sensors for stroke rehabilitation. IEEE Trans. Biomed. Eng. 66, 3026–3037 (2019)

    Article  Google Scholar 

  29. D. Ray, N.V. George, P.K. Meher, An analytical framework and approximation strategy for efficient implementation of distributed arithmetic-based inner-product architectures. IEEE Trans. Circuits Syst. I Regul. Pap. 67, 212–224 (2019)

    Article  Google Scholar 

  30. S.S. Sarwar, G. Srinivasan, B. Han, P. Wijesinghe, A. Jaiswal, P. Panda, A. Raghunathan, K. Roy, Energy efficient neural computing: a study of cross-layer approximations. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 796–809 (2018)

    Article  Google Scholar 

  31. Y. Shen, M. Ferdman, P. Milder, Maximizing CNN accelerator efficiency through resource partitioning. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2017), pp. 535–547

  32. R.J. Struharik, B.Z. Vukobratović, A.M. Erdeljan, D.M. Rakanović, CoNNa–hardware accelerator for compressed convolutional neural networks. Microprocess. Microsyst. 73, 102991 (2020)

    Article  Google Scholar 

  33. N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. Seo, Y. Cao. Throughput-optimized OpenCL-based FPGA accelerator for largescale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016), pp. 16–25

  34. F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, S. Wei, Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25, 2220–2233 (2017)

    Article  Google Scholar 

  35. C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, X. Zhou, DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36, 513–517 (2016)

    Google Scholar 

  36. J. Wang, J. Lin, Z. Wang, Efficient hardware architectures for deep convolutional neural network. IEEE Trans. Circuits Syst. I Regul. Pap. 65, 1941–1953 (2017)

    Article  Google Scholar 

  37. Y. Wang, J. Lin, Z. Wang, An energy-efficient architecture for binary weight convolutional neural networks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 280–293 (2017)

    Article  Google Scholar 

  38. Y. Wang, J. Lin, Z. Wang, Fpap: a folded architecture for energy-quality scalable convolutional neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 288–301 (2018)

    Article  Google Scholar 

  39. S. Wang, D. Zhou, X. Han, T. Yoshimura, Chain-NN: an energy-efficient 1D chain architecture for accelerating deep convolutional neural networks. In Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, 2017), pp. 1032–1037

  40. S.A. White, Applications of distributed arithmetic to digital signal processing: a tutorial review. IEEE ASSP Mag. 6, 4–19 (1989)

    Article  Google Scholar 

  41. C. Wu, M. Wang, X. Chu, K. Wang, and L. He. Low Precision Floating-point Arithmetic for High Performance FPGA-based CNN Acceleration. arXiv preprint arXiv:2003.03852

  42. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2015), pp. 161–170

  43. R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, Z. Zhang. Accelerating binarized convolutional neural networks with software programmable fpgas. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2017), pp. 15–24

  44. C. Zhu, K. Huang, S. Yang, Z. Zhu, H. Zhang, H. Shen, An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs. arXiv preprint arXiv:2001.01955

  45. J. Zhu, Z. Qian, C.-Y. Tsui. BHNN: a memory-efficient accelerator for compressing deep neural networks with blocked hashing techniques. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) (IEEE, 2017), pp. 690–695

Download references

Acknowledgements

This work is partially funded by the Science and Engineering Research Board (SERB), Government of India for the project entitled “Intelligent IoT enabled Autonomous Structural Health Monitoring System for Ships, Aeroplanes, Trains and Automobiles” under the Impacting Research Innovation and Technology (IMPRINT) program with the Grant Number IMP/2018/000375. The computational platform was supported by the project i-MOBILYZE funded by the Xilinx Inc., USA with the Grant Number IITH/EE/F091/S81. All the computer-aided design tools are supported under the Special Manpower Development Program (SMDP) of the Ministry of Electronics and Information Technology (MEITY), Government of India. MP and AA would also like to acknowledge the support received under the “Visvesvaraya Fellowship” by the MEITY.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Acharyya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panwar, M., Sri Hari, N., Biswas, D. et al. M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy. Circuits Syst Signal Process 40, 1542–1567 (2021). https://doi.org/10.1007/s00034-020-01534-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-020-01534-3

Keywords

Navigation