M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy

Panwar, Madhuri; Sri Hari, Nemani; Biswas, Dwaipayan; Acharyya, Amit

doi:10.1007/s00034-020-01534-3

M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy

Short Paper
Published: 07 September 2020

Volume 40, pages 1542–1567, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Madhuri Panwar¹,
Nemani Sri Hari²,
Dwaipayan Biswas³ &
…
Amit Acharyya ORCID: orcid.org/0000-0002-5636-0676¹

300 Accesses
3 Citations
Explore all metrics

Abstract

Convolutional neural network (CNN) is one of the most dominant deep learning networks with good generalization ability. Its high performance in solving large and complex learning problems has enabled usability in IoT devices. However, CNN involves a substantial amount of convolution operations, which demand a large number of power-consuming multipliers. This hinders the deployment of deep CNNs on mobile and IoT edge devices owing to restricted power–area constraints. In this paper, we propose a low-complex methodology named ‘minimal modified distributed arithmetic’ (M2DA) for convolutional neural network (CNN) by exploiting the data symmetry and consequently storing only the unique kernel coefficient’s combinations and the size of required memory and multiplication operations can be reduced, leading to power–area efficient design. For validation, a low-complex CNN architecture for activity recognition application is designed and synthesized in Synopsys using the UMC 65 nm technology wherein average 36.89% and 51.63% improvement is achieved in power and area, respectively, compared to conventional MDA methodology. To demonstrate the significance of the proposed M2DA methodology, we have also implemented the Alexnet which is the most widely and publicly available CNN model for the image classification problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Availability of data and materials

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study

References

T. Abtahi, C. Shea, A. Kulkarni, T. Mohsenin, Accelerating convolutional neural network with FFT on embedded hardware. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 1737–1749 (2018)
Article Google Scholar
A. Acharyya, K. Maharatna, B.M. Al-Hashimi, S.R. Gunn, Memory reduction methodology for distributed-arithmetic-based DWT/IDWT exploiting data symmetry. IEEE Trans. Circuits Syst. II Express Briefs 56, 285–289 (2009)
Article Google Scholar
R. Andri, L. Cavigelli, D. Rossi, L. Benini, YodaNN: An architecture for ultralow power binary-weight CNN acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37, 48–60 (2017)
Article Google Scholar
Y.-H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 127–138 (2016)
Article Google Scholar
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al. Dadiannao: a machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE, 2014), pp. 609–622
W. Chen, J. Wilson, S. Tyree, K. Q. Weinberger, Y. Chen, Compressing convolutional neural networks in the frequency domain. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1475–1484
Y.-H. Chen, T.-J. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 292–308 (2019)
Article Google Scholar
T. Chilimbi, Y. Suzue, J. Apacible, K. Kalyanaraman. Project adam: building an efficient and scalable deep learning training system. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14) (2014), pp. 571–582
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
A. Gautam, M. Panwar, D. Biswas, A. Acharyya, MyoNet: a transfer learning-based LRCN for lower limb movement recognition and knee joint angle prediction for remote monitoring of rehabilitation progress from sEMG. IEEE J. Transl. Eng. Health Med. 8, 1–10 (2020)
Google Scholar
A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural netw. 18, 602–610 (2005)
Article Google Scholar
A. Graves, G. Wayne, I. Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44, 243–254 (2016)
Article Google Scholar
S. Han, H. Mao, W.J. Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
D. Huang, X. Zhang, R. Zhang, T. Zhi, D. He, J. Guo, C. Liu, Q. Guo, Z. Du, S. Liu et al. DWM: a decomposable winograd method for convolution acceleration. arXiv preprint arXiv:2002.00552
A. Jafari, A. Ganesan, C.S.K. Thalisetty, V. Sivasubramanian, T. Oates, T. Mohsenin, Sensornet: a scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 274–287 (2018)
Article Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
C.-T. Liu, T.-W. Lin, Y.-H. Wu, Y.-S. Lin, H. Lee, Y. Tsao, S.-Y. Chien, Computation-performance optimization of convolutional neural networks with redundant filter removal. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 1908–1921 (2018)
Article Google Scholar
Y. Ma, Y. Cao, S. Vrudhula, J. Seo, Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 1354–1367 (2018)
Article Google Scholar
P. Meloni, G. Deriu, F. Conti, I. Loi, L. Raffo, L. Benini, A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) (IEEE, 2016), pp. 1–8
D. Miyashita, E.H. Lee, B. Murmann, Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025
D.T. Nguyen, T.N. Nguyen, H. Kim, H.-J. Lee, A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27, 1861–1873 (2019)
Article Google Scholar
K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2, 1–4 (2015)
Google Scholar
A. Page, A. Jafari, C. Shea, T. Mohsenin, Sparcnet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 13, 1–32 (2017)
Article Google Scholar
M. Panwar, S.R. Dyuthi, K.C. Prakash, D. Biswas, A. Acharyya, K. Maharatna, A. Gautam, G.R. Naik, CNN based approach for activity recognition using a wrist-worn accelerometer. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (IEEE, 2017), pp. 2438–2441
M. Panwar, J. Padmini, A. Acharyya, D. Biswas, et al., Modified distributed arithmetic based low complexity CNN architecture design methodology. In 2017 European Conference on Circuit Theory and Design (ECCTD) (IEEE, 2017), pp. 1–4
M. Panwar, D. Biswas, H. Bajaj, M. Jöbges, R. Turk, K. Maharatna, A. Acharyya, Rehab-net: deep learning framework for arm movement classification using wearable sensors for stroke rehabilitation. IEEE Trans. Biomed. Eng. 66, 3026–3037 (2019)
Article Google Scholar
D. Ray, N.V. George, P.K. Meher, An analytical framework and approximation strategy for efficient implementation of distributed arithmetic-based inner-product architectures. IEEE Trans. Circuits Syst. I Regul. Pap. 67, 212–224 (2019)
Article Google Scholar
S.S. Sarwar, G. Srinivasan, B. Han, P. Wijesinghe, A. Jaiswal, P. Panda, A. Raghunathan, K. Roy, Energy efficient neural computing: a study of cross-layer approximations. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 796–809 (2018)
Article Google Scholar
Y. Shen, M. Ferdman, P. Milder, Maximizing CNN accelerator efficiency through resource partitioning. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2017), pp. 535–547
R.J. Struharik, B.Z. Vukobratović, A.M. Erdeljan, D.M. Rakanović, CoNNa–hardware accelerator for compressed convolutional neural networks. Microprocess. Microsyst. 73, 102991 (2020)
Article Google Scholar
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. Seo, Y. Cao. Throughput-optimized OpenCL-based FPGA accelerator for largescale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016), pp. 16–25
F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, S. Wei, Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25, 2220–2233 (2017)
Article Google Scholar
C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, X. Zhou, DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36, 513–517 (2016)
Google Scholar
J. Wang, J. Lin, Z. Wang, Efficient hardware architectures for deep convolutional neural network. IEEE Trans. Circuits Syst. I Regul. Pap. 65, 1941–1953 (2017)
Article Google Scholar
Y. Wang, J. Lin, Z. Wang, An energy-efficient architecture for binary weight convolutional neural networks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 280–293 (2017)
Article Google Scholar
Y. Wang, J. Lin, Z. Wang, Fpap: a folded architecture for energy-quality scalable convolutional neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 288–301 (2018)
Article Google Scholar
S. Wang, D. Zhou, X. Han, T. Yoshimura, Chain-NN: an energy-efficient 1D chain architecture for accelerating deep convolutional neural networks. In Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, 2017), pp. 1032–1037
S.A. White, Applications of distributed arithmetic to digital signal processing: a tutorial review. IEEE ASSP Mag. 6, 4–19 (1989)
Article Google Scholar
C. Wu, M. Wang, X. Chu, K. Wang, and L. He. Low Precision Floating-point Arithmetic for High Performance FPGA-based CNN Acceleration. arXiv preprint arXiv:2003.03852
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2015), pp. 161–170
R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, Z. Zhang. Accelerating binarized convolutional neural networks with software programmable fpgas. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2017), pp. 15–24
C. Zhu, K. Huang, S. Yang, Z. Zhu, H. Zhang, H. Shen, An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs. arXiv preprint arXiv:2001.01955
J. Zhu, Z. Qian, C.-Y. Tsui. BHNN: a memory-efficient accelerator for compressing deep neural networks with blocked hashing techniques. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) (IEEE, 2017), pp. 690–695

Download references

Acknowledgements

This work is partially funded by the Science and Engineering Research Board (SERB), Government of India for the project entitled “Intelligent IoT enabled Autonomous Structural Health Monitoring System for Ships, Aeroplanes, Trains and Automobiles” under the Impacting Research Innovation and Technology (IMPRINT) program with the Grant Number IMP/2018/000375. The computational platform was supported by the project i-MOBILYZE funded by the Xilinx Inc., USA with the Grant Number IITH/EE/F091/S81. All the computer-aided design tools are supported under the Special Manpower Development Program (SMDP) of the Ministry of Electronics and Information Technology (MEITY), Government of India. MP and AA would also like to acknowledge the support received under the “Visvesvaraya Fellowship” by the MEITY.

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Hyderabad, Sangareddy, Telangana, 502205, India
Madhuri Panwar & Amit Acharyya
Indian Institute of Technology Mandi, Mandi, India
Nemani Sri Hari
Biomedical Circuits and Systems Group in IMEC, 3001, Heverlee, Belgium
Dwaipayan Biswas

Authors

Madhuri Panwar
View author publications
You can also search for this author in PubMed Google Scholar
Nemani Sri Hari
View author publications
You can also search for this author in PubMed Google Scholar
Dwaipayan Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Amit Acharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Acharyya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panwar, M., Sri Hari, N., Biswas, D. et al. M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy. Circuits Syst Signal Process 40, 1542–1567 (2021). https://doi.org/10.1007/s00034-020-01534-3

Download citation

Received: 14 January 2020
Revised: 19 August 2020
Accepted: 23 August 2020
Published: 07 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00034-020-01534-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Convolutional neural network: a review of models, methodologies and applications to object detection

A survey of the recent architectures of deep convolutional neural networks

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Convolutional neural network: a review of models, methodologies and applications to object detection

A survey of the recent architectures of deep convolutional neural networks

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation