Abstract
Convolutional neural network (CNN) is one of the most dominant deep learning networks with good generalization ability. Its high performance in solving large and complex learning problems has enabled usability in IoT devices. However, CNN involves a substantial amount of convolution operations, which demand a large number of power-consuming multipliers. This hinders the deployment of deep CNNs on mobile and IoT edge devices owing to restricted power–area constraints. In this paper, we propose a low-complex methodology named ‘minimal modified distributed arithmetic’ (M2DA) for convolutional neural network (CNN) by exploiting the data symmetry and consequently storing only the unique kernel coefficient’s combinations and the size of required memory and multiplication operations can be reduced, leading to power–area efficient design. For validation, a low-complex CNN architecture for activity recognition application is designed and synthesized in Synopsys using the UMC 65 nm technology wherein average 36.89% and 51.63% improvement is achieved in power and area, respectively, compared to conventional MDA methodology. To demonstrate the significance of the proposed M2DA methodology, we have also implemented the Alexnet which is the most widely and publicly available CNN model for the image classification problem.
Similar content being viewed by others
Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study
References
T. Abtahi, C. Shea, A. Kulkarni, T. Mohsenin, Accelerating convolutional neural network with FFT on embedded hardware. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 1737–1749 (2018)
A. Acharyya, K. Maharatna, B.M. Al-Hashimi, S.R. Gunn, Memory reduction methodology for distributed-arithmetic-based DWT/IDWT exploiting data symmetry. IEEE Trans. Circuits Syst. II Express Briefs 56, 285–289 (2009)
R. Andri, L. Cavigelli, D. Rossi, L. Benini, YodaNN: An architecture for ultralow power binary-weight CNN acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37, 48–60 (2017)
Y.-H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 127–138 (2016)
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al. Dadiannao: a machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE, 2014), pp. 609–622
W. Chen, J. Wilson, S. Tyree, K. Q. Weinberger, Y. Chen, Compressing convolutional neural networks in the frequency domain. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1475–1484
Y.-H. Chen, T.-J. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 292–308 (2019)
T. Chilimbi, Y. Suzue, J. Apacible, K. Kalyanaraman. Project adam: building an efficient and scalable deep learning training system. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14) (2014), pp. 571–582
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
A. Gautam, M. Panwar, D. Biswas, A. Acharyya, MyoNet: a transfer learning-based LRCN for lower limb movement recognition and knee joint angle prediction for remote monitoring of rehabilitation progress from sEMG. IEEE J. Transl. Eng. Health Med. 8, 1–10 (2020)
A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural netw. 18, 602–610 (2005)
A. Graves, G. Wayne, I. Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44, 243–254 (2016)
S. Han, H. Mao, W.J. Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
D. Huang, X. Zhang, R. Zhang, T. Zhi, D. He, J. Guo, C. Liu, Q. Guo, Z. Du, S. Liu et al. DWM: a decomposable winograd method for convolution acceleration. arXiv preprint arXiv:2002.00552
A. Jafari, A. Ganesan, C.S.K. Thalisetty, V. Sivasubramanian, T. Oates, T. Mohsenin, Sensornet: a scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 274–287 (2018)
A. Krizhevsky, I. Sutskever, G.E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)
C.-T. Liu, T.-W. Lin, Y.-H. Wu, Y.-S. Lin, H. Lee, Y. Tsao, S.-Y. Chien, Computation-performance optimization of convolutional neural networks with redundant filter removal. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 1908–1921 (2018)
Y. Ma, Y. Cao, S. Vrudhula, J. Seo, Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 1354–1367 (2018)
P. Meloni, G. Deriu, F. Conti, I. Loi, L. Raffo, L. Benini, A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) (IEEE, 2016), pp. 1–8
D. Miyashita, E.H. Lee, B. Murmann, Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025
D.T. Nguyen, T.N. Nguyen, H. Kim, H.-J. Lee, A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27, 1861–1873 (2019)
K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2, 1–4 (2015)
A. Page, A. Jafari, C. Shea, T. Mohsenin, Sparcnet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 13, 1–32 (2017)
M. Panwar, S.R. Dyuthi, K.C. Prakash, D. Biswas, A. Acharyya, K. Maharatna, A. Gautam, G.R. Naik, CNN based approach for activity recognition using a wrist-worn accelerometer. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (IEEE, 2017), pp. 2438–2441
M. Panwar, J. Padmini, A. Acharyya, D. Biswas, et al., Modified distributed arithmetic based low complexity CNN architecture design methodology. In 2017 European Conference on Circuit Theory and Design (ECCTD) (IEEE, 2017), pp. 1–4
M. Panwar, D. Biswas, H. Bajaj, M. Jöbges, R. Turk, K. Maharatna, A. Acharyya, Rehab-net: deep learning framework for arm movement classification using wearable sensors for stroke rehabilitation. IEEE Trans. Biomed. Eng. 66, 3026–3037 (2019)
D. Ray, N.V. George, P.K. Meher, An analytical framework and approximation strategy for efficient implementation of distributed arithmetic-based inner-product architectures. IEEE Trans. Circuits Syst. I Regul. Pap. 67, 212–224 (2019)
S.S. Sarwar, G. Srinivasan, B. Han, P. Wijesinghe, A. Jaiswal, P. Panda, A. Raghunathan, K. Roy, Energy efficient neural computing: a study of cross-layer approximations. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 796–809 (2018)
Y. Shen, M. Ferdman, P. Milder, Maximizing CNN accelerator efficiency through resource partitioning. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2017), pp. 535–547
R.J. Struharik, B.Z. Vukobratović, A.M. Erdeljan, D.M. Rakanović, CoNNa–hardware accelerator for compressed convolutional neural networks. Microprocess. Microsyst. 73, 102991 (2020)
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J. Seo, Y. Cao. Throughput-optimized OpenCL-based FPGA accelerator for largescale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016), pp. 16–25
F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, S. Wei, Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25, 2220–2233 (2017)
C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, X. Zhou, DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36, 513–517 (2016)
J. Wang, J. Lin, Z. Wang, Efficient hardware architectures for deep convolutional neural network. IEEE Trans. Circuits Syst. I Regul. Pap. 65, 1941–1953 (2017)
Y. Wang, J. Lin, Z. Wang, An energy-efficient architecture for binary weight convolutional neural networks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26, 280–293 (2017)
Y. Wang, J. Lin, Z. Wang, Fpap: a folded architecture for energy-quality scalable convolutional neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 66, 288–301 (2018)
S. Wang, D. Zhou, X. Han, T. Yoshimura, Chain-NN: an energy-efficient 1D chain architecture for accelerating deep convolutional neural networks. In Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, 2017), pp. 1032–1037
S.A. White, Applications of distributed arithmetic to digital signal processing: a tutorial review. IEEE ASSP Mag. 6, 4–19 (1989)
C. Wu, M. Wang, X. Chu, K. Wang, and L. He. Low Precision Floating-point Arithmetic for High Performance FPGA-based CNN Acceleration. arXiv preprint arXiv:2003.03852
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2015), pp. 161–170
R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, Z. Zhang. Accelerating binarized convolutional neural networks with software programmable fpgas. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2017), pp. 15–24
C. Zhu, K. Huang, S. Yang, Z. Zhu, H. Zhang, H. Shen, An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs. arXiv preprint arXiv:2001.01955
J. Zhu, Z. Qian, C.-Y. Tsui. BHNN: a memory-efficient accelerator for compressing deep neural networks with blocked hashing techniques. In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) (IEEE, 2017), pp. 690–695
Acknowledgements
This work is partially funded by the Science and Engineering Research Board (SERB), Government of India for the project entitled “Intelligent IoT enabled Autonomous Structural Health Monitoring System for Ships, Aeroplanes, Trains and Automobiles” under the Impacting Research Innovation and Technology (IMPRINT) program with the Grant Number IMP/2018/000375. The computational platform was supported by the project i-MOBILYZE funded by the Xilinx Inc., USA with the Grant Number IITH/EE/F091/S81. All the computer-aided design tools are supported under the Special Manpower Development Program (SMDP) of the Ministry of Electronics and Information Technology (MEITY), Government of India. MP and AA would also like to acknowledge the support received under the “Visvesvaraya Fellowship” by the MEITY.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Panwar, M., Sri Hari, N., Biswas, D. et al. M2DA: A Low-Complex Design Methodology for Convolutional Neural Network Exploiting Data Symmetry and Redundancy. Circuits Syst Signal Process 40, 1542–1567 (2021). https://doi.org/10.1007/s00034-020-01534-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-020-01534-3