Abstract
Deep Neural Networks (DNNs) are popular deep learning models due to their numerous learnable parameters, which are required for both the training and inference phases. However, deploying these models on mobile and edge devices with limited hardware resources and power budgets is a significant challenge. To meet real-time requirements and energy efficiency, it is essential to compact DNN models. This paper proposes a fixed partition compaction technique exploiting consecutive zeros and non-zero weights/parameters in sparse DNN models. This approach reduces memory storage requirements, memory transactions and computations for DNNs. We implemented convolution and fully connected layers with the compact weights on Virtex-7 FPGA VC707. Our experiments demonstrate that compact layers have better performance and energy efficiency than layers without compaction. Results show that the compact convolution layers achieved an average performance improvement of 32.51% and 29.43% compared to state-of-the-art SMM and direct convolution respectively performed on several convolution configurations. Moreover, an energy consumption reduction of 34.14% over SMM and 29.58% over direct convolution. Experiments on the compact fully connected layers achieved an average performance improvement of 26.61% and energy consumption reduction of 30.85% over layers without compaction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput. Archit. News 44(3), 1–13 (2016)
Capra, M., Bussolino, B., Marchisio, A., Masera, G., Martina, M., Shafique, M.: Hardware and software optimizations for accelerating deep neural networks: survey of current trends, challenges, and the road ahead. IEEE Access 8, 225134–225180 (2020)
Chang, S.E., et al.: Mix and match: a novel fpga-centric deep neural network quantization framework. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). pp. 208–220. IEEE (2021)
Chen, Y.H., Yang, T.J., Emer, J., Sze, V.: Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Selected Topics Circuits and Syst. 9(2), 292–308 (2019)
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)
Chou, S., Kjolstad, F., Amarasinghe, S.: Format abstraction for sparse tensor algebra compilers. Proc. ACM on Prog. Lang. 2(OOPSLA), 1–30 (2018)
Han, S., et al.: Deep compression and EIE: Efficient inference engine on compressed deep neural network. In: Hot Chips Symposium. pp. 1–6 (2016)
Han, S., et al.: Eie: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44(3), 243–254 (2016)
Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., Peste, A.: Sparsity in deep learning: pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22(1), 10882–11005 (2021)
Ofir, A., Ben-Artzi, G.: Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3067–3075 (2022)
Parashar, A., et al.: Scnn: an accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45(2), 27–40 (2017)
PyTorch: Pruning tutorial. https://pytorch.org/tutorials/intermediate/pruning_tutorial.html, Accessed on 04 July 2023
Qasaimeh, M., Zambreno, J., Jones, P.H.: An efficient hardware architecture for sparse convolution using linear feedback shift registers. In: 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). pp. 250–257. IEEE (2021)
Shafique, M., Marchisio, A., Putra, R.V.W., Hanif, M.A.: Towards energy-efficient and secure edge AI: A cross-layer framework ICCAD special session paper. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). pp. 1–9. IEEE (2021)
Smith, S., Karypis, G.: Tensor-matrix products with a compressed sparse tensor. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. pp. 1–7 (2015)
Stewart, R., Nowlan, A., Bacchus, P., Ducasse, Q., Komendantskaya, E.: Optimising hardware accelerated neural networks with quantisation and a knowledge distillation evolutionary algorithm. Electronics 10(4), 396 (2021)
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105(12), 2295–2329 (2017)
Yuan, Z., et al.: Sticker: A 0.41-62.1 tops/w 8bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In: 2018 IEEE symposium on VLSI circuits. pp. 33–34. IEEE (2018)
Zhang, S., et al.: Cambricon-x: An accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). pp. 1–12. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Declarations
This work was funded by the MulticoreWare Inc, and IPTIF, IIT Palakkad project No. IPTIF/TD/IP/003.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Baby, B.E., Deb, D., Sharma, B., Vijayakumar, K., Das, S. (2023). Energy Efficient DNN Compaction for Edge Deployment. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-42921-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42920-0
Online ISBN: 978-3-031-42921-7
eBook Packages: Computer ScienceComputer Science (R0)