Energy Efficient DNN Compaction for Edge Deployment

Baby, Bijin Elsa; Deb, Dipika; Sharma, Benuraj; Vijayakumar, Kirthika; Das, Satyajit

doi:10.1007/978-3-031-42921-7_20

Bijin Elsa Baby¹¹,
Dipika Deb¹²,
Benuraj Sharma¹³,
Kirthika Vijayakumar¹³ &
…
Satyajit Das¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14251))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

549 Accesses

Abstract

Deep Neural Networks (DNNs) are popular deep learning models due to their numerous learnable parameters, which are required for both the training and inference phases. However, deploying these models on mobile and edge devices with limited hardware resources and power budgets is a significant challenge. To meet real-time requirements and energy efficiency, it is essential to compact DNN models. This paper proposes a fixed partition compaction technique exploiting consecutive zeros and non-zero weights/parameters in sparse DNN models. This approach reduces memory storage requirements, memory transactions and computations for DNNs. We implemented convolution and fully connected layers with the compact weights on Virtex-7 FPGA VC707. Our experiments demonstrate that compact layers have better performance and energy efficiency than layers without compaction. Results show that the compact convolution layers achieved an average performance improvement of 32.51% and 29.43% compared to state-of-the-art SMM and direct convolution respectively performed on several convolution configurations. Moreover, an energy consumption reduction of 34.14% over SMM and 29.58% over direct convolution. Experiments on the compact fully connected layers achieved an average performance improvement of 26.61% and energy consumption reduction of 30.85% over layers without compaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput. Archit. News 44(3), 1–13 (2016)
Article Google Scholar
Capra, M., Bussolino, B., Marchisio, A., Masera, G., Martina, M., Shafique, M.: Hardware and software optimizations for accelerating deep neural networks: survey of current trends, challenges, and the road ahead. IEEE Access 8, 225134–225180 (2020)
Article Google Scholar
Chang, S.E., et al.: Mix and match: a novel fpga-centric deep neural network quantization framework. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). pp. 208–220. IEEE (2021)
Google Scholar
Chen, Y.H., Yang, T.J., Emer, J., Sze, V.: Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Selected Topics Circuits and Syst. 9(2), 292–308 (2019)
Article Google Scholar
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)
Chou, S., Kjolstad, F., Amarasinghe, S.: Format abstraction for sparse tensor algebra compilers. Proc. ACM on Prog. Lang. 2(OOPSLA), 1–30 (2018)
Google Scholar
Han, S., et al.: Deep compression and EIE: Efficient inference engine on compressed deep neural network. In: Hot Chips Symposium. pp. 1–6 (2016)
Google Scholar
Han, S., et al.: Eie: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44(3), 243–254 (2016)
Article Google Scholar
Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., Peste, A.: Sparsity in deep learning: pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22(1), 10882–11005 (2021)
MathSciNet MATH Google Scholar
Ofir, A., Ben-Artzi, G.: Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3067–3075 (2022)
Google Scholar
Parashar, A., et al.: Scnn: an accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45(2), 27–40 (2017)
Article Google Scholar
PyTorch: Pruning tutorial. https://pytorch.org/tutorials/intermediate/pruning_tutorial.html, Accessed on 04 July 2023
Qasaimeh, M., Zambreno, J., Jones, P.H.: An efficient hardware architecture for sparse convolution using linear feedback shift registers. In: 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). pp. 250–257. IEEE (2021)
Google Scholar
Shafique, M., Marchisio, A., Putra, R.V.W., Hanif, M.A.: Towards energy-efficient and secure edge AI: A cross-layer framework ICCAD special session paper. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). pp. 1–9. IEEE (2021)
Google Scholar
Smith, S., Karypis, G.: Tensor-matrix products with a compressed sparse tensor. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. pp. 1–7 (2015)
Google Scholar
Stewart, R., Nowlan, A., Bacchus, P., Ducasse, Q., Komendantskaya, E.: Optimising hardware accelerated neural networks with quantisation and a knowledge distillation evolutionary algorithm. Electronics 10(4), 396 (2021)
Article Google Scholar
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105(12), 2295–2329 (2017)
Article Google Scholar
Yuan, Z., et al.: Sticker: A 0.41-62.1 tops/w 8bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In: 2018 IEEE symposium on VLSI circuits. pp. 33–34. IEEE (2018)
Google Scholar
Zhang, S., et al.: Cambricon-x: An accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). pp. 1–12. IEEE (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Palakkad, Palakkad, Kerala, India
Bijin Elsa Baby & Satyajit Das
Indian Institute of Technology Guwahati, Guwahati, Assam, India
Dipika Deb
MulticoreWare Inc., Chennai, Tamil Nadu, India
Benuraj Sharma & Kirthika Vijayakumar

Authors

Bijin Elsa Baby
View author publications
You can also search for this author in PubMed Google Scholar
Dipika Deb
View author publications
You can also search for this author in PubMed Google Scholar
Benuraj Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Kirthika Vijayakumar
View author publications
You can also search for this author in PubMed Google Scholar
Satyajit Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bijin Elsa Baby .

Editor information

Editors and Affiliations

Università degli Studi di Sassari, Sassari, Italy
Francesca Palumbo
Aristotle University of Thessaloniki, Thessaloniki, Greece
Georgios Keramidas
University of Peloponnese, Patras, Greece
Nikolaos Voros
University of Porto, Porto, Portugal
Pedro C. Diniz

Ethics declarations

Declarations

This work was funded by the MulticoreWare Inc, and IPTIF, IIT Palakkad project No. IPTIF/TD/IP/003.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baby, B.E., Deb, D., Sharma, B., Vijayakumar, K., Das, S. (2023). Energy Efficient DNN Compaction for Edge Deployment. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-42921-7_20
Published: 16 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42920-0
Online ISBN: 978-3-031-42921-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Energy Efficient DNN Compaction for Edge Deployment