Skip to main content

Energy Efficient DNN Compaction for Edge Deployment

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14251))

Included in the following conference series:

  • 549 Accesses

Abstract

Deep Neural Networks (DNNs) are popular deep learning models due to their numerous learnable parameters, which are required for both the training and inference phases. However, deploying these models on mobile and edge devices with limited hardware resources and power budgets is a significant challenge. To meet real-time requirements and energy efficiency, it is essential to compact DNN models. This paper proposes a fixed partition compaction technique exploiting consecutive zeros and non-zero weights/parameters in sparse DNN models. This approach reduces memory storage requirements, memory transactions and computations for DNNs. We implemented convolution and fully connected layers with the compact weights on Virtex-7 FPGA VC707. Our experiments demonstrate that compact layers have better performance and energy efficiency than layers without compaction. Results show that the compact convolution layers achieved an average performance improvement of 32.51% and 29.43% compared to state-of-the-art SMM and direct convolution respectively performed on several convolution configurations. Moreover, an energy consumption reduction of 34.14% over SMM and 29.58% over direct convolution. Experiments on the compact fully connected layers achieved an average performance improvement of 26.61% and energy consumption reduction of 30.85% over layers without compaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput. Archit. News 44(3), 1–13 (2016)

    Article  Google Scholar 

  2. Capra, M., Bussolino, B., Marchisio, A., Masera, G., Martina, M., Shafique, M.: Hardware and software optimizations for accelerating deep neural networks: survey of current trends, challenges, and the road ahead. IEEE Access 8, 225134–225180 (2020)

    Article  Google Scholar 

  3. Chang, S.E., et al.: Mix and match: a novel fpga-centric deep neural network quantization framework. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). pp. 208–220. IEEE (2021)

    Google Scholar 

  4. Chen, Y.H., Yang, T.J., Emer, J., Sze, V.: Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Selected Topics Circuits and Syst. 9(2), 292–308 (2019)

    Article  Google Scholar 

  5. Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)

  6. Chou, S., Kjolstad, F., Amarasinghe, S.: Format abstraction for sparse tensor algebra compilers. Proc. ACM on Prog. Lang. 2(OOPSLA), 1–30 (2018)

    Google Scholar 

  7. Han, S., et al.: Deep compression and EIE: Efficient inference engine on compressed deep neural network. In: Hot Chips Symposium. pp. 1–6 (2016)

    Google Scholar 

  8. Han, S., et al.: Eie: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44(3), 243–254 (2016)

    Article  Google Scholar 

  9. Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., Peste, A.: Sparsity in deep learning: pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22(1), 10882–11005 (2021)

    MathSciNet  MATH  Google Scholar 

  10. Ofir, A., Ben-Artzi, G.: Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3067–3075 (2022)

    Google Scholar 

  11. Parashar, A., et al.: Scnn: an accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45(2), 27–40 (2017)

    Article  Google Scholar 

  12. PyTorch: Pruning tutorial. https://pytorch.org/tutorials/intermediate/pruning_tutorial.html, Accessed on 04 July 2023

  13. Qasaimeh, M., Zambreno, J., Jones, P.H.: An efficient hardware architecture for sparse convolution using linear feedback shift registers. In: 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). pp. 250–257. IEEE (2021)

    Google Scholar 

  14. Shafique, M., Marchisio, A., Putra, R.V.W., Hanif, M.A.: Towards energy-efficient and secure edge AI: A cross-layer framework ICCAD special session paper. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). pp. 1–9. IEEE (2021)

    Google Scholar 

  15. Smith, S., Karypis, G.: Tensor-matrix products with a compressed sparse tensor. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. pp. 1–7 (2015)

    Google Scholar 

  16. Stewart, R., Nowlan, A., Bacchus, P., Ducasse, Q., Komendantskaya, E.: Optimising hardware accelerated neural networks with quantisation and a knowledge distillation evolutionary algorithm. Electronics 10(4), 396 (2021)

    Article  Google Scholar 

  17. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105(12), 2295–2329 (2017)

    Article  Google Scholar 

  18. Yuan, Z., et al.: Sticker: A 0.41-62.1 tops/w 8bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In: 2018 IEEE symposium on VLSI circuits. pp. 33–34. IEEE (2018)

    Google Scholar 

  19. Zhang, S., et al.: Cambricon-x: An accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). pp. 1–12. IEEE (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bijin Elsa Baby .

Editor information

Editors and Affiliations

Ethics declarations

Declarations

This work was funded by the MulticoreWare Inc, and IPTIF, IIT Palakkad project No. IPTIF/TD/IP/003.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baby, B.E., Deb, D., Sharma, B., Vijayakumar, K., Das, S. (2023). Energy Efficient DNN Compaction for Edge Deployment. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42921-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42920-0

  • Online ISBN: 978-3-031-42921-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics