Efficient Approximate DNN Accelerators for Edge Devices: An Experimental Study

Asadikouhanjani, Mohammadreza; Zhang, Hao; Cho, Kyunghwan; Park, Young-Jin; Ko, Seok Bum

doi:10.1007/978-3-030-98347-5_19

Mohammadreza Asadikouhanjani³,
Hao Zhang⁴,
Kyunghwan Cho⁵,
Young-Jin Park⁵ &
…
Seok Bum Ko³

1287 Accesses

Abstract

Equipping edge devices with deep neural networks (DNNs) could result in a revolution in human interactions with surrounding environments as edge devices would be able to perform more complex tasks. However, DNNs are power-hungry, performing billions of computations in terms of one inference. Applying approximate computing techniques reduces the cost of the underlying circuits so that DNN inferences would be performed more efficiently where negligible inference accuracy loss is acceptable.

There are many approximate multipliers proposed for various applications until now. However, the function of only a few of these approximate designs has been explored while performing inference of DNNs. Furthermore, little attention has been given on applying various approximation techniques into different layers of DNNs. In this chapter, first, a detailed stepwise approach for designing a re-configurable approximate Booth multiplier using the common available approximate techniques is presented. Then, it is shown that to get the optimum accuracy out of the available approximation techniques, it is necessary to have a re-configurable multiplier to apply various approximation techniques into different layers of DNNs. Lastly, the function of the proposed multiplier in this chapter is evaluated in a real accelerator and compared to other designs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploiting Pixel Redundancy and Approximate Computing for Efficient Hardware–Software Co-design of CNN on IoT Edge Devices

Low- and Mixed-Precision Inference Accelerators

Comparing Neural Architectures to Find the Best Model Suited for Edge Devices

References

Zhang H, Ko SB. Design of power efficient posit multiplier. IEEE Trans Circuits Syst II Exp Briefs 2020;67 5:861–5.
Article Google Scholar
Chen Y, Luo T, Liu S, Zhang S, He L, Wang J, Li L, Chen T, Xu Z, Sun N. et al. DaDianNao: a machine-learning supercomputer. In: 2014 47th annual IEEE/ACM international symposium on microarchitecture. Piscataway: IEEE; 2014. p. 609–22.
Chapter Google Scholar
Choi Y, Bae D, Sim J, Choi S, Kim M, Kim LS. Energy-efficient design of processing element for convolutional neural network. IEEE Trans Circuits Syst II Exp Briefs 2017;64 11:1332–1336.
Article Google Scholar
Tang Y, Zhang J, Verma N. Scaling up in-memory-computing classifiers via boosted feature subsets in banked architectures. IEEE Trans Circuits Syst II Exp Briefs 2018;66 3:477–81.
Article Google Scholar
Chen KC, Ebrahimi M, Wang TY, Yang YC. NoC-based DNN accelerator: a future design paradigm. In: Proceedings of the 13th IEEE/ACM international symposium on networks-on-chip; 2019. p. 1–8.
Google Scholar
Chen YH, Yang TJ, Emer J, Sze V. Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emerg Sel Top Circuits Syst. 2019;9 2:292–308.
Article Google Scholar
Chen YH, Emer J, Sze V. Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Micro 2017;37 3:12–21.
Article Google Scholar
Cheng C, Parhi KK. Fast 2d convolution algorithms for convolutional neural networks. IEEE Trans Circuits Syst I Regul Pap. 2020;67 5:1678–91.
Article MathSciNet Google Scholar
Lavin A, Gray S. Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 4013–21.
Google Scholar
Chen YH, Krishna T, Emer JS, Sze V. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 2016;52 1:127–38.
Article Google Scholar
Sharify S, Lascorz AD, Mahmoud M, Nikolic M, Siu K, Stuart DM, Poulos Z, Moshovos A. Laconic deep learning inference acceleration. In: 2019 ACM/IEEE 46th annual international symposium on computer architecture (ISCA). Piscataway: IEEE; 2019. p. 304–17.
Google Scholar
Albericio J, Delmás A, Judd P, Sharify S, O’Leary G, Genov R, Moshovos A. Bit-pragmatic deep neural network computing. In: Proceedings of the 50th annual IEEE/ACM international symposium on microarchitecture; 2017. p. 382–94.
Google Scholar
Delmas Lascorz A, Judd P, Stuart DM, Poulos Z, Mahmoud M, Sharify S, Nikolic M, Siu K, MoshovosA. Bit-tactical: a software/hardware approach to exploiting value and bit sparsity in neural networks. In: Proceedings of the twenty-fourth international conference on architectural support for programming languages and operating systems;2019. p. 749–63.
Google Scholar
Asadikouhanjani M, Zhang H, Gopalakrishnan L, Lee HJ, Ko SB. A real-time architecture for pruning the effectual computations in deep neural networks. IEEE Trans Circuits Syst I Regul Pap. 2021;68 5:2030–41.
Article Google Scholar
Kim MS, Barrio AAD, Oliveira LT, Hermida R, Bagherzadeh N. Efficient Mitchell’s approximate log multipliers for convolutional neural networks. IEEE Trans Comput. 2019;68 5:660–75.
Article MathSciNet Google Scholar
Du Z, Lingamneni A, Chen Y, Palem KV, Temam O, Wu C. Leveraging the error resilience of neural networks for designing highly energy efficient accelerators. IEEE Trans Comput Aid Des Integr Circuits Syst. 2015;34 8:1223–35.
Article Google Scholar
Chen CY, Choi J, Brand D, Agrawal A, Zhang W, Gopalakrishnan K. AdaComp: adaptive residual gradient compression for data-parallel distributed training. In: Proceedings of the AAAI conference on artificial intelligence vol. 32, no. 1; 2018.
Google Scholar
Vaverka F, Hrbacek R, Sekanina L. Evolving component library for approximate high level synthesis. In: 2016 IEEE symposium series on computational intelligence (SSCI); 2016. p. 1–8.
Google Scholar
Park G, Kung J, Lee Y. Design and analysis of approximate compressors for balanced error accumulation in MAC operator. IEEE Trans Circuits Syst I Regul Pap. 2021;68 7:2950–61.
Article Google Scholar
Sarwar SS, Venkataramani S, Ankit A, Raghunathan A, Roy K. Energy-efficient neural computing with approximate multipliers. ACM J Emerg Technol Comput Syst. 2018;14 2:1–23.
Article Google Scholar
Venkataramani S, Ranjan A, Roy K, Raghunathan A. AxNN: energy-efficient neuromorphic systems using approximate computing. In: 2014 IEEE/ACM international symposium on low power electronics and design (ISLPED). Piscataway: IEEE; 2014. p. 27–32.
Chapter Google Scholar
Kim MS, Del Barrio Garcia AA, Kim H, Bagherzadeh N. The effects of approximate multiplication on convolutional neural networks. IEEE Trans Emerg Top Comput. 2021:1–1. https://doi.org/10.1109/TETC.2021.3050989.
Chen Y, Xie Y, Song L, Chen F, Tang T. A survey of accelerator architectures for deep neural networks. Engineering 2020;6 3:264–274.
Article Google Scholar
He X, Ke L, Lu W, Yan G, Zhang X. AxTrain: hardware-oriented neural network training for approximate inference. In: Proceedings of the international symposium on low power electronics and design; 2018. p. 1–6.
Google Scholar
He X, Lu W, Yan G, Zhang X. Joint design of training and hardware towards efficient and accuracy-scalable neural network inference. IEEE J Emerg Sel Top Circuits Syst. 2018;8 4:810–21.
Article Google Scholar
Venkatachalam S, Adams E, Lee HJ, Ko SB. Design and analysis of area and power efficient approximate booth multipliers. IEEE Trans Comput 2019;68 11:1697–703.
Article MathSciNet Google Scholar
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. SqueezeNet: alexNet-level accuracy with 50x fewer parameters and< 0.5 mb model size; 2016. arXiv:1602.07360.
Google Scholar
Venkatachalam S, Ko SB. Design of power and area efficient approximate multipliers. IEEE Trans Very Large Scale Integr Syst. 2017;25 5:1782–86.
Article Google Scholar
Esposito D, Strollo AG, Alioto M. Low-power approximate mac unit. In: 2017 13th conference on Ph. D. research in microelectronics and electronics (PRIME). Piscataway: IEEE; 2017. p. 81–4.
Google Scholar
Yang T, Sato T, Ukezono T. A low-power approximate multiply-add unit. In: 2019 2nd international symposium on devices, circuits and systems (ISDCS). Piscataway: IEEE; 2019. p. 1–4.
Google Scholar
Lu Y, Shan W, Xu J. A depthwise separable convolution neural network for small-footprint keyword spotting using approximate mac unit and streaming convolution reuse. In: 2019 IEEE Asia Pacific conference on circuits and systems (APCCAS). Piscataway: IEEE;2019. p. 309–12.
Google Scholar
Kim MS, Del Barrio AA, Hermida R, Bagherzadeh N. Low-power implementation of Mitchell’s approximate logarithmic multiplication for convolutional neural networks. In: 2018 23rd Asia and South Pacific design automation conference (ASP-DAC); 2018. p. 617–22.
Google Scholar
https://image-net.org/challenges/LSVRC/2012/index.php (2012). Accessed 3 Aug 2021.
Ahmadinejad M, Moaiyeri MH. Energy-and quality-efficient approximate multipliers for neural network and image processing applications. IEEE Trans Emerg Top Comput. 2021:1–1. https://doi.org/10.1109/TETC.2021.3072666.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9.
Google Scholar
https://www.mathworks.com/help/deeplearning/ug/deep-learning-speech-recognition.html (2017). Accessed 3 Aug 2021.
http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz (2016). Accessed 3 Aug 2021.

Download references

Acknowledgements

This work was supported by NSERC of Canada, the R&D program of MOTIE/KEIT (No. 10077609, Developing Processor Memory Storage Integrated Architecture for Low Power, High Performance Big Data Servers) and Korea Electrotechnology Research Institute (An Energy-Efficient DNN-Based Environmental Sound Classifier).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, SK, Canada
Mohammadreza Asadikouhanjani & Seok Bum Ko
Department of Electrical and Computer Engineering, Ocean University of China, Qingdao, China
Hao Zhang
Korea Electrotechnology Research Institute (KERI), Uiwang, South Korea
Kyunghwan Cho & Young-Jin Park

Authors

Mohammadreza Asadikouhanjani
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kyunghwan Cho
View author publications
You can also search for this author in PubMed Google Scholar
Young-Jin Park
View author publications
You can also search for this author in PubMed Google Scholar
Seok Bum Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seok Bum Ko .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Weiqiang Liu
Northeastern University, Boston, MA, USA
Fabrizio Lombardi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Asadikouhanjani, M., Zhang, H., Cho, K., Park, YJ., Ko, S.B. (2022). Efficient Approximate DNN Accelerators for Edge Devices: An Experimental Study. In: Liu, W., Lombardi, F. (eds) Approximate Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-98347-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-98347-5_19
Published: 18 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98346-8
Online ISBN: 978-3-030-98347-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Approximate DNN Accelerators for Edge Devices: An Experimental Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Pixel Redundancy and Approximate Computing for Efficient Hardware–Software Co-design of CNN on IoT Edge Devices

Low- and Mixed-Precision Inference Accelerators

Comparing Neural Architectures to Find the Best Model Suited for Edge Devices

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Approximate DNN Accelerators for Edge Devices: An Experimental Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Pixel Redundancy and Approximate Computing for Efficient Hardware–Software Co-design of CNN on IoT Edge Devices

Low- and Mixed-Precision Inference Accelerators

Comparing Neural Architectures to Find the Best Model Suited for Edge Devices

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation