Skipping CNN Convolutions Through Efficient Memoization

de Moura, Rafael Fão; Santos, Paulo C.; de Lima, João Paulo C.; Alves, Marco A. Z.; Beck, Antonio C. S.; Carro, Luigi

doi:10.1007/978-3-030-27562-4_5

Rafael Fão de Moura¹¹,
Paulo C. Santos¹¹,
João Paulo C. de Lima¹¹,
Marco A. Z. Alves¹²,
Antonio C. S. Beck¹¹ &
…
Luigi Carro¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11733))

Included in the following conference series:

International Conference on Embedded Computer Systems

1695 Accesses

Abstract

Convolutional Neural Networks (CNNs) have become a de-facto standard for image and video recognition. However, current software and hardware implementations targeting convolutional operations still lack embracing energy budget constraints due to the CNN intensive data processing behavior. This paper proposes a software-based memoization technique to skip entire convolution calculations. We demonstrate that, by grouping output values within proximity-based clusters, it is possible to reduce by hundreds of times the amount of memory necessary to store all the tables. Also, we present a table mapping scheme to index the input set of each convolutional layer to its output value. Our experimental results show that for a YOLOv3-tiny CNN, it is possible to achieve a speedup up to 3.5$\times $ while reducing the energy consumption to 22% of the baseline with an accuracy loss of 7.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MEPAD: A Memory-Efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks

Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

References

Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p. 22. IEEE Press (2016)
Google Scholar
Choquette, J., Giroux, O., Foley, D.: Volta: performance and programmability. IEEE Micro 38(2), 42–52 (2018)
Article Google Scholar
Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 933–941. JMLR.org (2017)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint. arXiv:1510.00149 (2015)
Hegde, K., Yu, J., Agrawal, R., Yan, M., Pellauer, M., Fletcher, C.W.: UCNN: exploiting computational reuse in deep neural networks via weight repetition. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 674–687. IEEE Press (2018)
Google Scholar
Hoshen, Y., Weiss, R.J., Wilson, K.W.: Speech acoustic modeling from raw multichannel waveforms. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4624–4628. IEEE (2015)
Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint. arXiv:1704.04861 (2017)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18(1), 6869–6898 (2017)
MathSciNet MATH Google Scholar
Jiao, X., Akhlaghi, V., Jiang, Y., Gupta, R.K.: Energy-efficient neural networks using approximate computation reuse. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1223–1228. IEEE (2018)
Google Scholar
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1–12. IEEE (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, X., Pool, J., Han, S., Dally, W.J.: Efficient sparse-winograd convolutional neural networks. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: Cacti 6.0: a tool to model large caches. HP laboratories, pp. 22–31 (2009)
Google Scholar
Razlighi, M.S., Imani, M., Koushanfar, F., Rosing, T.: LookNN: Neural network with no multiplication. In: Proceedings of the Conference on Design, Automation & Test in Europe, pp. 1779–1784. European Design and Automation Association (2017)
Google Scholar
Redmon, J.: Darknet: open source neural networks in C (2013–2016). http://pjreddie.com/darknet/
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint. arXiv:1804.02767 (2018)
Riera, M., Arnau, J.M., González, A.: Computation reuse in DNNs by exploiting input similarity. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 57–68. IEEE Press (2018)
Google Scholar
Shafiee, A., et al.: ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. News 44(3), 14–26 (2016)
Article Google Scholar
Sodani, A.: Knights landing (KNL): 2nd generation Intel® xeon phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24. IEEE (2015)
Google Scholar
Suresh, A., Rohou, E., Seznec, A.: Compile-time function memoization. In: Proceedings of the 26th International Conference on Compiler Construction, pp. 45–54. ACM (2017)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)
Google Scholar

Download references

Acknowledgements

This work was supported by CAPES, CNPQ and FAPERGS.

Author information

Authors and Affiliations

Informatics Institute, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Rafael Fão de Moura, Paulo C. Santos, João Paulo C. de Lima, Antonio C. S. Beck & Luigi Carro
Department of Informatics, Federal University of Paraná, Curitiba, Brazil
Marco A. Z. Alves

Authors

Rafael Fão de Moura
View author publications
You can also search for this author in PubMed Google Scholar
Paulo C. Santos
View author publications
You can also search for this author in PubMed Google Scholar
João Paulo C. de Lima
View author publications
You can also search for this author in PubMed Google Scholar
Marco A. Z. Alves
View author publications
You can also search for this author in PubMed Google Scholar
Antonio C. S. Beck
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Carro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Fão de Moura .

Editor information

Editors and Affiliations

Technical University of Crete and ICS - FORTH, Chania, Greece
Dionisios N. Pnevmatikatos
INSA Rennes, Rennes Cedex 7, France
Maxime Pelcat
Fraunhofer IESE, Kaiserslautern, Germany
Matthias Jung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Moura, R.F., Santos, P.C., de Lima, J.P.C., Alves, M.A.Z., Beck, A.C.S., Carro, L. (2019). Skipping CNN Convolutions Through Efficient Memoization. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-27562-4_5
Published: 08 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27561-7
Online ISBN: 978-3-030-27562-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics