Skip to main content

Skipping CNN Convolutions Through Efficient Memoization

  • Conference paper
  • First Online:
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11733))

Included in the following conference series:

Abstract

Convolutional Neural Networks (CNNs) have become a de-facto standard for image and video recognition. However, current software and hardware implementations targeting convolutional operations still lack embracing energy budget constraints due to the CNN intensive data processing behavior. This paper proposes a software-based memoization technique to skip entire convolution calculations. We demonstrate that, by grouping output values within proximity-based clusters, it is possible to reduce by hundreds of times the amount of memory necessary to store all the tables. Also, we present a table mapping scheme to index the input set of each convolutional layer to its output value. Our experimental results show that for a YOLOv3-tiny CNN, it is possible to achieve a speedup up to 3.5\(\times \) while reducing the energy consumption to 22% of the baseline with an accuracy loss of 7.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p. 22. IEEE Press (2016)

    Google Scholar 

  2. Choquette, J., Giroux, O., Foley, D.: Volta: performance and programmability. IEEE Micro 38(2), 42–52 (2018)

    Article  Google Scholar 

  3. Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 933–941. JMLR.org (2017)

  4. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint. arXiv:1510.00149 (2015)

  5. Hegde, K., Yu, J., Agrawal, R., Yan, M., Pellauer, M., Fletcher, C.W.: UCNN: exploiting computational reuse in deep neural networks via weight repetition. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 674–687. IEEE Press (2018)

    Google Scholar 

  6. Hoshen, Y., Weiss, R.J., Wilson, K.W.: Speech acoustic modeling from raw multichannel waveforms. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4624–4628. IEEE (2015)

    Google Scholar 

  7. Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint. arXiv:1704.04861 (2017)

  8. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18(1), 6869–6898 (2017)

    MathSciNet  MATH  Google Scholar 

  9. Jiao, X., Akhlaghi, V., Jiang, Y., Gupta, R.K.: Energy-efficient neural networks using approximate computation reuse. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1223–1228. IEEE (2018)

    Google Scholar 

  10. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1–12. IEEE (2017)

    Google Scholar 

  11. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  12. Liu, X., Pool, J., Han, S., Dally, W.J.: Efficient sparse-winograd convolutional neural networks. In: International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  13. Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: Cacti 6.0: a tool to model large caches. HP laboratories, pp. 22–31 (2009)

    Google Scholar 

  14. Razlighi, M.S., Imani, M., Koushanfar, F., Rosing, T.: LookNN: Neural network with no multiplication. In: Proceedings of the Conference on Design, Automation & Test in Europe, pp. 1779–1784. European Design and Automation Association (2017)

    Google Scholar 

  15. Redmon, J.: Darknet: open source neural networks in C (2013–2016). http://pjreddie.com/darknet/

  16. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint. arXiv:1804.02767 (2018)

  17. Riera, M., Arnau, J.M., González, A.: Computation reuse in DNNs by exploiting input similarity. In: Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 57–68. IEEE Press (2018)

    Google Scholar 

  18. Shafiee, A., et al.: ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. News 44(3), 14–26 (2016)

    Article  Google Scholar 

  19. Sodani, A.: Knights landing (KNL): 2nd generation Intel® xeon phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24. IEEE (2015)

    Google Scholar 

  20. Suresh, A., Rohou, E., Seznec, A.: Compile-time function memoization. In: Proceedings of the 26th International Conference on Compiler Construction, pp. 45–54. ACM (2017)

    Google Scholar 

  21. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  22. Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported by CAPES, CNPQ and FAPERGS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Fão de Moura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Moura, R.F., Santos, P.C., de Lima, J.P.C., Alves, M.A.Z., Beck, A.C.S., Carro, L. (2019). Skipping CNN Convolutions Through Efficient Memoization. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27562-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27561-7

  • Online ISBN: 978-3-030-27562-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics