Skip to main content

A Heterogeneous and Reconfigurable Embedded Architecture for Energy-Efficient Execution of Convolutional Neural Networks

  • Conference paper
  • First Online:
Architecture of Computing Systems – ARCS 2019 (ARCS 2019)

Abstract

Machine learning based convolutional neural networks (CNN) are becoming increasingly popular for identification tasks like image classification or speech recognition. However, CNNs have high memory and computational demands which makes it challenging to implement them on cost-efficient and energy-autonomous hardware. To cope with this challenge we present a heterogeneous and reconfigurable embedded architecture implemented on an inexpensive and widely available entry-level system on chip (SoC). Our architecture combines an ARM CPU and a coarse-grained reconfigurable architecture (CGRA) which execute a CNN in parallel to reach a higher energy-efficiency. Our results show up to 130% higher performance and 78% better energy-efficiency compared with an embedded Nvidia GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. LeCun, Y., Buttou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  2. Krizhevsky, A., Sutskever, I., Hinton, G. E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, pp. 1097–1105 (2012)

    Google Scholar 

  3. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  4. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  5. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv:1709.01507 (2017)

  6. ImageNet Large Scale Visual Recognition Challenge 2017 Results (ILSVRC2017). http://image-net.org/challenges/LSVRC/2017/results. Accessed 19 Nov 2018

  7. LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist. Accessed 29 Oct 2018

  8. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093 (2014)

  9. Nvidia cuDNN. https://developer.nvidia.com/cudnn. Accessed 2 Nov 2018

  10. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems. http://tensorflow.org. Accessed 14 Nov 2018

  11. Paszke, A., et al.: Automatic differentiation in PyTorch. In: Proceedings of the NIPS 2017 Workshop Autodiff, Long Beach, CA (2017)

    Google Scholar 

  12. Nvidia Titan RTX. https://www.nvidia.com/en-us/titan/titan-rtx. Accessed 20 Feb 2019

  13. Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014), Salt Lake City, UT, pp. 269–284 (2014). https://doi.org/10.1145/2541940.2541967

  14. Tanomoto, M., Takamaeda-Yamazaki, S., Yao, J., Nakashima, Y.: A CGRA-based approach for accelerating convolutional neural networks. In: Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC 2015), Turin, pp. 73–80 (2015). https://doi.org/10.1109/MCSoC.2015.41

  15. Shi, R., et al.: A locality aware convolutional neural networks accelerator. In: Proceedings of the 2015 Euromicro Conference on Digital System Design, Funchal, pp. 591–598 (2015). https://doi.org/10.1109/DSD.2015.70

  16. Fan, X., Li, H., Cao, W., Wang, L.: DT-CGRA: dual-track coarse-grained reconfigurable architecture for stream applications. In: Proceedings of the 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, pp. 1–9 (2016). https://doi.org/10.1109/FPL.2016.7577309

  17. Jafri, S.M.A.H., Hemani, A., Kolin, P., Abbas, N.: MOCHA: morphable locality and compression aware architecture for convolutional neural networks. In: Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, pp. 276–286 (2007). https://doi.org/10.1109/IPDPS.2017.59

  18. Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 137–138 (2017). https://doi.org/10.1109/JSSC.2016.2616357

    Article  Google Scholar 

  19. Zhao, B., Wang, M., Liu, M.: An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet. IEICE Electron. Express 14(15), 20170595 (2017). https://doi.org/10.1587/elex.14.20170595

  20. Shin, D., Lee, J., Lee, J., Yoo, H. J.: DNPU: an 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In: Proceedings of the in 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 240–241, San Francisco, CA (2017). https://doi.org/10.1109/ISSCC.2017.7870350

  21. Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regular Papers 65(1), 198–208 (2018). https://doi.org/10.1109/TCSI.2017.2735490

    Article  Google Scholar 

  22. Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA 2010), Saint-Malo, pp. 247–257 (2010). https://doi.org/10.1145/1815961.1815993

  23. Zhang, C., Li, P., Sun, G., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2015), Monterey, CA, pp. 161–170 (2015). https://doi.org/10.1145/2684746.2689060

  24. Qiu, J., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2016), Monterey, CA, pp. 26–35 (2016). https://doi.org/10.1145/2847263.2847265

  25. Gokhale, V., Zaidy, A., Chang, A.X.M., Culurciello, E.: Snowflake: an efficient hardware accelerator for convolutional neural networks. In: Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, pp. 1–4 (2017). https://doi.org/10.1109/ISCAS.2017.8050809

  26. Hartenstein, R.: A decade of reconfigurable computing: a visionary retrospective. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition 2001 (DATE 2001), Munich, pp. 642–649 (2001). https://doi.org/10.1109/DATE.2001.915091

  27. Xilinx: Zynq UltraScale+ Device Technical Reference Manual, UG1085 v1.7 (2017)

    Google Scholar 

  28. Oppold, T., Schweizer, T., Oliveira, J.F., Eisenhardt, S., Kuhn, T., Rosenstiel, W.: CRC - concepts and evaluation of processor-like reconfigurable architectures. Inf. Technol. IT 49(3), 157–164 (2007). https://doi.org/10.1524/itit.2007.49.3.157

    Article  Google Scholar 

  29. Lübeck, K., Morgenstern, D., Schweizer, T., Peterson D., Rosenstiel W., Bringmann O.: Neues Konzept zur Steigerung der Zuverlässigkeit einer ARM-basierten Prozessorarchitektur unter Verwendung eines CGRAs. In: 19. Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), Freiburg, pp. 46–58 (2016). https://doi.org/10.6094/UNIFR/10617

  30. Hennessy, J.L., Patterson, D.A.: Computer Architecture, 5th edn. Morgan Kaufmann Publisher Inc., San Francisco (2011)

    MATH  Google Scholar 

  31. Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 45–55 (1998). https://doi.org/10.1109/99.660313

    Article  Google Scholar 

  32. Pico-CNN. https://github.com/ekut-es/pico-cnn. Accessed 27 Feb 2019

  33. CRC Configurator. https://github.com/ekut-es/crc_configurator. Accessed 27 Feb 2019

  34. Jia, Y.: Training LeNet on MNIST with Caffe. http://caffe.berkeleyvision.org/gathered/examples/mnist.html. Accessed 20 Feb 2019

  35. System Management Interface Forum, PMBus Power System Management Protocol Specification Part II - Command Language, Revision 1.2 (2010)

    Google Scholar 

  36. Nvidia, Whitepaper NVIDIA Tegra K1 A New Era in Mobile Computing, V1.0 (2013)

    Google Scholar 

Download references

Acknowledgments

This work has been partially funded by the Stiftung Industrieforschung through the scholarship for master’s theses and the German Federal Ministry of Education and Research (BMBF) under grant number 16ES0876 (GENIAL!).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantin Lübeck .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lübeck, K., Bringmann, O. (2019). A Heterogeneous and Reconfigurable Embedded Architecture for Energy-Efficient Execution of Convolutional Neural Networks. In: Schoeberl, M., Hochberger, C., Uhrig, S., Brehm, J., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2019. ARCS 2019. Lecture Notes in Computer Science(), vol 11479. Springer, Cham. https://doi.org/10.1007/978-3-030-18656-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18656-2_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18655-5

  • Online ISBN: 978-3-030-18656-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics