Skip to main content

Advertisement

Log in

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Convolutional Neural Networks (CNN) have become one of the best algorithms in machine learning for content classification of digital images. The CNN computational complexity is much larger than traditional algorithms, that is why the use of Graphical Processor Units (GPU) and online servers to achieve operations acceleration is a common solution. However, there is a growing demand for real-time processing solutions in the object recognition field mainly implemented on embedded systems, which are limited both in resources and energy consumption. Recently, reported works are focused on minimizing the required resources through two design strategies. The first one is by implementing one accelerator that can be adapted to the operations of the whole CNN. The CNN architecture proposals with one accelerator for each convolution layer belong to the second design strategy, where higher performance is achieved in multiple image processing. A new design strategy is proposed in this paper, which is based on multiple accelerators using a layer operation chaining scheme for computing in parallel the operations corresponding to multiple CNN layers. Three types of parallel data processing are adopted in the proposed architecture, where the parallelism level for convolution layers is determined by defined cost-function-based algorithms. The proposed design strategy is shown by implementing three naive CNNs on a De2i-150 board, in which a peak acceleration of 18.04x was achieved in contrast with state-of-the-art design methods without layer operation chaining. Furthermore, the design results of one modified Alexnet CNN were obtained. According to the obtained results, the proposed design strategy allows to achieve a smaller processing time than that obtained by reported works using the other two design strategies. In addition, a competitive result in resources utilization is obtained for naive CNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The top-5 error is given by the top 5 correct answer rate obtained by the algorithm in a class probabilities classification.

References

  1. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  2. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1 (2012)

  4. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. European conference on computer vision, pp. 818–833, Springer, (2014)

  5. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, (2015)

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, (2016)

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018

  8. Sze, V., Chen, Y-H., Yang, T-J., Emer, J.: Efficient processing of deep neural networks: A tutorial and survey. arXiv preprint arXiv:1703.09039 (2017)

  9. Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851 (2013)

  10. Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. European Conference on Computer Vision, pp. 301–311, Springer, (2012)

  11. Winograd, S.: Arithmetic Complexity of Computations. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 33. Society for Industrial and Applied Mathematics, Philadelphia (1980)

    Book  Google Scholar 

  12. Lu, L., Liang, Y., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on fpgas. Field-Programmable Custom Computing Machines (FCCM), 2017 IEEE 25th Annual International Symposium on, pp. 101–108, IEEE, (2017)

  13. Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.-s., Cao, Y.: Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 16–25, ACM, (2016)

  14. Zhang, C., Fang, Z., Zhou, P., Pan, P., Cong, J.: Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on, pp. 1–8, IEEE, (2016)

  15. Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S. et al.: Going deeper with embedded fpga platform for convolutional neural network. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35, ACM, (2016)

  16. Chang, J., Sha, J.: An efficient implementation of 2d convolution in cnn. IEICE Electron. Express 14(1), 20161134–20161134 (2017)

    Article  Google Scholar 

  17. Sullivan, G.J.: Efficient scalar quantization of exponential and laplacian random variables. IEEE Trans. Inf. Theor. 42(5), 1365–1374 (1996)

    Article  Google Scholar 

  18. Denil, M., Shakibi, B., Dinh, L., de Freitas, N. et al.: Predicting parameters in deep learning. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2 (2013)

  19. Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)

  20. Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec (2015). http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network

  21. Hu, H., Peng, R., Tai, Y-W., Tang, C-K.: Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)

  22. Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814, (2015)

  23. Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 2074–2082. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6504-learning-structured-sparsity-in-deep-neural-networks.pdf

  24. Chen, W., Wilson, J., Tyree, S., Weinberger, K., Chen, Y.: Compressing neural networks with the hashing trick. International Conference on Machine Learning, pp. 2285–2294, (2015)

  25. Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170, ACM, (2015)

  26. Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Not. 49, 269–284 (2014). (ACM)

    Article  Google Scholar 

  27. Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J.Solid State Circ. 52(1), 127–138 (2017)

    Article  Google Scholar 

  28. Abdelouahab, K., Pelcat, M., Sérot, J., Bourrasset, C., Berry, F.: Tactics to directly map cnn graphs on embedded fpgas. IEEE Embed. Syst. Lett. 9, 113–116 (2017)

    Article  Google Scholar 

  29. Dundar, A., Jin, J., Martini, B., Culurciello, E.: Embedded streaming deep neural networks accelerator with applications. IEEE transactions on neural networks and learning systems, (2017)

  30. Nakahara, H., Sasao, T.: A deep convolutional neural network based on nested residue number system. Field Programmable Logic and Applications (FPL), 2015 25th International Conference on, pp. 1–6, IEEE, (2015)

  31. Du, L., Du, Y., Li, Y., Su, J., Kuan, Y.-C., Liu, C.-C., Chang, M.-C.F.: A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Trans. Circ. Syst. 65(1), 198–208 (2018)

    Google Scholar 

  32. Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., Li, X.: C-brain: a deep learning accelerator that tames the diversity of cnns through adaptive data-level parallelization. Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE, pp. 1–6, IEEE, (2016)

  33. Tu, F., Yin, S., Ouyang, P., Tang, S., Liu, L., Wei, S.: Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Larg. Scale Integr. Syst. 25, 2220–2233 (2017). (VLSI)

    Article  Google Scholar 

  34. Liu, Z., Dou, Y., Jiang, J., Xu, J., Li, S., Zhou, Y., Xu, Y.: Throughput-optimized fpga accelerator for deep convolutional neural networks. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 10(3), 17 (2017)

    Google Scholar 

  35. Li, H., Fan, X., Jiao, L., Cao, W., Zhou, X., Wang, L.: A high performance fpga-based accelerator for large-scale convolutional neural networks. Field Programmable Logic and Applications (FPL), 2016 26th International Conference on, pp. 1–9, IEEE, (2016)

  36. Li, N., Takaki, S., Tomiokay, Y., Kitazawa, H.: A multistage dataflow implementation of a deep convolutional neural network based on fpga for high-speed object recognition. Image Analysis and Interpretation (SSIAI), 2016 IEEE Southwest Symposium on, pp. 165–168, IEEE, (2016)

  37. Lacey, G., Taylor, G.W., Areibi, S.: Deep learning on fpgas: Past, present, and future. CoRR. arxiv:abs/1602.04283 (2016)

  38. Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38, 247–257 (2010). (ACM)

    Article  Google Scholar 

  39. Bottou, L.: Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT’2010, pp. 177–186, Springer, (2010)

  40. Fu, Y., Wu, E., Sirasao, A., Attia, S., Khan, K., Wittig, R.: Deep learning with int8 optimization on xilinx devices. White paper of Xilinx (2016)

  41. Gysel, P.: Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1605.06402 (2016)

  42. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moisés Arredondo-Velázquez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arredondo-Velázquez, M., Diaz-Carmona, J., Torres-Huitzil, C. et al. A streaming architecture for Convolutional Neural Networks based on layer operations chaining. J Real-Time Image Proc 17, 1715–1733 (2020). https://doi.org/10.1007/s11554-019-00938-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-019-00938-y

Keywords

Navigation