A streaming architecture for Convolutional Neural Networks based on layer operations chaining

Arredondo-Velázquez, Moisés; Diaz-Carmona, Javier; Torres-Huitzil, Cesar; Padilla-Medina, Alfredo; Prado-Olivarez, Juan

doi:10.1007/s11554-019-00938-y

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

Original Research Paper
Published: 04 January 2020

Volume 17, pages 1715–1733, (2020)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Moisés Arredondo-Velázquez ORCID: orcid.org/0000-0003-0198-274X¹,
Javier Diaz-Carmona¹,
Cesar Torres-Huitzil²,
Alfredo Padilla-Medina¹ &
…
Juan Prado-Olivarez¹

723 Accesses
8 Citations
Explore all metrics

Abstract

Convolutional Neural Networks (CNN) have become one of the best algorithms in machine learning for content classification of digital images. The CNN computational complexity is much larger than traditional algorithms, that is why the use of Graphical Processor Units (GPU) and online servers to achieve operations acceleration is a common solution. However, there is a growing demand for real-time processing solutions in the object recognition field mainly implemented on embedded systems, which are limited both in resources and energy consumption. Recently, reported works are focused on minimizing the required resources through two design strategies. The first one is by implementing one accelerator that can be adapted to the operations of the whole CNN. The CNN architecture proposals with one accelerator for each convolution layer belong to the second design strategy, where higher performance is achieved in multiple image processing. A new design strategy is proposed in this paper, which is based on multiple accelerators using a layer operation chaining scheme for computing in parallel the operations corresponding to multiple CNN layers. Three types of parallel data processing are adopted in the proposed architecture, where the parallelism level for convolution layers is determined by defined cost-function-based algorithms. The proposed design strategy is shown by implementing three naive CNNs on a De2i-150 board, in which a peak acceleration of 18.04x was achieved in contrast with state-of-the-art design methods without layer operation chaining. Furthermore, the design results of one modified Alexnet CNN were obtained. According to the obtained results, the proposed design strategy allows to achieve a smaller processing time than that obtained by reported works using the other two design strategies. In addition, a competitive result in resources utilization is obtained for naive CNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Article 27 September 2017

Balancing Convolutional Neural Networks Pipeline in FPGAs

Notes

The top-5 error is given by the top 5 correct answer rate obtained by the algorithm in a class probabilities classification.

References

Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1 (2012)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. European conference on computer vision, pp. 818–833, Springer, (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018
Sze, V., Chen, Y-H., Yang, T-J., Emer, J.: Efficient processing of deep neural networks: A tutorial and survey. arXiv preprint arXiv:1703.09039 (2017)
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851 (2013)
Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. European Conference on Computer Vision, pp. 301–311, Springer, (2012)
Winograd, S.: Arithmetic Complexity of Computations. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 33. Society for Industrial and Applied Mathematics, Philadelphia (1980)
Book Google Scholar
Lu, L., Liang, Y., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on fpgas. Field-Programmable Custom Computing Machines (FCCM), 2017 IEEE 25th Annual International Symposium on, pp. 101–108, IEEE, (2017)
Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.-s., Cao, Y.: Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 16–25, ACM, (2016)
Zhang, C., Fang, Z., Zhou, P., Pan, P., Cong, J.: Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on, pp. 1–8, IEEE, (2016)
Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S. et al.: Going deeper with embedded fpga platform for convolutional neural network. Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35, ACM, (2016)
Chang, J., Sha, J.: An efficient implementation of 2d convolution in cnn. IEICE Electron. Express 14(1), 20161134–20161134 (2017)
Article Google Scholar
Sullivan, G.J.: Efficient scalar quantization of exponential and laplacian random variables. IEEE Trans. Inf. Theor. 42(5), 1365–1374 (1996)
Article Google Scholar
Denil, M., Shakibi, B., Dinh, L., de Freitas, N. et al.: Predicting parameters in deep learning. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2 (2013)
Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec (2015). http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network
Hu, H., Peng, R., Tai, Y-W., Tang, C-K.: Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)
Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814, (2015)
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 2074–2082. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6504-learning-structured-sparsity-in-deep-neural-networks.pdf
Chen, W., Wilson, J., Tyree, S., Weinberger, K., Chen, Y.: Compressing neural networks with the hashing trick. International Conference on Machine Learning, pp. 2285–2294, (2015)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170, ACM, (2015)
Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Not. 49, 269–284 (2014). (ACM)
Article Google Scholar
Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J.Solid State Circ. 52(1), 127–138 (2017)
Article Google Scholar
Abdelouahab, K., Pelcat, M., Sérot, J., Bourrasset, C., Berry, F.: Tactics to directly map cnn graphs on embedded fpgas. IEEE Embed. Syst. Lett. 9, 113–116 (2017)
Article Google Scholar
Dundar, A., Jin, J., Martini, B., Culurciello, E.: Embedded streaming deep neural networks accelerator with applications. IEEE transactions on neural networks and learning systems, (2017)
Nakahara, H., Sasao, T.: A deep convolutional neural network based on nested residue number system. Field Programmable Logic and Applications (FPL), 2015 25th International Conference on, pp. 1–6, IEEE, (2015)
Du, L., Du, Y., Li, Y., Su, J., Kuan, Y.-C., Liu, C.-C., Chang, M.-C.F.: A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Trans. Circ. Syst. 65(1), 198–208 (2018)
Google Scholar
Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., Li, X.: C-brain: a deep learning accelerator that tames the diversity of cnns through adaptive data-level parallelization. Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE, pp. 1–6, IEEE, (2016)
Tu, F., Yin, S., Ouyang, P., Tang, S., Liu, L., Wei, S.: Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Larg. Scale Integr. Syst. 25, 2220–2233 (2017). (VLSI)
Article Google Scholar
Liu, Z., Dou, Y., Jiang, J., Xu, J., Li, S., Zhou, Y., Xu, Y.: Throughput-optimized fpga accelerator for deep convolutional neural networks. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 10(3), 17 (2017)
Google Scholar
Li, H., Fan, X., Jiao, L., Cao, W., Zhou, X., Wang, L.: A high performance fpga-based accelerator for large-scale convolutional neural networks. Field Programmable Logic and Applications (FPL), 2016 26th International Conference on, pp. 1–9, IEEE, (2016)
Li, N., Takaki, S., Tomiokay, Y., Kitazawa, H.: A multistage dataflow implementation of a deep convolutional neural network based on fpga for high-speed object recognition. Image Analysis and Interpretation (SSIAI), 2016 IEEE Southwest Symposium on, pp. 165–168, IEEE, (2016)
Lacey, G., Taylor, G.W., Areibi, S.: Deep learning on fpgas: Past, present, and future. CoRR. arxiv:abs/1602.04283 (2016)
Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38, 247–257 (2010). (ACM)
Article Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT’2010, pp. 177–186, Springer, (2010)
Fu, Y., Wu, E., Sirasao, A., Attia, S., Khan, K., Wittig, R.: Deep learning with int8 optimization on xilinx devices. White paper of Xilinx (2016)
Gysel, P.: Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1605.06402 (2016)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electronics Engineering Department, Technological Institute of Celaya, Av. Tecnológico y G. Cubas, s/n, 38010, Celaya, GTO, Mexico
Moisés Arredondo-Velázquez, Javier Diaz-Carmona, Alfredo Padilla-Medina & Juan Prado-Olivarez
Tecnologico de Monterrey, School of Engineering and Sciences, Campus Puebla, Av. Atlixcayotl 5718, Puebla C.P., 72453, Puebla, Mexico
Cesar Torres-Huitzil

Authors

Moisés Arredondo-Velázquez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Diaz-Carmona
View author publications
You can also search for this author in PubMed Google Scholar
Cesar Torres-Huitzil
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Padilla-Medina
View author publications
You can also search for this author in PubMed Google Scholar
Juan Prado-Olivarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moisés Arredondo-Velázquez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arredondo-Velázquez, M., Diaz-Carmona, J., Torres-Huitzil, C. et al. A streaming architecture for Convolutional Neural Networks based on layer operations chaining. J Real-Time Image Proc 17, 1715–1733 (2020). https://doi.org/10.1007/s11554-019-00938-y

Download citation

Received: 16 May 2019
Accepted: 15 December 2019
Published: 04 January 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11554-019-00938-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

Abstract

Access this article

Similar content being viewed by others

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Balancing Convolutional Neural Networks Pipeline in FPGAs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A streaming architecture for Convolutional Neural Networks based on layer operations chaining

Abstract

Access this article

Similar content being viewed by others

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Balancing Convolutional Neural Networks Pipeline in FPGAs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation