An Efficient Dataflow Mapping Method for Convolutional Neural Networks

Liu, Zhuangzhuang; Gu, Huaxi; Zhang, Bowen; Shi, Canran

doi:10.1007/s11063-021-10670-z

An Efficient Dataflow Mapping Method for Convolutional Neural Networks

Published: 11 November 2021

Volume 54, pages 1075–1090, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Zhuangzhuang Liu¹,
Huaxi Gu ORCID: orcid.org/0000-0002-6409-2229¹,
Bowen Zhang¹ &
…
Canran Shi¹

553 Accesses
Explore all metrics

Abstract

Convolutional neural network (CNN) have been widely used in speech recognition, object detection and image recognition. In the process of inference, data access operations consume more energy than calculations. Therefore, optimizing the dataflow from external storage to the on-chip processing unit is an effective method to reduce the power consumption. The state-of-the-art row stationary dataflow maximizes the data reusing to reduce the number of data movement and thus reducing the power consumption of the system. However, it has the problems of low processing unit utilization and poor scalability. In this letter, we propose an enhance row stationary (ERS) dataflow. ERS dataflow maps the data of each channel of the three-dimensional filter and input feature map (ifmap) into a column of processing units by changing the mapping of data. The processing array operates multiple filters in parallel, which effectively improves the utilization of computing resources. Within each processing unit, one row of filter data and one row of ifmap data are processed at a time, and the filter row data and ifmap row data are reused in multiple convolution operations. In addition, a configurable sliding window model is proposed to solve the problem that the existing dataflow cannot handle the calculation array width smaller than the filter width. Simulation results show that for AlexNet and VGG16, the execution speed of ERS dataflow is improved by about 60% compared with row stationary dataflow, and the hardware resource utilization is improved by about 30%. For the MobileNet V1, the execution speed of ERS dataflow is improved by about 4% compared with the row stationary dataflow.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Article 24 November 2023

Slice Operator for Efficient Convolutional Neural Network Architecture

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Article 02 February 2022

References

Chen Y, Emer J, Sze V (2017) Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Micro 37(3):12–21
Article Google Scholar
Sadr H, Pedram MM, Teshnehlab M (2019) A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks. Neural Process Lett 50(3):2745–2761
Article Google Scholar
Zhang S, Xu X, Pang Y et al (2020) Multi-layer attention based CNN for target-dependent sentiment classification. Neural Process Lett 51(3):2089–2103
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA, pp 1–9
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of CVPR, pp 770–778
Hu J, Shen L, Albanie S et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell IEEE 42(8):2011–2023
Article Google Scholar
Sinha M, Gade SH, Singh W, Deb S (2018). Data-flow aware CNN accelerator with hybrid wireless interconnection. In: 2018 IEEE 29th international conference on application-specific systems, architectures and processors (ASAP), Milan, pp 1–4
Yin S et al (2018) A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE J Solid State Circuits 53(4):968–982
Article Google Scholar
Chakradhar S et al. (2010) A dynamically configurable coprocessor for convolutional neural networks. In: Proceedings of 37th annual international symposium. Computer architecture (ISCA 10), 2010, pp 247–257
Chen T, DianNao et al (2014) A small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of 19th international conference. Architectural support for programming languages and operating systems (ASPLOS14), pp 269–284
Kwon Y, Rhu M (2018) A case for memory-centric HPC system architecture for training deep neural networks. In: IEEE computer architecture letters, vol 17, no 2, pp 134–138, 1
Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) FlexFlow: a flexible data-flow accelerator architecture for convolutional neural networks. In: IEEE international symposium on high performance computer architecture (HPCA). Austin, TX, pp 553–564
Chen Y, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138
Article Google Scholar
Li Y, Ma S, Guo Y, Chen G, Xu R (2018) Dataflow single-channel, for convolutional neural network accelerator. In: IEEE 4th information technology and mechatronics engineering conference (ITOEC). Chong-qing, China, pp 966–970
Yasoubi A, Hojabr R, Modarressi M (2017) Power-efficient accelerator design for neural networks using computation reuse. In: IEEE computer architecture letters, vol.16, no 1, pp 72-75, 1
Gokhale V, Jin J, Dundar A et al (2014) A 240 G-ops/s mobile coprocessor for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshops, pp 682–687
Du Z, Fasthuber R, Chen T et al (2015) ShiDianNao: Shifting vision processing closer to the sensor. In: 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA). Portland, OR, USA, pp 92–104
Zhang C, Li P, Sun G, et al (2015)Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate. ACM/SIGDA, pp 161–170
Kong A, Zhao B (2019) A high efficient architecture for convolution neural network accelerator. In: 2019 2nd international conference on intelligent autonomous systems (ICoIAS), Singapore, Singapore, pp 131–134
Oh M, Lee C, Lee S, Accelerator Convolutional Neural Network, with Reconfigurable Dataflow, et al (2018) International SoC design conference (ISOCC), Dae-gu. Korea (South) 2018, pp 42–43
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Fen G, Ning W, Qi W (2007) Simulation and performance evaluation for network on chip design using OPNET, TENCON 2007–2007 IEEE region 10 conference. Taipei, pp 1–4. https://doi.org/10.1109/TENCON.2007.4428942

Download references

Acknowledgements

This work was supported in part by the National Key R & D Program of China under Grant 2018YFE0202800, the National Natural Science Foundation of China under Grant 61634004 and 61934002, the Natural Science Foundation of Shaanxi Province for Distinguished Young Scholars under Grant no 2020JC-26, and the Fundamental Research Funds for the Central Universities under Grant No. JB190105 and XJS200119, and the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2019A01.

Author information

Authors and Affiliations

The State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, 710071, China
Zhuangzhuang Liu, Huaxi Gu, Bowen Zhang & Canran Shi

Authors

Zhuangzhuang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huaxi Gu
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Canran Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaxi Gu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Gu, H., Zhang, B. et al. An Efficient Dataflow Mapping Method for Convolutional Neural Networks. Neural Process Lett 54, 1075–1090 (2022). https://doi.org/10.1007/s11063-021-10670-z

Download citation

Accepted: 15 October 2021
Published: 11 November 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11063-021-10670-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Dataflow Mapping Method for Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Slice Operator for Efficient Convolutional Neural Network Architecture

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Efficient Dataflow Mapping Method for Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Slice Operator for Efficient Convolutional Neural Network Architecture

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation