UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Sun, Fan; Wang, Chao; Gong, Lei; Zhang, Yiwei; Xu, Chongchong; Lu, Yuntao; Li, Xi; Zhou, Xuehai

doi:10.1007/s10766-017-0522-1

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Published: 27 September 2017

Volume 46, pages 776–787, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Fan Sun¹,
Chao Wang¹,
Lei Gong¹,
Yiwei Zhang¹,
Chongchong Xu¹,
Yuntao Lu¹,
Xi Li¹ &
…
Xuehai Zhou¹

507 Accesses
2 Citations
Explore all metrics

Abstract

Convolutional neural networks (CNNs) have been widely applied for image recognition, face detection, and video analysis because of their ability to achieve accuracy close to or even better than human level perception. However, different features of convolution layers and fully connected layers have brought many challenges to the implementation of CNN on FPGA platforms, because different accelerator units must be designed to process the whole networks. In order to overcome this problem, this work proposes a pipelined accelerator towards uniformed computing for convolutional neural networks. For the convolution layer, the accelerator first repositions the input features into matrix on-the-fly when they are stored to FPGA on-chip buffers, thus the computation of convolution layer can be completed through matrix multiplication. For the fully connected layer, the batch-based method is used to reduce the required memory bandwidth, which also can be completed through matrix multiplication. Then a pipelined computation method for matrix multiplication is proposed to increase the throughput and also reduce the buffer requirement. The experiment results show that the proposed accelerator surpasses CPUs and GPUs platform in terms of energy efficiency. The proposed accelerator can achieve the throughput of 49.31 GFLOPS, which is done using only 198 DSP modules. Compared to the state-of-the-art implementatuion, our accelerator has better hardware utilization efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design of Hardware Accelerator for Facial Recognition System Using Convolutional Neural Networks Based on FPGA

An Anatomization of FPGA-Based Neural Networks

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

References

Caffe, J.Y.: An open source convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia. ACM (2014)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. ACM, pp. 160–167 (2008)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404 (1990)
Li, H., Fan, X., Jiao, L., Cao, W., Zhou, X., Wang, L.: A high performance fpga-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 1–9 (2016)
Moini, S., Alizadeh, B., Emad, M., Ebrahimpour, R.: A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans. Circuits Syst. 64(10), 1217–1221 (2017)
Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P,, Gray, J., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, pp. 13–24 (2014)
Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, pp. 26–35 (2016)
Rahman, A., Lee, J., Choi, K.: Efficient fpga acceleration of convolutional neural networks using logical-3D compute array. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp. 1393–1398 (2016)
Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.s., Cao, Y.: Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, pp. 16–25 (2016)
Sun, F., Wang, C., Gong, L., Xu, C., Zhang, Y., Lu, Y., Li, X., Zhou, X.: A power-efficient accelerator for convolutional neural networks. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE (2017, in press)
Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems, pp. 1988–1996 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Wang, C., Li, X., Chen, P., Zhang, J., Feng, X., Zhou, X.: Regarding processors and reconfigurable ip cores as services. In: 2012 IEEE Ninth International Conference on Services Computing (SCC). IEEE, pp. 668–669 (2012)
Wang, C., Li, X., Zhou, X.: Soda: software defined fpga based accelerators for big data. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp. 884–887 (2015)
Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., Zhou, X.: Dlau: a scalable deep learning accelerator unit on fpga. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(3), 513–517 (2017)
Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, pp. 161–170 (2015)
Zhao, Y., Yu, Q., Zhou, X., Zhou, X., Li, X., Wang, C.: Pie: A pipeline energy-efficient accelerator for inference process in deep neural networks. In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp. 1067–1074 (2016)

Download references

Acknowledgements

This work was supported by the NSFC (No. 61379040), Anhui Provincial NSF (No. 1608085QF12), Suzhou Research Foundation (No. SYG201625), CCF-Venustech Hongyan Research Initiative (No. CCF-VenustechRP1026002), Youth Innovation Promotion Association CAS (No. 2017497), and Fundamental Research Funds for the Central Universities (WK2150110003).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Fan Sun, Chao Wang, Lei Gong, Yiwei Zhang, Chongchong Xu, Yuntao Lu, Xi Li & Xuehai Zhou

Authors

Fan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yiwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chongchong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuntao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuehai Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, F., Wang, C., Gong, L. et al. UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs. Int J Parallel Prog 46, 776–787 (2018). https://doi.org/10.1007/s10766-017-0522-1

Download citation

Received: 31 August 2017
Accepted: 18 September 2017
Published: 27 September 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10766-017-0522-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Abstract

Access this article

Similar content being viewed by others

Design of Hardware Accelerator for Facial Recognition System Using Convolutional Neural Networks Based on FPGA

An Anatomization of FPGA-Based Neural Networks

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

Abstract

Access this article

Similar content being viewed by others

Design of Hardware Accelerator for Facial Recognition System Using Convolutional Neural Networks Based on FPGA

An Anatomization of FPGA-Based Neural Networks

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation