Skip to main content

Advertisement

Log in

A high-throughput scalable BNN accelerator with fully pipelined architecture

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

By replacing multiplication with XNOR operation, Binarized Neural Networks (BNN) are hardware-friendly and extremely suitable for FPGA acceleration. Previous researches highlighted the potential exploitation of BNNs performance. However, most of the present researches targeted at minimizing chip areas. They achieved excellent energy and resource efficiency in small FPGA while the results in larger FPGA were unsatisfying. Thus, we proposed a scalable fully pipelined BNN architecture, which targeted on maximizing throughput and keeping energy and resource efficiency in large FPGA. By exploiting multi-levels parallelism and balancing pipeline stages, it achieved excellent performance. Moreover, we shared on-chip memory and balanced the computation resources to further utilizing the resource. Then a methodology is proposed that explores design space for the optimal configuration. This work is evaluated based on Xilinx UltraScale XCKU115. The results show that the proposed architecture achieves 2.24×–11.24× performance and 2.43×–11.79× resource efficiency improvement compared with other BNN accelerators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  • Alvarez, R., Prabhavalkar, R., Bakhtin, A.: On the efficient representation and execution of deep acoustic models. arXiv preprint Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1607.04683 (2016)

  • Blott, M., Preußer, T.B., Fraser, N.J., Gambardella, G., O’brien, K., Umuroglu, Y., Leeser, M., Vissers, K.: FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Trans. Reconfig. Technol. Syst. (TRETS) 11(3), 1–23 (2018)

    Article  Google Scholar 

  • Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)

  • Courbariaux, M., Bengio, Y., David, J.-P.: Binaryconnect: Training deep neural networks with binary weights during propagations. NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2, 2015

  • Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)

  • Fu, C., Zhu, S., Su, H., Lee, C.-E., Zhao, J.: Towards fast and energy-efficient binarized neural network inference on fpga. arXiv preprint arXiv:1810.02068 (2018)

  • Gong, C., Chen, Y., Lu, Y., Li, T., Hao, C., Chen, D.: VecQ: minimal loss DNN model compression with vectorized weight quantization. IEEE Trans. Comput. (2020). https://doi.org/10.1109/TC.2020.2995593

    Article  Google Scholar 

  • Guo, P., Ma, H., Chen, R., Li, P., Xie, S., Wang, D.: FBNA: A fully binarized neural network accelerator. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL) 2018, pp. 51–513. IEEE

  • Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18(1), 6869–6898 (2017)

    MathSciNet  MATH  Google Scholar 

  • Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  • Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 2014, pp. 1725–1732

  • Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009, 1(4)

  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 60(6), 84–90 (2012). https://doi.org/10.1145/3065386

    Google Scholar 

  • Liang, S., Yin, S., Liu, L., Luk, W., Wei, S.: FP-BNN: Binarized neural network on FPGA. Neurocomputing 275, 1072–1086 (2018)

    Article  Google Scholar 

  • Lin et al.: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 344–352 (2017)

  • Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. Adv. Neural Inf. Process. Syst., 345–353 (2020)

  • Liu, B., Cai, H., Wang, Z., Sun, Y., Shen, Z., Zhu, W., Li, Y., Gong, Y., Ge, W., Yang, J.: A 22nm, 10.8 μW/15.1 μW dual computing modes high power-performance-area efficiency domained background noise aware keyword-spotting processor. In: IEEE Transactions on Circuits and Systems I: Regular Papers (2020)

  • Migacz, S.: 8-bit inference with tensorrt. In: GPU Technology Conference 2017, vol. 4, p. 5

  • Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. (2011). NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.

  • Song, L., Wu, Y., Qian, X., Li, H., Chen, Y.: ReBNN: in-situ acceleration of binarized neural networks in ReRAM using complementary resistive cell. CCF Trans. High Perform. Comput. 1(3), 196–208 (2019)

    Article  Google Scholar 

  • Tang, W., Hua, G., Wang, L.: How to train a compact binary neural network with high accuracy? In: Thirty-First AAAI conference on artificial intelligence 2017

  • Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., Jahre, M., Vissers, K.: Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2017, pp. 65–74

  • Yang, L., He, Z., Fan, D.: A fully onchip binarized convolutional neural network fpga impelmentation with accurate inference. In: Proceedings of the International Symposium on Low Power Electronics and Design 2018, pp. 1–6

  • Zhao, R., Song, W., Zhang, W., Xing, T., Lin, J.-H., Srivastava, M., Gupta, R., Zhang, Z.: Accelerating binarized convolutional neural networks with software-programmable fpgas. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2017, pp. 15–24

Download references

Funding

This work is supported by National Science and Technology Major Project 2018ZX01028101.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingfei Jiang.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

This paper is submitted for possible publication in the Special Issue on Reliability and Power Efficiency for HPC.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Z., Jiang, J., Xu, J. et al. A high-throughput scalable BNN accelerator with fully pipelined architecture. CCF Trans. HPC 3, 17–30 (2021). https://doi.org/10.1007/s42514-020-00059-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00059-0

Keywords

Navigation