Abstract
By replacing multiplication with XNOR operation, Binarized Neural Networks (BNN) are hardware-friendly and extremely suitable for FPGA acceleration. Previous researches highlighted the potential exploitation of BNNs performance. However, most of the present researches targeted at minimizing chip areas. They achieved excellent energy and resource efficiency in small FPGA while the results in larger FPGA were unsatisfying. Thus, we proposed a scalable fully pipelined BNN architecture, which targeted on maximizing throughput and keeping energy and resource efficiency in large FPGA. By exploiting multi-levels parallelism and balancing pipeline stages, it achieved excellent performance. Moreover, we shared on-chip memory and balanced the computation resources to further utilizing the resource. Then a methodology is proposed that explores design space for the optimal configuration. This work is evaluated based on Xilinx UltraScale XCKU115. The results show that the proposed architecture achieves 2.24×–11.24× performance and 2.43×–11.79× resource efficiency improvement compared with other BNN accelerators.










Similar content being viewed by others
References
Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Alvarez, R., Prabhavalkar, R., Bakhtin, A.: On the efficient representation and execution of deep acoustic models. arXiv preprint Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1607.04683 (2016)
Blott, M., Preußer, T.B., Fraser, N.J., Gambardella, G., O’brien, K., Umuroglu, Y., Leeser, M., Vissers, K.: FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Trans. Reconfig. Technol. Syst. (TRETS) 11(3), 1–23 (2018)
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)
Courbariaux, M., Bengio, Y., David, J.-P.: Binaryconnect: Training deep neural networks with binary weights during propagations. NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2, 2015
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)
Fu, C., Zhu, S., Su, H., Lee, C.-E., Zhao, J.: Towards fast and energy-efficient binarized neural network inference on fpga. arXiv preprint arXiv:1810.02068 (2018)
Gong, C., Chen, Y., Lu, Y., Li, T., Hao, C., Chen, D.: VecQ: minimal loss DNN model compression with vectorized weight quantization. IEEE Trans. Comput. (2020). https://doi.org/10.1109/TC.2020.2995593
Guo, P., Ma, H., Chen, R., Li, P., Xie, S., Wang, D.: FBNA: A fully binarized neural network accelerator. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL) 2018, pp. 51–513. IEEE
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18(1), 6869–6898 (2017)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 2014, pp. 1725–1732
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009, 1(4)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 60(6), 84–90 (2012). https://doi.org/10.1145/3065386
Liang, S., Yin, S., Liu, L., Luk, W., Wei, S.: FP-BNN: Binarized neural network on FPGA. Neurocomputing 275, 1072–1086 (2018)
Lin et al.: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 344–352 (2017)
Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. Adv. Neural Inf. Process. Syst., 345–353 (2020)
Liu, B., Cai, H., Wang, Z., Sun, Y., Shen, Z., Zhu, W., Li, Y., Gong, Y., Ge, W., Yang, J.: A 22nm, 10.8 μW/15.1 μW dual computing modes high power-performance-area efficiency domained background noise aware keyword-spotting processor. In: IEEE Transactions on Circuits and Systems I: Regular Papers (2020)
Migacz, S.: 8-bit inference with tensorrt. In: GPU Technology Conference 2017, vol. 4, p. 5
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. (2011). NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.
Song, L., Wu, Y., Qian, X., Li, H., Chen, Y.: ReBNN: in-situ acceleration of binarized neural networks in ReRAM using complementary resistive cell. CCF Trans. High Perform. Comput. 1(3), 196–208 (2019)
Tang, W., Hua, G., Wang, L.: How to train a compact binary neural network with high accuracy? In: Thirty-First AAAI conference on artificial intelligence 2017
Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., Jahre, M., Vissers, K.: Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2017, pp. 65–74
Yang, L., He, Z., Fan, D.: A fully onchip binarized convolutional neural network fpga impelmentation with accurate inference. In: Proceedings of the International Symposium on Low Power Electronics and Design 2018, pp. 1–6
Zhao, R., Song, W., Zhang, W., Xing, T., Lin, J.-H., Srivastava, M., Gupta, R., Zhang, Z.: Accelerating binarized convolutional neural networks with software-programmable fpgas. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2017, pp. 15–24
Funding
This work is supported by National Science and Technology Major Project 2018ZX01028101.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
This paper is submitted for possible publication in the Special Issue on Reliability and Power Efficiency for HPC.
Rights and permissions
About this article
Cite this article
Han, Z., Jiang, J., Xu, J. et al. A high-throughput scalable BNN accelerator with fully pipelined architecture. CCF Trans. HPC 3, 17–30 (2021). https://doi.org/10.1007/s42514-020-00059-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-020-00059-0