Abstract
Object detection using convolutional neural networks (CNNs) has garnered a lot of interest due to their high performance capability. Yet, the large number of operations and memory fetches to both on-chip and external memory needed for such CNNs result in high latency and power dissipation on resource constrained edge devices, hence impeding their real-time operation from a battery supply. In this paper, a resource and cost efficient hardware accelerator for CNN is implemented on an FPGA. Using an existing metric \(\mathrm{DSP}_\mathrm{efficiency}\) and a new metric \(\mathrm{Cost}_\mathrm{efficiency}\) as the primary optimization variables, exploration of algorithms and hardware using a design space exploration tool, called ZigZag, is undertaken. An optimized architecture is implemented on a Xilinx XC7Z035 FPGA and tiny-YOLOv2 is mapped to demonstrate the real-time object detection application. Compared to the state-of-the-art (SotA), the implementation results shows that the hardware achieves the best \(\mathrm{DSP}_\mathrm{efficiency}\) at 90% and \(\mathrm{Cost}_\mathrm{efficiency}\) at 0.146.
Similar content being viewed by others
References
Redmon, J., Farhadi, A.: (2016) Yolo9000: Better, faster, stronger. 1612.08242
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: Lecture Notes in Computer Science, pp. 21–37 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. 1506.01497 (2016)
Keserwani, P., Dhankhar, A., Saini, R., Roy, P.P.: Quadbox: quadrilateral bounding box based scene text detection using vector regression. IEEE Access 9, 36802–36818 (2021). https://doi.org/10.1109/ACCESS.2021.3063030
Kumar, G., Keserwani, P., Roy, P.P., Dogra, D.P.: Logo detection using weakly supervised saliency map. Multimedia Tools Appl. 80(3), 4341–4365 (2021)
Yazdanbakhsh, A., Park, J., Sharma, H., Lotfi-Kamran, P., Esmaeilzadeh, H.: Neural acceleration for gpu throughput processors. In: 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 482–493 (2015)
Wang, J., Yuan, Z., Liu, R., Yang, H., Liu, Y.: An n-way group association architecture and sparse data group association load balancing algorithm for sparse cnn accelerators. In: Proceedings of the 24th Asia and South Pacific Design Automation Conference, pp. 329–334. Association for Computing Machinery, New York, NY, USA, ASPDAC ’19 (2019)
Liu, W., Lin, J., Wang, Z.: A precision-scalable energy-efficient convolutional neural network accelerator. IEEE Trans. Circuits Syst. I Reg. Pap. 67(10), 3484–3497 (2020)
Parmar, Y., Sridharan, K.: A resource-efficient multiplierless systolic array architecture for convolutions in deep networks. IEEE Trans. Circuits Syst. II Expr. Briefs 67(2), 370–374 (2020)
Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., Liang, Y.: Req-yolo: A resource-aware, efficient quantization framework for object detection on fpgas. 1909.13396 (2019)
Choi, J., Kong, B.Y., Park, I.C.: Retrain-less weight quantization for multiplier-less convolutional neural networks. IEEE Trans. Circuits Syst. I Reg. Pap. 67(3), 972–982 (2020)
Chen, Z., Chen, Z., Lin, J., Liu, S., Li, W.: Deep neural network acceleration based on low-rank approximated channel pruning. IEEE Trans. Circuits Syst. I Reg. Pap. 67(4), 1232–1244 (2020)
Liang, Y., Lu, L., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on fpgas. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(4), 857–870 (2020)
Zhang, J., Cheng, L., Li, C., Li, Y., He, G., Xu, N., Lian, Y.: A low-latency fpga implementation for real-time object detection. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2021)
Zhang, X., Wang, J., Zhu, C., Lin, Y., Xiong, J., Hwu, Wm., Chen, D.: Dnnbuilder: an automated tool for building high-performance dnn hardware accelerators for fpgas. In: Proceedings of the International Conference on Computer-Aided Design. Association for Computing Machinery, New York, NY, USA, ICCAD ’18 (2018)
Guo, K., Sui, L., Qiu, J., Yu, J., Wang, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(1), 35–47 (2018)
Yu, J., Guo, K., Hu, Y., Ning, X., Qiu, J., Mao, H., Yao, S., Tang, T., Li, B., Wang, Y., Yang, H.: Real-time object detection towards high power efficiency. In: 2018 Design, Automation and Test in Europe Conference Exhibition (DATE), pp. 704–708 (2018)
Preuser, T.B., Gambardella, G., Fraser, N., Blott, M.: Inference of quantized neural networks on heterogeneous all-programmable devices. In: 2018 Design, Automation and Test in Europe Conference and Exhibition (DATE) (2018)
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient fpga implementation of yolo cnn for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)
Zhang, S., Cao, J., Zhang, Q., Zhang, Q., Zhang, Y., Wang, Y.: An fpga-based reconfigurable cnn accelerator for yolo. In: 2020 IEEE 3rd International Conference on Electronics Technology (ICET), pp. 74–78 (2020)
Wai, Y.J., bin, Mohd Yussof. Z., Bin,Salim. S.I., Chuan, L.K.: Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int. J. Adv. Comput. Sci. Appl. 9(10), 506–512 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. 1506.02640 (2016)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. 1804.02767 (2018)
Nguyen, D.T., Hung, N.H., Kim, H., Lee, H.J.: An approximate memory architecture for energy saving in deep learning applications. IEEE Trans. Circuits Syst. I Reg. Pap. 67(5), 1588–1601 (2020)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. 1603.05279 (2016)
Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless cnns with low-precision weights. 1702.03044 (2017)
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. 1602.02830 (2016)
Benoit, J., Skirmantas, K., Bo, C., Menglong, Z., Matthew, T., Andrew, H., Hartwig, A., Dmitry, K.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. 1712.05877 (2017)
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. 1806.08342 (2018)
Bram-Ernst, V., Nathan, L., Stefan, C., Peter, D., Ioannis, P., Arindam, M., Diederik, V.: Fq-conv: fully quantized convolution for efficient and accurate inference. 1912.09356 (2019)
Mei, L., Houshmand, P., Jain, V., Giraldo, S., Verhelst, M.: Zigzag: enlarging joint architecture-mapping design space exploration for dnn accelerators. IEEE Trans. Comp. 70(8), 1160–1174 (2021)
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.: Efficient processing of deep neural networks: a tutorial and survey. 1703.09039 (2017)
Acknowledgements
This work has been supported by the FWO SBO project OmniDrone under agreement S003817N, the Flemish Government under the AI Research Program and ISAAC project under the FOD Economie Belgium Energietransitiefonds (oproep II) with Magics Instruments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, V., Jadhav, N. & Verhelst, M. Enabling real-time object detection on low cost FPGAs. J Real-Time Image Proc 19, 217–229 (2022). https://doi.org/10.1007/s11554-021-01177-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01177-w