Skip to main content

Advertisement

Log in

Enabling real-time object detection on low cost FPGAs

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Object detection using convolutional neural networks (CNNs) has garnered a lot of interest due to their high performance capability. Yet, the large number of operations and memory fetches to both on-chip and external memory needed for such CNNs result in high latency and power dissipation on resource constrained edge devices, hence impeding their real-time operation from a battery supply. In this paper, a resource and cost efficient hardware accelerator for CNN is implemented on an FPGA. Using an existing metric \(\mathrm{DSP}_\mathrm{efficiency}\) and a new metric \(\mathrm{Cost}_\mathrm{efficiency}\) as the primary optimization variables, exploration of algorithms and hardware using a design space exploration tool, called ZigZag, is undertaken. An optimized architecture is implemented on a Xilinx XC7Z035 FPGA and tiny-YOLOv2 is mapped to demonstrate the real-time object detection application. Compared to the state-of-the-art (SotA), the implementation results shows that the hardware achieves the best \(\mathrm{DSP}_\mathrm{efficiency}\) at 90% and \(\mathrm{Cost}_\mathrm{efficiency}\) at 0.146.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Redmon, J., Farhadi, A.: (2016) Yolo9000: Better, faster, stronger. 1612.08242

  2. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: Lecture Notes in Computer Science, pp. 21–37 (2016)

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. 1506.01497 (2016)

  4. Keserwani, P., Dhankhar, A., Saini, R., Roy, P.P.: Quadbox: quadrilateral bounding box based scene text detection using vector regression. IEEE Access 9, 36802–36818 (2021). https://doi.org/10.1109/ACCESS.2021.3063030

    Article  Google Scholar 

  5. Kumar, G., Keserwani, P., Roy, P.P., Dogra, D.P.: Logo detection using weakly supervised saliency map. Multimedia Tools Appl. 80(3), 4341–4365 (2021)

    Article  Google Scholar 

  6. Yazdanbakhsh, A., Park, J., Sharma, H., Lotfi-Kamran, P., Esmaeilzadeh, H.: Neural acceleration for gpu throughput processors. In: 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 482–493 (2015)

  7. Wang, J., Yuan, Z., Liu, R., Yang, H., Liu, Y.: An n-way group association architecture and sparse data group association load balancing algorithm for sparse cnn accelerators. In: Proceedings of the 24th Asia and South Pacific Design Automation Conference, pp. 329–334. Association for Computing Machinery, New York, NY, USA, ASPDAC ’19 (2019)

  8. Liu, W., Lin, J., Wang, Z.: A precision-scalable energy-efficient convolutional neural network accelerator. IEEE Trans. Circuits Syst. I Reg. Pap. 67(10), 3484–3497 (2020)

    Article  Google Scholar 

  9. Parmar, Y., Sridharan, K.: A resource-efficient multiplierless systolic array architecture for convolutions in deep networks. IEEE Trans. Circuits Syst. II Expr. Briefs 67(2), 370–374 (2020)

    Article  Google Scholar 

  10. Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., Liang, Y.: Req-yolo: A resource-aware, efficient quantization framework for object detection on fpgas. 1909.13396 (2019)

  11. Choi, J., Kong, B.Y., Park, I.C.: Retrain-less weight quantization for multiplier-less convolutional neural networks. IEEE Trans. Circuits Syst. I Reg. Pap. 67(3), 972–982 (2020)

    Article  Google Scholar 

  12. Chen, Z., Chen, Z., Lin, J., Liu, S., Li, W.: Deep neural network acceleration based on low-rank approximated channel pruning. IEEE Trans. Circuits Syst. I Reg. Pap. 67(4), 1232–1244 (2020)

    Article  Google Scholar 

  13. Liang, Y., Lu, L., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on fpgas. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(4), 857–870 (2020)

    Article  Google Scholar 

  14. Zhang, J., Cheng, L., Li, C., Li, Y., He, G., Xu, N., Lian, Y.: A low-latency fpga implementation for real-time object detection. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2021)

  15. Zhang, X., Wang, J., Zhu, C., Lin, Y., Xiong, J., Hwu, Wm., Chen, D.: Dnnbuilder: an automated tool for building high-performance dnn hardware accelerators for fpgas. In: Proceedings of the International Conference on Computer-Aided Design. Association for Computing Machinery, New York, NY, USA, ICCAD ’18 (2018)

  16. Guo, K., Sui, L., Qiu, J., Yu, J., Wang, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(1), 35–47 (2018)

    Article  Google Scholar 

  17. Yu, J., Guo, K., Hu, Y., Ning, X., Qiu, J., Mao, H., Yao, S., Tang, T., Li, B., Wang, Y., Yang, H.: Real-time object detection towards high power efficiency. In: 2018 Design, Automation and Test in Europe Conference Exhibition (DATE), pp. 704–708 (2018)

  18. Preuser, T.B., Gambardella, G., Fraser, N., Blott, M.: Inference of quantized neural networks on heterogeneous all-programmable devices. In: 2018 Design, Automation and Test in Europe Conference and Exhibition (DATE) (2018)

  19. Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient fpga implementation of yolo cnn for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)

    Article  Google Scholar 

  20. Zhang, S., Cao, J., Zhang, Q., Zhang, Q., Zhang, Y., Wang, Y.: An fpga-based reconfigurable cnn accelerator for yolo. In: 2020 IEEE 3rd International Conference on Electronics Technology (ICET), pp. 74–78 (2020)

  21. Wai, Y.J., bin, Mohd Yussof. Z., Bin,Salim. S.I., Chuan, L.K.: Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int. J. Adv. Comput. Sci. Appl. 9(10), 506–512 (2018)

  22. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. 1506.02640 (2016)

  23. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. 1804.02767 (2018)

  24. Nguyen, D.T., Hung, N.H., Kim, H., Lee, H.J.: An approximate memory architecture for energy saving in deep learning applications. IEEE Trans. Circuits Syst. I Reg. Pap. 67(5), 1588–1601 (2020)

    Article  Google Scholar 

  25. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. 1603.05279 (2016)

  26. Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless cnns with low-precision weights. 1702.03044 (2017)

  27. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. 1602.02830 (2016)

  28. Benoit, J., Skirmantas, K., Bo, C., Menglong, Z., Matthew, T., Andrew, H., Hartwig, A., Dmitry, K.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. 1712.05877 (2017)

  29. Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. 1806.08342 (2018)

  30. Bram-Ernst, V., Nathan, L., Stefan, C., Peter, D., Ioannis, P., Arindam, M., Diederik, V.: Fq-conv: fully quantized convolution for efficient and accurate inference. 1912.09356 (2019)

  31. Mei, L., Houshmand, P., Jain, V., Giraldo, S., Verhelst, M.: Zigzag: enlarging joint architecture-mapping design space exploration for dnn accelerators. IEEE Trans. Comp. 70(8), 1160–1174 (2021)

    Article  Google Scholar 

  32. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.: Efficient processing of deep neural networks: a tutorial and survey. 1703.09039 (2017)

Download references

Acknowledgements

This work has been supported by the FWO SBO project OmniDrone under agreement S003817N, the Flemish Government under the AI Research Program and ISAAC project under the FOD Economie Belgium Energietransitiefonds (oproep II) with Magics Instruments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vikram Jain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, V., Jadhav, N. & Verhelst, M. Enabling real-time object detection on low cost FPGAs. J Real-Time Image Proc 19, 217–229 (2022). https://doi.org/10.1007/s11554-021-01177-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-021-01177-w

Keywords

Navigation