Enabling real-time object detection on low cost FPGAs

Jain, Vikram; Jadhav, Ninad; Verhelst, Marian

doi:10.1007/s11554-021-01177-w

Enabling real-time object detection on low cost FPGAs

Original Research Paper
Published: 30 October 2021

Volume 19, pages 217–229, (2022)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

899 Accesses
4 Citations
Explore all metrics

Abstract

Object detection using convolutional neural networks (CNNs) has garnered a lot of interest due to their high performance capability. Yet, the large number of operations and memory fetches to both on-chip and external memory needed for such CNNs result in high latency and power dissipation on resource constrained edge devices, hence impeding their real-time operation from a battery supply. In this paper, a resource and cost efficient hardware accelerator for CNN is implemented on an FPGA. Using an existing metric \(\mathrm{DSP}_\mathrm{efficiency}\) and a new metric \(\mathrm{Cost}_\mathrm{efficiency}\) as the primary optimization variables, exploration of algorithms and hardware using a design space exploration tool, called ZigZag, is undertaken. An optimized architecture is implemented on a Xilinx XC7Z035 FPGA and tiny-YOLOv2 is mapped to demonstrate the real-time object detection application. Compared to the state-of-the-art (SotA), the implementation results shows that the hardware achieves the best \(\mathrm{DSP}_\mathrm{efficiency}\) at 90% and \(\mathrm{Cost}_\mathrm{efficiency}\) at 0.146.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network

Article 21 December 2019

High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

References

Redmon, J., Farhadi, A.: (2016) Yolo9000: Better, faster, stronger. 1612.08242
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: Lecture Notes in Computer Science, pp. 21–37 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. 1506.01497 (2016)
Keserwani, P., Dhankhar, A., Saini, R., Roy, P.P.: Quadbox: quadrilateral bounding box based scene text detection using vector regression. IEEE Access 9, 36802–36818 (2021). https://doi.org/10.1109/ACCESS.2021.3063030
Article Google Scholar
Kumar, G., Keserwani, P., Roy, P.P., Dogra, D.P.: Logo detection using weakly supervised saliency map. Multimedia Tools Appl. 80(3), 4341–4365 (2021)
Article Google Scholar
Yazdanbakhsh, A., Park, J., Sharma, H., Lotfi-Kamran, P., Esmaeilzadeh, H.: Neural acceleration for gpu throughput processors. In: 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 482–493 (2015)
Wang, J., Yuan, Z., Liu, R., Yang, H., Liu, Y.: An n-way group association architecture and sparse data group association load balancing algorithm for sparse cnn accelerators. In: Proceedings of the 24th Asia and South Pacific Design Automation Conference, pp. 329–334. Association for Computing Machinery, New York, NY, USA, ASPDAC ’19 (2019)
Liu, W., Lin, J., Wang, Z.: A precision-scalable energy-efficient convolutional neural network accelerator. IEEE Trans. Circuits Syst. I Reg. Pap. 67(10), 3484–3497 (2020)
Article Google Scholar
Parmar, Y., Sridharan, K.: A resource-efficient multiplierless systolic array architecture for convolutions in deep networks. IEEE Trans. Circuits Syst. II Expr. Briefs 67(2), 370–374 (2020)
Article Google Scholar
Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., Liang, Y.: Req-yolo: A resource-aware, efficient quantization framework for object detection on fpgas. 1909.13396 (2019)
Choi, J., Kong, B.Y., Park, I.C.: Retrain-less weight quantization for multiplier-less convolutional neural networks. IEEE Trans. Circuits Syst. I Reg. Pap. 67(3), 972–982 (2020)
Article Google Scholar
Chen, Z., Chen, Z., Lin, J., Liu, S., Li, W.: Deep neural network acceleration based on low-rank approximated channel pruning. IEEE Trans. Circuits Syst. I Reg. Pap. 67(4), 1232–1244 (2020)
Article Google Scholar
Liang, Y., Lu, L., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on fpgas. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(4), 857–870 (2020)
Article Google Scholar
Zhang, J., Cheng, L., Li, C., Li, Y., He, G., Xu, N., Lian, Y.: A low-latency fpga implementation for real-time object detection. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2021)
Zhang, X., Wang, J., Zhu, C., Lin, Y., Xiong, J., Hwu, Wm., Chen, D.: Dnnbuilder: an automated tool for building high-performance dnn hardware accelerators for fpgas. In: Proceedings of the International Conference on Computer-Aided Design. Association for Computing Machinery, New York, NY, USA, ICCAD ’18 (2018)
Guo, K., Sui, L., Qiu, J., Yu, J., Wang, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(1), 35–47 (2018)
Article Google Scholar
Yu, J., Guo, K., Hu, Y., Ning, X., Qiu, J., Mao, H., Yao, S., Tang, T., Li, B., Wang, Y., Yang, H.: Real-time object detection towards high power efficiency. In: 2018 Design, Automation and Test in Europe Conference Exhibition (DATE), pp. 704–708 (2018)
Preuser, T.B., Gambardella, G., Fraser, N., Blott, M.: Inference of quantized neural networks on heterogeneous all-programmable devices. In: 2018 Design, Automation and Test in Europe Conference and Exhibition (DATE) (2018)
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient fpga implementation of yolo cnn for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)
Article Google Scholar
Zhang, S., Cao, J., Zhang, Q., Zhang, Q., Zhang, Y., Wang, Y.: An fpga-based reconfigurable cnn accelerator for yolo. In: 2020 IEEE 3rd International Conference on Electronics Technology (ICET), pp. 74–78 (2020)
Wai, Y.J., bin, Mohd Yussof. Z., Bin,Salim. S.I., Chuan, L.K.: Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int. J. Adv. Comput. Sci. Appl. 9(10), 506–512 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. 1506.02640 (2016)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. 1804.02767 (2018)
Nguyen, D.T., Hung, N.H., Kim, H., Lee, H.J.: An approximate memory architecture for energy saving in deep learning applications. IEEE Trans. Circuits Syst. I Reg. Pap. 67(5), 1588–1601 (2020)
Article Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. 1603.05279 (2016)
Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless cnns with low-precision weights. 1702.03044 (2017)
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. 1602.02830 (2016)
Benoit, J., Skirmantas, K., Bo, C., Menglong, Z., Matthew, T., Andrew, H., Hartwig, A., Dmitry, K.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. 1712.05877 (2017)
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. 1806.08342 (2018)
Bram-Ernst, V., Nathan, L., Stefan, C., Peter, D., Ioannis, P., Arindam, M., Diederik, V.: Fq-conv: fully quantized convolution for efficient and accurate inference. 1912.09356 (2019)
Mei, L., Houshmand, P., Jain, V., Giraldo, S., Verhelst, M.: Zigzag: enlarging joint architecture-mapping design space exploration for dnn accelerators. IEEE Trans. Comp. 70(8), 1160–1174 (2021)
Article Google Scholar
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.: Efficient processing of deep neural networks: a tutorial and survey. 1703.09039 (2017)

Download references

Acknowledgements

This work has been supported by the FWO SBO project OmniDrone under agreement S003817N, the Flemish Government under the AI Research Program and ISAAC project under the FOD Economie Belgium Energietransitiefonds (oproep II) with Magics Instruments.

Author information

Vikram Jain and Ninad Jadhav contributed equally to this work.

Authors and Affiliations

KU Leven-MICAS, Kasteelpark Arenberg 10, 3001, Heverlee, Belgium
Vikram Jain & Marian Verhelst
MAGICS Instruments, Cipalstraat 3, 2440, Geel, Belgium
Ninad Jadhav

Authors

Vikram Jain
View author publications
You can also search for this author in PubMed Google Scholar
Ninad Jadhav
View author publications
You can also search for this author in PubMed Google Scholar
Marian Verhelst
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vikram Jain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, V., Jadhav, N. & Verhelst, M. Enabling real-time object detection on low cost FPGAs. J Real-Time Image Proc 19, 217–229 (2022). https://doi.org/10.1007/s11554-021-01177-w

Download citation

Received: 10 July 2021
Accepted: 13 October 2021
Published: 30 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11554-021-01177-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enabling real-time object detection on low cost FPGAs

Abstract

Access this article

Similar content being viewed by others

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network

High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enabling real-time object detection on low cost FPGAs

Abstract

Access this article

Similar content being viewed by others

Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms

A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network

High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation