RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects

Amudhan, A. N.; Vrajesh, Shah Rutvik; Sudheer, A. P.; Lijiya, A.

doi:10.1007/s11554-021-01170-3

RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects

Original Research Paper
Published: 21 September 2021

Volume 19, pages 133–146, (2022)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

A. N. Amudhan¹,
Shah Rutvik Vrajesh²,
A. P. Sudheer ORCID: orcid.org/0000-0003-0644-3702¹ &
…
A. Lijiya²

859 Accesses
6 Citations
Explore all metrics

Abstract

Small-size object detection (SOD) is one of the challenging problems in computer vision applications. SOD is highly useful in defense, military, surveillance, medical, industrial and analysis in sports applications. Various algorithms were developed in the past to solve the problem of SOD. However, the algorithms developed are not suitable for real-time applications. In this work, a convolutional neural network architecture based on YOLO is proposed to enhance small objects' detection performance. The proposed network is inspired by the ideas of Residual blocks, Densenet, Feature Pyramidal Network, Cross stage partial connections, and 1 × 1 convolutions. The Receptive field and the reuse of feature maps are the main factors in the design of the architecture and is hence referred to as RFSOD. It is developed as a lightweight network to suit real-time applications and can run smoothly on single-board computers such as Jetson Nano, Tx2, Raspberry Pi and the like. The proposed model is evaluated on various public datasets such as VHR10, BCCD dataset and few small-size objects from the MS COCO dataset. This work is motivated by the need to develop a vision system for a badminton-playing robot. Therefore, the proposed model is also tested on a custom-made shuttlecock dataset. The model's performance is compared with the state-of-the-art deep learning models that are suitable for real-time applications. The hardware implementation of the proposed model was carried out on Jetson Nano, Raspberry Pi4 and a Laptop with an i5 processor. Improved Detection accuracy was observed on small objects. More than 2 × detection speed was obtained on Raspberry Pi, and i5 processor while 30% improvement was observed on Jetson Nano with real-time videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Deep Models for Real-Time Small Object Detection

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

SWD: Low-Compute Real-Time Object Detection Architecture

Availability of data and material

All the data used in this work are available from the corresponding author upon request.

Code availability

The codes used are available from the corresponding author upon request.

References

Lu, D., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28, 823–870 (2007). https://doi.org/10.1080/01431160600746456
Article Google Scholar
Hong, D., Gao, L., Yao, J., Zhang, B.: Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 59, 1–13 (2020)
Google Scholar
Parekh, S.H., Thakore, G.D., Jaliya, U.K.: A survey on object detection and tracking. Int. J. Adv. Eng. Res. Dev. 3, 2970–2978 (2016). https://doi.org/10.21090/IJAERD.030144
Article Google Scholar
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-Gonzalez, P., Garcia-Rodriguez, J.: A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 70, 41–65 (2018). https://doi.org/10.1016/j.asoc.2018.05.018
Article Google Scholar
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Garcia-Rodriguez, J.: A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv Prepr. arXiv1704.06857. (2017)
De Brabandere, B., Neven, D., Van Gool, L.: Semantic Instance Segmentation with a Discriminative Loss Function. {arXiv Prepr. arXiv1708.02551. (2017)
Romera-Paredes, B., Hilaire, P., Torr, S.: Recurrent Instance Segmentation. In: European conference on computer vision. pp. 312–329. Springer (2016)
Ciaparrone, G., Luque Sánchez, F., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020). https://doi.org/10.1016/J.NEUCOM.2019.11.023
Article Google Scholar
Reuter, S., Wilking, B., Wiest, J., Munz, M.: Real-time multi-object tracking using random finite sets. IEEE Trans. Aerosp. Electron. Syst. 49, 2666–2678 (2013)
Article Google Scholar
Yang, L., Qin, Y., Zhang, X.: Lightweight densely connected residual network for human pose estimation. J. Real-Time Image Process. 18, 825–837 (2021). https://doi.org/10.1007/s11554-020-01025-3
Article Google Scholar
Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Underst. (2020). https://doi.org/10.1016/j.cviu.2019.102897
Article Google Scholar
Fu, Y., Lei, Y., Wang, T., Curran, W.: Deep learning in medical image registration: a review. Phys. Med. Biol. 65, 20–21 (2020). https://doi.org/10.1088/1361-6560/ab843e
Article Google Scholar
Nandalike, R., Sarojadevi, H.: Multimodal image feature detection with ROI-based optimization for image registration. J. Real-Time Image Process. 17, 1007–1013 (2019). https://doi.org/10.1007/S11554-018-0847-Z
Article Google Scholar
Farfade, S.S., Saberian, M.J., Li, L.-J.: Multi-view Face Detection Using Deep Convolutional Neural Networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. pp. 643–650. ACM, New York, NY, USA (2015)
Lee, J.-G., Jun, S., Cho, Y.-W., Lee, H.: Deep learning in medical imaging: general overview. Korean J. Radiol. 18, 570 (2017)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). pp. 886–893. IEEE (2005)
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2037–2041 (2006). https://doi.org/10.1109/TPAMI.2006.244
Article MATH Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 404–417 (2006)
Deng, J., Dong, W., Socher, R., Li-Jia, Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. pp. 248–255 (2009)
Krizhevsky, A., Sutskever, I.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J., Berkeley, U.C., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 580–587 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE international conference on computer vision. pp. 2961–2969 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91 (2015)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. Eur. Conf. Comput. Vis. (2015). https://doi.org/10.1007/978-3-319-46448-0_2
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. In: IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7263–7271. Institute of Electrical and Electronics Engineers Inc. (2017)
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. arXiv Prepr. arXiv. (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv Prepr. arXiv2004.10934. (2020)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal Loss for Dense Object Detection. In: Proceedings of the IEEE international conference on computer vision. pp. 2980–2988 (2017)
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: Deconvolutional single shot detector. arXiv Prepr. arXiv1701.06659. (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path Aggregation Network for Instance Segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Article Google Scholar
Tan, M., Le, Q. V.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning, ICML 2019. pp. 10691–10700. International Machine Learning Society (IMLS) (2019)
Lee, Y., Park, J.: CenterMask : Real-Time Anchor-Free Instance Segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13906–13915 (2020)
Jocher, G., Stoken, A., Borovec, J., NanoCode012, Chaurasia, A., And, T., And, L.C., And, A. V, And, L., And, T., And, Y., And, A.H., And, L., And, A., And, J.H., And, L.D., And, M., And, Y.K., And, O., And, W., And, Y.D., And, A.L., And, M., And, B.M., And, B.F., And, D.K., And, D.Y., And, D., And, D., Ingham}, F.: ultralytics/yolov5: v5.0-YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations (2021). https://doi.org/10.5281/zenodo.4679653
Hendry, Chen, R.-C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019). https://doi.org/10.1016/j.imavis.2019.04.007
Article Google Scholar
Sun, X., Gu, J., Huang, R.: A modified SSD method for electronic components fast recognition. Optik (Stuttg) (2020). https://doi.org/10.1016/j.ijleo.2019.163767
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
Cao, C., Wang, B., Zhang, W., Zeng, X., Yan, X.: An improved faster R-CNN for small object detection. IEEE Access. 7, 106838–106846 (2019)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: IEEE international conference on computer vision. pp. 1440–1448 (2015)
Pérez-Hernández, F., Tabik, S., Lamas, A., Olmos, R., Fujita, H., Herrera, F.: Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowledge-Based Syst. 194, 105590 (2020). https://doi.org/10.1016/j.knosys.2020.105590
Article Google Scholar
Hendry, R.-C.C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019). https://doi.org/10.1016/j.imavis.2019.04.007
Article Google Scholar
Hsu, G.-S., Chen, J.-C., Chung, Y.-Z.: Application-oriented license plate recognition. IEEE Trans. Veh. Technol. 62, 552–561 (2013). https://doi.org/10.1109/TVT.2012.2226218
Article Google Scholar
Bosquet, B., Mucientes, M., Brea, V.M.: STDnet: exploiting high resolution feature maps for small object detection. Eng. Appl. Artif. Intell. 91, 103615 (2020). https://doi.org/10.1016/j.engappai.2020.103615
Article Google Scholar
Cui, L., Ma, R., Lv, P., Jiang, X., Gao, Z., Zhou, B., Xu, M.: MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects. arXiv. 2–4 (2018)
Li, Y., Dong, H., Li, H., Zhang, X., Zhang, B., Xiao, Z.: Multi-block SSD based on small object detection for UAV railway scene surveillance. Chin. J. Aeronaut. 33, 1747–1755 (2020). https://doi.org/10.1016/j.cja.2020.02.024
Article Google Scholar
Luo, H.-W., Zhang, C.-S., Pan, F.-C., Ju, X.-M.: Contextual-YOLOV3: Implement Better Small Object Detection Based Deep Learning. In: 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). pp. 134–141. IEEE (2019)
Hu, P., Ramanan, D.: Finding Tiny Faces Supplementary Materials. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 951–959 (2017)
Chen, C., Liu, M.-Y., Tuzel, O., Xiao, J.: R-CNN for Small Object Detection. In: n Asian conference on computer vision. pp. 214–230. Springe, Cham (2017)
Du, P., Qu, X., Wei, T., Peng, C., Zhong, X., Chen, C.: Research on Small-size Object Detection in Complex Background. In: 2018 Chinese Automation Congress (CAC). pp. 4216–4220. IEEE (2018)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 1–9. IEEE Computer Society (2015)
Huang, R., Pedoeem, J., Chen, C.: YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 2503–2510. IEEE (2018)
Mao, Q.-C., Sun, H.-M., Liu, Y.-B., Jia, R.-S.: Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access. 7, 133529–133538 (2019)
Article Google Scholar
Yin, Y., Li, H., Fu, W.: Faster-YOLO: An accurate and faster object detection method. Digit. Signal Process. 102, 102756 (2020). https://doi.org/10.1016/j.dsp.2020.102756
Article Google Scholar
Wu, B., Wan, A., Iandola, F., Jin, P.H., Keutzer, K.: SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving. arXiv Prepr. 129–137 (2016)
Fang, W., Wang, L., Ren, P.: Tinier-YOLO: a real-time object detection method for constrained environments. IEEE Access. 8, 1935–1944 (2020). https://doi.org/10.1109/ACCESS.2019.2961959
Article Google Scholar
Nguyen, N., Do, T., Ngo, T.D., Le, D.: An evaluation of deep learning methods for small object detection. J. Electr. Comput. Eng. 2020, 1 (2020)
Article Google Scholar
Liu, Y., Sun, P., Wergeles, N., Shang, Y.: A survey and performance evaluation of deep learning methods for small object detection. Expert Syst. Appl. 172, 114602 (2021). https://doi.org/10.1016/j.eswa.2021.114602
Article Google Scholar
Huang, Z., Wang, J., Fu, X., Yu, T., Guo, Y., Wang, R.: DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. (Ny) 522, 241–258 (2020). https://doi.org/10.1016/j.ins.2020.02.067
Article MathSciNet Google Scholar
Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv Prepr. arXiv2005.03572. (2020)

Download references

Acknowledgements

The authors would like to thank the Government of India for the Technical Education Quality Improvement Program (TEQIP III), coordinators of TEQIP III and Dean, Research and consultancy at National Institute of Technology Calicut for providing financial aid to procure 2080ti GPU-workstation and Baumer high-speed cameras.

Author information

Authors and Affiliations

Robotics Lab, Department of Mechanical Engineering, National Institute of Technology Calicut, Kozhikode, Kerala, India
A. N. Amudhan & A. P. Sudheer
Department of Computer Science Engineering, National Institute of Technology Calicut, Kozhikode, Kerala, India
Shah Rutvik Vrajesh & A. Lijiya

Authors

A. N. Amudhan
View author publications
You can also search for this author in PubMed Google Scholar
Shah Rutvik Vrajesh
View author publications
You can also search for this author in PubMed Google Scholar
A. P. Sudheer
View author publications
You can also search for this author in PubMed Google Scholar
A. Lijiya
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AAN: conceptualization, methodology, data acquisition, data curation, interpretation of data, software, visualization, writing—original draft. SRV: Data collection, software. SAP: Formal analysis, writing—review and editing, supervision, funding acquisition. LA: Writing—review and editing, supervision.

Corresponding author

Correspondence to A. P. Sudheer.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amudhan, A.N., Vrajesh, S.R., Sudheer, A.P. et al. RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects. J Real-Time Image Proc 19, 133–146 (2022). https://doi.org/10.1007/s11554-021-01170-3

Download citation

Received: 26 April 2021
Accepted: 31 August 2021
Published: 21 September 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11554-021-01170-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects

Abstract

Access this article

Similar content being viewed by others

Evaluation of Deep Models for Real-Time Small Object Detection

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

SWD: Low-Compute Real-Time Object Detection Architecture

Availability of data and material

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects

Abstract

Access this article

Similar content being viewed by others

Evaluation of Deep Models for Real-Time Small Object Detection

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

SWD: Low-Compute Real-Time Object Detection Architecture

Availability of data and material

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation