Abstract
Small-size object detection (SOD) is one of the challenging problems in computer vision applications. SOD is highly useful in defense, military, surveillance, medical, industrial and analysis in sports applications. Various algorithms were developed in the past to solve the problem of SOD. However, the algorithms developed are not suitable for real-time applications. In this work, a convolutional neural network architecture based on YOLO is proposed to enhance small objects' detection performance. The proposed network is inspired by the ideas of Residual blocks, Densenet, Feature Pyramidal Network, Cross stage partial connections, and 1 × 1 convolutions. The Receptive field and the reuse of feature maps are the main factors in the design of the architecture and is hence referred to as RFSOD. It is developed as a lightweight network to suit real-time applications and can run smoothly on single-board computers such as Jetson Nano, Tx2, Raspberry Pi and the like. The proposed model is evaluated on various public datasets such as VHR10, BCCD dataset and few small-size objects from the MS COCO dataset. This work is motivated by the need to develop a vision system for a badminton-playing robot. Therefore, the proposed model is also tested on a custom-made shuttlecock dataset. The model's performance is compared with the state-of-the-art deep learning models that are suitable for real-time applications. The hardware implementation of the proposed model was carried out on Jetson Nano, Raspberry Pi4 and a Laptop with an i5 processor. Improved Detection accuracy was observed on small objects. More than 2 × detection speed was obtained on Raspberry Pi, and i5 processor while 30% improvement was observed on Jetson Nano with real-time videos.







Similar content being viewed by others
Availability of data and material
All the data used in this work are available from the corresponding author upon request.
Code availability
The codes used are available from the corresponding author upon request.
References
Lu, D., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28, 823–870 (2007). https://doi.org/10.1080/01431160600746456
Hong, D., Gao, L., Yao, J., Zhang, B.: Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 59, 1–13 (2020)
Parekh, S.H., Thakore, G.D., Jaliya, U.K.: A survey on object detection and tracking. Int. J. Adv. Eng. Res. Dev. 3, 2970–2978 (2016). https://doi.org/10.21090/IJAERD.030144
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-Gonzalez, P., Garcia-Rodriguez, J.: A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 70, 41–65 (2018). https://doi.org/10.1016/j.asoc.2018.05.018
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Garcia-Rodriguez, J.: A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv Prepr. arXiv1704.06857. (2017)
De Brabandere, B., Neven, D., Van Gool, L.: Semantic Instance Segmentation with a Discriminative Loss Function. {arXiv Prepr. arXiv1708.02551. (2017)
Romera-Paredes, B., Hilaire, P., Torr, S.: Recurrent Instance Segmentation. In: European conference on computer vision. pp. 312–329. Springer (2016)
Ciaparrone, G., Luque Sánchez, F., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020). https://doi.org/10.1016/J.NEUCOM.2019.11.023
Reuter, S., Wilking, B., Wiest, J., Munz, M.: Real-time multi-object tracking using random finite sets. IEEE Trans. Aerosp. Electron. Syst. 49, 2666–2678 (2013)
Yang, L., Qin, Y., Zhang, X.: Lightweight densely connected residual network for human pose estimation. J. Real-Time Image Process. 18, 825–837 (2021). https://doi.org/10.1007/s11554-020-01025-3
Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Underst. (2020). https://doi.org/10.1016/j.cviu.2019.102897
Fu, Y., Lei, Y., Wang, T., Curran, W.: Deep learning in medical image registration: a review. Phys. Med. Biol. 65, 20–21 (2020). https://doi.org/10.1088/1361-6560/ab843e
Nandalike, R., Sarojadevi, H.: Multimodal image feature detection with ROI-based optimization for image registration. J. Real-Time Image Process. 17, 1007–1013 (2019). https://doi.org/10.1007/S11554-018-0847-Z
Farfade, S.S., Saberian, M.J., Li, L.-J.: Multi-view Face Detection Using Deep Convolutional Neural Networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. pp. 643–650. ACM, New York, NY, USA (2015)
Lee, J.-G., Jun, S., Cho, Y.-W., Lee, H.: Deep learning in medical imaging: general overview. Korean J. Radiol. 18, 570 (2017)
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). pp. 886–893. IEEE (2005)
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2037–2041 (2006). https://doi.org/10.1109/TPAMI.2006.244
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 404–417 (2006)
Deng, J., Dong, W., Socher, R., Li-Jia, Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. pp. 248–255 (2009)
Krizhevsky, A., Sutskever, I.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Girshick, R., Donahue, J., Darrell, T., Malik, J., Berkeley, U.C., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 580–587 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE international conference on computer vision. pp. 2961–2969 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. Eur. Conf. Comput. Vis. (2015). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. In: IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7263–7271. Institute of Electrical and Electronics Engineers Inc. (2017)
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. arXiv Prepr. arXiv. (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv Prepr. arXiv2004.10934. (2020)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal Loss for Dense Object Detection. In: Proceedings of the IEEE international conference on computer vision. pp. 2980–2988 (2017)
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: Deconvolutional single shot detector. arXiv Prepr. arXiv1701.06659. (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path Aggregation Network for Instance Segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Tan, M., Le, Q. V.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning, ICML 2019. pp. 10691–10700. International Machine Learning Society (IMLS) (2019)
Lee, Y., Park, J.: CenterMask : Real-Time Anchor-Free Instance Segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13906–13915 (2020)
Jocher, G., Stoken, A., Borovec, J., NanoCode012, Chaurasia, A., And, T., And, L.C., And, A. V, And, L., And, T., And, Y., And, A.H., And, L., And, A., And, J.H., And, L.D., And, M., And, Y.K., And, O., And, W., And, Y.D., And, A.L., And, M., And, B.M., And, B.F., And, D.K., And, D.Y., And, D., And, D., Ingham}, F.: ultralytics/yolov5: v5.0-YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations (2021). https://doi.org/10.5281/zenodo.4679653
Hendry, Chen, R.-C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019). https://doi.org/10.1016/j.imavis.2019.04.007
Sun, X., Gu, J., Huang, R.: A modified SSD method for electronic components fast recognition. Optik (Stuttg) (2020). https://doi.org/10.1016/j.ijleo.2019.163767
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
Cao, C., Wang, B., Zhang, W., Zeng, X., Yan, X.: An improved faster R-CNN for small object detection. IEEE Access. 7, 106838–106846 (2019)
Girshick, R.: Fast R-CNN. In: IEEE international conference on computer vision. pp. 1440–1448 (2015)
Pérez-Hernández, F., Tabik, S., Lamas, A., Olmos, R., Fujita, H., Herrera, F.: Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowledge-Based Syst. 194, 105590 (2020). https://doi.org/10.1016/j.knosys.2020.105590
Hendry, R.-C.C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019). https://doi.org/10.1016/j.imavis.2019.04.007
Hsu, G.-S., Chen, J.-C., Chung, Y.-Z.: Application-oriented license plate recognition. IEEE Trans. Veh. Technol. 62, 552–561 (2013). https://doi.org/10.1109/TVT.2012.2226218
Bosquet, B., Mucientes, M., Brea, V.M.: STDnet: exploiting high resolution feature maps for small object detection. Eng. Appl. Artif. Intell. 91, 103615 (2020). https://doi.org/10.1016/j.engappai.2020.103615
Cui, L., Ma, R., Lv, P., Jiang, X., Gao, Z., Zhou, B., Xu, M.: MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects. arXiv. 2–4 (2018)
Li, Y., Dong, H., Li, H., Zhang, X., Zhang, B., Xiao, Z.: Multi-block SSD based on small object detection for UAV railway scene surveillance. Chin. J. Aeronaut. 33, 1747–1755 (2020). https://doi.org/10.1016/j.cja.2020.02.024
Luo, H.-W., Zhang, C.-S., Pan, F.-C., Ju, X.-M.: Contextual-YOLOV3: Implement Better Small Object Detection Based Deep Learning. In: 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). pp. 134–141. IEEE (2019)
Hu, P., Ramanan, D.: Finding Tiny Faces Supplementary Materials. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 951–959 (2017)
Chen, C., Liu, M.-Y., Tuzel, O., Xiao, J.: R-CNN for Small Object Detection. In: n Asian conference on computer vision. pp. 214–230. Springe, Cham (2017)
Du, P., Qu, X., Wei, T., Peng, C., Zhong, X., Chen, C.: Research on Small-size Object Detection in Complex Background. In: 2018 Chinese Automation Congress (CAC). pp. 4216–4220. IEEE (2018)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 1–9. IEEE Computer Society (2015)
Huang, R., Pedoeem, J., Chen, C.: YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 2503–2510. IEEE (2018)
Mao, Q.-C., Sun, H.-M., Liu, Y.-B., Jia, R.-S.: Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access. 7, 133529–133538 (2019)
Yin, Y., Li, H., Fu, W.: Faster-YOLO: An accurate and faster object detection method. Digit. Signal Process. 102, 102756 (2020). https://doi.org/10.1016/j.dsp.2020.102756
Wu, B., Wan, A., Iandola, F., Jin, P.H., Keutzer, K.: SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving. arXiv Prepr. 129–137 (2016)
Fang, W., Wang, L., Ren, P.: Tinier-YOLO: a real-time object detection method for constrained environments. IEEE Access. 8, 1935–1944 (2020). https://doi.org/10.1109/ACCESS.2019.2961959
Nguyen, N., Do, T., Ngo, T.D., Le, D.: An evaluation of deep learning methods for small object detection. J. Electr. Comput. Eng. 2020, 1 (2020)
Liu, Y., Sun, P., Wergeles, N., Shang, Y.: A survey and performance evaluation of deep learning methods for small object detection. Expert Syst. Appl. 172, 114602 (2021). https://doi.org/10.1016/j.eswa.2021.114602
Huang, Z., Wang, J., Fu, X., Yu, T., Guo, Y., Wang, R.: DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. (Ny) 522, 241–258 (2020). https://doi.org/10.1016/j.ins.2020.02.067
Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv Prepr. arXiv2005.03572. (2020)
Acknowledgements
The authors would like to thank the Government of India for the Technical Education Quality Improvement Program (TEQIP III), coordinators of TEQIP III and Dean, Research and consultancy at National Institute of Technology Calicut for providing financial aid to procure 2080ti GPU-workstation and Baumer high-speed cameras.
Author information
Authors and Affiliations
Contributions
AAN: conceptualization, methodology, data acquisition, data curation, interpretation of data, software, visualization, writing—original draft. SRV: Data collection, software. SAP: Formal analysis, writing—review and editing, supervision, funding acquisition. LA: Writing—review and editing, supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Amudhan, A.N., Vrajesh, S.R., Sudheer, A.P. et al. RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects. J Real-Time Image Proc 19, 133–146 (2022). https://doi.org/10.1007/s11554-021-01170-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01170-3