Skip to main content
Log in

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Detecting small objects is a challenging task in computer vision due to the objects only occupying a limited number of pixels and having blurred contours. These factors result in minimal discriminative features being available to effectively model the objects. In this paper, we propose three lightweight plug-and-play modules that can be seamlessly integrated into object detection algorithms, particularly those in the YOLO series, to improve the accuracy of detecting small objects. The Spatially Enhanced Convolutional Block Attention Module (SE-CBAM) is integrated into the feature extraction layer of the network to enhance the feature extraction capability of neural networks. Additionally, a Contextual Information Pooling Enhancement Module (CIE-Pool) is included at the multi-scale feature fusion stage to extract and improve object background information, which enhances the recognition rate of small objects. To improve the detection of small objects, a new layer is added to the detection head, which incorporates the shallow feature map obtained from the feature extraction network after Adaptive Feature Processing (AFP), thereby obtaining more and richer information about small objects. The efficacy of the algorithm has been evaluated on the VisDrone2021 and AI-TOD datasets. The experimental results demonstrate that the method proposed in this paper greatly improves the detection accuracy of small objects while maintaining real-time capabilities. Furthermore, it maintains high accuracy and speed even when dealing with complex background conditions and detecting small objects with high blur.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets employed in this study were obtained from publicly available repositories, accessible via their corresponding references. The code used in this study is available at https://github.com/hzshonny/DetectAerialObjects

References

  1. Jiao, L., Zhang, F., Liu, F., Yang, S., Li, L., Feng, Z., Qu, R.: A survey of deep learning-based object detection. IEEE Access 7, 128837–128868 (2019). https://doi.org/10.1109/ACCESS.2019.2939201

    Article  Google Scholar 

  2. Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: A survey (2023) arXiv:1905.05055 [cs.CV]

  3. Cheng, G., Yuan, X., Yao, X., Yan, K., Zeng, Q., Xie, X., Han, J.: Towards large-scale small object detection: survey and benchmarks. IEEE Trans. Pattern Anal. Mach. Intell. (2023). https://doi.org/10.1109/tpami.2023.3290594

    Article  Google Scholar 

  4. Chen, G., Wang, H., Chen, K., Li, Z., Song, Z., Liu, Y., Chen, W., Knoll, A.: A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst., Man., Cybern.: Syst. 52(2), 936–953 (2022). https://doi.org/10.1109/TSMC.2020.3005231

    Article  Google Scholar 

  5. Cha, Y., Choi, W., Suh, G., Mahmoudkhani, S., Büyüköztürk, O.: Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput. Aided Civ. Infrastruct. Eng. (2018). https://doi.org/10.1111/mice.12334

    Article  Google Scholar 

  6. Arnold, E., Al-Jarrah, O.Y., Dianati, M., Fallah, S., Oxtoby, D., Mouzakitis, A.: A survey on 3d object detection methods for autonomous driving applications. IEEE Trans. Intell. Transp. Syst. 20(10), 3782–3795 (2019). https://doi.org/10.1109/TITS.2019.2892405

    Article  Google Scholar 

  7. Wang, T., Chen, Y., Qiao, M., Snoussi, H.: A fast and robust convolutional neural network-based defect detection model in product quality control. Int. J. Adv. Manuf. Technol. 94, 3465–3471 (2018)

    Article  Google Scholar 

  8. Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3119563

    Article  Google Scholar 

  9. Wang, J., Yang, W., Guo, H., Zhang, R., Xia, G.-S.: Tiny object detection in aerial images. In: 25th International Conference on Pattern Recognition (ICPR), pp. 3791–3798 (2021). https://doi.org/10.1109/ICPR48806.2021.9413340

  10. Uijlings, J.R.R., Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)

    Article  Google Scholar 

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81

  12. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  13. Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169

  14. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  15. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  16. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690

  17. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018) 1804.02767

  18. Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: Optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020) 2004.10934

  19. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106

  20. Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: IEEE International Conference on Computer Vision (ICCV), pp. 8439–8448 (2019). https://doi.org/10.1109/ICCV.2019.00853

  21. Jocher, G.: YOLOv5 by Ultralytics. https://doi.org/10.5281/zenodo.3908559 . https://github.com/ultralytics/yolov5

  22. Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics

  23. Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, L.: Deepproposal: Hunting objects by cascading deep convolutional layers. In: IEEE International Conference on Computer Vision (ICCV), pp. 2578–2586 (2015). https://doi.org/10.1109/ICCV.2015.296

  24. Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018). https://doi.org/10.1109/CVPR.2018.00644

  25. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4203–4212 (2018). https://doi.org/10.1109/CVPR.2018.00442

  26. Etten, A.V.: You Only Look Twice: Rapid multi-scale object detection in satellite imagery. CoRR abs/1805.09512 (2018) 1805.09512

  27. Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-Outside Net: Detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883 (2016). https://doi.org/10.1109/CVPR.2016.314

  28. Yuan, Y., Xiong, Z., Wang, Q.: VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans. Image Process. 28(7), 3423–3434 (2019). https://doi.org/10.1109/TIP.2019.2896952

    Article  MathSciNet  Google Scholar 

  29. Müller, J., Dietmayer, K.: Detecting traffic lights by single shot detection. In: 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 266–273 (2018). https://doi.org/10.1109/ITSC.2018.8569683

  30. Yan, B., Li, J., Yang, Z., Zhang, X., Hao, X.: AIE-YOLO: auxiliary information enhanced YOLO for small object detection. Sensors 22(21), 8221 (2022)

    Article  Google Scholar 

  31. Wang, M., Yang, W., Wang, L., Chen, D., Wei, F., KeZiErBieKe, H., Liao, Y.: FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection. J. Vis. Commun. Image Represent. 90, 103752 (2023)

    Article  Google Scholar 

  32. Hu, J., Shen, L., Sun, G.: squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

  33. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: Efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155

  34. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  35. Yang, R., Li, W., Shang, X., Zhu, D., Man, X.: KPE-YOLOv5: an improved small target detection algorithm based on YOLOv5. Electronics 12(4), 817 (2023)

    Article  Google Scholar 

  36. Zhou, W., Cai, C., Zheng, L., Li, C., Zeng, D.: ASSD-YOLO: a small object detection method based on improved YOLOv7 for airport surface surveillance. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-17628-4

    Article  Google Scholar 

  37. Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 181–186 (2021). https://doi.org/10.1109/ICAIIC51459.2021.9415217

  38. Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Neural Information Processing Systems (NeurIPS), vol. 29 (2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf

  39. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: Keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 6568–6577 (2019). https://doi.org/10.1109/ICCV.2019.00667

  40. Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3500–3509 (2021). https://doi.org/10.1109/ICCV48922.2021.00350

  41. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324

  42. Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  43. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)

Download references

Author information

Authors and Affiliations

Authors

Contributions

Xiao He: validation, data curation, formal analysis, investigation, writing—original draft, writing—review, editing. Xiaolong Zheng: conceptualization, methodology, investigation, formal analysis, visualization, writing—review, editing. Xiyu Hao: data curation, investigation, formal analysis, writing—review, editing. Heng Jin: investigation, formal analysis, writing—review, editing. Xiangming Zhou: investigation, formal analysis, writing—review, editing. Lihuan Shao: conceptualization, investigation, formal analysis, supervision, writing—review, editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiaolong Zheng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, X., Zheng, X., Hao, X. et al. Improving small object detection via context-aware and feature-enhanced plug-and-play modules. J Real-Time Image Proc 21, 44 (2024). https://doi.org/10.1007/s11554-024-01426-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01426-8

Keywords

Navigation