Skip to main content
Log in

Designing lightweight small object detection models using attention and context

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Existing deep learning models have made some progress in improving detection accuracy for small objects, but there remains much work to be done in coordinating the practical factors involved in real object detection: accuracy, running time, model parameters and complexity. In this study, we designed multiple lightweight models and evaluated their detection performance by constructing a lightweight backbone network and embedding different combinations of attention and context modules. The results show that 1) The introduction of an attention mechanism and context information into the backbone network helps to improve the detection accuracy of the model for small objects, but the degree of improvement varies depending on the structure of the model. 2) The introduction of a context fusion network is more helpful for improving the detection performance than the introduction of an attention module. 3) The joint use of an attention mechanism and contextual information requires careful consideration of the model structure. Moreover, the designed models are lightweight. This makes the program very fast when reading and writing the parameter file. The feedforward time of the model is very short.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. Computer Vision and Pattern Recognition. 1–17

  2. Castellano G, Castiello C, Cianciotta M, Mencar C, Vessio G (2020) Multi-view convolutional network for crowd counting in drone-captured images. In: Bartoli A, Fusiello A (eds) Computer vision – ECCV 2020 workshops. ECCV 2020. Lecture Notes in Computer Science, 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_35

  3. Che M, Che M, Chao Z, Cao X (2020) Traffic light recognition for real scenes based on image processing and deep learning. Comput Inf 39(3):439–463

    Google Scholar 

  4. Dollár P, Zitnick CL (2013) Structured forests for fast edge detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1841–1848. Computer Vision Foundation

  5. Everingham M, Winn J (2007) The PASCAL visual object classes challenge 2007 (VOC2007) Development Kit. In pp 1–23. Kaggle

  6. Everingham M, Winn J (2012) The PASCAL visual object classes challenge 2012 (VOC2012) Development Kit. In pp 1–32. Kaggle

  7. Girshick R (2015) Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. Computer Vision Foundation

  8. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. Computer Vision Foundation

  9. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  10. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141. Computer Vision Foundation

  11. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2017) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and !0.5MB model size. Computer Vision and Pattern Recognition. 1–13

  12. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868

    Article  Google Scholar 

  13. Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. Computer Vision and Pattern Recognition. 1–15

  14. Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76:333–354

    Article  Google Scholar 

  15. Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1222–1230. Computer Vision Foundation

  16. Lim J-S, Astrid M, Yoon H-J, Lee S-I (2019) Small object detection using context and attention. In 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp 4321–4330. IEEE

  17. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125. Computer Vision Foundation

  18. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multiBox detector. In European Conference on Computer Vision, 9905, 21–37. Springer

  19. Qin P, Li C, Chen J, Chai R (2018) Research on improved algorithm of object detection based on feature pyramid. Multimed Tools Appl 78(1):913–927

    Article  Google Scholar 

  20. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788. Computer Vision Foundation

  21. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7263–7271. Computer Vision Foundation

  22. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. Computer Vision and Pattern Recognition. 1–6

  23. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  24. Sande KEAVD, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. 2011 international conference on computer vision, Barcelona, Spain, pp. 1879–1886. IEEE

  25. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Computer Vision and Pattern Recognition. 1–14

  26. Tong K, Wu Y, Zhou F (2020) Recent advances in small object detection based on deep learning: a review. Image Vis Comput 97:1–14

    Article  Google Scholar 

  27. Wang X, Girshick R, Gupta A, He K (2017) Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803. Computer Vision Foundation

  28. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3156–3164. Computer Vision Foundation

  29. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19. Computer Vision Foundation

  30. Wu X, Sahoo D, Hoi SCH (2019) Recent advances in deep learning for object detection. Neurocomputing 396(5):39–64

    Google Scholar 

  31. Yan J, Yu Y, Zhu X, Lei Z, Li SZ (2015) Object detection by labeling superpixels. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5107–5116. Computer Vision Foundation

  32. Zhang Y, Bai Y, Ding M, Ghanem B (2020) Multi-task generative adversarial network for detecting small objects in the wild. Int J Comput Vision 128(6):1810–1828

    Article  MathSciNet  Google Scholar 

  33. Zhang Y, Chu J, Leng L, Miao J (2020) Mask-refined R-CNN: a network for refining object details in instance segmentation. Sensors 20(4):1–16

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (No. KF-2021-06-022), Fundamental Research Funds for the Central Universities of China University of Mining and Technology (No. 2021QN1058), Project of Nantong Science and Technology Bureau (No. JC2020174), Project of Taizhou Natural Resources and Planning Bureau (No. JSJWZBDL2020-62), and Nantong Key Laboratory Project (No. CP12016005). We thank Dr. B. Chen for providing the rain grate dataset. We thank Dr. J. Zhang for his technical support in data processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingliang Che.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Che, M., Wu, Z., Wang, X. et al. Designing lightweight small object detection models using attention and context. Multimed Tools Appl 83, 9523–9546 (2024). https://doi.org/10.1007/s11042-023-15847-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15847-3

Keywords

Navigation