Designing lightweight small object detection models using attention and context

Che, Mingliang; Wu, Zhenhua; Wang, Xiaowen; Zhang, Chi; Yang, Fan

doi:10.1007/s11042-023-15847-3

Designing lightweight small object detection models using attention and context

Published: 21 June 2023

Volume 83, pages 9523–9546, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mingliang Che ORCID: orcid.org/0000-0002-6388-1744^1,2,3,
Zhenhua Wu⁴,
Xiaowen Wang⁵,
Chi Zhang^1,3 &
…
Fan Yang^1,2,3

222 Accesses
1 Altmetric
Explore all metrics

Abstract

Existing deep learning models have made some progress in improving detection accuracy for small objects, but there remains much work to be done in coordinating the practical factors involved in real object detection: accuracy, running time, model parameters and complexity. In this study, we designed multiple lightweight models and evaluated their detection performance by constructing a lightweight backbone network and embedding different combinations of attention and context modules. The results show that 1) The introduction of an attention mechanism and context information into the backbone network helps to improve the detection accuracy of the model for small objects, but the degree of improvement varies depending on the structure of the model. 2) The introduction of a context fusion network is more helpful for improving the detection performance than the introduction of an attention module. 3) The joint use of an attention mechanism and contextual information requires careful consideration of the model structure. Moreover, the designed models are lightweight. This makes the program very fast when reading and writing the parameter file. The feedforward time of the model is very short.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved YOLOv5 Network with Attention and Context for Small Object Detection

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

Article 01 March 2024

An attention-based feature pyramid network for single-stage small object detection

Article 18 November 2022

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. Computer Vision and Pattern Recognition. 1–17
Castellano G, Castiello C, Cianciotta M, Mencar C, Vessio G (2020) Multi-view convolutional network for crowd counting in drone-captured images. In: Bartoli A, Fusiello A (eds) Computer vision – ECCV 2020 workshops. ECCV 2020. Lecture Notes in Computer Science, 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_35
Che M, Che M, Chao Z, Cao X (2020) Traffic light recognition for real scenes based on image processing and deep learning. Comput Inf 39(3):439–463
Google Scholar
Dollár P, Zitnick CL (2013) Structured forests for fast edge detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1841–1848. Computer Vision Foundation
Everingham M, Winn J (2007) The PASCAL visual object classes challenge 2007 (VOC2007) Development Kit. In pp 1–23. Kaggle
Everingham M, Winn J (2012) The PASCAL visual object classes challenge 2012 (VOC2012) Development Kit. In pp 1–32. Kaggle
Girshick R (2015) Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. Computer Vision Foundation
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. Computer Vision Foundation
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141. Computer Vision Foundation
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2017) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and !0.5MB model size. Computer Vision and Pattern Recognition. 1–13
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868
Article Google Scholar
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. Computer Vision and Pattern Recognition. 1–15
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76:333–354
Article Google Scholar
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1222–1230. Computer Vision Foundation
Lim J-S, Astrid M, Yoon H-J, Lee S-I (2019) Small object detection using context and attention. In 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp 4321–4330. IEEE
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125. Computer Vision Foundation
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multiBox detector. In European Conference on Computer Vision, 9905, 21–37. Springer
Qin P, Li C, Chen J, Chai R (2018) Research on improved algorithm of object detection based on feature pyramid. Multimed Tools Appl 78(1):913–927
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788. Computer Vision Foundation
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7263–7271. Computer Vision Foundation
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. Computer Vision and Pattern Recognition. 1–6
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Sande KEAVD, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. 2011 international conference on computer vision, Barcelona, Spain, pp. 1879–1886. IEEE
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Computer Vision and Pattern Recognition. 1–14
Tong K, Wu Y, Zhou F (2020) Recent advances in small object detection based on deep learning: a review. Image Vis Comput 97:1–14
Article Google Scholar
Wang X, Girshick R, Gupta A, He K (2017) Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803. Computer Vision Foundation
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3156–3164. Computer Vision Foundation
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19. Computer Vision Foundation
Wu X, Sahoo D, Hoi SCH (2019) Recent advances in deep learning for object detection. Neurocomputing 396(5):39–64
Google Scholar
Yan J, Yu Y, Zhu X, Lei Z, Li SZ (2015) Object detection by labeling superpixels. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5107–5116. Computer Vision Foundation
Zhang Y, Bai Y, Ding M, Ghanem B (2020) Multi-task generative adversarial network for detecting small objects in the wild. Int J Comput Vision 128(6):1810–1828
Article MathSciNet Google Scholar
Zhang Y, Chu J, Leng L, Miao J (2020) Mask-refined R-CNN: a network for refining object details in instance segmentation. Sensors 20(4):1–16
Article Google Scholar

Download references

Acknowledgements

This research is supported by the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (No. KF-2021-06-022), Fundamental Research Funds for the Central Universities of China University of Mining and Technology (No. 2021QN1058), Project of Nantong Science and Technology Bureau (No. JC2020174), Project of Taizhou Natural Resources and Planning Bureau (No. JSJWZBDL2020-62), and Nantong Key Laboratory Project (No. CP12016005). We thank Dr. B. Chen for providing the rain grate dataset. We thank Dr. J. Zhang for his technical support in data processing.

Author information

Authors and Affiliations

College of Geographic Science, Nantong Key Laboratory of Spatial Information Technology R&D and Application, Nantong University, Nantong, 226019, China
Mingliang Che, Chi Zhang & Fan Yang
Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen, 518034, China
Mingliang Che & Fan Yang
Jiangsu Yangtze River Economic Belt Research Institute, Nantong, 226019, China
Mingliang Che, Chi Zhang & Fan Yang
School of Economics and Management, China University of Mining and Technology, Xuzhou, 221116, China
Zhenhua Wu
Nantong Urban Planning and Design Institute Co. Ltd., Nantong, 226004, China
Xiaowen Wang

Authors

Mingliang Che
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingliang Che.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Che, M., Wu, Z., Wang, X. et al. Designing lightweight small object detection models using attention and context. Multimed Tools Appl 83, 9523–9546 (2024). https://doi.org/10.1007/s11042-023-15847-3

Download citation

Received: 11 October 2021
Revised: 05 May 2023
Accepted: 15 May 2023
Published: 21 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15847-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing lightweight small object detection models using attention and context

Abstract

Access this article

Similar content being viewed by others

Improved YOLOv5 Network with Attention and Context for Small Object Detection

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

An attention-based feature pyramid network for single-stage small object detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Designing lightweight small object detection models using attention and context

Abstract

Access this article

Similar content being viewed by others

Improved YOLOv5 Network with Attention and Context for Small Object Detection

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

An attention-based feature pyramid network for single-stage small object detection

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation