Abstract
Existing deep learning models have made some progress in improving detection accuracy for small objects, but there remains much work to be done in coordinating the practical factors involved in real object detection: accuracy, running time, model parameters and complexity. In this study, we designed multiple lightweight models and evaluated their detection performance by constructing a lightweight backbone network and embedding different combinations of attention and context modules. The results show that 1) The introduction of an attention mechanism and context information into the backbone network helps to improve the detection accuracy of the model for small objects, but the degree of improvement varies depending on the structure of the model. 2) The introduction of a context fusion network is more helpful for improving the detection performance than the introduction of an attention module. 3) The joint use of an attention mechanism and contextual information requires careful consideration of the model structure. Moreover, the designed models are lightweight. This makes the program very fast when reading and writing the parameter file. The feedforward time of the model is very short.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. Computer Vision and Pattern Recognition. 1–17
Castellano G, Castiello C, Cianciotta M, Mencar C, Vessio G (2020) Multi-view convolutional network for crowd counting in drone-captured images. In: Bartoli A, Fusiello A (eds) Computer vision – ECCV 2020 workshops. ECCV 2020. Lecture Notes in Computer Science, 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_35
Che M, Che M, Chao Z, Cao X (2020) Traffic light recognition for real scenes based on image processing and deep learning. Comput Inf 39(3):439–463
Dollár P, Zitnick CL (2013) Structured forests for fast edge detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1841–1848. Computer Vision Foundation
Everingham M, Winn J (2007) The PASCAL visual object classes challenge 2007 (VOC2007) Development Kit. In pp 1–23. Kaggle
Everingham M, Winn J (2012) The PASCAL visual object classes challenge 2012 (VOC2012) Development Kit. In pp 1–32. Kaggle
Girshick R (2015) Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. Computer Vision Foundation
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. Computer Vision Foundation
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141. Computer Vision Foundation
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2017) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and !0.5MB model size. Computer Vision and Pattern Recognition. 1–13
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. Computer Vision and Pattern Recognition. 1–15
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76:333–354
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1222–1230. Computer Vision Foundation
Lim J-S, Astrid M, Yoon H-J, Lee S-I (2019) Small object detection using context and attention. In 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp 4321–4330. IEEE
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125. Computer Vision Foundation
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multiBox detector. In European Conference on Computer Vision, 9905, 21–37. Springer
Qin P, Li C, Chen J, Chai R (2018) Research on improved algorithm of object detection based on feature pyramid. Multimed Tools Appl 78(1):913–927
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788. Computer Vision Foundation
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7263–7271. Computer Vision Foundation
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. Computer Vision and Pattern Recognition. 1–6
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Sande KEAVD, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation as selective search for object recognition. 2011 international conference on computer vision, Barcelona, Spain, pp. 1879–1886. IEEE
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Computer Vision and Pattern Recognition. 1–14
Tong K, Wu Y, Zhou F (2020) Recent advances in small object detection based on deep learning: a review. Image Vis Comput 97:1–14
Wang X, Girshick R, Gupta A, He K (2017) Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803. Computer Vision Foundation
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3156–3164. Computer Vision Foundation
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19. Computer Vision Foundation
Wu X, Sahoo D, Hoi SCH (2019) Recent advances in deep learning for object detection. Neurocomputing 396(5):39–64
Yan J, Yu Y, Zhu X, Lei Z, Li SZ (2015) Object detection by labeling superpixels. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5107–5116. Computer Vision Foundation
Zhang Y, Bai Y, Ding M, Ghanem B (2020) Multi-task generative adversarial network for detecting small objects in the wild. Int J Comput Vision 128(6):1810–1828
Zhang Y, Chu J, Leng L, Miao J (2020) Mask-refined R-CNN: a network for refining object details in instance segmentation. Sensors 20(4):1–16
Acknowledgements
This research is supported by the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (No. KF-2021-06-022), Fundamental Research Funds for the Central Universities of China University of Mining and Technology (No. 2021QN1058), Project of Nantong Science and Technology Bureau (No. JC2020174), Project of Taizhou Natural Resources and Planning Bureau (No. JSJWZBDL2020-62), and Nantong Key Laboratory Project (No. CP12016005). We thank Dr. B. Chen for providing the rain grate dataset. We thank Dr. J. Zhang for his technical support in data processing.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Che, M., Wu, Z., Wang, X. et al. Designing lightweight small object detection models using attention and context. Multimed Tools Appl 83, 9523–9546 (2024). https://doi.org/10.1007/s11042-023-15847-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15847-3