Small object detection in UAV imagery based on channel-spatial fusion cross attention

Li, JianLong; Zheng, Chunhou; Chen, Peng; Zhang, Jun; Wang, Bing

doi:10.1007/s11760-025-03850-0

Small object detection in UAV imagery based on channel-spatial fusion cross attention

Original Paper
Published: 18 February 2025

Volume 19, article number 302, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

JianLong Li¹,
Chunhou Zheng¹,
Peng Chen^1,3,
Jun Zhang¹ &
…
Bing Wang²

275 Accesses
Explore all metrics

Abstract

Object detection in unmanned aerial vehicle (UAV) images has become an important research area in computer vision due to its unique value and challenges. UAV images are characterized by densely distributed small targets, significant changes in target scale, and background noise, which affect the accuracy and reliability of detection. To address these issues, we propose an small target detection network based on Enhanced Scale Sequence Fusion and channel space fusion cross-attention mechanism, called CSFCANet.To tackle the high proportion of small targets and scale variation in UAV images, we employ Enhanced Scale Sequence Fusion, integrating fine-grained information from shallow feature maps and semantic information from deep feature maps. Additionally, we incorporate an tiny target detection head to enhance the network’s ability to extract fine-grained information features for small targets. To address the issue of background noise, we propose a channel space fusion cross-attention mechanism, which first performs attention calculation on local patch block feature maps, and then performs attention calculation global patch blocks. This captures both long-range dependencies and detailed information. The method for calculating attention combines spatial description information and channel description information.Extensive experiments were conducted to validate the effectiveness of the model on the VisDrone benchmark dataset, UAVDT dataset and our self-made UAV power inspection dataset PIDrone. In comparison to the YOLOv8s model, the CSFCANet demonstrated an improvement in mAP of 7% on the PIDrone, 2.4% on the VisDrone, and 3.6% on the UAVDT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SCA-YOLO: a new small object detection model for UAV images

Article 25 May 2023

Fusion of multi-scale attention for aerial images small-target detection model based on PARE-YOLO

Article Open access 08 February 2025

ODD-YOLOv8: an algorithm for small object detection in UAV imagery

Article 23 November 2024

Data Availability

No datasets were generated or analysed during the current study.

References

Kainz, O., Dopiriak, M., Michalko, M., Jakab, F., Nováková, I.: Traffic monitoring from the perspective of an unmanned aerial vehicle. Appl. Sci. 12(16), 7966 (2022)
Google Scholar
Abdelfattah, R., Wang, X., Wang, S.: Ttpla: An aerial-image dataset for detection and segmentation of transmission towers and power lines. In: Proceedings of the Asian conference on computer vision (2020)
Xue, Y., Jin, G., Shen, T., Tan, L., Wang, N., Gao, J., Wang, L.: Smalltrack: Wavelet pooling and graphenhanced classification for UAV small object tracking. IEEE Trans. Geosci. Remote Sens. 61, 1–15 (2023)
Xue, Y., Jin, G., Shen, T., Tan, L., Wang, L.: Template-guided frequency attention and adaptive cross-entropy loss for UAV visual tracking. Chin. J. Aeronautics 36(9), 299–312 (2023)
MATH Google Scholar
Xue, Y., Jin, G., Shen, T., Tan, L., Yang, J., Hou, X.: Mobiletrack: Siamese efficient mobile network for high-speed UAV tracking. IET Image Process. 16(12), 3300–3313 (2022)
Google Scholar
Xue, Y., Jin, G., Shen, T., Tan, L., Wang, N., Gao, J., Wang, L.: Consistent representation mining for multi-drone single object tracking. IEEE Trans. Circuits Syst. Video Technol. 34(11), 10845–10859 (2024)
Xue, Y., Shen, T., Jin, G., Tan, L., Wang, N., Wang, L., Gao, J.: Handling occlusion in UAV visual tracking with query-guided redetection. IEEE Trans. Instrum. Meas. 73, 1–17 (2024)
MATH Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, (2017)
Kang, M., Ting, C.-M., Ting, F.F., Phan, R.C.-W.: Asf-yolo: A novel yolo model with attentional scale sequence fusion for cell instance segmentation. Image Vis. Comput. 147, 105057 (2024)
MATH Google Scholar
Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: 2021 International conference on artificial intelligence in information and communication (ICAIIC), pp. 181–186. IEEE, (2021)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542, (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, (2018)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13713–13722, (2021)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19, (2018)
Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., Peng, T., Zheng, J., Wang, X., Zhang, Y., et al.: Visdrone-det2019: The vision meets drone object detection in image challenge results. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0, (2019)
Yu, H., Li, G., Zhang, W., Huang, Q., Du, D., Tian, Q., Sebe, N.: The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. Int. J. Comput. Vis. 128, 1141–1159 (2020)
Google Scholar
Liang, X., Zhang, J., Zhuo, L., Li, Y., Tian, Q.: Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans. Circuits Syst. Video Technol. 30(6), 1758–1770 (2019)
MATH Google Scholar
Pawar, N., Waghmare, A., Pratap, A., Thorat, A., Ghogale, K.N., Karamtoth, S.N.R., Shaikh, N.F.: Miniscule object detection in aerial images using yolor: a review. In: Proceedings of International conference on communication and computational technologies: ICCCT 2022, pp. 697–708. Springer, (2022)
Hong, S., Kang, S., Cho, D.: Patch-level augmentation for object detection in aerial images. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0, (2019)
Bosquet, B., Cores, D., Seidenari, L., Brea, V.M., Mucientes, M., Del Bimbo, A.: A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 133, 108998 (2023)
Google Scholar
Huang, Y., Chen, J., Huang, D.: Ufpmp-det: Toward accurate and efficient object detection on drone imagery. In: Proceedings of the AAAI conference on artificial intelligence, vol. 36, pp. 1026–1033, (2022)
Amudhan, A., Sudheer, A.: Lightweight and computationally faster hypermetropic convolutional neural network for small size object detection. Image Vis. Comput. 119, 104396 (2022)
MATH Google Scholar
Zhang, Y., Ye, M., Zhu, G., Liu, Y., Guo, P., Yan, J.: Ffca-yolo for small object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 62, 1–15 (2024)
MATH Google Scholar
Hong, M., Li, S., Yang, Y., Zhu, F., Zhao, Q., Lu, L.: Sspnet: Scale selection pyramid network for tiny person detection from UAV images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021)
MATH Google Scholar
Xu, J., Li, Y., Wang, S.: Adazoom: Adaptive zoom network for multi-scale object detection in large scenes. arXiv preprint arXiv:2106.10409 (2021)
Leng, J., Mo, M., Zhou, Y., Gao, C., Li, W., Gao, X.: Pareto refocusing for drone-view object detection. IEEE Trans. Circuits Syst. Video Technol. 33(3), 1320–1334 (2022)
MATH Google Scholar
Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Xian, S., Fu, K.S.: Towards more robust detection for small, cluttered and rotated objects. arxiv 2018. arXiv preprint arXiv:1811.07126
Zhang, M., Zhang, B., Liu, M., Xin, M.: Robust object detection in aerial imagery based on multi-scale detector and soft densely connected. IEEE Access 8, 92791–92801 (2020)
MATH Google Scholar
Nie, J., Pang, Y., Zhao, S., Han, J., Li, X.: Efficient selective context network for accurate object detection. IEEE Trans. Circuits Syst. Video Technol. 31(9), 3456–3468 (2020)
MATH Google Scholar
Quan, Y., Zhang, D., Zhang, L., Tang, J.: Centralized feature pyramid for object detection. IEEE Trans. Image Process. 32, 4341–4354 (2023)
MATH Google Scholar
Singh, B., Davis, L.S.: An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3578–3587, (2018)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7794–7803, (2018)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 3146–3154, (2019)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0, (2019)
Xia, B.N., Gong, Y., Zhang, Y., Poellabauer, C.: Second-order non-local attention networks for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3760–3769, (2019)
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 510–519, (2019)
Liu, Y., Shao, Z., Hoffmann, N.: Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv preprint arXiv:2112.05561 (2021)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transac. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
MATH Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154–6162, (2018)
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., Ding, G.: YOLOv10: Real-time end-to-end object detection https://arxiv.org/abs/2405.14458 (2024)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9759–9768, (2020)
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: Task-aligned one-stage object detection. In: 2021 IEEE/CVF International conference on computer vision (ICCV), pp. 3490–3499. IEEE Computer Society (2021)
Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 33, 21002–21012 (2020)
Google Scholar
Lyu, C., Zhang, W., Huang, H., Zhou, Y., Wang, Y., Liu, Y., Zhang, S., Chen, K.: Rtmdet: An empirical study of designing real-time object detectors. arxiv 2022. arXiv preprint arXiv:2212.07784
Ross, T.-Y., Dollár, G.: Focal loss for dense object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2980–2988, (2017)
Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16965–16974, (2024)
Zhang, Y., Ye, M., Zhu, G., Liu, Y., Guo, P., Yan, J.: FFCA-YOLO for small object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 62, 1–15 (2024)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Nos. 62072002, 62172004, and 62273001), Anhui Province Collaborative Innovation Project (Nos. GXXT-2022-050, GXXT-2022-053), and the Outstanding Research and Innovation Team Project of Anhui Province (2022AH010005), and the Special Fund for Anhui Agriculture Research System (2021-2025).

Author information

Authors and Affiliations

Information Materials and Intelligent Sensing Laboratory of Anhui Province, National Engineering Research Center for Agro-Ecological Big Data Analysis & Application, Institutes of Physical Science and Information Technology & School of Internet, Anhui University, Hefei, 230601, China
JianLong Li, Chunhou Zheng, Peng Chen & Jun Zhang
School of Management Science and Engineering, Anhui University of Finance & Economics, Bengbu, 233030, China
Bing Wang
Anhui Rocvision Intelligent Technology Co., Ltd, Hefei, 230001, China
Peng Chen

Authors

JianLong Li
View author publications
You can also search for this author inPubMed Google Scholar
Chunhou Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Peng Chen
View author publications
You can also search for this author inPubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Bing Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

JLL and PC conceived the study; JLL, CHZ and BW participated in the methodology design; JLL and PC carried it out and drafted the manuscript. All authors revised the manuscript critically. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Peng Chen.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Zheng, C., Chen, P. et al. Small object detection in UAV imagery based on channel-spatial fusion cross attention. SIViP 19, 302 (2025). https://doi.org/10.1007/s11760-025-03850-0

Download citation

Received: 08 October 2024
Revised: 07 December 2024
Accepted: 11 January 2025
Published: 18 February 2025
DOI: https://doi.org/10.1007/s11760-025-03850-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Small object detection in UAV imagery based on channel-spatial fusion cross attention

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SCA-YOLO: a new small object detection model for UAV images

Fusion of multi-scale attention for aerial images small-target detection model based on PARE-YOLO

ODD-YOLOv8: an algorithm for small object detection in UAV imagery

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now