Abstract
Remote sensing images (RSIs) have become integral to a multitude of sectors, including military operations, urban traffic planning, and natural resource management, making the detection of targets within RSIs a critical research endeavor. Although RSIs object detection holds significant application value across various domains, its research still confronts numerous challenges. These include the interference of complex backgrounds in RSIs, the high resolution of RSIs that complicates deployment on computationally constrained satellite platforms, and processing large amounts of complex data leads to inefficiencies. To surmount these obstacles, we introduce a novel Task-Balanced Algorithm for Object Detection in Remote Sensing Images Based on Improved YOLO—MTGS-Yolo. The algorithm commences with the construction of a Multi-Transformer model, designed to address dense prediction problems. By significantly augmenting the network's capacity to capture both local and global contextual information, it minimizes information loss and enhances the network’s adaptability to more intricate background scenarios. Furthermore, we have incorporated a Generalized Efficient Aggregation Network (GELAN) structure, which transcends traditional architectural and device limitations. This innovation is geared toward adapting to the spectrum of feature learning, from complex to lightweight, thereby achieving a model that is both lightweight and computationally efficient. This advancement not only reduces computational costs but also significantly improves model efficiency. In response to the issue of low feature resolution for small objects in RSIs, which often leads to background confusion, we have proposed a Spatial Context-Aware Module (SCAM). This module leverages spatial contextual information to delineate cross-spatial relationships between pixels, effectively suppressing irrelevant background elements and enhancing the distinguishability between targets and their surroundings. Experimental results on the public DIOR dataset demonstrate that MTGS-Yolo surpasses the baseline network in terms of detection performance and robustness. Additionally, transfer learning experiments conducted on the NWPU VHR-10 dataset reveal that MTGS-Yolo outperforms other classic and improved algorithms in terms of detection performance and exhibits superior generalization capabilities.











Similar content being viewed by others
Data availability
The download addresses of the datasets used or analyzed during the current study: DIOR dataset: http://www.escience.cn/people/gongcheng/DIOR.html. NWPU VHR-10 dataset: https://hyper.ai/datasets/5422
References
Li Z, Wang Y, Zhang N et al (2022) Deep learning-based object detection techniques for remote sensing images: a survey[J]. Remote Sens 14(10):2385
Johansen K, Roelfsema C, Phinn S (2008) High spatial resolution remote sensing for environmental monitoring and management preface[J]. J Spat Sci 53(1):43–47
Bharatkar PS, Patel R (2013) Evaluation of rsi classification methods for effective land use mapping[C]. In: 2013 International Conference on Communication Systems and Network Technologies. IEEE, p 109–113
Persia L, Usami DS, De Simone F et al (2016) Management of road infrastructure safety[J]. Transp Res Procedia 14:3436–3445
Shi L, Kodagoda S, Dissanayake G (2010) Multi-class classification for semantic labeling of places[C]. In: 2010 11th International Conference on Control Automation Robotics & Vision. IEEE, p 2307–2312
Cook WD, Liang L, Zhu J (2010) Measuring performance of two-stage network structures by DEA: a review and future perspective[J]. Omega 38(6):423–430
Girshick R (2015) Fast r-cnn[C]. In: Proceedings of the IEEE international conference on computer vision. p 1440–1448
Zhang Y, Li X, Wang F, et al (2021) A comprehensive review of one-stage networks for object detection[C]. In: 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). IEEE, p 1–6
Xue J, Zheng Y, Dong-Ye C et al (2022) Improved YOLOv5 network method for remote sensing image-based ground objects recognition[J]. Soft Comput 26(20):10879–10889
Feng X, Han J, Yao X et al (2020) TCANet: triple context-aware network for weakly supervised object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 59(8):6946–6955
Gao T, Liu Z, Zhang J et al (2023) A task-balanced multi-scale adaptive fusion network for object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 61:1–15
Thiele ST, Lorenz S, Kirsch M et al (2021) Multi-scale, multi-sensor data integration for automated 3-D geological mapping[J]. Ore Geol Rev 136:104252
Zhang Y, Ye M, Zhu G et al (2024) FFCA-YOLO for small object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 62:1–15
Ogura K, Yamada Y, Kajita S et al (2019) Ground object recognition and segmentation from aerial image-based 3D point cloud[J]. Comput Intell 35(3):625–642
Decision making in complex environments[M] (2007) Ashgate Publishing, Ltd.
Baqué P, Fleuret F, Fua P (2017) Deep occlusion reasoning for multi-camera multi-target detection[C]. In: Proceedings of the IEEE International Conference on Computer Vision, p 271–279.
Wu Q, Feng D, Cao C, Zeng X, Feng Z, Wu J, Huang Z (2021) Improved mask R-CNN for aircraft detection in remote sensing images. Sensors 21:2618
Li Q, Chen Y, Zeng Y (2022) Transformer with transfer CNN for remote-sensing-image object detection[J]. Remote Sens 14(4):984
Li G, Liu Z, Zeng D et al (2022) Adjacent context coordination network for salient object detection in optical remote sensing images[J]. IEEE Trans Cybern 53(1):526–538
Wang W, Shi Y, Zhang J, Hu L, Li S, He D, Liu F (2023) Traditional village building extraction based on improved mask R-CNN: a case study of Beijing. China Remote Sens 15:2616
Niu R, Zhi X, Jiang S et al (2023) Aircraft target detection in low signal-to-noise ratio visible remote sensing images[J]. Remote Sens 15(8):1971
Li Z, Yuan J, Li G et al (2023) RSI-YOLO: object detection method for remote sensing images based on improved YOLO[J]. Sensors 23(14):6414
Wang L, Shoulin Y, Alyami H, et al (2022) A novel deep learning‐based single shot multibox detector model for object detection in optical remote sensing images[J]
Chefer H, Gur S, Wolf L (2021) Transformer interpretability beyond attention visualization[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 782–791
Pally RJ, Samadi S (2022) Application of image processing and convolutional neural networks for flood image classification and semantic segmentation[J]. Environ Model Softw 148:105285
Li J, Wang X, Tu Z et al (2021) On the diversity of multi-head attention[J]. Neurocomputing 454:14–24
Swinney CJ, Woods JC (2021) Unmanned aerial vehicle operating mode classification using deep residual learning feature extraction[J]. Aerospace 8(3):79
Wang CY, Liao HYM, Wu YH, et al (2020) CSPNet: a new backbone that can enhance learning capability of CNN[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, p 390–391
Borovanský P, Kirchner C, Kirchner H et al (1998) An overview of ELAN[J]. Electron Notes Theor Comput Sci 15:55–70
O’Connor L (1994) On the distribution of characteristics in bijective mappings[C]. Advances in Cryptology—EUROCRYPT’93: Workshop on the Theory and Application of Cryptographic Techniques Lofthus, Norway, May 23–27, 1993 Proceedings 12. Springer Berlin Heidelberg,p 360-370
Cheng G, Han J (2016) A survey on object detection in optical remote sensing images[J]. ISPRS J Photogramm Remote Sens 117:11–28
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement[J]. arXiv preprint arXiv:1804.02767
Yao J, Qi J, Zhang J et al (2021) A real-time detection algorithm for kiwifruit defects based on YOLOv5[J]. Electronics 10(14):1711
Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7464–7475
Sohan M, Sai Ram T, Reddy R, et al (2024) A review on YOLOv8 and its advancements[C]. In: International Conference on Data Intelligence and Cognitive Informatics. Springer, Singapore, p 529–545
Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems 28
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p 6154–6162
Gao T, Liu Z, Zhang J, et al (2023) A task-balanced multi-scale adaptive fusion network for object detection in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing
Sun Y, Liu W, Gao Y et al (2022) A dense feature pyramid network for remote sensing object detection[J]. Appl Sci 12(10):4997
Yuan Z, Liu Z, Zhu C et al (2021) Object detection in remote sensing images via multi-feature pyramid network with receptive field block[J]. Remote Sens 13(5):862
Gao T, Niu Q, Zhang J, et al (2023) Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing
Wang P, Sun X, Diao W et al (2019) FMSSD: feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery[J]. IEEE Trans Geosci Remote Sens 58(5):3377–3390
Liu J, Yang D, Hu F (2022) Multiscale object detection in remote sensing images combined with multi-receptive-field features and relation-connected attention[J]. Remote Sens 14(2):427
Tian S, Kang L, Xing X et al (2021) A relation-augmented embedded graph attention network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens 60:1–18
Liu Y et al (2021) ABNet: adaptive balanced network for multiscale object detection in remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–14
Zhao D, Shao F, Liu Q et al (2024) A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens 16(6):1002
Han QGHHZ, Li QFY (2024) GLFE-YOLOX: global and local feature enhanced YOLOX for remote sensing images[J]
Duan K et al (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: single shot multibox detector[C]. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, p 21–37
Li C, Zhou A, Yao A (2022) Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947
Cheng G et al (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98:119–132
Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning[J]. J Big Data 3:1–40
Author information
Authors and Affiliations
Contributions
(A. and B. wrote the main manuscript text, C., D., E., F. prepared Figs.1–7. All authors reviewed the manuscript.)
Corresponding author
Ethics declarations
Conflict of interest
All authors disclosed no relevant relationships.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jin, Z., Duan, J., Qiao, L. et al. MTGS-Yolo: a task-balanced algorithm for object detection in remote sensing images based on improved yolo. J Supercomput 81, 542 (2025). https://doi.org/10.1007/s11227-025-07003-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-07003-5