MTGS-Yolo: a task-balanced algorithm for object detection in remote sensing images based on improved yolo

Jin, Zhao; Duan, Jiang; Qiao, Liping; He, Tian; Shi, Xinyu; Yan, Bohan

doi:10.1007/s11227-025-07003-5

MTGS-Yolo: a task-balanced algorithm for object detection in remote sensing images based on improved yolo

Published: 24 February 2025

Volume 81, article number 542, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Zhao Jin¹,
Jiang Duan¹,
Liping Qiao²,
Tian He¹,
Xinyu Shi¹ &
…
Bohan Yan¹

397 Accesses
Explore all metrics

Abstract

Remote sensing images (RSIs) have become integral to a multitude of sectors, including military operations, urban traffic planning, and natural resource management, making the detection of targets within RSIs a critical research endeavor. Although RSIs object detection holds significant application value across various domains, its research still confronts numerous challenges. These include the interference of complex backgrounds in RSIs, the high resolution of RSIs that complicates deployment on computationally constrained satellite platforms, and processing large amounts of complex data leads to inefficiencies. To surmount these obstacles, we introduce a novel Task-Balanced Algorithm for Object Detection in Remote Sensing Images Based on Improved YOLO—MTGS-Yolo. The algorithm commences with the construction of a Multi-Transformer model, designed to address dense prediction problems. By significantly augmenting the network's capacity to capture both local and global contextual information, it minimizes information loss and enhances the network’s adaptability to more intricate background scenarios. Furthermore, we have incorporated a Generalized Efficient Aggregation Network (GELAN) structure, which transcends traditional architectural and device limitations. This innovation is geared toward adapting to the spectrum of feature learning, from complex to lightweight, thereby achieving a model that is both lightweight and computationally efficient. This advancement not only reduces computational costs but also significantly improves model efficiency. In response to the issue of low feature resolution for small objects in RSIs, which often leads to background confusion, we have proposed a Spatial Context-Aware Module (SCAM). This module leverages spatial contextual information to delineate cross-spatial relationships between pixels, effectively suppressing irrelevant background elements and enhancing the distinguishability between targets and their surroundings. Experimental results on the public DIOR dataset demonstrate that MTGS-Yolo surpasses the baseline network in terms of detection performance and robustness. Additionally, transfer learning experiments conducted on the NWPU VHR-10 dataset reveal that MTGS-Yolo outperforms other classic and improved algorithms in terms of detection performance and exhibits superior generalization capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Improved Lightweight YOLOv5 for Remote Sensing Images

Research on object detection and recognition in remote sensing images based on YOLOv11

Article Open access 23 April 2025

YOLO-L: A YOLO-Based Algorithm for Remote Sensing Image Target Detection

Data availability

The download addresses of the datasets used or analyzed during the current study: DIOR dataset: http://www.escience.cn/people/gongcheng/DIOR.html. NWPU VHR-10 dataset: https://hyper.ai/datasets/5422

References

Li Z, Wang Y, Zhang N et al (2022) Deep learning-based object detection techniques for remote sensing images: a survey[J]. Remote Sens 14(10):2385
Article MATH Google Scholar
Johansen K, Roelfsema C, Phinn S (2008) High spatial resolution remote sensing for environmental monitoring and management preface[J]. J Spat Sci 53(1):43–47
Article MATH Google Scholar
Bharatkar PS, Patel R (2013) Evaluation of rsi classification methods for effective land use mapping[C]. In: 2013 International Conference on Communication Systems and Network Technologies. IEEE, p 109–113
Persia L, Usami DS, De Simone F et al (2016) Management of road infrastructure safety[J]. Transp Res Procedia 14:3436–3445
Article MATH Google Scholar
Shi L, Kodagoda S, Dissanayake G (2010) Multi-class classification for semantic labeling of places[C]. In: 2010 11th International Conference on Control Automation Robotics & Vision. IEEE, p 2307–2312
Cook WD, Liang L, Zhu J (2010) Measuring performance of two-stage network structures by DEA: a review and future perspective[J]. Omega 38(6):423–430
Article MATH Google Scholar
Girshick R (2015) Fast r-cnn[C]. In: Proceedings of the IEEE international conference on computer vision. p 1440–1448
Zhang Y, Li X, Wang F, et al (2021) A comprehensive review of one-stage networks for object detection[C]. In: 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). IEEE, p 1–6
Xue J, Zheng Y, Dong-Ye C et al (2022) Improved YOLOv5 network method for remote sensing image-based ground objects recognition[J]. Soft Comput 26(20):10879–10889
Article MATH Google Scholar
Feng X, Han J, Yao X et al (2020) TCANet: triple context-aware network for weakly supervised object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 59(8):6946–6955
Article MATH Google Scholar
Gao T, Liu Z, Zhang J et al (2023) A task-balanced multi-scale adaptive fusion network for object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 61:1–15
MATH Google Scholar
Thiele ST, Lorenz S, Kirsch M et al (2021) Multi-scale, multi-sensor data integration for automated 3-D geological mapping[J]. Ore Geol Rev 136:104252
Article MATH Google Scholar
Zhang Y, Ye M, Zhu G et al (2024) FFCA-YOLO for small object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 62:1–15
Article MATH Google Scholar
Ogura K, Yamada Y, Kajita S et al (2019) Ground object recognition and segmentation from aerial image-based 3D point cloud[J]. Comput Intell 35(3):625–642
Article MathSciNet MATH Google Scholar
Decision making in complex environments[M] (2007) Ashgate Publishing, Ltd.
Baqué P, Fleuret F, Fua P (2017) Deep occlusion reasoning for multi-camera multi-target detection[C]. In: Proceedings of the IEEE International Conference on Computer Vision, p 271–279.
Wu Q, Feng D, Cao C, Zeng X, Feng Z, Wu J, Huang Z (2021) Improved mask R-CNN for aircraft detection in remote sensing images. Sensors 21:2618
Article MATH Google Scholar
Li Q, Chen Y, Zeng Y (2022) Transformer with transfer CNN for remote-sensing-image object detection[J]. Remote Sens 14(4):984
Article MATH Google Scholar
Li G, Liu Z, Zeng D et al (2022) Adjacent context coordination network for salient object detection in optical remote sensing images[J]. IEEE Trans Cybern 53(1):526–538
Article MATH Google Scholar
Wang W, Shi Y, Zhang J, Hu L, Li S, He D, Liu F (2023) Traditional village building extraction based on improved mask R-CNN: a case study of Beijing. China Remote Sens 15:2616
Article MATH Google Scholar
Niu R, Zhi X, Jiang S et al (2023) Aircraft target detection in low signal-to-noise ratio visible remote sensing images[J]. Remote Sens 15(8):1971
Article MATH Google Scholar
Li Z, Yuan J, Li G et al (2023) RSI-YOLO: object detection method for remote sensing images based on improved YOLO[J]. Sensors 23(14):6414
Article MATH Google Scholar
Wang L, Shoulin Y, Alyami H, et al (2022) A novel deep learning‐based single shot multibox detector model for object detection in optical remote sensing images[J]
Chefer H, Gur S, Wolf L (2021) Transformer interpretability beyond attention visualization[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 782–791
Pally RJ, Samadi S (2022) Application of image processing and convolutional neural networks for flood image classification and semantic segmentation[J]. Environ Model Softw 148:105285
Article MATH Google Scholar
Li J, Wang X, Tu Z et al (2021) On the diversity of multi-head attention[J]. Neurocomputing 454:14–24
Article MATH Google Scholar
Swinney CJ, Woods JC (2021) Unmanned aerial vehicle operating mode classification using deep residual learning feature extraction[J]. Aerospace 8(3):79
Article MATH Google Scholar
Wang CY, Liao HYM, Wu YH, et al (2020) CSPNet: a new backbone that can enhance learning capability of CNN[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, p 390–391
Borovanský P, Kirchner C, Kirchner H et al (1998) An overview of ELAN[J]. Electron Notes Theor Comput Sci 15:55–70
Article MATH Google Scholar
O’Connor L (1994) On the distribution of characteristics in bijective mappings[C]. Advances in Cryptology—EUROCRYPT’93: Workshop on the Theory and Application of Cryptographic Techniques Lofthus, Norway, May 23–27, 1993 Proceedings 12. Springer Berlin Heidelberg,p 360-370
Cheng G, Han J (2016) A survey on object detection in optical remote sensing images[J]. ISPRS J Photogramm Remote Sens 117:11–28
Article MATH Google Scholar
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement[J]. arXiv preprint arXiv:1804.02767
Yao J, Qi J, Zhang J et al (2021) A real-time detection algorithm for kiwifruit defects based on YOLOv5[J]. Electronics 10(14):1711
Article MATH Google Scholar
Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7464–7475
Sohan M, Sai Ram T, Reddy R, et al (2024) A review on YOLOv8 and its advancements[C]. In: International Conference on Data Intelligence and Cognitive Informatics. Springer, Singapore, p 529–545
Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems 28
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p 6154–6162
Gao T, Liu Z, Zhang J, et al (2023) A task-balanced multi-scale adaptive fusion network for object detection in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing
Sun Y, Liu W, Gao Y et al (2022) A dense feature pyramid network for remote sensing object detection[J]. Appl Sci 12(10):4997
Article MATH Google Scholar
Yuan Z, Liu Z, Zhu C et al (2021) Object detection in remote sensing images via multi-feature pyramid network with receptive field block[J]. Remote Sens 13(5):862
Article MATH Google Scholar
Gao T, Niu Q, Zhang J, et al (2023) Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing
Wang P, Sun X, Diao W et al (2019) FMSSD: feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery[J]. IEEE Trans Geosci Remote Sens 58(5):3377–3390
Article MATH Google Scholar
Liu J, Yang D, Hu F (2022) Multiscale object detection in remote sensing images combined with multi-receptive-field features and relation-connected attention[J]. Remote Sens 14(2):427
Article MATH Google Scholar
Tian S, Kang L, Xing X et al (2021) A relation-augmented embedded graph attention network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens 60:1–18
MATH Google Scholar
Liu Y et al (2021) ABNet: adaptive balanced network for multiscale object detection in remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–14
MATH Google Scholar
Zhao D, Shao F, Liu Q et al (2024) A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens 16(6):1002
Article MATH Google Scholar
Han QGHHZ, Li QFY (2024) GLFE-YOLOX: global and local feature enhanced YOLOX for remote sensing images[J]
Duan K et al (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: single shot multibox detector[C]. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, p 21–37
Li C, Zhou A, Yao A (2022) Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947
Cheng G et al (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98:119–132
Article MATH Google Scholar
Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning[J]. J Big Data 3:1–40
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Engineering, Chang’an University, Xi’an, 710064, China
Zhao Jin, Jiang Duan, Tian He, Xinyu Shi & Bohan Yan
School of Information Engineering, Xizang Minzu University, Xianyang, 712082, China
Liping Qiao

Authors

Zhao Jin
View author publications
You can also search for this author inPubMed Google Scholar
Jiang Duan
View author publications
You can also search for this author inPubMed Google Scholar
Liping Qiao
View author publications
You can also search for this author inPubMed Google Scholar
Tian He
View author publications
You can also search for this author inPubMed Google Scholar
Xinyu Shi
View author publications
You can also search for this author inPubMed Google Scholar
Bohan Yan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

(A. and B. wrote the main manuscript text, C., D., E., F. prepared Figs.1–7. All authors reviewed the manuscript.)

Corresponding author

Correspondence to Jiang Duan.

Ethics declarations

Conflict of interest

All authors disclosed no relevant relationships.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jin, Z., Duan, J., Qiao, L. et al. MTGS-Yolo: a task-balanced algorithm for object detection in remote sensing images based on improved yolo. J Supercomput 81, 542 (2025). https://doi.org/10.1007/s11227-025-07003-5

Download citation

Accepted: 28 January 2025
Published: 24 February 2025
DOI: https://doi.org/10.1007/s11227-025-07003-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MTGS-Yolo: a task-balanced algorithm for object detection in remote sensing images based on improved yolo

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved Lightweight YOLOv5 for Remote Sensing Images

Research on object detection and recognition in remote sensing images based on YOLOv11

YOLO-L: A YOLO-Based Algorithm for Remote Sensing Image Target Detection

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now