Skip to main content

Advertisement

Log in

MTGS-Yolo: a task-balanced algorithm for object detection in remote sensing images based on improved yolo

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Remote sensing images (RSIs) have become integral to a multitude of sectors, including military operations, urban traffic planning, and natural resource management, making the detection of targets within RSIs a critical research endeavor. Although RSIs object detection holds significant application value across various domains, its research still confronts numerous challenges. These include the interference of complex backgrounds in RSIs, the high resolution of RSIs that complicates deployment on computationally constrained satellite platforms, and processing large amounts of complex data leads to inefficiencies. To surmount these obstacles, we introduce a novel Task-Balanced Algorithm for Object Detection in Remote Sensing Images Based on Improved YOLO—MTGS-Yolo. The algorithm commences with the construction of a Multi-Transformer model, designed to address dense prediction problems. By significantly augmenting the network's capacity to capture both local and global contextual information, it minimizes information loss and enhances the network’s adaptability to more intricate background scenarios. Furthermore, we have incorporated a Generalized Efficient Aggregation Network (GELAN) structure, which transcends traditional architectural and device limitations. This innovation is geared toward adapting to the spectrum of feature learning, from complex to lightweight, thereby achieving a model that is both lightweight and computationally efficient. This advancement not only reduces computational costs but also significantly improves model efficiency. In response to the issue of low feature resolution for small objects in RSIs, which often leads to background confusion, we have proposed a Spatial Context-Aware Module (SCAM). This module leverages spatial contextual information to delineate cross-spatial relationships between pixels, effectively suppressing irrelevant background elements and enhancing the distinguishability between targets and their surroundings. Experimental results on the public DIOR dataset demonstrate that MTGS-Yolo surpasses the baseline network in terms of detection performance and robustness. Additionally, transfer learning experiments conducted on the NWPU VHR-10 dataset reveal that MTGS-Yolo outperforms other classic and improved algorithms in terms of detection performance and exhibits superior generalization capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The download addresses of the datasets used or analyzed during the current study: DIOR dataset: http://www.escience.cn/people/gongcheng/DIOR.html. NWPU VHR-10 dataset: https://hyper.ai/datasets/5422

References

  1. Li Z, Wang Y, Zhang N et al (2022) Deep learning-based object detection techniques for remote sensing images: a survey[J]. Remote Sens 14(10):2385

    Article  MATH  Google Scholar 

  2. Johansen K, Roelfsema C, Phinn S (2008) High spatial resolution remote sensing for environmental monitoring and management preface[J]. J Spat Sci 53(1):43–47

    Article  MATH  Google Scholar 

  3. Bharatkar PS, Patel R (2013) Evaluation of rsi classification methods for effective land use mapping[C]. In: 2013 International Conference on Communication Systems and Network Technologies. IEEE, p 109–113

  4. Persia L, Usami DS, De Simone F et al (2016) Management of road infrastructure safety[J]. Transp Res Procedia 14:3436–3445

    Article  MATH  Google Scholar 

  5. Shi L, Kodagoda S, Dissanayake G (2010) Multi-class classification for semantic labeling of places[C]. In: 2010 11th International Conference on Control Automation Robotics & Vision. IEEE, p 2307–2312

  6. Cook WD, Liang L, Zhu J (2010) Measuring performance of two-stage network structures by DEA: a review and future perspective[J]. Omega 38(6):423–430

    Article  MATH  Google Scholar 

  7. Girshick R (2015) Fast r-cnn[C]. In: Proceedings of the IEEE international conference on computer vision. p 1440–1448

  8. Zhang Y, Li X, Wang F, et al (2021) A comprehensive review of one-stage networks for object detection[C]. In: 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). IEEE, p 1–6

  9. Xue J, Zheng Y, Dong-Ye C et al (2022) Improved YOLOv5 network method for remote sensing image-based ground objects recognition[J]. Soft Comput 26(20):10879–10889

    Article  MATH  Google Scholar 

  10. Feng X, Han J, Yao X et al (2020) TCANet: triple context-aware network for weakly supervised object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 59(8):6946–6955

    Article  MATH  Google Scholar 

  11. Gao T, Liu Z, Zhang J et al (2023) A task-balanced multi-scale adaptive fusion network for object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 61:1–15

    MATH  Google Scholar 

  12. Thiele ST, Lorenz S, Kirsch M et al (2021) Multi-scale, multi-sensor data integration for automated 3-D geological mapping[J]. Ore Geol Rev 136:104252

    Article  MATH  Google Scholar 

  13. Zhang Y, Ye M, Zhu G et al (2024) FFCA-YOLO for small object detection in remote sensing images[J]. IEEE Trans Geosci Remote Sens 62:1–15

    Article  MATH  Google Scholar 

  14. Ogura K, Yamada Y, Kajita S et al (2019) Ground object recognition and segmentation from aerial image-based 3D point cloud[J]. Comput Intell 35(3):625–642

    Article  MathSciNet  MATH  Google Scholar 

  15. Decision making in complex environments[M] (2007) Ashgate Publishing, Ltd.

  16. Baqué P, Fleuret F, Fua P (2017) Deep occlusion reasoning for multi-camera multi-target detection[C]. In: Proceedings of the IEEE International Conference on Computer Vision, p 271–279.

  17. Wu Q, Feng D, Cao C, Zeng X, Feng Z, Wu J, Huang Z (2021) Improved mask R-CNN for aircraft detection in remote sensing images. Sensors 21:2618

    Article  MATH  Google Scholar 

  18. Li Q, Chen Y, Zeng Y (2022) Transformer with transfer CNN for remote-sensing-image object detection[J]. Remote Sens 14(4):984

    Article  MATH  Google Scholar 

  19. Li G, Liu Z, Zeng D et al (2022) Adjacent context coordination network for salient object detection in optical remote sensing images[J]. IEEE Trans Cybern 53(1):526–538

    Article  MATH  Google Scholar 

  20. Wang W, Shi Y, Zhang J, Hu L, Li S, He D, Liu F (2023) Traditional village building extraction based on improved mask R-CNN: a case study of Beijing. China Remote Sens 15:2616

    Article  MATH  Google Scholar 

  21. Niu R, Zhi X, Jiang S et al (2023) Aircraft target detection in low signal-to-noise ratio visible remote sensing images[J]. Remote Sens 15(8):1971

    Article  MATH  Google Scholar 

  22. Li Z, Yuan J, Li G et al (2023) RSI-YOLO: object detection method for remote sensing images based on improved YOLO[J]. Sensors 23(14):6414

    Article  MATH  Google Scholar 

  23. Wang L, Shoulin Y, Alyami H, et al (2022) A novel deep learning‐based single shot multibox detector model for object detection in optical remote sensing images[J]

  24. Chefer H, Gur S, Wolf L (2021) Transformer interpretability beyond attention visualization[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 782–791

  25. Pally RJ, Samadi S (2022) Application of image processing and convolutional neural networks for flood image classification and semantic segmentation[J]. Environ Model Softw 148:105285

    Article  MATH  Google Scholar 

  26. Li J, Wang X, Tu Z et al (2021) On the diversity of multi-head attention[J]. Neurocomputing 454:14–24

    Article  MATH  Google Scholar 

  27. Swinney CJ, Woods JC (2021) Unmanned aerial vehicle operating mode classification using deep residual learning feature extraction[J]. Aerospace 8(3):79

    Article  MATH  Google Scholar 

  28. Wang CY, Liao HYM, Wu YH, et al (2020) CSPNet: a new backbone that can enhance learning capability of CNN[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, p 390–391

  29. Borovanský P, Kirchner C, Kirchner H et al (1998) An overview of ELAN[J]. Electron Notes Theor Comput Sci 15:55–70

    Article  MATH  Google Scholar 

  30. O’Connor L (1994) On the distribution of characteristics in bijective mappings[C]. Advances in Cryptology—EUROCRYPT’93: Workshop on the Theory and Application of Cryptographic Techniques Lofthus, Norway, May 23–27, 1993 Proceedings 12. Springer Berlin Heidelberg,p 360-370

  31. Cheng G, Han J (2016) A survey on object detection in optical remote sensing images[J]. ISPRS J Photogramm Remote Sens 117:11–28

    Article  MATH  Google Scholar 

  32. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement[J]. arXiv preprint arXiv:1804.02767

  33. Yao J, Qi J, Zhang J et al (2021) A real-time detection algorithm for kiwifruit defects based on YOLOv5[J]. Electronics 10(14):1711

    Article  MATH  Google Scholar 

  34. Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7464–7475

  35. Sohan M, Sai Ram T, Reddy R, et al (2024) A review on YOLOv8 and its advancements[C]. In: International Conference on Data Intelligence and Cognitive Informatics. Springer, Singapore, p 529–545

  36. Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems 28

  37. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p 6154–6162

  38. Gao T, Liu Z, Zhang J, et al (2023) A task-balanced multi-scale adaptive fusion network for object detection in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing

  39. Sun Y, Liu W, Gao Y et al (2022) A dense feature pyramid network for remote sensing object detection[J]. Appl Sci 12(10):4997

    Article  MATH  Google Scholar 

  40. Yuan Z, Liu Z, Zhu C et al (2021) Object detection in remote sensing images via multi-feature pyramid network with receptive field block[J]. Remote Sens 13(5):862

    Article  MATH  Google Scholar 

  41. Gao T, Niu Q, Zhang J, et al (2023) Global to local: a scale-aware network for remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing

  42. Wang P, Sun X, Diao W et al (2019) FMSSD: feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery[J]. IEEE Trans Geosci Remote Sens 58(5):3377–3390

    Article  MATH  Google Scholar 

  43. Liu J, Yang D, Hu F (2022) Multiscale object detection in remote sensing images combined with multi-receptive-field features and relation-connected attention[J]. Remote Sens 14(2):427

    Article  MATH  Google Scholar 

  44. Tian S, Kang L, Xing X et al (2021) A relation-augmented embedded graph attention network for remote sensing object detection[J]. IEEE Trans Geosci Remote Sens 60:1–18

    MATH  Google Scholar 

  45. Liu Y et al (2021) ABNet: adaptive balanced network for multiscale object detection in remote sensing imagery. IEEE Trans Geosci Remote Sens 60:1–14

    MATH  Google Scholar 

  46. Zhao D, Shao F, Liu Q et al (2024) A small object detection method for drone-captured images based on improved YOLOv7[J]. Remote Sens 16(6):1002

    Article  MATH  Google Scholar 

  47. Han QGHHZ, Li QFY (2024) GLFE-YOLOX: global and local feature enhanced YOLOX for remote sensing images[J]

  48. Duan K et al (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision

  49. Liu W, Anguelov D, Erhan D, et al (2016) Ssd: single shot multibox detector[C]. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, p 21–37

  50. Li C, Zhou A, Yao A (2022) Omni-dimensional dynamic convolution. arXiv preprint arXiv:2209.07947

  51. Cheng G et al (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens 98:119–132

    Article  MATH  Google Scholar 

  52. Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning[J]. J Big Data 3:1–40

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

(A. and B. wrote the main manuscript text, C., D., E., F. prepared Figs.1–7. All authors reviewed the manuscript.)

Corresponding author

Correspondence to Jiang Duan.

Ethics declarations

Conflict of interest

All authors disclosed no relevant relationships.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, Z., Duan, J., Qiao, L. et al. MTGS-Yolo: a task-balanced algorithm for object detection in remote sensing images based on improved yolo. J Supercomput 81, 542 (2025). https://doi.org/10.1007/s11227-025-07003-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-07003-5

Keywords