Skip to main content
Log in

Enhanced YOLOv8 framework for precision vehicle detection in high-resolution remote sensing images

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Vehicle detection in high-resolution remote sensing imagery faces challenges such as varying scales, complex backgrounds, and high intra-class variability. We propose an enhanced YOLOv8 framework, incorporating three key advancements: the Adaptive Feature Pyramid Network (AFPN), Omni-Dimensional Convolution (ODConv), and a Slim Neck with Generalized Shuffle Convolution (GSConv). These enhancements improve vehicle detection accuracy, computational efficiency, and visual AI capabilities for applications such as computer animation and virtual worlds. Our model achieves a Mean Average Precision (mAP) of 0.7153, representing a 4.99% improvement over the baseline YOLOv8. Precision and recall increase to 0.9233 and 0.9329, respectively, while box loss is reduced from 1.213 to 1.054. This framework supports real-time surveillance, traffic monitoring, and urban planning. The NEPU-OWOD V2.0 dataset, used for evaluation, includes high-resolution images from multiple regions and seasons, along with diverse annotations and augmentations. Our modular approach allows for separate assessments of each enhancement. The dataset and source code are available for future research and development at (https://doi.org/10.5281/zenodo.13075939).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Wang, K. et al.: Oriented object detection in optical remote sensing images using deep learning: a survey (2023). https://doi.org/10.48550/arXiv.2302.10473

  2. Zhang, J. et al.: Fair1m: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery (2021). https://doi.org/10.48550/arXiv.2103.05569

  3. Wang, X. et al.: A comprehensive review of yolo architectures in computer vision: from yolov1 to yolov8 and yolo-nas (2023). https://doi.org/10.48550/arXiv.2304.00501

  4. Cao, L., Shen, Z., Xu, S.: Efficient forest fire detection based on an improved yolo model. Vis. Intell. 2, 20 (2024). https://doi.org/10.1007/s44267-024-00053-y

    Article  MATH  Google Scholar 

  5. Cheng, G., et al.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 159, 296–307 (2020). https://doi.org/10.48550/arXiv.1909.00133

    Article  MATH  Google Scholar 

  6. Chen, G., Zhuang, P., Guo, J., Xu, J., Liu, H., Zhang, L.: Omni-dimensional dynamic convolution (2022). https://doi.org/10.48550/arXiv.2209.07947

  7. Chen, W. et al.: Castdet: toward open vocabulary aerial object detection with clip-activated student-teacher learning (2023). https://doi.org/10.48550/arXiv.2311.11646

  8. Liu, W., Zhang, Y., Hu, Y., Zhou, H.: Efficient meta-learning enabled lightweight multiscale few-shot object detection in remote sensing images (2023)

  9. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. 28 (2015). https://doi.org/10.48550/arXiv.1506.01497

  10. Dai, L., et al.: A deep learning system for predicting time to progression of diabetic retinopathy. Nat. Med. 30, 584–594 (2024). https://doi.org/10.1038/s41591-023-02702-z

    Article  MATH  Google Scholar 

  11. Qian, B. et al.: Drac 2022: a public benchmark for diabetic retinopathy analysis on ultra-wide optical coherence tomography angiography images. https://doi.org/10.1016/j.patter.2024.100929

  12. Qin, Y. et al.: Urbanevolver: function-aware urban layout regeneration (2024). https://doi.org/10.1007/s11263-024-02030-w

  13. Lin, X., et al.: Eapt: Efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 25, 50–61 (2023). https://doi.org/10.1109/TMM.2021.3120873

    Article  MATH  Google Scholar 

  14. Zhu, J. et al.: Clustering environment aware learning for active domain adaptation. https://doi.org/10.1109/TSMC.2024.3374068

  15. Huang, J. et al.: Speed/accuracy trade-offs for modern convolutional object detectors. pp. 7310–7311 (2017)

  16. Liu, S., Qi, X., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. pp. 8759–8768 (2018)

  17. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. pp. 764–773 (2017). https://doi.org/10.48550/arXiv.1703.06211

  18. Chen, L.-C., Papandreou, G., Kokkinos, K., Yuille, A.L.: Rethinking atrous convolution for semantic image segmentation (2017). https://doi.org/10.48550/arXiv.1706.05587

  19. Li, L., Ding, J., Cui, H., Chen, Z., Liao, G.: Litemsnet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes. pp. 1–15 (2024). https://doi.org/10.1007/s00371-024-03569-y

  20. Sheng, B., et al.: Improving video temporal consistency via broad learning system. IEEE Trans. Cybern. 52(7), 6662–6675 (2022). https://doi.org/10.1109/TCYB.2021.3079311

  21. Li, J., et al.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inform. 18(1), 163–173 (2022). https://doi.org/10.1109/TII.2021.3085669

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shao, Z., He, K., Yuan, B. et al. Enhanced YOLOv8 framework for precision vehicle detection in high-resolution remote sensing images. SIViP 19, 218 (2025). https://doi.org/10.1007/s11760-024-03783-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03783-0

Keywords