Tiny-YOLOv7: Tiny Object Detection Model for Drone Imagery

Cheng, Pengchao; Tang, Xu; Liang, Wenqi; Li, Yu; Cong, Wei; Zang, Chuanzhi

doi:10.1007/978-3-031-46311-2_5

Pengchao Cheng^14,15,
Xu Tang¹⁵,
Wenqi Liang¹⁵,
Yu Li^14,15,
Wei Cong¹⁵ &
…
Chuanzhi Zang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14357))

Included in the following conference series:

International Conference on Image and Graphics

469 Accesses
1 Citations

Abstract

With the rapid development of drones, tiny object detection in drone-captured scenarios has become a challenge task. However, the altitude of the drone changes while flying lead to the scale of the object changes dramatically. In addition, drones flying quickly cause motion blur on the densely tiny objects. In order to address the two issues mention above, we propose Tiny-YOLOv7. In order to detect multi-scale objects, we replace the original prediction heads with transformer prediction heads. For the motion blur issue, we propose DBS module to extract more visual elements and maintain computational cost of model. The DBS module consists of Dynamic Region-Aware Convolution (DRConv), Batch Normalization and Silu modules. On scenarios with dense objects, we additionally incorporate the Convolutional Block Attention Model (CBAM) to find the attention region of dense objects. Tiny-YOLOv7 is an effective and elegant method for handing tiny object detection. We validate our model through extensive experiments on VisDrone2021 and DOTA-v1.0 datasets. The results show that our method obtains remarkable improvements over the other models. In VisDrone2021 dataset, the mAP result of our method is 39.22\(\%\), which is higher than SOTA method by 1.07\(\%\). Furthermore, experiments on dataset DOTA-v1.0 demonstrate generalization of the propose model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Betti, A., Tucci, M.: YOLO-S: a lightweight and accurate YOLO-like network for small target selection in aerial imagery. Sensors 23(4), 1865 (2023)
Article Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Cao, Y., et al.: VisDrone-DET2021: the vision meets drone object detection challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2847–2854 (2021)
Google Scholar
Chen, J., Wang, X., Guo, Z., Zhang, X., Sun, J.: Dynamic region-aware convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8064–8073 (2021)
Google Scholar
Ding, J., et al.: Object detection in aerial images: a large-scale benchmark and challenges. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7778–7796 (2021)
Article Google Scholar
Doloriel, C.T.C., Cajote, R.D.: Improving the detection of small oriented objects in aerial images. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 176–185 (2023)
Google Scholar
Dong, R., Xu, D., Zhao, J., Jiao, L., An, J.: Sig-NMS-based faster R-CNN combining transfer learning for small target detection in VHR optical remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 57(11), 8534–8545 (2019)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gu, J., Su, T., Wang, Q., Du, X., Guizani, M.: Multiple moving targets surveillance based on a cooperative network for multi-UAV. IEEE Commun. Mag. 56(4), 82–89 (2018)
Article Google Scholar
Huang, Z., Li, W., Xia, X.G., Tao, R.: A general gaussian heatmap label assignment for arbitrary-oriented object detection. IEEE Trans. Image Process. 31, 1895–1910 (2022)
Article Google Scholar
Kellenberger, B., Marcos, D., Tuia, D.: Detecting mammals in UAV images: best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ. 216, 139–153 (2018)
Article Google Scholar
Li, C., et al.: YOLOV6 V3. 0: a full-scale reloading. arXiv preprint arXiv:2301.05586 (2023)
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6054–6063 (2019)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Shao, Z., Li, C., Li, D., Altan, O., Zhang, L., Ding, L.: An accurate matching method for projecting vector data into surveillance video to monitor and protect cultivated land. ISPRS Int. J. Geo Inf. 9(7), 448 (2020)
Article Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOV7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar
Zhen, P.: Towards accurate oriented object detection in aerial images with adaptive multi-level feature fusion. ACM Trans. Multimed. Comput. Commun. Appl. 19(1), 1–22 (2023)
Article Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhu, X., Lyu, S., Wang, X., Zhao, Q.: TPH-YOLOV5: improved YOLOV5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, Shenyang University of Technology, Shenyang, 10142, China
Pengchao Cheng, Yu Li & Chuanzhi Zang
State Key Laboratory of Robotics, Shenyang Institute of Automation Chinese Academy of Sciences, Shenyang, 110016, China
Pengchao Cheng, Xu Tang, Wenqi Liang, Yu Li & Wei Cong

Authors

Pengchao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Wenqi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Cong
View author publications
You can also search for this author in PubMed Google Scholar
Chuanzhi Zang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Tang .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, P., Tang, X., Liang, W., Li, Y., Cong, W., Zang, C. (2023). Tiny-YOLOv7: Tiny Object Detection Model for Drone Imagery. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14357. Springer, Cham. https://doi.org/10.1007/978-3-031-46311-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-46311-2_5
Published: 29 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46310-5
Online ISBN: 978-3-031-46311-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tiny-YOLOv7: Tiny Object Detection Model for Drone Imagery