Abstract
UAV object detection task is a highly popular computer vision task, where algorithms can be deployed on unmanned aerial vehicles (UAVs) for real-time object detection. However, YOLOv5’s performance for UAV object detection is not entirely satisfactory due to the small size of the detected objects and the problem of occlusion. To address these two issues in the YOLOv5 algorithm, we propose the YOLOv5-LW algorithm model. Building upon YOLOv5, we replace the FPN-PAN network structure with the FPN-PANS structure. This modification helps mitigate the issue of feature disappearance for small objects during the training process while reducing the model parameters and computational complexity. Additionally, within the FPN-PANS structure, we employ a multistage feature fusion approach instead of the original feature fusion module. This approach effectively corrects the erroneous information generated during the upsampling stage for certain objects. Finally, we replace the SPPF module with the SPPF-W module to further increase the receptive field while maintaining almost unchanged parameters. We conducted multiple experiments and demonstrate that YOLOv5-LW performs exceptionally well in lightweight small object detection tasks using the VisDrone dataset. Compared to YOLOv5, YOLOv5-LW achieves a 4.7% improvement in mean average precision (mAP), reduces the model size by 40%, and decreases the parameters by 40%.
This work was supported by Jiangxi Province Office of Education.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hird, J.N., et al.: Use of unmanned aerial vehicles for monitoring recovery of forest vegetation on petroleum well sites. Remote Sens. 9(5), 413 (2017)
Shao, Z., Li, C., Li, D., Altan, O., Zhang, L., Ding, L.: An accurate matching method for projecting vector data into surveillance video to monitor and protect cultivated land. ISPRS Int. J. Geo Inf. 9(7), 448 (2020)
Kellenberger, B., Volpi, M., Tuia, D.: Fast animal detection in UAV images using convolutional neural networks. In: 2017 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2017, Fort Worth, TX, USA, 23–28 July 2017, pp. 866–869. IEEE (2017)
Kellenberger, B., Marcos, D., Tuia, D.: Detecting mammals in UAV images: best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ. 216, 139–153 (2018)
Audebert, N., Le Saux, B., Lefèvre, S.: Beyond RGB: very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Photogrammetry Remote Sens. 140, 20–32 (2018)
Gu, J., Su, T., Wang, Q., Du, X., Guizani, M.: Multiple moving targets surveillance based on a cooperative network for multi-UAV. IEEE Commun. Mag. 56(4), 82–89 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Lin, T.-Y., Goyal, P., Girshick, R.B., He, K., Dollar, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2999–3007. IEEE Computer Society (2017)
Zhang, H., Wang, Y., Dayoub, F., Sunderhauf, N.: VarifocalNet: an IoU-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523 (2021)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Alexey, B., Chien-Yao, W., Mark, L.H.-Y.: YOLOv4: optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934 (2020)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-YOLOv4: scaling cross stage partial network. In: Computer Vision and Pattern Recognition, pp. 13029–13038 (2021)
Junyang, C., et al.: A multiscale lightweight and efficient model based on YOLOv7: applied to citrus orchard. Plants-Basel 11(23) (2022)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 821–830 (2019)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 91–99 (2015)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) LNCS. ECCV 2016, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)
Ghiasi, G., Lin, T.-Y., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7036–7045 (2019)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(9), 1904–1916 (2015)
Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 385–400 (2018)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 2169–2178. IEEE (2006)
Visdrone Team. Visdrone 2020 leaderboard (2020). http://aiskyeye.com/visdrone-2020-leaderboard/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Xiao, H., Zhao, K., Xie, X., Song, P., Dong, S., Yang, J. (2024). YOLOv5-LW: Lightweight UAV Object Detection Algorithm Based on YOLOv5. In: Wu, C., Chen, X., Feng, J., Wu, Z. (eds) Mobile Networks and Management. MONAMI 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 559. Springer, Cham. https://doi.org/10.1007/978-3-031-55471-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-55471-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55470-4
Online ISBN: 978-3-031-55471-1
eBook Packages: Computer ScienceComputer Science (R0)