Abstract
In catastrophic occurrences, the automatic, accurate, and fast object detection of images taken by unmanned aerial vehicles (UAVs) can significantly reduce the time required for manual search and rescue. Victim detection methods, however, are not sufficiently robust for identifying partially obscured and multiscale objects against diverse backgrounds. To overcome this problem, we propose a hybrid-domain attention mechanism algorithm based on YOLOv5 with multiscale feature reuse (YOLO-MSFR). First, to solve the issue of the victim target being easily masked by a complex background, which complicates target attribute representation, a channel-space domain attention method was built, to improve target feature expression ability. Second, to address the problem of easily missed multiscale characteristics of victim targets, a multiscale feature reuse MSFR module was designed to ensure that large-scale target features are effectively expressed while enhancing small-target feature expression ability. The MSFR module was designed based on dilated convolution to solve the problem of small-target feature information loss during downsampling in the backbone network. The features were reused by cascade residuals to reduce the model training parameters and avoid the disappearance of the network deepening gradient. Finally, the efficient intersection over union (EIOU) loss function was adopted to accelerate network convergence and improve the detection performance of the network to accurately locate victims. The proposed algorithm was compared with five classical target detection algorithms on data from multiple disaster environments to verify its advantages. The experimental results show that the proposed algorithm can accurately detect multiscale victim targets in complex natural disasters, with mAP reaching 91.0%. The image detection speed at 640 × 640 resolution was 42 fps, indicating good real-time performance.
Similar content being viewed by others
Data availability
Not applicable.
Abbreviations
- MSFR:
-
Multiscale feature reuse
- CNN:
-
Convolutional neural network
- R-CNN:
-
Region-CNN
- YOLO:
-
You only look once
- RNN:
-
Recurrent neural network
- CV:
-
Computer vision
- SSD:
-
Single shot detector
- CBAM:
-
Convolutional block attention module
- FPN:
-
Feature pyramid network
- MLP:
-
Multilayer perceptron
- FPS:
-
Frames per second
- IoU:
-
Intersection over union
- TP:
-
True positive
- TN:
-
True negative
- FN:
-
False negative
- FP:
-
False positive
References
Avola, D., Cannistraci, I., Cascio, M., Cinque, L., Diko, A., Fagioli, A., Foresti, G.L., Lanzino, R., Mancini, M., Mecca, A., Pannone, D.: A Novel GAN-Based Anomaly Detection and Localization Method for Aerial Video Surveillance at Low Altitude. Remote Sens. 14, 4110 (2022). https://doi.org/10.3390/rs14164110
Kin, J.S., Lee, Y., Lee, H.J.: Fast ROI Detection for Speed up in a CNN based Object Detection. Journal of Multimedia Information System. 6, 203–208 (2019)
Lee, H.L., Kim, Y., Kim, B.G.: A Survey for 3D Object Detection Algorithms from Images. Journal of Multimedia Information System. 9, 183–190 (2022)
Fu, R., He, J., Liu, G., Li, W., Mao, J., He, M., Lin, Y.: Fast Seismic Landslide Detection Based on Improved Mask R-CNN. Remote Sens. 14, 3928 (2022). https://doi.org/10.3390/rs14163928
Kgirshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition. 55, 580–587 (2014)
R Girshick. Fast R-CNN. IEEE International Conference on Computer Vision. 2015, 1440–1448.
Ren, S., He, M., Girshick, R.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)
Zhang, J., Chen, L., Li, Z.: Pedestrian head detection algorithm based on clustering and Faster RCNN. Journal of Northwestern University (Natural Science Edition). 50, 971–978 (2020)
Huang, P., Shi, H., Gao, Y.: A multi-scale Faster-RCNN detection algorithm for small targets. Computer Research and Development. 56, 319–327 (2019)
Liu, W., Anguelov, D., Erhan, D.: SSD: Single shot multibox detector. Proceedings of European Conference on Computer Vision. 9905, 21–37 (2016)
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. IEEE Conference on Computer Vision and Pattern. 55, 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. IEEE Conference on Computer Vision and Pattern Recognition. 55, 6517–6525 (2017)
A Bochkovskiy, Y Wang, Y Liao. YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint, arXiv: 2004.10934, 2020.
R Hartawan, W Purboyo. Disaster Victims Detection System Using Convolutional Neural Network (CNN) Method. IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT). 2019, 105–111.
J Dong Otaka, M Dong. UAV-Based Real-Time Survivor Detection System in Post-Disaster Search and Rescue Operations. IEEE Journal on Miniaturization for Air and Space Systems.2022, 2, 209–219.
Ma, X., Zhang, Z., Zhang, W.: SDWBF Algorithm. Novel Pedestrian Detection Algorithm in the Aerial Scene. (2022). https://doi.org/10.3390/drones6030076
Zhang, N., Nex, F., Vosselman, G.: Training a Disaster Victim Detection Network for UAV Search and Rescue Using Harmonious Composite Images. Remote Sens. 14, 2977 (2020)
S Woo, J Park, Y Lee. CBAM: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision. 2018, 3–19.
F, Zhang; Q, Ren; Z, Zhang. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. arXiv preprint arXiv: 2101. 08158(2021).
Y Wang, L Mark, HYH Wu. CSPNet: a new backbone that can enhance the learning capability of CNN. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2020, 1571–1580.
S Liu, L Qi, F Qin. Path aggregation network for instance segmentation. IEEE Conference on Computer Vision and Patten Recognition. 2017, 8759–8768.
Y Lin, P Dollar, R Girshick. Feature Pyramid Networks for Object Detection. IEEE Conference on Computer Vision and Pattern Recognition. 2017, 936–944.
Park, H.J., Kang, J.W., Kim, B.G.: ssFPN: Scale Sequence (S2) Feature-Based Feature Pyramid Network for Object Detection. Sonsors. 9, 183–190 (2022)
Zheng, H., Wang, P., Ren, W.: Enhancing Geometric Factors. IEEE Transactions on Cybernetics. 52, 8574–8586 (2021)
Wu, B., Wei, Y., LM, H.: Improved YOLOv4 for dangerous goods detection in X-ray inspection combined with atrous convolution and transfer learning. China Optics. 14, 1117–1125 (2021)
Funding
This research was funded by National Natural Science Foundation of China (51804250); China Postdoctoral Science Foundation (2019M653874XB, 2020M683522); Scientific Research Program of Shaanxi Provincial Department of Education (18JK0512); Natural Science Basic Research Program of Shaanxi (2021JQ-572, 2020JQ-757); Innovation Capability Support Program of Shaanxi (2020TD-021).
Author information
Authors and Affiliations
Contributions
Conceptualization, S.H.; methodology, Q.Z.; software, X.M.; validation, Y.W., S.G.; formal analysis, C.Y.; investigation, T.H.; resources, Q.Z.; data curation, Q.Z.; writing—original draft preparation, S.H. and Q.Z.; writing—review and editing, S.H. and Q.Z.; visualization, S.H. and Q.Z.; supervision, S.H.; project administration, S.H.; funding acquisition, S.H. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hao, S., Zhao, Q., Ma, X. et al. YOLO-MSFR: real-time natural disaster victim detection based on improved YOLOv5 network. J Real-Time Image Proc 21, 7 (2024). https://doi.org/10.1007/s11554-023-01383-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-023-01383-8