Abstract
Camouflaged targets are a type of nonsalient target with high foreground and background fusion and minimal target feature information, making target recognition extremely difficult. Most detection algorithms for camouflaged targets use only the target’s single-band information, resulting in low detection accuracy and a high missed detection rate. We present a multimodal image fusion camouflaged target detection technique (MIF-YOLOv5) in this paper. First, we provide a multimodal image input to achieve pixel-level fusion of the camouflaged target’s optical and infrared images to improve the effective feature information of the camouflaged target. Second, a loss function is created, and the K-Means++ clustering technique is used to optimize the target anchor frame in the dataset to increase camouflage personnel detection accuracy and robustness. Finally, a comprehensive detection index of camouflaged targets is proposed to compare the overall effectiveness of various approaches. More crucially, we create a multispectral camouflage target dataset to test the suggested technique. Experimental results show that the proposed method has the best comprehensive detection performance, with a detection accuracy of 96.5%, a recognition probability of 92.5%, a parameter number increase of 1×104, a theoretical calculation amount increase of 0.03 GFLOPs, and a comprehensive detection index of 0.85. The advantage of this method in terms of detection accuracy is also apparent in performance comparisons with other target algorithms.
摘要
伪装目标是一种前景和背景高度融合、目标特征信息极少的非显著目标, 给目标识别带来极大困难. 大多数伪装目标检测算法仅使用目标的单波段信息, 导致检测精度低、漏检率高. 本文提出一种多模态图像融合伪装目标检测技术(MIF-YOLOv5). 首先, 通过多模态图像输入端实现伪装目标的光学和红外图像的像素级融合, 增强伪装目标的有效特征信息. 其次, 创建损失函数, 并利用K-Means++聚类算法优化数据集中的目标锚框, 提高伪装人员的检测精度和算法鲁棒性. 最后, 提出伪装目标的综合检测指标, 以比较各种方法的综合检测效果. 更重要的是, 创建了一个多光谱伪装目标数据集来测试所提技术. 实验结果表明, 所提方法综合检测性能最佳, 其检测精度为96.5%, 识别概率为92.5%, 模型参数增加1×104, 理论计算量增加0.03 GFLOPs, 伪装目标综合检测指数为0.85. 与其他目标算法相比, 该方法在检测精度上的优势显而易见.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Bhajantri NU, Nagabhushan P, 2006. Camouflage defect identification: a novel approach. Proc 9th Int Conf on Information Technology, p.145–148. https://doi.org/10.1109/ICIT.2006.34
Bochkovskiy A, Wang CY, Liao HY, et al., 2020. YOLOv4: optimal speed and accuracy of object detection. https://arxiv.org/abs/2004.10934
Cheng XL, Geng KK, Wang ZW, et al., 2023. SLBAF-Net: super-lightweight bimodal adaptive fusion network for UAV detection in low recognition environment. Multim Tools Appl, 82(30):47773–47792. https://doi.org/10.1007/s11042-023-15333-w
Cheng Y, Hao HZ, Ji Y, et al., 2022. Attention-based neighbor selective aggregation network for camouflaged object detection. Proc Int Joint Conf on Neural Networks, p. 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892156
Fan DP, Ji GP, Sun GL, et al., 2020a. Camouflaged object detection. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.2777–2787. https://doi.org/10.1109/CVPR42600.2020.00285
Fan DP, Ji GP, Zhou T, et al., 2020b. PraNet: parallel reverse attention network for polyp segmentation. Proc 23rd Int Conf on Medical Image Computing and Computer-Assisted Intervention, p.263–273. https://doi.org/10.1007/978-3-030-59725-2_26
Fang QY, Han DP, Wang ZK, 2021. Cross-modality fusion Transformer for multispectral object detection. https://arxiv.org/abs/2111.00273
Gevorgyan Z, 2022. SIoU loss: more powerful learning for bounding box regression. https://arxiv.org/abs/2205.12740
Girshick R, 2015. Fast R-CNN. Proc IEEE Int Conf on Computer Vision, p.1440–1448. https://doi.org/10.1109/ICCV.2015.169
Girshick R, Donahue J, Darrell T, et al., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.580–587. https://doi.org/10.1109/CVPR.2014.81
Hu JH, Cui GZ, Qin L, 2015. A new method of multispectral image processing with camouflage effect detection. Proc SPIE 9675, Image Processing and Analysis, Article 967510. https://doi.org/10.1117/12.2199206
Liang XY, Lin HK, Yang H, et al., 2021. Construction of semantic segmentation dataset of camouflage target image. Lasers Optoelectron Prog, 58(4):0410015 (in Chinese). https://doi.org/10.3788/LOP202158.0410015
Lin ZY, Goyal P, Girshick R, et al., 2020. Focal loss for dense object detection. IEEE Trans Patt Anal Mach Intell, 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Liu CX, 2022. Research on the Fusion Algorithms of Infrared and Visible Image. MS Thesis, Lanzhou Jiaotong University, Lanzhou, China (in Chinese). https://doi.org/10.27205/d.cnki.gltec.2022.001211
Liu W, Anguelov D, Erhan D, et al., 2016. SSD: single shot multibox detector. Proc 14th European Conf on Computer Vision, p.21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Lv YQ, Zhang J, Dai YC, et al., 2021. Simultaneously localize, segment and rank the camouflaged objects. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.11591–11601. https://doi.org/10.1109/CVPR46437.2021.01142
Putatunda R, Gangopadhyay A, Erbacher RF, et al., 2022. Camouflaged object detection system at the edge. Proc SPIE 12096, Automatic Target Recognition XXXII, Article 120960I. https://doi.org/10.1117/12.2618869
Qi B, 2022. Research on Fusion of Infrared and Visible Light Image Based on Co-occurrence Analysis Shearlet Transform. MS Thesis, Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China (in Chinese). https://doi.org/10.27522/d.cnki.gkcgs.2022.000050
Redmon J, Farhadi A, 2017. YOLO9000: better, faster, stronger. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.7263–7271. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Divvala S, Girshick R, et al., 2016. You only look once: unified, real-time object detection. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.779–788. https://doi.org/10.1109/CVPR.2016.91
Sun XH, Guan Z, Wang X, 2023. Vision Transformer for fusing infrared and visible images in groups. J Image Graph, 28(1):166–178 (in Chinese). https://doi.org/10.11834/jig.220515
Tan XY, Hu X, Yang JX, et al., 2022. Camouflaged object detection based on progressive feature enhancement aggregation. J Comput Appl, 42(7):2192–2200 (in Chinese). https://doi.org/10.11772/j.issn.1001-9081.2021060900
Wu GJ, Lyu XL, Xing HN, et al., 2015. Application of three-dimensional convex analysis in pattern painting camouflage detection. J PLA Univ Sci Technol (Nat Sci Ed), 16(6):582–586 (in Chinese). https://doi.org/10.7666/j.issn.1009-3443.20141212001
Yadav D, Arora MK, Tiwari KC, et al., 2018. Detection and identification of camouflaged targets using hyperspectral and LiDAR data. Def Sci J, 68(6):540–546. https://doi.org/10.14429/dsj.68.12731
Zhang W, Zhou QK, Li RZ, et al., 2022. Research on camouflaged human target detection based on deep learning. Comput Intell Neurosci, 2022:7703444. https://doi.org/10.1155/2022/7703444
Author information
Authors and Affiliations
Contributions
Ruihui PENG and Jie LAI designed the research. Jie LAI devised the experimental method for acquiring camouflage target datasets. Xueting YANG, Yingjuan SONG, and Wei GUO collaborated to complete data collection. Jie LAI and Xueting YANG accomplished experimental verification. Jie LAI and Dianxing SUN drafted the paper. Ruihui PENG, Dianxing SUN, and Shuncheng TAN revised and finalized the paper.
Corresponding author
Ethics declarations
All the authors declare that they have no conflict of interest.
Additional information
Project supported by the Shandong Provincial Natural Science Foundation of China (No. ZR2020MF015) and the Aerospace Science and Technology Innovation Institute Stabilization Support Project (No. ZY0110020009)
Rights and permissions
About this article
Cite this article
Peng, R., Lai, J., Yang, X. et al. Camouflaged target detection based on multimodal image input pixel-level fusion. Front Inform Technol Electron Eng 25, 1226–1239 (2024). https://doi.org/10.1631/FITEE.2300503
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2300503
Key words
- Camouflaged target detection
- Pixel-level fusion
- Anchor box optimization
- Loss function
- Multispectral dataset