Abstract
Feature fusion is an essential component of multimodal object detection to exploit the complementary information and common information between multi-source images. When it comes to visible-infrared image pairs, however, the visible images are prone to illumination and visibility and there may be a lot of interference information and little useful information. We suggest performing common feature enhancement and spatial cross attention sequentially to solve this problem. For this purpose, a novel Dual Attention Transformer Feature Fusion (DATFF) module which is designed for feature fusion of intermediate feature maps is proposed. We integrate it into two-stream object detectors and achieve state-of-the-art performance on DroneVehicle and FLIR visible-infrared object detection datasets. Our code is available at https://github.com/a21401624/DATFF.
This research was supported by the National Key Research and Development Program of China under Grant No. 2018AAA0100400, and the National Natural Science Foundation of China under Grants 91538201, and 62076242.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, K., Wang, J., Pang, J., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Y.-T., Shi, J., Ye, Z., Mertz, C., Ramanan, D., Kong, S.: Multimodal object detection via probabilistic ensembling. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13669, pp. 139–158. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20077-9_9
Ding, J., Xue, N., Long, Y., et al.: Learning RoI Transformer for oriented object detection in aerial images. In: CVPR, pp. 2844–2853 (2019)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Li, C., et al.: Illumination-aware faster r-cnn for robust multispectral pedestrian detection. Pattern Recogn. 85, 161–171 (2019)
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
Liu, J., Zhang, S., Wang, S., et al.: Multispectral deep neural networks for pedestrian detection. In: BMVC, pp. 73.1-73.13 (2016)
Liu, T., Lam, K.M., Zhao, R., Qiu, G.: Deep cross-modal representation learning and distillation for illumination-invariant pedestrian detection. IEEE Trans. Circuits Syst. Video Technol. 32(1), 315–329 (2022)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Qingyun, F., Dapeng, H., Zhaokui, W.: Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273 (2021)
Qingyun, F., Zhaokui, W.: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery. Pattern Recogn. 130, 108786 (2022)
Razakarivony, S., Jurie, F.: Vehicle detection in aerial imagery: a small target detection benchmark. J. Vis. Commun. Image Represent. 34, 187–203 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, vol. 28, pp. 91–99 (2015)
Sharma, M., Dhanaraj, M., Karnam, S., et al.: YOLOrs: object detection in multimodal remote sensing imagery. IEEE J. Selected Topics Appli. Earth Observat. Remote Sensing 14, 1497–1508 (2021)
Sun, Y., Cao, B., Zhu, P., et al.: Drone-based RGB-Infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 32(10), 6700–6713 (2022)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS, vol. 30, pp. 5998–6008 (2017)
Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: ICCV, pp. 3500–3509 (2021)
Yang, X., Yan, J., Feng, Z., He, T.: R3det: refined single-stage detector with feature refinement for rotating object. In: AAAI, vol. 35(4), pp. 3163–3171 (2021)
Zhang, H., Fromont, E., Lefevre, S., et al.: Guided attentive feature fusion for multispectral pedestrian detection. In: WACV, pp. 72–80 (2021)
Zhang, H., Fromont, E., et al.: Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: ICIP, pp. 276–280 (2020)
Zhang, J., Lei, J., Xie, W., et al.: SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 61, 1–15 (2023)
Zhang, L., Liu, Z., Zhang, S., Yang, X., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inform. Fusion 50, 20–29 (2019)
Zhang, X., Jiang, H., Xu, N., et al.: MsIFT: multi-source image fusion transformer. Remote Sensing 14(16) (2022)
Zhou, K., Chen, L., Cao, X.: Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 787–803. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_46
Zhou, Y., Yang, X., Zhang, G., et al.: MMRotate: a rotated object detection benchmark using pytorch. arXiv preprint arXiv:2204.13317 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, Y., Shi, L., Yao, L., Weng, L. (2023). Dual Attention Feature Fusion for Visible-Infrared Object Detection. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-44195-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)