Abstract
The matching of infrared and visible images has a wide range of applications across various fields. However, the large difference between these two types of images poses a significant challenge to achieving accurate feature matching. In this paper, we introduce a novel feature matching method for infrared and visible images, named MIVI. Our proposed multi-stage matching architecture enables the model to capture both fine local feature details and remote dependencies, while our novel composite loss function optimizes the model at each stage and significantly improves the matching accuracy. Qualitative and quantitative experiments demonstrate that MIVI outperforms other excellent algorithms in terms of accuracy. The code will be released at: https://github.com/LiaoYun0x0/MIVI.
Similar content being viewed by others
Data availability
The datasets analysed during the current study are available from the following public domain resources: https://mediatum.ub.tum.de/1474000; http://matthewalunbrown.com/nirscene/nirscene.html;
References
Cheng, D., Zhou, J., Wang, N., Gao, X.: Hybrid dynamic contrast and probability distillation for unsupervised person Re-Id. IEEE Trans. Image Process. 31, 3334–3346 (2022). https://doi.org/10.1109/TIP.2022.3169693
Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., Pajdla, T., Torii, A.: Inloc: indoor visual localization with dense matching and view synthesis. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 7199–7209. Computer Vision Foundation / IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00752. http://openaccess.thecvf.com/content_cvpr_2018/html/Taira_InLoc_Indoor_Visual_CVPR_2018_paper.html
Yoon, S., Kim, A.: Line as a visual sentence: context-aware line descriptor for visual localization. IEEE Robot. Autom. Lett. 6(4), 8726–8733 (2021). https://doi.org/10.1109/LRA.2021.3111760
Lindenberger, P., Sarlin, P., Larsson, V., Pollefeys, M.: Pixel-perfect structure-from-motion with featuremetric refinement. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 5967–5977. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00593
Schönberger, J.L., Frahm, J.: Structure-from-motion revisited. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 4104–4113. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.445
Chen, H., Hu, W., Yang, K., Bai, J., Wang, K.: Panoramic annular SLAM with loop closure and global optimization. CoRR abs/2102.13400 (2021) arXiv:2102.13400
Son, J., Kim, S., Sohn, K.: A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments. Expert Syst. Appl. 42(22), 8830–8839 (2015). https://doi.org/10.1016/j.eswa.2015.07.035
Liu, X., Li, J., Pan, J., Wang, S.: An advanced gradient texture feature descriptor based on phase information for infrared and visible image matching. Multim. Tools Appl. 80(11), 16491–16511 (2021). https://doi.org/10.1007/s11042-020-10213-z
Cui, S., Ma, A., Wan, Y., Zhong, Y., Luo, B., Xu, M.: Cross-modality image matching network with modality-invariant feature representation for airborne-ground thermal infrared and visible datasets. IEEE Trans. Geosci. Remote. Sens. 60, 1–14 (2022). https://doi.org/10.1109/TGRS.2021.3099506
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local feature matching with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19–25, 2021, pp. 8922–8931. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00881. https://openaccess.thecvf.com/content/CVPR2021/html/Sun_LoFTR_Detector-Free_Local_Feature_Matching_With_Transformers_CVPR_2021_paper.html
Bökman, G., Kahl, F.: A case for using rotation invariant features in state of the art feature matchers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19–20, 2022, pp. 5106–5115. IEEE (2022). https://doi.org/10.1109/CVPRW56347.2022.00559
Tang, S., Zhang, J., Zhu, S., Tan, P.: Quadtree attention for vision transformers. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022). https://openreview.net/forum?id=fR-EnKWL_Zb
Wang, Q., Zhang, J., Yang, K., Peng, K., Stiefelhagen, R.: Matchformer: interleaving attention in transformers for feature matching. In: Wang, L., Gall, J., Chin, T., Sato, I., Chellappa, R. (eds.) Computer Vision—ACCV 2022—16th Asian Conference on Computer Vision, Macao, China, December 4–8, 2022, Proceedings, Part III. Lecture Notes in Computer Science, vol. 13843, pp. 256–273. Springer (2022). https://doi.org/10.1007/978-3-031-26313-2_16
Bhattacharjee, D., Roy, H.: Pattern of local gravitational force (PLGF): a novel local image descriptor. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 595–607 (2021). https://doi.org/10.1109/TPAMI.2019.2930192
Ghannadi, M.A., Saadatseresht, M.: A modified local binary pattern descriptor for SAR image matching. IEEE Geosci. Remote. Sens. Lett. 16(4), 568–572 (2019). https://doi.org/10.1109/LGRS.2018.2876661
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. In: Metaxas, D.N., Quan, L., Sanfeliu, A., Gool, L.V. (eds.) IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp. 2564–2571. IEEE Computer Society (2011). https://doi.org/10.1109/ICCV.2011.6126544
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VI. Lecture Notes in Computer Science, vol. 9910, pp. 467–483. Springer (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Luo, Z., Zhou, L., Bai, X., Chen, H., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Aslfeat: learning local features of accurate shape and localization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 6588–6597. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00662. https://openaccess.thecvf.com/content_CVPR_2020/html/Luo_ASLFeat_Learning_Local_Features_of_Accurate_Shape_and_Localization_CVPR_2020_paper.html
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 224–236. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPRW.2018.00060. http://openaccess.thecvf.com/content_cvpr_2018_workshops/w9/html/DeTone_SuperPoint_Self-Supervised_Interest_CVPR_2018_paper.html
Fang, Y., Wang, K., Cheng, R., Yang, K.: CFVL: A coarse-to-fine vehicle localizer with omnidirectional perception across severe appearance variations. In: IEEE Intelligent Vehicles Symposium, IV 2020, Las Vegas, NV, USA, October 19–November 13, 2020, pp. 1885–1891. IEEE (2020). https://doi.org/10.1109/IV47402.2020.9304612
Di, Y., Zhu, X., Jin, X., Dou, Q., Zhou, W., Duan, Q.: Color-UNet++: a resolution for colorization of grayscale images using improved UNet++. Multimed. Tools Appl. 80(28–29), 35629–35648 (2021). https://doi.org/10.1007/s11042-021-10830-2
Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for patch-based matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 3279–3286. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298948
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19–22, 2016. BMVA Press (2016). http://www.bmva.org/bmvc/2016/papers/paper119/index.html
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 4826–4837 (2017). https://proceedings.neurips.cc/paper/2017/hash/831caa1b600f852b7844499430ecac17-Abstract.html
Liao, Y., Di, Y., Zhou, H., Li, A., Liu, J., Lu, M., Duan, Q.: Feature matching and position matching between optical and SAR with local deep feature descriptor. IEEE JIEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 15, 448–462 (2022). https://doi.org/10.1109/JSTARS.2021.3134676
Giang, K.T., Song, S., Jo, S.: TopicFM: robust and interpretable feature matching with topic-assisted. CoRR abs/2207.00328 (2022). arXiv:2207.00328. https://doi.org/10.48550/arXiv.2207.00328
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net (2021). https://openreview.net/forum?id=YicbFdNTTy
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 9992–10002. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00986
Liu, X., Li, J., Pan, J.: Feature point matching based on distinct wavelength phase congruency and log-gabor filters in infrared and visible images. Sensors 19(19), 4244 (2019). https://doi.org/10.3390/s19194244
Wu, F., Wang, B., Yi, X., Li, M., Hao, J., Qin, H., Zhou, H.: Visible and infrared image registration based on visual salient features. J. Electron. Imaging 24(5), 053017 (2015). https://doi.org/10.1117/1.JEI.24.5.053017
Min, C., Gu, Y., Yang, F., Li, Y., Lian, W.: Non-rigid registration for infrared and visible images via Gaussian weighted shape context and enhanced affine transformation. IEEE Access 8, 42562–42575 (2020). https://doi.org/10.1109/ACCESS.2020.2976767
Wang, L., Gao, C., Zhao, Y., Song, T., Feng, Q.: Infrared and visible image registration using transformer adversarial network. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7–10, 2018, pp. 1248–1252. IEEE (2018). https://doi.org/10.1109/ICIP.2018.8451370
Arar, M., Ginger, Y., Danon, D., Bermano, A.H., Cohen-Or, D.: Unsupervised multi-modal image registration via geometry preserving image-to-image translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 13407–13416. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.01342. https://openaccess.thecvf.com/content_CVPR_2020/html/Arar_Unsupervised_Multi-Modal_Image_Registration_via_Geometry_Preserving_Image-to-Image_Translation_CVPR_2020_paper.html
Hrkac, T., Kalafatic, Z., Krapac, J.: Infrared-visual image registration based on corners and Hausdorff distance. In: Ersbøll, B.K., Pedersen, K.S. (eds.) Image Analysis, 15th Scandinavian Conference, SCIA 2007, Aalborg, Denmark, June 10–14, 2007, Proceedings. Lecture Notes in Computer Science, vol. 4522, pp. 383–392. Springer (2007). https://doi.org/10.1007/978-3-540-73040-8_39
Ma, J., Zhao, J., Ma, Y., Tian, J.: Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognit. 48(3), 772–784 (2015). https://doi.org/10.1016/j.patcog.2014.09.005
Min, C., Gu, Y., Li, Y., Yang, F.: Non-rigid infrared and visible image registration by enhanced affine transformation. Pattern Recognit. 106, 107377 (2020). https://doi.org/10.1016/j.patcog.2020.107377
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015—18th International Conference Munich, Germany, October 5–9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sarlin, P., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 4937–4946. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00499. https://openaccess.thecvf.com/content_CVPR_2020/html/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.html
Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., Yan, S.: Metaformer is actually what you need for vision. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, pp. 10809–10819. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01055
Tyszkiewicz, M.J., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/a42a596fc71e17828440030074d15e74-Abstract.html
Rocco, I., Cimpoi, M., Arandjelovic, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 1658–1669 (2018). https://proceedings.neurips.cc/paper/2018/hash/8f7d807e1f53eff5f9efbe5cb81090fb-Abstract.html
Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12346, pp. 757–774. Springer (2020). https://doi.org/10.1007/978-3-030-58452-8_44
Schmitt, M., Hughes, L.H., Zhu, X.X.: The SEN1-2 dataset for deep learning in SAR-optical data fusion. CoRR abs/1807.01569 (2018). arXiv:1807.01569
Schmitt, M., Wu, Y.: Remote sensing image classification with the SEN12MS dataset. CoRR abs/2104.00704 (2021). arXiv:2104.00704
Brown, M.A., Süsstrunk, S.: Multi-spectral SIFT for scene category recognition. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011, pp. 177–184. IEEE Computer Society (2011). https://doi.org/10.1109/CVPR.2011.5995637
Li, J., Xu, W., Shi, P., Zhang, Y., Hu, Q.: LNIFT: locally normalized image for rotation invariant multimodal feature matching. IEEE Trans. Geosci. Remote. Sens. 60, 1–14 (2022). https://doi.org/10.1109/TGRS.2022.3165940
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: a trainable CNN for joint description and detection of local features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 8092–8101. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.00828. http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005). https://doi.org/10.1109/TPAMI.2005.188
Zhou, Q., Sattler, T., Leal-Taixé, L.: Patch2pix: epipolar-guided pixel-level correspondences. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19–25, 2021, pp. 4669–4678. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00464. https://openaccess.thecvf.com/content/CVPR2021/html/Zhou_Patch2Pix_Epipolar-Guided_Pixel-Level_Correspondences_CVPR_2021_paper.html
Acknowledgements
This work is supported by a grant from the Social and Science Foundation of Liaoning Province (No. L20BTQ008), in part by the National Natural Science Foundation of China under Grant 61976124 and in part by the Scientific Research Fund of Yunnan Provincial Education Department under Grant 2021J0007.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interests or competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Di, Y., Liao, Y., Zhu, K. et al. MIVI: multi-stage feature matching for infrared and visible image. Vis Comput 40, 1839–1851 (2024). https://doi.org/10.1007/s00371-023-02889-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02889-9