Skip to main content
Log in

MIVI: multi-stage feature matching for infrared and visible image

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

The matching of infrared and visible images has a wide range of applications across various fields. However, the large difference between these two types of images poses a significant challenge to achieving accurate feature matching. In this paper, we introduce a novel feature matching method for infrared and visible images, named MIVI. Our proposed multi-stage matching architecture enables the model to capture both fine local feature details and remote dependencies, while our novel composite loss function optimizes the model at each stage and significantly improves the matching accuracy. Qualitative and quantitative experiments demonstrate that MIVI outperforms other excellent algorithms in terms of accuracy. The code will be released at: https://github.com/LiaoYun0x0/MIVI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available from the following public domain resources: https://mediatum.ub.tum.de/1474000; http://matthewalunbrown.com/nirscene/nirscene.html;

References

  1. Cheng, D., Zhou, J., Wang, N., Gao, X.: Hybrid dynamic contrast and probability distillation for unsupervised person Re-Id. IEEE Trans. Image Process. 31, 3334–3346 (2022). https://doi.org/10.1109/TIP.2022.3169693

    Article  PubMed  ADS  Google Scholar 

  2. Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., Pajdla, T., Torii, A.: Inloc: indoor visual localization with dense matching and view synthesis. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 7199–7209. Computer Vision Foundation / IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00752. http://openaccess.thecvf.com/content_cvpr_2018/html/Taira_InLoc_Indoor_Visual_CVPR_2018_paper.html

  3. Yoon, S., Kim, A.: Line as a visual sentence: context-aware line descriptor for visual localization. IEEE Robot. Autom. Lett. 6(4), 8726–8733 (2021). https://doi.org/10.1109/LRA.2021.3111760

    Article  Google Scholar 

  4. Lindenberger, P., Sarlin, P., Larsson, V., Pollefeys, M.: Pixel-perfect structure-from-motion with featuremetric refinement. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 5967–5977. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00593

  5. Schönberger, J.L., Frahm, J.: Structure-from-motion revisited. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 4104–4113. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.445

  6. Chen, H., Hu, W., Yang, K., Bai, J., Wang, K.: Panoramic annular SLAM with loop closure and global optimization. CoRR abs/2102.13400 (2021) arXiv:2102.13400

  7. Son, J., Kim, S., Sohn, K.: A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments. Expert Syst. Appl. 42(22), 8830–8839 (2015). https://doi.org/10.1016/j.eswa.2015.07.035

    Article  Google Scholar 

  8. Liu, X., Li, J., Pan, J., Wang, S.: An advanced gradient texture feature descriptor based on phase information for infrared and visible image matching. Multim. Tools Appl. 80(11), 16491–16511 (2021). https://doi.org/10.1007/s11042-020-10213-z

    Article  Google Scholar 

  9. Cui, S., Ma, A., Wan, Y., Zhong, Y., Luo, B., Xu, M.: Cross-modality image matching network with modality-invariant feature representation for airborne-ground thermal infrared and visible datasets. IEEE Trans. Geosci. Remote. Sens. 60, 1–14 (2022). https://doi.org/10.1109/TGRS.2021.3099506

    Article  Google Scholar 

  10. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local feature matching with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19–25, 2021, pp. 8922–8931. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00881. https://openaccess.thecvf.com/content/CVPR2021/html/Sun_LoFTR_Detector-Free_Local_Feature_Matching_With_Transformers_CVPR_2021_paper.html

  11. Bökman, G., Kahl, F.: A case for using rotation invariant features in state of the art feature matchers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19–20, 2022, pp. 5106–5115. IEEE (2022). https://doi.org/10.1109/CVPRW56347.2022.00559

  12. Tang, S., Zhang, J., Zhu, S., Tan, P.: Quadtree attention for vision transformers. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25–29, 2022. OpenReview.net (2022). https://openreview.net/forum?id=fR-EnKWL_Zb

  13. Wang, Q., Zhang, J., Yang, K., Peng, K., Stiefelhagen, R.: Matchformer: interleaving attention in transformers for feature matching. In: Wang, L., Gall, J., Chin, T., Sato, I., Chellappa, R. (eds.) Computer Vision—ACCV 2022—16th Asian Conference on Computer Vision, Macao, China, December 4–8, 2022, Proceedings, Part III. Lecture Notes in Computer Science, vol. 13843, pp. 256–273. Springer (2022). https://doi.org/10.1007/978-3-031-26313-2_16

  14. Bhattacharjee, D., Roy, H.: Pattern of local gravitational force (PLGF): a novel local image descriptor. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 595–607 (2021). https://doi.org/10.1109/TPAMI.2019.2930192

    Article  PubMed  Google Scholar 

  15. Ghannadi, M.A., Saadatseresht, M.: A modified local binary pattern descriptor for SAR image matching. IEEE Geosci. Remote. Sens. Lett. 16(4), 568–572 (2019). https://doi.org/10.1109/LGRS.2018.2876661

    Article  ADS  Google Scholar 

  16. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  17. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. In: Metaxas, D.N., Quan, L., Sanfeliu, A., Gool, L.V. (eds.) IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp. 2564–2571. IEEE Computer Society (2011). https://doi.org/10.1109/ICCV.2011.6126544

  18. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VI. Lecture Notes in Computer Science, vol. 9910, pp. 467–483. Springer (2016). https://doi.org/10.1007/978-3-319-46466-4_28

  19. Luo, Z., Zhou, L., Bai, X., Chen, H., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Aslfeat: learning local features of accurate shape and localization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 6588–6597. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00662. https://openaccess.thecvf.com/content_CVPR_2020/html/Luo_ASLFeat_Learning_Local_Features_of_Accurate_Shape_and_Localization_CVPR_2020_paper.html

  20. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 224–236. Computer Vision Foundation/IEEE Computer Society (2018). https://doi.org/10.1109/CVPRW.2018.00060. http://openaccess.thecvf.com/content_cvpr_2018_workshops/w9/html/DeTone_SuperPoint_Self-Supervised_Interest_CVPR_2018_paper.html

  21. Fang, Y., Wang, K., Cheng, R., Yang, K.: CFVL: A coarse-to-fine vehicle localizer with omnidirectional perception across severe appearance variations. In: IEEE Intelligent Vehicles Symposium, IV 2020, Las Vegas, NV, USA, October 19–November 13, 2020, pp. 1885–1891. IEEE (2020). https://doi.org/10.1109/IV47402.2020.9304612

  22. Di, Y., Zhu, X., Jin, X., Dou, Q., Zhou, W., Duan, Q.: Color-UNet++: a resolution for colorization of grayscale images using improved UNet++. Multimed. Tools Appl. 80(28–29), 35629–35648 (2021). https://doi.org/10.1007/s11042-021-10830-2

    Article  Google Scholar 

  23. Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: Matchnet: unifying feature and metric learning for patch-based matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 3279–3286. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298948

  24. Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19–22, 2016. BMVA Press (2016). http://www.bmva.org/bmvc/2016/papers/paper119/index.html

  25. Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 4826–4837 (2017). https://proceedings.neurips.cc/paper/2017/hash/831caa1b600f852b7844499430ecac17-Abstract.html

  26. Liao, Y., Di, Y., Zhou, H., Li, A., Liu, J., Lu, M., Duan, Q.: Feature matching and position matching between optical and SAR with local deep feature descriptor. IEEE JIEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 15, 448–462 (2022). https://doi.org/10.1109/JSTARS.2021.3134676

    Article  ADS  Google Scholar 

  27. Giang, K.T., Song, S., Jo, S.: TopicFM: robust and interpretable feature matching with topic-assisted. CoRR abs/2207.00328 (2022). arXiv:2207.00328. https://doi.org/10.48550/arXiv.2207.00328

  28. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  29. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net (2021). https://openreview.net/forum?id=YicbFdNTTy

  30. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 9992–10002. IEEE (2021). https://doi.org/10.1109/ICCV48922.2021.00986

  31. Liu, X., Li, J., Pan, J.: Feature point matching based on distinct wavelength phase congruency and log-gabor filters in infrared and visible images. Sensors 19(19), 4244 (2019). https://doi.org/10.3390/s19194244

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  32. Wu, F., Wang, B., Yi, X., Li, M., Hao, J., Qin, H., Zhou, H.: Visible and infrared image registration based on visual salient features. J. Electron. Imaging 24(5), 053017 (2015). https://doi.org/10.1117/1.JEI.24.5.053017

    Article  ADS  Google Scholar 

  33. Min, C., Gu, Y., Yang, F., Li, Y., Lian, W.: Non-rigid registration for infrared and visible images via Gaussian weighted shape context and enhanced affine transformation. IEEE Access 8, 42562–42575 (2020). https://doi.org/10.1109/ACCESS.2020.2976767

    Article  Google Scholar 

  34. Wang, L., Gao, C., Zhao, Y., Song, T., Feng, Q.: Infrared and visible image registration using transformer adversarial network. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7–10, 2018, pp. 1248–1252. IEEE (2018). https://doi.org/10.1109/ICIP.2018.8451370

  35. Arar, M., Ginger, Y., Danon, D., Bermano, A.H., Cohen-Or, D.: Unsupervised multi-modal image registration via geometry preserving image-to-image translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 13407–13416. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.01342. https://openaccess.thecvf.com/content_CVPR_2020/html/Arar_Unsupervised_Multi-Modal_Image_Registration_via_Geometry_Preserving_Image-to-Image_Translation_CVPR_2020_paper.html

  36. Hrkac, T., Kalafatic, Z., Krapac, J.: Infrared-visual image registration based on corners and Hausdorff distance. In: Ersbøll, B.K., Pedersen, K.S. (eds.) Image Analysis, 15th Scandinavian Conference, SCIA 2007, Aalborg, Denmark, June 10–14, 2007, Proceedings. Lecture Notes in Computer Science, vol. 4522, pp. 383–392. Springer (2007). https://doi.org/10.1007/978-3-540-73040-8_39

  37. Ma, J., Zhao, J., Ma, Y., Tian, J.: Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognit. 48(3), 772–784 (2015). https://doi.org/10.1016/j.patcog.2014.09.005

    Article  ADS  Google Scholar 

  38. Min, C., Gu, Y., Li, Y., Yang, F.: Non-rigid infrared and visible image registration by enhanced affine transformation. Pattern Recognit. 106, 107377 (2020). https://doi.org/10.1016/j.patcog.2020.107377

    Article  Google Scholar 

  39. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015—18th International Conference Munich, Germany, October 5–9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  40. Sarlin, P., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 4937–4946. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00499. https://openaccess.thecvf.com/content_CVPR_2020/html/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.html

  41. Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., Yan, S.: Metaformer is actually what you need for vision. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, pp. 10809–10819. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01055

  42. Tyszkiewicz, M.J., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/a42a596fc71e17828440030074d15e74-Abstract.html

  43. Rocco, I., Cimpoi, M., Arandjelovic, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 1658–1669 (2018). https://proceedings.neurips.cc/paper/2018/hash/8f7d807e1f53eff5f9efbe5cb81090fb-Abstract.html

  44. Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12346, pp. 757–774. Springer (2020). https://doi.org/10.1007/978-3-030-58452-8_44

  45. Schmitt, M., Hughes, L.H., Zhu, X.X.: The SEN1-2 dataset for deep learning in SAR-optical data fusion. CoRR abs/1807.01569 (2018). arXiv:1807.01569

  46. Schmitt, M., Wu, Y.: Remote sensing image classification with the SEN12MS dataset. CoRR abs/2104.00704 (2021). arXiv:2104.00704

  47. Brown, M.A., Süsstrunk, S.: Multi-spectral SIFT for scene category recognition. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011, pp. 177–184. IEEE Computer Society (2011). https://doi.org/10.1109/CVPR.2011.5995637

  48. Li, J., Xu, W., Shi, P., Zhang, Y., Hu, Q.: LNIFT: locally normalized image for rotation invariant multimodal feature matching. IEEE Trans. Geosci. Remote. Sens. 60, 1–14 (2022). https://doi.org/10.1109/TGRS.2022.3165940

    Article  Google Scholar 

  49. Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: a trainable CNN for joint description and detection of local features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 8092–8101. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.00828. http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html

  50. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005). https://doi.org/10.1109/TPAMI.2005.188

    Article  PubMed  Google Scholar 

  51. Zhou, Q., Sattler, T., Leal-Taixé, L.: Patch2pix: epipolar-guided pixel-level correspondences. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19–25, 2021, pp. 4669–4678. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00464. https://openaccess.thecvf.com/content/CVPR2021/html/Zhou_Patch2Pix_Epipolar-Guided_Pixel-Level_Correspondences_CVPR_2021_paper.html

Download references

Acknowledgements

This work is supported by a grant from the Social and Science Foundation of Liaoning Province (No. L20BTQ008), in part by the National Natural Science Foundation of China under Grant 61976124 and in part by the Scientific Research Fund of Yunnan Provincial Education Department under Grant 2021J0007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyu Lu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interests or competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Di, Y., Liao, Y., Zhu, K. et al. MIVI: multi-stage feature matching for infrared and visible image. Vis Comput 40, 1839–1851 (2024). https://doi.org/10.1007/s00371-023-02889-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02889-9

Keywords

Navigation