Skip to main content
Log in

Learning-based correspondence classifier with self-attention hierarchical network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Finding valid correspondences is of considerable significance to image matching, which has been regarded as the key of numerous vision-based tasks. Current methods usually have drawbacks in sets with high proportion of outliers. To address the problem, given a set of putative correspondences in two images, this paper proposes a novel framework (named SAH-Net) to remove outliers and recover camera pose through essential matrix using an end-to-end network. The proposed SAH-Net is hierarchical with a multi-scale structure, which consists of correspondence level and cluster level. First, correspondence level takes advantage of two-view geometry to learn correspondence features. Next, in order to integrate structural information of the scene, correspondences are pooled via a self-attention method. Additionally, SAH-Net applies a spatial correlation operation after the clustering, separating features into segments and learning the spatial characteristics of clustered nodes. Finally, clusters have been integrated with spatial information, and they are recovered to original scale via a learned upsampling operation. Extensive experiments are conducted on remote sensing image registration, general image matching (outdoor and indoor image datasets respectively) and loop closure detection, which demonstrate the excellence of SAH-Net in mismatch removal and relative pose estimation compared to other state-of-the-art competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the first author upon reasonable request.

References

  1. Ma J, Zhao J, Jiang J, Zhou H, Guo X (2019) Locality preserving matching. Int J Comput Vision 127(5):512–531

    Article  MathSciNet  MATH  Google Scholar 

  2. Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: A survey Inf Fusion 45:153–178

    Google Scholar 

  3. Ma J, Jiang X, Fan A, Jiang J, Yan J (2021) Image matching from handcrafted to deep features: A survey. Int J Comput Vision 129(1):23–79

    Article  MathSciNet  MATH  Google Scholar 

  4. Revaud J, De Souza C, Humenberger M, Weinzaepfel P (2019) R2d2:Reliable and repeatable detector and descriptor. Adv Neural Inf Proc Syst 32

  5. DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

  6. Li D, He K, Wang L, Zhang D (2022) Local feature extraction network with high correspondences for 3d point cloud registration. Appl Intelligence 1–12

  7. Mei S, Ma Y, Mei X, Huang J, Fan F (2022) S2-net: Self-supervision guided feature representation learning for cross-modality images. IEEE/-CAA Journal of Automatica Sinica 9(10):1883–1885

    Article  Google Scholar 

  8. Yi KM, Trulls E, Ono Y, Lepetit V, Salzmann M, Fua P (2018) Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  9. Zhang Y, Yi J, Chen Y, Dai Z, Han F, Cao S (2022) Pose estimation for workpieces in complex stacking industrial scene based on rgb images. Appl Intelligence 52(8):8757–8769

    Article  Google Scholar 

  10. Kamranian Z, Sadeghian H, Naghsh Nilchi AR, Mehrandezh M (2021) Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intelligence 51(6):3581–3599

    Article  Google Scholar 

  11. Longuet-Higgins HC (1981) A computer algorithm for reconstructing a scene from two projections. Nature 293(5828):133–135

    Article  Google Scholar 

  12. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  13. Zhang J, Sun D, Luo Z, Yao A, Zhou L, Shen T, Chen Y, Quan L, Liao H (2019) Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision

  14. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  15. Zhou D, Zhang H, Yang K, Liu L, Yan H, Xu X, Zhang Z, Yan S (2022) Learning to synthesize compatible fashion items using semantic alignment and collocation classification: An outfit generation framework. IEEE Transactions on Neural Networks and Learning Systems

  16. Dong L, Zhang H, Yang K, Zhou D, Shi J, Ma J (2022) Crowd counting by using top-k relations: A mixed ground-truth cnn framework. IEEE Transactions on Consumer Electronics 68(3):307–316

    Article  Google Scholar 

  17. Tang L, Deng Y, Ma Y, Huang J, Ma J (2022) Superfusion: A versatile image registration and fusion network with semantic awareness. IEEE/CAA Journal of Automatica Sinica 9(12):2121–2137

    Article  Google Scholar 

  18. Barath D, Matas J (2018) Graph-cut ransac. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  19. Barath D, Matas J, Noskova J (2019) Magsac: marginalizing sample consensus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  20. Barath, D, Noskova, J, Ivashechkin, M, Matas, J (2020) Magsac++, a fast, reliable and accurate robust estimator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  21. Ma J, Zhao J, Tian J, Yuille AL, Tu Z (2014) Robust point matching via vector field consensus. IEEE Transactions on Image Processing 23(4):1706–1721

    Article  MathSciNet  MATH  Google Scholar 

  22. Ma J, Ma Y, Zhao J, Tian J (2014) Image feature matching via progressive vector field consensus. IEEE Signal Processing Letters 22(6):767–771

    Article  Google Scholar 

  23. Ma J, Wu J, Zhao J, Jiang J, Zhou H, Sheng QZ (2018) Nonrigid point set registration with robust transformation learning under manifold regularization. IEEE Transactions on Neural Networks and Learning Systems 30(12):3584–3597

    Article  MathSciNet  Google Scholar 

  24. Bian JW, Lin WY, Liu Y, Zhang L, Yeung SK, Cheng MM, Reid I (2020) Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence. International Journal of Computer Vision 128:1580–1593

    Article  MathSciNet  Google Scholar 

  25. Shao F, Liu Z, An J (2020) A discriminative point matching algorithm based on local structure consensus constraint. IEEE Geoscience and Remote Sensing Letters 18(8):1366–1370

    Article  Google Scholar 

  26. Cavalli L, Larsson V, Oswald MR, Sattler T, Pollefeys M (2020) Handcrafted outlier detection revisited. In: Proceedings of the European Conference on Computer Vision

  27. Ma J, Li Z, Zhang K, Shao Z, Xiao G (2022) Robust feature matching via neighborhood manifold representation consensus. ISPRS Journal of Photogrammetry and Remote Sensing 183:196–209

    Article  Google Scholar 

  28. Li Z, Ma Y, Mei X, Huang J, Ma J (2022) Guided neighborhood affine subspace embedding for feature matching. Pattern Recognition 124:108489

    Article  Google Scholar 

  29. Ranftl R, Koltun V (2018) Deep fundamental matrix estimation. In: Proceedings of the European Conference on Computer Vision 284–299

  30. Yang R, Zhang J, Li B (2022) Estimating the fundamental matrix based on the end-to-end convolutional network. Appl Intelligence 1–12

  31. Brachmann E, Rother C (2019) Neural-guided ransac: Learning where to sample model hypotheses. In: Proceedings of the IEEE/CVF International Conference on Computer Vision

  32. Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S, Rother C (2017) Dsac-differentiable ransac for camera localization. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  33. Kluger F, Brachmann E, Ackermann H, Rother C, Yang MY, Rosenhahn B (2020) Consac: Robust multi-model fitting by conditional sample consensus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  34. Plötz T, Roth S (2018) Neural nearest neighbors networks. Adv Neural Inf Processing Syst 31

  35. Sun W, Jiang W, Trulls E, Tagliasacchi A, Yi KM (2020) Acne: Attentive context normalization for robust permutation-equivariant learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  36. Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. Adv Neural Inf Processing Syst 31

  37. Ma J, Jiang X, Jiang J, Zhao J, Guo X (2019) Lmr: Learning a two-class classifier for mismatch removal. IEEE Transactions on Image Processing 28(8):4045–4059

    Article  MathSciNet  MATH  Google Scholar 

  38. Sarlin PE, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  39. Zhong Z, Xiao G, Zheng L, Lu Y, Ma J (2021) T-net: Effective permutation-equivariant network for two-view correspondence learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision

  40. Liu X, Xiao G, Dai L, Zeng K, Yang C, Chen R (2021) Scsa-net: Presentation of two-view reliable correspondence learning via spatial-channel self-attention. Neurocomputing 431:137–147

  41. Lee J, Lee I, Kang J (2019) Self-attention graph pooling. In: Proceedings of the International Conference on Machine Learning

  42. Guo Q, Qiu X, Liu P, Xue X, Zhang Z (2020) Multi-scale self-attention for text classification. Proceedings of the AAAI Conference on Artificial Intelligence 34:7847–7854

    Article  Google Scholar 

  43. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2):91–110

    Article  Google Scholar 

  44. Li J, Hu Q, Ai M (2019) Rift: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Transactions on Image Processing 29:3296–3310

    Article  MATH  Google Scholar 

  45. Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2016) Yfcc100m: The new data in multimedia research. Communications of the ACM 59(2):64–73

    Article  Google Scholar 

  46. Xiao J, Owens A, Torralba A (2013) Sun3d: A database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE International Conference on Computer Vision

  47. Heinly J, Schonberger JL, Dunn E, Frahm JM (2015) Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  48. Ma J, Jiang J, Zhou H, Zhao J, Guo X (2018) Guided locality preserving feature matching for remote sensing image registration. IEEE Transactions on Geoscience and Remote Sensing 56(8):4435–4447

    Article  Google Scholar 

  49. Wang Z (2022) Recognition of occluded objects by slope difference distribution features. Applied Soft Computing 120:108622

    Article  Google Scholar 

  50. Zhang K, Jiang X, Ma J (2022) Appearance-based loop closure detection via locality-driven accurate motion field learning. IEEE Transactions on Intelligent Transportation Systems 23(3):2350–2365

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62061160370, 62075169 and 62003247; Hubei Province Key Research and Development Program under Grant 2021BBA235; Zhuhai Basic and Applied Basic Research Foundation under Grant ZH22017003200010PWC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingfan Chu.

Ethics declarations

Conflicts of interest

The authors declare that they have no known conflicts of interest that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chu, M., Ma, Y., Mei, X. et al. Learning-based correspondence classifier with self-attention hierarchical network. Appl Intell 53, 24360–24376 (2023). https://doi.org/10.1007/s10489-023-04789-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04789-w

Keywords

Navigation