Abstract
Finding valid correspondences is of considerable significance to image matching, which has been regarded as the key of numerous vision-based tasks. Current methods usually have drawbacks in sets with high proportion of outliers. To address the problem, given a set of putative correspondences in two images, this paper proposes a novel framework (named SAH-Net) to remove outliers and recover camera pose through essential matrix using an end-to-end network. The proposed SAH-Net is hierarchical with a multi-scale structure, which consists of correspondence level and cluster level. First, correspondence level takes advantage of two-view geometry to learn correspondence features. Next, in order to integrate structural information of the scene, correspondences are pooled via a self-attention method. Additionally, SAH-Net applies a spatial correlation operation after the clustering, separating features into segments and learning the spatial characteristics of clustered nodes. Finally, clusters have been integrated with spatial information, and they are recovered to original scale via a learned upsampling operation. Extensive experiments are conducted on remote sensing image registration, general image matching (outdoor and indoor image datasets respectively) and loop closure detection, which demonstrate the excellence of SAH-Net in mismatch removal and relative pose estimation compared to other state-of-the-art competitors.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the first author upon reasonable request.
References
Ma J, Zhao J, Jiang J, Zhou H, Guo X (2019) Locality preserving matching. Int J Comput Vision 127(5):512–531
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: A survey Inf Fusion 45:153–178
Ma J, Jiang X, Fan A, Jiang J, Yan J (2021) Image matching from handcrafted to deep features: A survey. Int J Comput Vision 129(1):23–79
Revaud J, De Souza C, Humenberger M, Weinzaepfel P (2019) R2d2:Reliable and repeatable detector and descriptor. Adv Neural Inf Proc Syst 32
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Li D, He K, Wang L, Zhang D (2022) Local feature extraction network with high correspondences for 3d point cloud registration. Appl Intelligence 1–12
Mei S, Ma Y, Mei X, Huang J, Fan F (2022) S2-net: Self-supervision guided feature representation learning for cross-modality images. IEEE/-CAA Journal of Automatica Sinica 9(10):1883–1885
Yi KM, Trulls E, Ono Y, Lepetit V, Salzmann M, Fua P (2018) Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang Y, Yi J, Chen Y, Dai Z, Han F, Cao S (2022) Pose estimation for workpieces in complex stacking industrial scene based on rgb images. Appl Intelligence 52(8):8757–8769
Kamranian Z, Sadeghian H, Naghsh Nilchi AR, Mehrandezh M (2021) Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intelligence 51(6):3581–3599
Longuet-Higgins HC (1981) A computer algorithm for reconstructing a scene from two projections. Nature 293(5828):133–135
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6):381–395
Zhang J, Sun D, Luo Z, Yao A, Zhou L, Shen T, Chen Y, Quan L, Liao H (2019) Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhou D, Zhang H, Yang K, Liu L, Yan H, Xu X, Zhang Z, Yan S (2022) Learning to synthesize compatible fashion items using semantic alignment and collocation classification: An outfit generation framework. IEEE Transactions on Neural Networks and Learning Systems
Dong L, Zhang H, Yang K, Zhou D, Shi J, Ma J (2022) Crowd counting by using top-k relations: A mixed ground-truth cnn framework. IEEE Transactions on Consumer Electronics 68(3):307–316
Tang L, Deng Y, Ma Y, Huang J, Ma J (2022) Superfusion: A versatile image registration and fusion network with semantic awareness. IEEE/CAA Journal of Automatica Sinica 9(12):2121–2137
Barath D, Matas J (2018) Graph-cut ransac. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Barath D, Matas J, Noskova J (2019) Magsac: marginalizing sample consensus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Barath, D, Noskova, J, Ivashechkin, M, Matas, J (2020) Magsac++, a fast, reliable and accurate robust estimator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ma J, Zhao J, Tian J, Yuille AL, Tu Z (2014) Robust point matching via vector field consensus. IEEE Transactions on Image Processing 23(4):1706–1721
Ma J, Ma Y, Zhao J, Tian J (2014) Image feature matching via progressive vector field consensus. IEEE Signal Processing Letters 22(6):767–771
Ma J, Wu J, Zhao J, Jiang J, Zhou H, Sheng QZ (2018) Nonrigid point set registration with robust transformation learning under manifold regularization. IEEE Transactions on Neural Networks and Learning Systems 30(12):3584–3597
Bian JW, Lin WY, Liu Y, Zhang L, Yeung SK, Cheng MM, Reid I (2020) Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence. International Journal of Computer Vision 128:1580–1593
Shao F, Liu Z, An J (2020) A discriminative point matching algorithm based on local structure consensus constraint. IEEE Geoscience and Remote Sensing Letters 18(8):1366–1370
Cavalli L, Larsson V, Oswald MR, Sattler T, Pollefeys M (2020) Handcrafted outlier detection revisited. In: Proceedings of the European Conference on Computer Vision
Ma J, Li Z, Zhang K, Shao Z, Xiao G (2022) Robust feature matching via neighborhood manifold representation consensus. ISPRS Journal of Photogrammetry and Remote Sensing 183:196–209
Li Z, Ma Y, Mei X, Huang J, Ma J (2022) Guided neighborhood affine subspace embedding for feature matching. Pattern Recognition 124:108489
Ranftl R, Koltun V (2018) Deep fundamental matrix estimation. In: Proceedings of the European Conference on Computer Vision 284–299
Yang R, Zhang J, Li B (2022) Estimating the fundamental matrix based on the end-to-end convolutional network. Appl Intelligence 1–12
Brachmann E, Rother C (2019) Neural-guided ransac: Learning where to sample model hypotheses. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S, Rother C (2017) Dsac-differentiable ransac for camera localization. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Kluger F, Brachmann E, Ackermann H, Rother C, Yang MY, Rosenhahn B (2020) Consac: Robust multi-model fitting by conditional sample consensus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Plötz T, Roth S (2018) Neural nearest neighbors networks. Adv Neural Inf Processing Syst 31
Sun W, Jiang W, Trulls E, Tagliasacchi A, Yi KM (2020) Acne: Attentive context normalization for robust permutation-equivariant learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. Adv Neural Inf Processing Syst 31
Ma J, Jiang X, Jiang J, Zhao J, Guo X (2019) Lmr: Learning a two-class classifier for mismatch removal. IEEE Transactions on Image Processing 28(8):4045–4059
Sarlin PE, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhong Z, Xiao G, Zheng L, Lu Y, Ma J (2021) T-net: Effective permutation-equivariant network for two-view correspondence learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Liu X, Xiao G, Dai L, Zeng K, Yang C, Chen R (2021) Scsa-net: Presentation of two-view reliable correspondence learning via spatial-channel self-attention. Neurocomputing 431:137–147
Lee J, Lee I, Kang J (2019) Self-attention graph pooling. In: Proceedings of the International Conference on Machine Learning
Guo Q, Qiu X, Liu P, Xue X, Zhang Z (2020) Multi-scale self-attention for text classification. Proceedings of the AAAI Conference on Artificial Intelligence 34:7847–7854
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2):91–110
Li J, Hu Q, Ai M (2019) Rift: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Transactions on Image Processing 29:3296–3310
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2016) Yfcc100m: The new data in multimedia research. Communications of the ACM 59(2):64–73
Xiao J, Owens A, Torralba A (2013) Sun3d: A database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE International Conference on Computer Vision
Heinly J, Schonberger JL, Dunn E, Frahm JM (2015) Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Ma J, Jiang J, Zhou H, Zhao J, Guo X (2018) Guided locality preserving feature matching for remote sensing image registration. IEEE Transactions on Geoscience and Remote Sensing 56(8):4435–4447
Wang Z (2022) Recognition of occluded objects by slope difference distribution features. Applied Soft Computing 120:108622
Zhang K, Jiang X, Ma J (2022) Appearance-based loop closure detection via locality-driven accurate motion field learning. IEEE Transactions on Intelligent Transportation Systems 23(3):2350–2365
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant No. 62061160370, 62075169 and 62003247; Hubei Province Key Research and Development Program under Grant 2021BBA235; Zhuhai Basic and Applied Basic Research Foundation under Grant ZH22017003200010PWC.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known conflicts of interest that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chu, M., Ma, Y., Mei, X. et al. Learning-based correspondence classifier with self-attention hierarchical network. Appl Intell 53, 24360–24376 (2023). https://doi.org/10.1007/s10489-023-04789-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04789-w