Learning-based correspondence classifier with self-attention hierarchical network

Chu, Mingfan; Ma, Yong; Mei, Xiaoguang; Huang, Jun; Fan, Fan

doi:10.1007/s10489-023-04789-w

Learning-based correspondence classifier with self-attention hierarchical network

Published: 24 July 2023

Volume 53, pages 24360–24376, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Mingfan Chu ORCID: orcid.org/0000-0002-0557-0366¹,
Yong Ma¹,
Xiaoguang Mei¹,
Jun Huang¹ &
…
Fan Fan¹

170 Accesses
Explore all metrics

Abstract

Finding valid correspondences is of considerable significance to image matching, which has been regarded as the key of numerous vision-based tasks. Current methods usually have drawbacks in sets with high proportion of outliers. To address the problem, given a set of putative correspondences in two images, this paper proposes a novel framework (named SAH-Net) to remove outliers and recover camera pose through essential matrix using an end-to-end network. The proposed SAH-Net is hierarchical with a multi-scale structure, which consists of correspondence level and cluster level. First, correspondence level takes advantage of two-view geometry to learn correspondence features. Next, in order to integrate structural information of the scene, correspondences are pooled via a self-attention method. Additionally, SAH-Net applies a spatial correlation operation after the clustering, separating features into segments and learning the spatial characteristics of clustered nodes. Finally, clusters have been integrated with spatial information, and they are recovered to original scale via a learned upsampling operation. Extensive experiments are conducted on remote sensing image registration, general image matching (outdoor and indoor image datasets respectively) and loop closure detection, which demonstrate the excellence of SAH-Net in mismatch removal and relative pose estimation compared to other state-of-the-art competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ECO-TR: Efficient Correspondences Finding via Coarse-to-Fine Refinement

Two-view correspondence learning via complex information extraction

Article 25 November 2021

PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence

Data Availability

The data that support the findings of this study are available from the first author upon reasonable request.

References

Ma J, Zhao J, Jiang J, Zhou H, Guo X (2019) Locality preserving matching. Int J Comput Vision 127(5):512–531
Article MathSciNet MATH Google Scholar
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: A survey Inf Fusion 45:153–178
Google Scholar
Ma J, Jiang X, Fan A, Jiang J, Yan J (2021) Image matching from handcrafted to deep features: A survey. Int J Comput Vision 129(1):23–79
Article MathSciNet MATH Google Scholar
Revaud J, De Souza C, Humenberger M, Weinzaepfel P (2019) R2d2:Reliable and repeatable detector and descriptor. Adv Neural Inf Proc Syst 32
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Li D, He K, Wang L, Zhang D (2022) Local feature extraction network with high correspondences for 3d point cloud registration. Appl Intelligence 1–12
Mei S, Ma Y, Mei X, Huang J, Fan F (2022) S2-net: Self-supervision guided feature representation learning for cross-modality images. IEEE/-CAA Journal of Automatica Sinica 9(10):1883–1885
Article Google Scholar
Yi KM, Trulls E, Ono Y, Lepetit V, Salzmann M, Fua P (2018) Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang Y, Yi J, Chen Y, Dai Z, Han F, Cao S (2022) Pose estimation for workpieces in complex stacking industrial scene based on rgb images. Appl Intelligence 52(8):8757–8769
Article Google Scholar
Kamranian Z, Sadeghian H, Naghsh Nilchi AR, Mehrandezh M (2021) Fast, yet robust end-to-end camera pose estimation for robotic applications. Appl Intelligence 51(6):3581–3599
Article Google Scholar
Longuet-Higgins HC (1981) A computer algorithm for reconstructing a scene from two projections. Nature 293(5828):133–135
Article Google Scholar
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6):381–395
Article MathSciNet Google Scholar
Zhang J, Sun D, Luo Z, Yao A, Zhou L, Shen T, Chen Y, Quan L, Liao H (2019) Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhou D, Zhang H, Yang K, Liu L, Yan H, Xu X, Zhang Z, Yan S (2022) Learning to synthesize compatible fashion items using semantic alignment and collocation classification: An outfit generation framework. IEEE Transactions on Neural Networks and Learning Systems
Dong L, Zhang H, Yang K, Zhou D, Shi J, Ma J (2022) Crowd counting by using top-k relations: A mixed ground-truth cnn framework. IEEE Transactions on Consumer Electronics 68(3):307–316
Article Google Scholar
Tang L, Deng Y, Ma Y, Huang J, Ma J (2022) Superfusion: A versatile image registration and fusion network with semantic awareness. IEEE/CAA Journal of Automatica Sinica 9(12):2121–2137
Article Google Scholar
Barath D, Matas J (2018) Graph-cut ransac. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Barath D, Matas J, Noskova J (2019) Magsac: marginalizing sample consensus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Barath, D, Noskova, J, Ivashechkin, M, Matas, J (2020) Magsac++, a fast, reliable and accurate robust estimator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ma J, Zhao J, Tian J, Yuille AL, Tu Z (2014) Robust point matching via vector field consensus. IEEE Transactions on Image Processing 23(4):1706–1721
Article MathSciNet MATH Google Scholar
Ma J, Ma Y, Zhao J, Tian J (2014) Image feature matching via progressive vector field consensus. IEEE Signal Processing Letters 22(6):767–771
Article Google Scholar
Ma J, Wu J, Zhao J, Jiang J, Zhou H, Sheng QZ (2018) Nonrigid point set registration with robust transformation learning under manifold regularization. IEEE Transactions on Neural Networks and Learning Systems 30(12):3584–3597
Article MathSciNet Google Scholar
Bian JW, Lin WY, Liu Y, Zhang L, Yeung SK, Cheng MM, Reid I (2020) Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence. International Journal of Computer Vision 128:1580–1593
Article MathSciNet Google Scholar
Shao F, Liu Z, An J (2020) A discriminative point matching algorithm based on local structure consensus constraint. IEEE Geoscience and Remote Sensing Letters 18(8):1366–1370
Article Google Scholar
Cavalli L, Larsson V, Oswald MR, Sattler T, Pollefeys M (2020) Handcrafted outlier detection revisited. In: Proceedings of the European Conference on Computer Vision
Ma J, Li Z, Zhang K, Shao Z, Xiao G (2022) Robust feature matching via neighborhood manifold representation consensus. ISPRS Journal of Photogrammetry and Remote Sensing 183:196–209
Article Google Scholar
Li Z, Ma Y, Mei X, Huang J, Ma J (2022) Guided neighborhood affine subspace embedding for feature matching. Pattern Recognition 124:108489
Article Google Scholar
Ranftl R, Koltun V (2018) Deep fundamental matrix estimation. In: Proceedings of the European Conference on Computer Vision 284–299
Yang R, Zhang J, Li B (2022) Estimating the fundamental matrix based on the end-to-end convolutional network. Appl Intelligence 1–12
Brachmann E, Rother C (2019) Neural-guided ransac: Learning where to sample model hypotheses. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Brachmann E, Krull A, Nowozin S, Shotton J, Michel F, Gumhold S, Rother C (2017) Dsac-differentiable ransac for camera localization. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Kluger F, Brachmann E, Ackermann H, Rother C, Yang MY, Rosenhahn B (2020) Consac: Robust multi-model fitting by conditional sample consensus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Plötz T, Roth S (2018) Neural nearest neighbors networks. Adv Neural Inf Processing Syst 31
Sun W, Jiang W, Trulls E, Tagliasacchi A, Yi KM (2020) Acne: Attentive context normalization for robust permutation-equivariant learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. Adv Neural Inf Processing Syst 31
Ma J, Jiang X, Jiang J, Zhao J, Guo X (2019) Lmr: Learning a two-class classifier for mismatch removal. IEEE Transactions on Image Processing 28(8):4045–4059
Article MathSciNet MATH Google Scholar
Sarlin PE, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhong Z, Xiao G, Zheng L, Lu Y, Ma J (2021) T-net: Effective permutation-equivariant network for two-view correspondence learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Liu X, Xiao G, Dai L, Zeng K, Yang C, Chen R (2021) Scsa-net: Presentation of two-view reliable correspondence learning via spatial-channel self-attention. Neurocomputing 431:137–147
Lee J, Lee I, Kang J (2019) Self-attention graph pooling. In: Proceedings of the International Conference on Machine Learning
Guo Q, Qiu X, Liu P, Xue X, Zhang Z (2020) Multi-scale self-attention for text classification. Proceedings of the AAAI Conference on Artificial Intelligence 34:7847–7854
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2):91–110
Article Google Scholar
Li J, Hu Q, Ai M (2019) Rift: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Transactions on Image Processing 29:3296–3310
Article MATH Google Scholar
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2016) Yfcc100m: The new data in multimedia research. Communications of the ACM 59(2):64–73
Article Google Scholar
Xiao J, Owens A, Torralba A (2013) Sun3d: A database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE International Conference on Computer Vision
Heinly J, Schonberger JL, Dunn E, Frahm JM (2015) Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Ma J, Jiang J, Zhou H, Zhao J, Guo X (2018) Guided locality preserving feature matching for remote sensing image registration. IEEE Transactions on Geoscience and Remote Sensing 56(8):4435–4447
Article Google Scholar
Wang Z (2022) Recognition of occluded objects by slope difference distribution features. Applied Soft Computing 120:108622
Article Google Scholar
Zhang K, Jiang X, Ma J (2022) Appearance-based loop closure detection via locality-driven accurate motion field learning. IEEE Transactions on Intelligent Transportation Systems 23(3):2350–2365
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62061160370, 62075169 and 62003247; Hubei Province Key Research and Development Program under Grant 2021BBA235; Zhuhai Basic and Applied Basic Research Foundation under Grant ZH22017003200010PWC.

Author information

Authors and Affiliations

Electronic Information School, Wuhan University, No.299 Bayi Road, Wuhan, 430072, Hubei, China
Mingfan Chu, Yong Ma, Xiaoguang Mei, Jun Huang & Fan Fan

Authors

Mingfan Chu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Mei
View author publications
You can also search for this author in PubMed Google Scholar
Jun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingfan Chu.

Ethics declarations

Conflicts of interest

The authors declare that they have no known conflicts of interest that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chu, M., Ma, Y., Mei, X. et al. Learning-based correspondence classifier with self-attention hierarchical network. Appl Intell 53, 24360–24376 (2023). https://doi.org/10.1007/s10489-023-04789-w

Download citation

Accepted: 11 June 2023
Published: 24 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04789-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning-based correspondence classifier with self-attention hierarchical network

Abstract

Access this article

Similar content being viewed by others

ECO-TR: Efficient Correspondences Finding via Coarse-to-Fine Refinement

Two-view correspondence learning via complex information extraction

PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning-based correspondence classifier with self-attention hierarchical network

Abstract

Access this article

Similar content being viewed by others

ECO-TR: Efficient Correspondences Finding via Coarse-to-Fine Refinement

Two-view correspondence learning via complex information extraction

PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation