skip to main content
10.1145/3574131.3574433acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Semantic-assisted Unified Network for Feature Point Extraction and Matching

Published: 13 January 2023 Publication History

Abstract

Feature point matching between two images is an essential part of 3D reconstruction, augmented reality, panorama stitching, etc. The quality of the initial feature point matching stage greatly affects the overall performance of a system. We present a unified feature point extraction-matching method, making use of semantic segmentation results to constrain feature point matching. To integrate high-level semantic information into feature points efficiently, we propose a unified feature point extraction and matching network, called SP-Net, which can detect feature points and generate feature descriptors simultaneously and perform feature point matching with accurate outcomes. Compared with previous works, our method can extract multi-scale context of the image, including shallow information and high-level semantic information of the local area, which is more stable when handling complex conditions such as changing illumination or large viewpoint. In evaluating the feature-matching benchmark, our method shows superior performance over the state-of-art method. As further validation, we propose SP-Net++ as an extension for 3D reconstruction. The experimental results show that our neural network can obtain accurate feature point positioning and robust feature matching to recover more cameras and get a well-shaped point cloud. Our semantic-assisted method can improve the stability of feature points as well as specific applicability for complex scenes.

References

[1]
2019. Phototourism Challenge, CVPR 2019 Image Matching Workshop. Accessed August 1, 2019. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/INTERNAL STYLE-FILE ERROR
[2]
V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk. 2017. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3852–3861. https://doi.org/10.1109/CVPR.2017.410
[3]
V. Balntas, E. Riba, D. Ponsa, and K. Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In British Machine Vision Conference (BMVC).
[4]
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded up robust features. Computer Vision-ECCV 2006 3951, 404–417. https://doi.org/10.1007/11744023_32
[5]
Florian Bernard, Johan Thunberg, Paul Swoboda, and Christian Theobalt. 2019. Higher-order Projected Power Iterations for Scalable Multi-Matching. In 2019 IEEE International Conference on Computer Vision (ICCV).
[6]
J. Bian, W. Y. Lin, Y. Matsushita, S. K. Yeung, and M. M. Cheng. 2017. GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence. In IEEE Conference on Computer Vision & Pattern Recognition.
[7]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 833–851.
[8]
D. DeTone, T. Malisiewicz, and A. Rabinovich. 2018. SuperPoint: Self-Supervised Interest Point Detection and Description. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 337–33712. https://doi.org/10.1109/CVPRW.2018.00060
[9]
M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. 2019. D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8084–8093. https://doi.org/10.1109/CVPR.2019.00828
[10]
Jie Hu, Li Shen, and Gang Sun. 2017. Squeeze-and-Excitation Networks. CoRR abs/1709.01507(2017). arxiv:1709.01507http://arxiv.org/abs/1709.01507
[11]
Zhiming Hu, Andreas Bulling, Sheng Li, and Guoping Wang. 2021. FixationNet: Forecasting eye fixations in task-oriented virtual environments. IEEE Transactions on Visualization and Computer Graphics 27, 5(2021), 2681–2690.
[12]
Z. Li and N. Snavely. 2018. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2041–2050. https://doi.org/10.1109/CVPR.2018.00218
[13]
J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
[14]
David Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60 (11 2004), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
[15]
Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. 2019. ContextDesc: Local Descriptor Augmentation With Cross-Modality Context. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16]
Zixin Luo, Tianwei Shen, Lei Zhou, Siyu Zhu, Runze Zhang, Yao Yao, Tian Fang, and Long Quan. 2018. GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 170–185.
[17]
Zixin Luo, Lei Zhou, Xuyang Bai, Hongkai Chen, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. 2020. Aslfeat: Learning local features of accurate shape and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6589–6598.
[18]
Krystian Mikolajczyk and Cordelia Schmid. 2005. A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 10 (Oct. 2005), 1615–1630. https://doi.org/10.1109/TPAMI.2005.188
[19]
Anastasiia Mishchuk, Dmytro Mishkin, Filip Radenovic, and Jiri Matas. 2017. Working hard to know your neighbor's margins: Local descriptor learning loss. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2017/file/831caa1b600f852b7844499430ecac17-Paper.pdf
[20]
Yuki Ono, Eduard Trulls, Pascal Fua, and Kwang Moo Yi. 2018. LF-Net: Learning Local Features from Images. CoRR abs/1805.09662(2018). arxiv:1805.09662http://arxiv.org/abs/1805.09662
[21]
I. Rocco, M. Cimpoi, R. Arandjelović, A. Torii, T. Pajdla, and J. Sivic. 2018. Neighbourhood Consensus Networks. (2018).
[22]
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2020. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4938–4947.
[23]
J. L. Schönberger and J. Frahm. 2016. Structure-from-Motion Revisited. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4104–4113. https://doi.org/10.1109/CVPR.2016.445
[24]
Xuelun Shen, Cheng Wang, Xin Li, Zenglei Yu, Jonathan Li, Chenglu Wen, Ming Cheng, and Zijian He. 2019. RF-Net: An End-To-End Image Matching Network Based on Receptive Field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25]
E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. 2015. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In 2015 IEEE International Conference on Computer Vision (ICCV). 118–126. https://doi.org/10.1109/ICCV.2015.22
[26]
Y. Tian, B. Fan, and F. Wu. 2017. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6128–6136. https://doi.org/10.1109/CVPR.2017.649
[27]
Q. Wang, X. Zhou, and K. Daniilidis. 2017. Multi-Image Semantic Matching by Mining Consistent Features. (2017).
[28]
Simon. A. J. Winder and M. Brown. 2007. Learning Local Image Descriptors. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. 1–8. https://doi.org/10.1109/CVPR.2007.382971
[29]
Xufeng Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. 2015. MatchNet: Unifying feature and metric learning for patch-based matching. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3279–3286. https://doi.org/10.1109/CVPR.2015.7298948
[30]
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. LIFT: Learned Invariant Feature Transform. In Computer Vision – ECCV 2016, Bastian Leibe, Nicu Matas, Jiriand Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 467–483.
[31]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 334–349.
[32]
S. Zagoruyko and N. Komodakis. 2015. Learning to compare image patches via convolutional neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4353–4361. https://doi.org/10.1109/CVPR.2015.7299064

Index Terms

  1. Semantic-assisted Unified Network for Feature Point Extraction and Matching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry
    December 2022
    284 pages
    ISBN:9798400700316
    DOI:10.1145/3574131
    • Editors:
    • Enhua Wu,
    • Lionel Ming-Shuan Ni,
    • Zhigeng Pan,
    • Daniel Thalmann,
    • Ping Li,
    • Charlie C.L. Wang,
    • Lei Zhu,
    • Minghao Yang
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Feature point
    2. deep learning
    3. extraction
    4. matching
    5. semantic information

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Key R&D Program of China

    Conference

    VRCAI '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 51 of 107 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 82
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media