research-article

Semantic-assisted Unified Network for Feature Point Extraction and Matching

Authors:

Sheng LiAuthors Info & Claims

VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

Article No.: 12, Pages 1 - 9

https://doi.org/10.1145/3574131.3574433

Published: 13 January 2023 Publication History

Abstract

Feature point matching between two images is an essential part of 3D reconstruction, augmented reality, panorama stitching, etc. The quality of the initial feature point matching stage greatly affects the overall performance of a system. We present a unified feature point extraction-matching method, making use of semantic segmentation results to constrain feature point matching. To integrate high-level semantic information into feature points efficiently, we propose a unified feature point extraction and matching network, called SP-Net, which can detect feature points and generate feature descriptors simultaneously and perform feature point matching with accurate outcomes. Compared with previous works, our method can extract multi-scale context of the image, including shallow information and high-level semantic information of the local area, which is more stable when handling complex conditions such as changing illumination or large viewpoint. In evaluating the feature-matching benchmark, our method shows superior performance over the state-of-art method. As further validation, we propose SP-Net++ as an extension for 3D reconstruction. The experimental results show that our neural network can obtain accurate feature point positioning and robust feature matching to recover more cameras and get a well-shaped point cloud. Our semantic-assisted method can improve the stability of feature points as well as specific applicability for complex scenes.

References

[1]

2019. Phototourism Challenge, CVPR 2019 Image Matching Workshop. Accessed August 1, 2019. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/INTERNAL STYLE-FILE ERROR

[2]

V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk. 2017. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3852–3861. https://doi.org/10.1109/CVPR.2017.410

[3]

V. Balntas, E. Riba, D. Ponsa, and K. Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In British Machine Vision Conference (BMVC).

[4]

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded up robust features. Computer Vision-ECCV 2006 3951, 404–417. https://doi.org/10.1007/11744023_32

Digital Library

[5]

Florian Bernard, Johan Thunberg, Paul Swoboda, and Christian Theobalt. 2019. Higher-order Projected Power Iterations for Scalable Multi-Matching. In 2019 IEEE International Conference on Computer Vision (ICCV).

[6]

J. Bian, W. Y. Lin, Y. Matsushita, S. K. Yeung, and M. M. Cheng. 2017. GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence. In IEEE Conference on Computer Vision & Pattern Recognition.

[7]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 833–851.

Digital Library

[8]

D. DeTone, T. Malisiewicz, and A. Rabinovich. 2018. SuperPoint: Self-Supervised Interest Point Detection and Description. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 337–33712. https://doi.org/10.1109/CVPRW.2018.00060

[9]

M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. 2019. D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8084–8093. https://doi.org/10.1109/CVPR.2019.00828

[10]

Jie Hu, Li Shen, and Gang Sun. 2017. Squeeze-and-Excitation Networks. CoRR abs/1709.01507(2017). arxiv:1709.01507http://arxiv.org/abs/1709.01507

[11]

Zhiming Hu, Andreas Bulling, Sheng Li, and Guoping Wang. 2021. FixationNet: Forecasting eye fixations in task-oriented virtual environments. IEEE Transactions on Visualization and Computer Graphics 27, 5(2021), 2681–2690.

[12]

Z. Li and N. Snavely. 2018. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2041–2050. https://doi.org/10.1109/CVPR.2018.00218

[13]

J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965

[14]

David Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60 (11 2004), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

Digital Library

[15]

Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. 2019. ContextDesc: Local Descriptor Augmentation With Cross-Modality Context. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]

Zixin Luo, Tianwei Shen, Lei Zhou, Siyu Zhu, Runze Zhang, Yao Yao, Tian Fang, and Long Quan. 2018. GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 170–185.

Digital Library

[17]

Zixin Luo, Lei Zhou, Xuyang Bai, Hongkai Chen, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. 2020. Aslfeat: Learning local features of accurate shape and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6589–6598.

[18]

Krystian Mikolajczyk and Cordelia Schmid. 2005. A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 10 (Oct. 2005), 1615–1630. https://doi.org/10.1109/TPAMI.2005.188

Digital Library

[19]

Anastasiia Mishchuk, Dmytro Mishkin, Filip Radenovic, and Jiri Matas. 2017. Working hard to know your neighbor's margins: Local descriptor learning loss. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2017/file/831caa1b600f852b7844499430ecac17-Paper.pdf

[20]

Yuki Ono, Eduard Trulls, Pascal Fua, and Kwang Moo Yi. 2018. LF-Net: Learning Local Features from Images. CoRR abs/1805.09662(2018). arxiv:1805.09662http://arxiv.org/abs/1805.09662

[21]

I. Rocco, M. Cimpoi, R. Arandjelović, A. Torii, T. Pajdla, and J. Sivic. 2018. Neighbourhood Consensus Networks. (2018).

[22]

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2020. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4938–4947.

[23]

J. L. Schönberger and J. Frahm. 2016. Structure-from-Motion Revisited. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4104–4113. https://doi.org/10.1109/CVPR.2016.445

[24]

Xuelun Shen, Cheng Wang, Xin Li, Zenglei Yu, Jonathan Li, Chenglu Wen, Ming Cheng, and Zijian He. 2019. RF-Net: An End-To-End Image Matching Network Based on Receptive Field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]

E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. 2015. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In 2015 IEEE International Conference on Computer Vision (ICCV). 118–126. https://doi.org/10.1109/ICCV.2015.22

Digital Library

[26]

Y. Tian, B. Fan, and F. Wu. 2017. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6128–6136. https://doi.org/10.1109/CVPR.2017.649

[27]

Q. Wang, X. Zhou, and K. Daniilidis. 2017. Multi-Image Semantic Matching by Mining Consistent Features. (2017).

[28]

Simon. A. J. Winder and M. Brown. 2007. Learning Local Image Descriptors. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. 1–8. https://doi.org/10.1109/CVPR.2007.382971

[29]

Xufeng Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. 2015. MatchNet: Unifying feature and metric learning for patch-based matching. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3279–3286. https://doi.org/10.1109/CVPR.2015.7298948

[30]

Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. LIFT: Learned Invariant Feature Transform. In Computer Vision – ECCV 2016, Bastian Leibe, Nicu Matas, Jiriand Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 467–483.

[31]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 334–349.

Digital Library

[32]

S. Zagoruyko and N. Komodakis. 2015. Learning to compare image patches via convolutional neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4353–4361. https://doi.org/10.1109/CVPR.2015.7299064

Index Terms

Semantic-assisted Unified Network for Feature Point Extraction and Matching
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

Feature extraction and fusion network for salient object detection
Abstract
In the salient object detection (SOD) models based on convolutional neural network (CNN), the high-level semantic features and low-level features of the image are effectively fused and complementary, which can effectively improve the performance ...
Deep feature extraction with tri-channel textual feature map for text classification
Abstract
The complexity and diversity of texts make it difficult for shallow text classification models to capture deeper text features. Therefore, this paper takes advantage of the BiLSTM-CNN hybrid network based on the self-attention mechanism to ...
Highlights
- We propose a novel text feature representation in the form of a tri-channel textual feature map.
- We designed a deep feature extraction network to capture deeper features in the text.
- We construct a deep feature extraction text ...
Feature Extraction and Matching for Plant Images
CIS '09: Proceedings of the 2009 International Conference on Computational Intelligence and Security - Volume 01

In this paper, some improvements, including the pyramid frame in image scale space, key point locating method for the SIFT (scale invariant feature transform) algorithm, are developed. In view of the characteristic of plant images, the calculating ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

December 2022

284 pages

ISBN:9798400700316

DOI:10.1145/3574131

Editors:
Enhua Wu
SKLCS, Chinese Academy of Sciences / FST, University of Macau / Guangzhou Greater Bay Area Virtual Reality Research Institute, China
,
Lionel Ming-Shuan Ni
The Hong Kong University of Science and Technology (Guangzhou) & The Hong Kong University of Science and Technology, China
,
Zhigeng Pan
Nanjing University of Information Science & Technology / Hangzhou Normal University, China
,
Daniel Thalmann
École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
,
Ping Li
The Hong Kong Polytechnic University, Hong Kong, China
,
Charlie C.L. Wang
The University of Manchester, U.K.
,
Lei Zhu
The Hong Kong University of Science and Technology (Guangzhou) & The Hong Kong University of Science and Technology, China
,
Minghao Yang
Institute of Automation, Chinese Academy of Sciences, China

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key R&D Program of China

Conference

VRCAI '22

Sponsor:

SIGGRAPH

VRCAI '22: The 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

December 27 - 29, 2022

Guangzhou, China

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
82
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten