Elsevier

Neurocomputing

Volume 452, 10 September 2021, Pages 159-168
Neurocomputing

TSSN-Net: Two-step Sparse Switchable Normalization for learning correspondences with heavy outliers

https://doi.org/10.1016/j.neucom.2021.04.093Get rights and content

Abstract

In this paper, we solve the problem of feature matching by designing an end-to-end network (called TSSN-Net). Given putative correspondences of feature points in two views, existing deep learning based methods formulate the feature matching problem as a binary classification problem. In these methods, a normalizer plays an important role in the networks. However, they adopt the same normalizer in all normalization layers of the entire networks, which will result in suboptimal performance. To address this problem, we propose a Two-step Sparse Switchable Normalization Block, which involves the advantage of adaptive normalization for different convolution layers from Sparse Switchable Normalization and robust global context information from Context Normalization. Moreover, to capture local information of correspondences, we propose a Multi-Scale Correspondence Grouping algorithm, by defining a multi-scale neighborhood representation, to search for consistent neighbors of each correspondence. Finally, with a series of convolution layers, the end-to-end TSSN-Net is proposed to learn correspondences with heavy outliers for feature matching. Our experimental results have shown that our network achieves the state-of-the-art performance on benchmark datasets.

Introduction

Establishing reliable feature correspondences is a fundamental problem in computer vision [1], [2], [3], e.g., Structure from Motion (SfM) [4], [5], Simultaneous Localization and Mapping (SLAM) [6], [7], Image Fusion [8] and geometric model fitting [9], [10], [11], [12]. This problem relies on two steps, i.e., correspondence generation and correspondence selection [13]. The first step can be dealt with matching local key-point features (e.g., SIFT [14], Hessian-Affine Detector [15]). However, the initial correspondences are often inevitable to be contaminated by outliers (see Fig. 1) due to various problems, e.g., local key-point localization errors and ambiguities of the local descriptors. Thus, many methods (e.g., [13], [16], [17], [18], [19], [20], [21], [22], [23]) have been proposed to remove outliers.

Among current existing feature matching methods, deep learning based methods are the most popular technique due to its effectiveness, e.g., LGC-Net [20] and NM-Net [13]. LGC-Net introduces Multi-Layer Perceptrons (MLPs) and Context Normalization (CN) for feature matching. NM-Net includes a compatibility-specific mining algorithm to successively extract and aggregate local features for a hierarchical deep learning network. Despite their great successes, they adopt the same normalizer in all normalization layers of the entire networks, which will result in suboptimal performance.

Recently, Sparse Switchable Normalization (SSN) [24], which formulates the optimization problem into the differentiable feed-forward computation problem by a sparse learning algorithm, is proposed to adaptively select normalizers for the tasks of image classification, semantic segmentation and action recognition. Specifically, SSN employs a sparse learning algorithm to adaptively select the most suitable normalization, from Batch Normalization (BN) [25], Instance Normalization (IN) [26], and Layer Normalization (LN) [27], for each normalization layer of a deep network. As we know, different normalizers have their own advantages and disadvantages. For example, BN acts as a feature regularizer and improves generalization of the network, but it is sensitive to the value of batch size (i.e., the number of samples per GPU) since the mean and standard deviation of BN are derived from the batch size. In contrast, IN and LN are not sensitive to the value of batch size but they have a limited generalization ability of the network for feature matching. Thus, through adaptively select normalizers, SSN can inherit all benefits of normalizers (i.e., BN, IN, LN) and also has good generalization ability.

In this paper, we propose a hierarchical deep learning network (called TSSN-Net) to learn correspondences with heavy outliers for feature matching. The architecture of the proposed TSSN-Net is shown in Fig. 2. Specifically, we propose to add SSN into the deep network, to improve the performance of current existing deep networks for feature matching. Note that the performance of SSN depends on the effectiveness of the sparse learning algorithm, which trains deep models with sparse constraints. However, the input data for feature matching problem has not obvious sparse space. To address this issue, we propose a novel Two-step Sparse Switchable Normalization block (TSSN Block), which introduces CN to construct the sparse space for SSN. Thus, TSSN Block is able to inherit all benefits from different normalizers for feature matching.

In addition, to extract local features, we search for consistent neighbors of each correspondence according to the compatibility-specific distance, which is able to find more reliable neighbors than spatial distance for feature matching. To further improve the generalization ability of the proposed network, we propose a Multi-Scale Correspondence Grouping algorithm, which searches for neighbors of each correspondence and integrates correspondence and neighbors to a graph, by a novel multi-scale neighborhood representation.

The contributions are summarized as follows:

  • We develop an end-to-end network to handle the feature matching problem with heavy outliers. The proposed network is able to achieve the state-of-the-art performance on three benchmark datasets with various proportions of outliers.

  • We propose a novel normalization block, which effectively involves the global context information and adaptively switches normalizers in different convolution layers, for feature matching. The proposed block is able to significantly improve the performance of our network since it inherits all benefits of from different normalizers.

  • We propose a correspondence grouping algorithm to capture local information of correspondences. The proposed algorithm searches for neighbors based on compatibility-specific distance and integrate correspondences with neighbors to a graph, which is permutation-equivariant to initial correspondences. Therefore, our network is insensitive to the order of correspondences and has good generalization ability for feature matching.

The rest of the paper is organized as follows: We briefly review the related work and background material in Section 2. We present our network for feature matching in Section 3. In Section 4, we describe and analyze the experimental results on three benchmark datasets. At last, we draw conclusions in Section 6.

Section snippets

Related work

In this section, we briefly review the related feature matching methods, including parametric methods, non-parametric methods and learning based methods, and we also review some normalization techniques that the proposed method is based on.

Overview

As shown in Fig. 2, our network consists of two important modules, i.e., TSSN Block and Multi-Scale Correspondence Grouping algorithm (called MSCG). We use a hierarchically network with TSSN Block, which can capture global context information and adaptively select normalizers in different convolution layers for optimal performance. In addition, our MSCG can exploit multi-scale local information.

The details of our network are described as follows: The feature learning and aggregation layers are

Experimental results

In the section, we evaluate the performance of our TSSN-Net on three benchmark datasets: NARROW, WIDE [13] and COLMAP [45]. All experiments are performed on Linux 3.10.0 with NVIDIA TESLA P100 GPUs.

Discussion

Our network includes two main important parts, i.e., TSSN Block and MSCG. Recall that, TSSN-Net1 includes TSSN Block, and it gets the global context information from the first normalization step and adaptively selects normalizers for different convolution layers, while NM-Net includes CN Block. Thus, we can compare TSSN-Net1 and NM-Net to show the effectiveness. As shown in Table 2 and Fig. 6, the proposed TSSN block (CN+SSN) can significantly improve the performance of plain CN Block (CN

Conclusion

In this paper, we propose an end-to-end network (called TSSN-Net) to identify inliers and outliers for two-view feature matching problems. The proposed TSSN-Net includes two innovations, i.e., TSSN Block and MSCG. Specifically, the proposed TSSN Block can help to extract global information and adaptively select normalizers for different convolution layers, which can improve the performance on algorithm accuracy. Also, we propose MSCG to search for correspondence neighbors and integrate

CRediT authorship contribution statement

Zhen Zhong: Conceptualization, Methodology, Software, Writing - original draft. Guobao Xiao: Methodology, Writing - review & editing, Supervision. Kun Zeng: Visualization, Investigation. Shiping Wang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 62072223, by the Natural Science Foundation of Fujian Province under Grants 2020J01829, and by the Project of Central Leading Local under Grants 2020L3024.

Zhen Zhong received the bachelor’s degree in traditional Chinese medicine from the Hunan University of Chinese Medicine, Hunan, China, in 2018. He is currently pursuing the M.S. degree with the Department of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, where he is also attached to the Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University. His research interests include computer vision, machine learning, and

References (45)

  • C. Cadena et al.

    Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age

    IEEE Transactions on Robotics

    (2016)
  • G. Xiao et al.

    Superpixel-guided two-view deterministic geometric model fitting

    International Journal of Computer Vision

    (2019)
  • F. Kluger et al.

    Consac: Robust multi-model fitting by conditional sample consensus

  • G. Xiao, H. Luo, K. Zeng, L. Wei, J. Ma, Robust feature matching for remote sensing image registration via guided...
  • C. Zhao et al.

    Nm-net: Mining reliable neighbors for robust feature correspondences

  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    International Journal of Computer Vision

    (2004)
  • K. Mikolajczyk et al.

    Scale & affine invariant interest point detectors

    International Journal of Computer Vision

    (2004)
  • M.A. Fischler et al.

    Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography

    Communications of The ACM

    (1981)
  • J. Ma et al.

    Locality preserving matching

    International Journal of Computer Vision

    (2019)
  • K. Moo Yi et al.

    Learning to find good correspondences

  • D. Barath et al.

    Magsac++, a fast, reliable and accurate robust estimator

  • W. Sun et al.

    Acne: Attentive context normalization for robust permutation-equivariant learning

  • Cited by (7)

    View all citing articles on Scopus

    Zhen Zhong received the bachelor’s degree in traditional Chinese medicine from the Hunan University of Chinese Medicine, Hunan, China, in 2018. He is currently pursuing the M.S. degree with the Department of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, where he is also attached to the Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University. His research interests include computer vision, machine learning, and pattern recognition.

    Guobao Xiao received the B.S. degree in information and computing science from Fujian Normal University, China, in 2013 and the Ph.D. degree in Computer Science and Technology from Xiamen University, China, in 2016. From 2016-2018, he was a Postdoctoral Fellow in the School of Aerospace Engineering at Xiamen University, China. He is currently a Professor at Minjiang University, China. He has published over 30 papers in the international journals and conferences including IEEE TPAMI/TIP/TITS/TIE, IJCV, PR, ICCV, ECCV, ACCV, AAAI, etc. His research interests include machine learning, computer vision and pattern recognition. He has been awarded the best PhD thesis in Fujian Province and the best PhD thesis award in China Society of Image and Graphics (a total of ten winners in China). He also served on the program committee (PC) of CVPR, ICCV, ECCV, AAAI, WACV, etc. He was the General Chair for IEEE BDCLOUD 2019.

    Kun Zeng received Ph. D degrees from Department of Computer Science, Xiamen University, Xiamen, China, in 2015, where he was a Postdoctoral Fellow with the Department of Electronic Science, from 2016 to 2019. He is currently a Lecturer with Minjiang University, China. His current research interests include image processing, machine learning and medical image reconstruction.

    Shiping Wang received the Ph.D. degree from the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China, in 2014.

    He was a Research Fellow at Nanyang Technological University, Singapore, from 2015 to 2016. He is currently a Full Professor and Qishan Scholar with the College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China. His research interests include machine learning, computer vision, and granular computing.

    View full text