TSSN-Net: Two-step Sparse Switchable Normalization for learning correspondences with heavy outliers

doi:10.1016/j.neucom.2021.04.093

Neurocomputing

Volume 452, 10 September 2021, Pages 159-168

https://doi.org/10.1016/j.neucom.2021.04.093 Get rights and content

Abstract

In this paper, we solve the problem of feature matching by designing an end-to-end network (called TSSN-Net). Given putative correspondences of feature points in two views, existing deep learning based methods formulate the feature matching problem as a binary classification problem. In these methods, a normalizer plays an important role in the networks. However, they adopt the same normalizer in all normalization layers of the entire networks, which will result in suboptimal performance. To address this problem, we propose a Two-step Sparse Switchable Normalization Block, which involves the advantage of adaptive normalization for different convolution layers from Sparse Switchable Normalization and robust global context information from Context Normalization. Moreover, to capture local information of correspondences, we propose a Multi-Scale Correspondence Grouping algorithm, by defining a multi-scale neighborhood representation, to search for consistent neighbors of each correspondence. Finally, with a series of convolution layers, the end-to-end TSSN-Net is proposed to learn correspondences with heavy outliers for feature matching. Our experimental results have shown that our network achieves the state-of-the-art performance on benchmark datasets.

Introduction

Establishing reliable feature correspondences is a fundamental problem in computer vision [1], [2], [3], e.g., Structure from Motion (SfM) [4], [5], Simultaneous Localization and Mapping (SLAM) [6], [7], Image Fusion [8] and geometric model fitting [9], [10], [11], [12]. This problem relies on two steps, i.e., correspondence generation and correspondence selection [13]. The first step can be dealt with matching local key-point features (e.g., SIFT [14], Hessian-Affine Detector [15]). However, the initial correspondences are often inevitable to be contaminated by outliers (see Fig. 1) due to various problems, e.g., local key-point localization errors and ambiguities of the local descriptors. Thus, many methods (e.g., [13], [16], [17], [18], [19], [20], [21], [22], [23]) have been proposed to remove outliers.

Among current existing feature matching methods, deep learning based methods are the most popular technique due to its effectiveness, e.g., LGC-Net [20] and NM-Net [13]. LGC-Net introduces Multi-Layer Perceptrons (MLPs) and Context Normalization (CN) for feature matching. NM-Net includes a compatibility-specific mining algorithm to successively extract and aggregate local features for a hierarchical deep learning network. Despite their great successes, they adopt the same normalizer in all normalization layers of the entire networks, which will result in suboptimal performance.

Recently, Sparse Switchable Normalization (SSN) [24], which formulates the optimization problem into the differentiable feed-forward computation problem by a sparse learning algorithm, is proposed to adaptively select normalizers for the tasks of image classification, semantic segmentation and action recognition. Specifically, SSN employs a sparse learning algorithm to adaptively select the most suitable normalization, from Batch Normalization (BN) [25], Instance Normalization (IN) [26], and Layer Normalization (LN) [27], for each normalization layer of a deep network. As we know, different normalizers have their own advantages and disadvantages. For example, BN acts as a feature regularizer and improves generalization of the network, but it is sensitive to the value of batch size (i.e., the number of samples per GPU) since the mean and standard deviation of BN are derived from the batch size. In contrast, IN and LN are not sensitive to the value of batch size but they have a limited generalization ability of the network for feature matching. Thus, through adaptively select normalizers, SSN can inherit all benefits of normalizers (i.e., BN, IN, LN) and also has good generalization ability.

In this paper, we propose a hierarchical deep learning network (called TSSN-Net) to learn correspondences with heavy outliers for feature matching. The architecture of the proposed TSSN-Net is shown in Fig. 2. Specifically, we propose to add SSN into the deep network, to improve the performance of current existing deep networks for feature matching. Note that the performance of SSN depends on the effectiveness of the sparse learning algorithm, which trains deep models with sparse constraints. However, the input data for feature matching problem has not obvious sparse space. To address this issue, we propose a novel Two-step Sparse Switchable Normalization block (TSSN Block), which introduces CN to construct the sparse space for SSN. Thus, TSSN Block is able to inherit all benefits from different normalizers for feature matching.

In addition, to extract local features, we search for consistent neighbors of each correspondence according to the compatibility-specific distance, which is able to find more reliable neighbors than spatial distance for feature matching. To further improve the generalization ability of the proposed network, we propose a Multi-Scale Correspondence Grouping algorithm, which searches for neighbors of each correspondence and integrates correspondence and neighbors to a graph, by a novel multi-scale neighborhood representation.

The contributions are summarized as follows:

•
We develop an end-to-end network to handle the feature matching problem with heavy outliers. The proposed network is able to achieve the state-of-the-art performance on three benchmark datasets with various proportions of outliers.
•
We propose a novel normalization block, which effectively involves the global context information and adaptively switches normalizers in different convolution layers, for feature matching. The proposed block is able to significantly improve the performance of our network since it inherits all benefits of from different normalizers.
•
We propose a correspondence grouping algorithm to capture local information of correspondences. The proposed algorithm searches for neighbors based on compatibility-specific distance and integrate correspondences with neighbors to a graph, which is permutation-equivariant to initial correspondences. Therefore, our network is insensitive to the order of correspondences and has good generalization ability for feature matching.

The rest of the paper is organized as follows: We briefly review the related work and background material in Section 2. We present our network for feature matching in Section 3. In Section 4, we describe and analyze the experimental results on three benchmark datasets. At last, we draw conclusions in Section 6.

Section snippets

Related work

In this section, we briefly review the related feature matching methods, including parametric methods, non-parametric methods and learning based methods, and we also review some normalization techniques that the proposed method is based on.

Overview

As shown in Fig. 2, our network consists of two important modules, i.e., TSSN Block and Multi-Scale Correspondence Grouping algorithm (called MSCG). We use a hierarchically network with TSSN Block, which can capture global context information and adaptively select normalizers in different convolution layers for optimal performance. In addition, our MSCG can exploit multi-scale local information.

The details of our network are described as follows: The feature learning and aggregation layers are

Experimental results

In the section, we evaluate the performance of our TSSN-Net on three benchmark datasets: NARROW, WIDE [13] and COLMAP [45]. All experiments are performed on Linux $3.10.0$ with NVIDIA TESLA $P 100$ GPUs.

Discussion

Our network includes two main important parts, i.e., TSSN Block and MSCG. Recall that, TSSN-Net1 includes TSSN Block, and it gets the global context information from the first normalization step and adaptively selects normalizers for different convolution layers, while NM-Net includes CN Block. Thus, we can compare TSSN-Net1 and NM-Net to show the effectiveness. As shown in Table 2 and Fig. 6, the proposed TSSN block (CN+SSN) can significantly improve the performance of plain CN Block (CN

Conclusion

In this paper, we propose an end-to-end network (called TSSN-Net) to identify inliers and outliers for two-view feature matching problems. The proposed TSSN-Net includes two innovations, i.e., TSSN Block and MSCG. Specifically, the proposed TSSN Block can help to extract global information and adaptively select normalizers for different convolution layers, which can improve the performance on algorithm accuracy. Also, we propose MSCG to search for correspondence neighbors and integrate

CRediT authorship contribution statement

Zhen Zhong: Conceptualization, Methodology, Software, Writing - original draft. Guobao Xiao: Methodology, Writing - review & editing, Supervision. Kun Zeng: Visualization, Investigation. Shiping Wang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 62072223, by the Natural Science Foundation of Fujian Province under Grants 2020J01829, and by the Project of Central Leading Local under Grants 2020L3024.

Zhen Zhong received the bachelor’s degree in traditional Chinese medicine from the Hunan University of Chinese Medicine, Hunan, China, in 2018. He is currently pursuing the M.S. degree with the Department of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, where he is also attached to the Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University. His research interests include computer vision, machine learning, and

References (45)

X. Jiang et al.
A review of multimodal image matching: Methods and applications
Information Fusion
(2021)
J. Ma et al.
Infrared and visible image fusion methods and applications: A survey
Information Fusion
(2019)
G. Xiao et al.
Robust geometric model fitting based on iterative hypergraph construction and partition
Neurocomputing
(2019)
D. Zeng et al.
Neighborhood geometry based feature matching for geostationary satellite remote sensing image
Neurocomputing
(2017)
S.M.M. Kahaki et al.
Deformation invariant image matching based on dissimilarity of spatial features
Neurocomputing
(2016)
J. Ma et al.
Image matching from handcrafted to deep features: A survey
International Journal of Computer Vision
(2021)
Y. Jin et al.
Image matching across wide baselines: From paper to practice
International Journal of Computer Vision
(2020)
C. Wu, Towards linear-time incremental structure from motion, in: 3DV, 2013, pp....
J.L. Schonberger et al.
Structure-from-motion revisited
R. Mur-Artal et al.
Orb-slam: a versatile and accurate monocular slam system
IEEE Transactions on Robotics
(2015)

C. Cadena et al.

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age

IEEE Transactions on Robotics

(2016)

G. Xiao et al.

Superpixel-guided two-view deterministic geometric model fitting

International Journal of Computer Vision

(2019)

F. Kluger et al.

Consac: Robust multi-model fitting by conditional sample consensus

G. Xiao, H. Luo, K. Zeng, L. Wei, J. Ma, Robust feature matching for remote sensing image registration via guided...

C. Zhao et al.

Nm-net: Mining reliable neighbors for robust feature correspondences

D.G. Lowe

Distinctive image features from scale-invariant keypoints

International Journal of Computer Vision

(2004)

K. Mikolajczyk et al.

Scale & affine invariant interest point detectors

International Journal of Computer Vision

(2004)

M.A. Fischler et al.

Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography

Communications of The ACM

(1981)

J. Ma et al.

Locality preserving matching

International Journal of Computer Vision

(2019)

K. Moo Yi et al.

Learning to find good correspondences

D. Barath et al.

Magsac++, a fast, reliable and accurate robust estimator

W. Sun et al.

Acne: Attentive context normalization for robust permutation-equivariant learning

Cited by (7)

Deep learning model based on urban multi-source data for predicting heavy metals (Cu, Zn, Ni, Cr) in industrial sewer networks
2022, Journal of Hazardous Materials
The high concentrations of heavy metals in municipal industrial sewer networks will seriously impact the microorganisms of the activated sludge in the wastewater treatment plant (WWTP), thus deteriorating the effluent quality and destroying the stability of sewage treatment. Therefore, timely prediction and early warning of heavy metal concentrations in industrial sewer networks is crucial. However, due to the complex sources of heavy metals in industrial sewer networks, traditional physical modeling and linear methods cannot establish an accurate prediction model. Herein, we developed a Gated Recurrent Unit (GRU) neural network model based on a deep learning algorithm for predicting the concentrations of heavy metals in industrial sewer networks. To train the GRU model, we used low-cost and easy-to-obtain urban multi-source data, including socio-environmental indicator data, air environmental indicator data, water quantity indicator data, and easily measurable water quality indicator data. The model was applied to predict the concentrations of heavy metals (Cu, Zn, Ni, and Cr) in the sewer networks of an industrial area in southern China. The results are compared with the commonly used Artificial Neural Network (ANN) model. In this study, it was shown that the GRU had better prediction performance for Cu, Zn, Ni, and Cr concentrations, with the average R² significantly increased by 12.35%, 11.94%, 9.21%, and 8.13%, respectively, compared to ANN predictions. The sensitivity analysis based on Shapley (SHAP) values revealed that conductivity (σ), temperature (T), pH, and sewage flow (Flow) contributed significantly to the prediction results of the model. Furthermore, the three input variables including air pressure (AP), land area (A), and population (Pop.) were removed without affecting the prediction performance of the model, which maximized the modeling efficiency and reduced the operational cost. This study provides an economical and feasible technical method for early warning of abnormal heavy metal concentrations in urban industrial sewer networks.
Prediction Models for Cu and Zn Bioavailability During Composting: Insights into Machine Learning
2024, SSRN
Prediction Models for Cu and Zn Bioavailability During Composting: Insights into Machine Learning
2024, SSRN
A New Method for the Prediction of Cu and Zn Bio-Availability During Composting Based on Machine Learning Model
2023, SSRN
A Survey on the Deep Learning-Based Mismatch Removal: Principles and Methods
2023, IEEE Access
Deep Learning Model Based on Urban Multi-Source Data for Predicting Heavy Metals (Cu, Zn, Ni, Cr) in Industrial Sewer Networks
2022, SSRN

View all citing articles on Scopus

Guobao Xiao received the B.S. degree in information and computing science from Fujian Normal University, China, in 2013 and the Ph.D. degree in Computer Science and Technology from Xiamen University, China, in 2016. From 2016-2018, he was a Postdoctoral Fellow in the School of Aerospace Engineering at Xiamen University, China. He is currently a Professor at Minjiang University, China. He has published over 30 papers in the international journals and conferences including IEEE TPAMI/TIP/TITS/TIE, IJCV, PR, ICCV, ECCV, ACCV, AAAI, etc. His research interests include machine learning, computer vision and pattern recognition. He has been awarded the best PhD thesis in Fujian Province and the best PhD thesis award in China Society of Image and Graphics (a total of ten winners in China). He also served on the program committee (PC) of CVPR, ICCV, ECCV, AAAI, WACV, etc. He was the General Chair for IEEE BDCLOUD 2019.

Kun Zeng received Ph. D degrees from Department of Computer Science, Xiamen University, Xiamen, China, in 2015, where he was a Postdoctoral Fellow with the Department of Electronic Science, from 2016 to 2019. He is currently a Lecturer with Minjiang University, China. His current research interests include image processing, machine learning and medical image reconstruction.

Shiping Wang received the Ph.D. degree from the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China, in 2014.

He was a Research Fellow at Nanyang Technological University, Singapore, from 2015 to 2016. He is currently a Full Professor and Qishan Scholar with the College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China. His research interests include machine learning, computer vision, and granular computing.

View full text

TSSN-Net: Two-step Sparse Switchable Normalization for learning correspondences with heavy outliers

Abstract

Introduction

Section snippets

Related work

Overview

Experimental results

Discussion

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Information Fusion

Information Fusion

Neurocomputing

Neurocomputing

Neurocomputing

Image matching from handcrafted to deep features: A survey

International Journal of Computer Vision

Image matching across wide baselines: From paper to practice

International Journal of Computer Vision

Structure-from-motion revisited

Orb-slam: a versatile and accurate monocular slam system

IEEE Transactions on Robotics

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age

IEEE Transactions on Robotics

Superpixel-guided two-view deterministic geometric model fitting

International Journal of Computer Vision

Consac: Robust multi-model fitting by conditional sample consensus

Nm-net: Mining reliable neighbors for robust feature correspondences

Distinctive image features from scale-invariant keypoints

International Journal of Computer Vision

Scale & affine invariant interest point detectors

International Journal of Computer Vision

Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography

Communications of The ACM

Locality preserving matching

International Journal of Computer Vision

Learning to find good correspondences

Magsac++, a fast, reliable and accurate robust estimator

Acne: Attentive context normalization for robust permutation-equivariant learning