Full length articleSpectral-invariant matching network
Graphical abstract
Introduction
Many researchers and industries utilize sensors of various domains to get more information about targets. For managing information from each sensor, proper sensor fusion methods are required. In the field of computer vision, cross-spectral (i.e.visible–near infrared (NIR)) and multi-spectral (i.e.visible–thermal) image matching are being actively studied because the different spectral domains can provide complementary information [1]. As an example, visible and thermal images can mutually compensate for rich color information and high textural structures in low-light conditions, making these images suitable for all-day vision systems [2]. All-day vision or fusing multi-spectral images has become an essential and significant task for sensor fusion systems that conduct facial expression recognition [3], [4], material classification [5], [6], medical image analysis [7], pansharpening [8], and pedestrian detection [9], [10], [11].
Since cross- and multi-spectral images capture different wavelength spectral ranges, the images appear significantly different in both intensity and pixel levels. Even with well-known local feature descriptors [12], [13], the relationship between images across spectral domains cannot be accounted for, which results in severe performance drops in matching tasks. Recently, convolutional neural networks (CNNs) have demonstrated some ability to address this issue, by leveraging semantic information along with low-level features. Siamese structures overcome the somewhat challenging matching problems among various spectral domains [14], [15], [16], [17]. In most siamese structures, the same deep neural network is applied to both multi-spectral image patches and extracts each feature. Their loss functions make the distance between two positive patches short, otherwise far. Encoder–decoder structures are also utilized to extract common features between multi-spectral image patches [18], [19].
Although these previous methods show that their methods work for fusing cross- and multi-spectral images, these methods separately extract features from each image, and just aggregate the features to fuse information or to predict similarities. We have observed that the methods are not suited to dealing with both intensity- and pixel-level differences because these differences can be one reason to make a fusion between multi-spectral images hard and previous methods do not have any module to reduce the difference. In this paper, we present a SPectral-Invariant Matching Network (SPIMNet), an end-to-end CNN framework for robust image patch matching across different spectral domains. These are the primary contributions of this study:
- •
In contrast to previous methods that extract features directly from input patches, SPIMNet learns the spectral translations of input patches using a domain conversion network. We can use similar features to compare image patches across different spectral domains.
- •
SPIMNet utilizes a dual-Siamese network for feature extraction from each translated piece of information to predict the matching label through a fully connected network.
- •
The proposed end-to-end method can be trained from scratch without any pre-trained backbone network, and we obtain competitive results over several standard datasets, including both visible–NIR and visible–thermal images.
- •
Additionally, ablation studies indicate that each of these technical contributions leads to appreciable improvements in matching accuracy, and we show that the proposed method can be applied to various applications such as stereo matching and image enhancement.
Section snippets
Related works
Hand-crafted Feature Descriptions hand-crafted features such as SIFT [12], SURF [13] and FAST [20] are based on measurements of texture similarities and have shown promise for finding correspondences between visible images, even with illumination and scale changes. A modification of the hand-crafted features was used to handle the issue of dense correspondences in [21]. However, these methods often fail in cross- and multi-spectral imagery because their different spectral characteristics result
Spectral-invariant matching network
Previous works [14], [15] on cross-spectral matching have directly extracted discriminative features from input image patches. As shown in Fig. 1, matching image patches from different spectral domains is a challenging task because the objects and materials have totally different appearances. For this reason, performance has been limited in previous works [14], [15].
In this work, instead of learning discriminative features directly from cross-spectral image patches, we solve the matching
Experimental results
To demonstrate the effectiveness of SPIMNet, we evaluate it on three publicly available datasets, the VIS–NIR patch dataset [16], the KAIST Multi-spectral pedestrian dataset [9], and the PittsStereo-RGBNIR dataset [45] as shown in Fig. 7. We compare SPIMNet with four hand-crafted feature matching methods (SIFT [12], GISIFT [55], EHD [56], LGHD [57]) and eight CNN-based methods including a Siamese network [16], Pseudo-Siamese network [16] (PSiamese), 2-channel network [16], PNNet [58], Q-Net [59]
Conclusion
We have developed an image patch matching network across cross- and multi-spectral domains, named SPIMNet. SPIMNet is formulated as an end-to-end network, using two domain conversion networks to adjust the pixel- and intensity-level of input cross-spectral images. A dual-Siamese network enables the automatic selection of a better matching domain for two converted domain features. By incorporating these schemes in a deep learning framework, competitive matching accuracy is achieved on a variety
CRediT authorship contribution statement
Yeongmin Ko: Methodology, Software, Validation, Writing – review & editing, Visualization. Yong-Jun Jang: Methodology, Software, Writing – review & editing. Vinh Quang Dinh: Conceptualization, Software, Writing – original draft. Hae-Gon Jeon: Formal analysis, Investigation, Writing – original draft. Moongu Jeon: Formal analysis, Investigation, Resources, Supervision, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by Institute of Information communications Technology Planning Evaluation (IITP) grant funded by the Korea Government (MSIT) (No. 2014-3-00077, Development of Global Multi-target Tracking and Event Prediction Techniques Based on Real-time Large-Scale Video Analysis).
Yeongmin Ko received the B.S. degree in School of Electrical Engineering from Gwangju Institute of Science and Technology (GIST), Gwangju, South Korea, in 2017. He is currently pursuing the Ph.D. degree with the School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology. His current research interests include computer vision, self-driving, and deep learning.
References (62)
- et al.
A review of multimodal image matching: Methods and applications
Inf. Fusion
(2021) - et al.
A review of state-of-the-art in Face Presentation Attack Detection: From early development to advanced deep learning and multi-modal fusion methods
Inf. Fusion
(2021) - et al.
DiCyc: GAN-based deformation invariant cross-domain information fusion for medical image synthesis
Inf. Fusion
(2021) - et al.
A theoretical and practical survey of image fusion methods for multispectral pansharpening
Inf. Fusion
(2022) - et al.
Speeded-up robust features (SURF)
Comput. Vis. Image Underst.
(2008) - et al.
Multimodal and multicontrast image fusion via deep generative models
Inf. Fusion
(2022) - et al.
Joint patch clustering-based dictionary learning for multimodal image fusion
Inf. Fusion
(2016) - et al.
Unsupervised learning framework for interest point detection and description via properties optimization
Pattern Recognit.
(2021) - et al.
FusionGAN: A generative adversarial network for infrared and visible image fusion
Inf. Fusion
(2019) - et al.
KAIST multi-spectral day/night data set for autonomous and assisted driving
IEEE Trans. Intell. Transp. Syst. (T-ITS)
(2018)
Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications
IEEE Trans. Pattern Anal. Mach. Intell.
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
Robust matching for SAR and optical images using multiscale convolutional gradient features
IEEE Geosci. Remote Sens. Lett.
Commonality autoencoder: Learning common features for change detection from heterogeneous images
IEEE Trans. Neural Netw. Learn. Syst.
Sift flow: Dense correspondence across scenes and its applications
IEEE Trans. Pattern Anal. Mach. Intell.
Multispectral stereo odometry
IEEE Trans. Intell. Transp. Syst.
Robust stereo matching using adaptive normalized cross-correlation
IEEE Trans. Pattern Anal. Mach. Intell.
Joint depth map and color consistency estimation for stereo images with different illuminations and cameras
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (4)
A unified feature-spatial cycle consistency fusion framework for robust image matching
2023, Information FusionVehicle Counting on Vietnamese Street
2023, IEEE Workshop on Statistical Signal Processing Proceedings
Yeongmin Ko received the B.S. degree in School of Electrical Engineering from Gwangju Institute of Science and Technology (GIST), Gwangju, South Korea, in 2017. He is currently pursuing the Ph.D. degree with the School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology. His current research interests include computer vision, self-driving, and deep learning.
Yong-Jun Chang received the B.S. degree in electronic engineering and avionics from Korea Aerospace University, Gyeonggi-do, Korea, in 2014, and he received the M.S. degree in information and communications and the Ph.D degree in electrical engineering and computer science from Gwangju Institute of Science and Technology (GIST), Gwangju, Korea, in 2016 and 2021. In 2021, he was a researcher of Korea Culture Technology Institute in GIST. He is currently a research engineer of Hyundai Rotem. His research interests are in computer vision and deep learning.
Vinh Quang Dinh received the B.S. degree in computer science from Nong Lam University, Ho Chi Minh City, Vietnam, in 2008, and the M.S. and Ph.D. degrees in electrical and computer engineering from Sungkyunkwan University, Suwon, South Korea, in 2013 and 2016, respectively. From 2016 to 2017, he was a Postgraduate Researcher with Sungkyunkwan University. From 2017 to 2020, he was a Postgraduate Researcher with the Gwangju Institute of Science and Technology. In 2020, he joined Vietnamese-German University, where he is currently a Lecturer with the School of Electrical Engineering and Computer Science. His current research interests include computer vision and deep learning.
Hae-Gon Jeon received the BS degree in the School of Electrical and Electronic Engineering from Yonsei University in 2011, the MS degree and Ph.D. degree in the School of Electrical Engineering from KAIST in 2013 and in 2018, respectively. He was a postdoctoral researcher of the Robotics Institute at Carnegie Mellon University. He is currently affiliated with both AI Graduate School and the School of Electrical Engineering and Computer Science at GIST as an assistant professor. He is a winner of the Best Ph.D. Thesis Award 2018 in KAIST. His research interests include computational imaging, 3D reconstruction and machine learning.
Moongu Jeon received the B.S. degree in architectural engineering from Korea University, Seoul, South Korea, in 1988, and the M.S. and Ph.D. degrees in computer science and scientific computation from the University of Minnesota, Minneapolis, MN, USA, in 1999 and 2001, respectively. As the master’s degree researcher, he was involved in optimal control problems with the University of California at Santa Barbara, Santa Barbara, CA, USA, from 2001 to 2003, and then moved to the National Research Council of Canada, where he was involved in the sparse representation of high-dimensional data and the image processing, until July 2005. In 2005, he joined the Gwangju Institute of Science and Technology, Gwangju, South Korea, where he is currently a Full Professor with the School of Electrical Engineering and Computer Science. His current research interests include machine learning, computer vision, and artificial intelligence.