Hierarchical multi-view metric learning with HSIC regularization
Introduction
The increasing interest in feature extraction techniques has heightened the need for tools suitable to cope with multi-modality (multi-view) data [1]. Of particular fundamental is the Multi-view Metric Learning (MvML) techniques, which attempt to concurrently learn multiple Mahalanobis metrics from multi-view features to reveal the semantic similarity of multi-view data [2]. With proper distance metrics, the performance of various similarity-based multi-view applications can be significantly improved, such as image classification [3], face verification [4], and re-identification [5], [6].
Distance Metric Learning (DML) aims to learn a Mahalanobis metric matrix from the training data, such that samples from the same classes are aggregated close to each other while those from different classes are detached by a large margin [7]. Since DML takes weights and correlations of features of objects into consideration when evaluating the distance between samples, it can better measure the semantic similarity between data points than the Euclidean one. Advantages of DML have been discovered and validated from various perspectives [8]. Despite most DML methods having been widely utilized in practical applications, these studies can only be applied to single-view scenarios. With that in mind, researchers have proposed to extend the classical DML methods to multi-view scenarios, and simultaneously learn multiple Mahalanobis matrices from side information [9]. A simple yet effective manner is to concatenate all the multi-view features into a compact vector and then employ DML techniques directly in this case. Nevertheless, the concatenated vector lacks physical meaning and is prone to incur the curse of dimensionality problem. Hence, McFee and Lanckriet [10] adopted multiple kernel functions for different views to project the multi-view features into reproducing kernel Hilbert space (RKHS), and then multiple projections and the corresponding weights were concurrently learned under the supervision of human perceptual similarity information. Afterwards, many MKL-based MvML approaches have been developed [11], [12]. Although MKL based methods could capture the heterogeneous multi-view features structure, these methods might suffer from poor scalability problems.
Besides MKL techniques, there have also been some late fusion approaches. These methods combine outputs of the single-view DML methods constructed from different views and can be recognized as the naive extension of the classical single-view DML method under the multi-view setting [13], [14], [15]. For instance, EMGMML [13] was presented to extend GMML [16] into a multi-view scenario, in which the weight and the metric inside each view were collectively obtained. Whereas, late fusion methods only concentrate on the closeness of samples from the same classes and the separability of those from different classes inside each view, leaving the rich consensus information among multiple views unconsidered. Notably, most existing MvML methods optimize the model via the naive euclidean gradient descent algorithm [17], [18]. It has been validated by various studies that directly optimizing the loss functions constrained by the low-rank matrices in the Euclidean space are numerically unstable and are easily trapped into the bad local optimal solutions [19], [20].
In light of the aforementioned limitations and deficiencies of the existing MvML models, we put forward a novel MvML model entitled Hierarchical Multi-view Metric Learning with HSIC Regularization (HH), which aims to simultaneously optimize the closeness and the separability in intra-view and inter-view to improve the overall discriminative ability. For comprehensibility, we plot the main flowchart of HH in Fig. 1. HH adopts a hierarchical mechanism to build multiple metric matrices, where each metric is composed of a view-specific projection matrix and a Symmetric Positive-definite (PSD) metric shared by all the views. The view-specific projection matrix is exploited to extract the consensus information inside that view, and the shared PSD matrix is in charge of characterizing the correlations of the features in the shared latent space. Importantly, a regularization term based on the Hilbert–Schmidt Independence Criterion (HSIC) [21] is further come up with to calibrate the enormous distribution difference among different views. Accordingly, an alternating direction method is employed to settle the minimization problem. Instead of using the Euclidean gradient descent method to solve the sub-problem, we propose to view the sub-problem as an unconstrained minimization one on the Riemannian manifold and handle it efficiently by means of the Riemannian optimization technique. Finally, extensive experiments on visual recognition datasets confirm the effectiveness of our proposed method. In summary, the major contributions of this paper can be listed as follows:
- •
We propose a novel MvML method to simultaneously maximize the discriminative ability in intra-view and inter-view, which aims to amply explore the consensus information among multiple views.
- •
An alternating direction strategy is developed to seek the feasible solution of HH and the optimization of the sub-problem is further efficiently tackled by the Riemannian optimization technique.
- •
Extensive visual recognition experiments on five datasets validate the effectiveness of the proposed method over the compared MvML methods used in experiments.
The rest of the paper is organized as follows. In Section 2, we review the related work, and describe the main motivation of HH in Section 3. In Section 4, based on Riemannian optimization techniques, we develop an effective Riemannian gradient descent algorithm to solve the proposed method, followed by the visual recognition experimental results in Section 5. In Section 6, we conclude this paper.
Section snippets
Related Work
This section begins with the related work of DML, followed by some multi-view learning work, lastly, the concept of the geometry of Riemannian manifolds, which provides the groundwork for techniques described in the optimization section.
The Proposed Framework
This section begins with notations and MvML problem definition. Then the motivation of the proposed HH is introduced in detail.
Optimization
HH can be settled alternatively between the projection matrices and . Since there exist slack variables in the objective function, for notational convenience, we employ () to denote the active similar (dissimilar) sub-set of which violates the distance constraints when we optimize the matrix or .
Fix and solve the transformation matrix : The sub-problem can be stated as:
Experiments
In this section, we employ various visual recognition datasets with diverse characteristics to evaluate the effectiveness of HH, namely face verification (Labeled Faces in the Wild (LFW) [46]), kinship verification (KinFaceW-I and KinFaceW-II [47]), and person re-identification (VIPeR [48] and CUHK01[49]).
Conclusion
We claim that the discriminative ability in intra-view and inter-view, as well as view consistency, are indispensable parts of MvML. Targeting this goal, we develop a novel MvML approach entitled HH. HH hierarchically constructs multiple metric matrices and simultaneously optimizes the closeness of similar pairs and the separability of dissimilar pairs in intra-view and inter-view. Moreover, HSIC is adopted to calibrate the enormous distribution difference between multiple views to further
CRediT authorship contribution statement
Huiyuan Deng: Conceptualization, Methodology, Writing - original draft. Xiangzhu Meng: Software, Writing - review & editing. Huibing Wang: Writing - review & editing, Visualization. Lin Feng: Supervision, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
The authors would like to thank the anonymous reviewers for their insightful comments and the suggestions to significantly improve the quality of this paper. This work was supported by National Natural Science Foundation of PR China (61972064), LiaoNing Revitalization Talents Program (XLYC1806006) and the Fundamental Research Funds for the Central Universities (DUT19RC(3)012).
Huiyuan Deng received his BS degree from South-Central University for Nationalities, in 2014. Now he is working towards the Ph.D. degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include metric learning, computer vision, and data mining.
References (59)
- et al.
Large margin metric learning for multi-view vehicle re-identification
Neurocomputing
(2021) - et al.
Adaptive-weighting discriminative regression for multi-view classification
Pattern Recognition
(2019) - et al.
Multi-model fusion metric learning for image set classification
Knowledge-Based Systems
(2019) - et al.
Efficient multi-modal geometric mean metric learning
Pattern Recognition
(2018) - et al.
Distance metric learning for radio fingerprinting localization
Expert Systems with Applications
(2021) - et al.
A self-adaptive local metric learning method for classification
Pattern Recognition
(2019) - et al.
Metric learning-based kernel transformer with triplets and label constraints for feature fusion
Pattern Recognition
(2020) - et al.
Multi-view locality low-rank embedding for dimension reduction
Knowledge-Based Systems
(2020) - et al.
Multi-dimensional clustering through fusion of high-order similarities
Pattern Recognition
(2022) - et al.
Image retrieval based on micro-structure descriptor
Pattern Recognition
(2011)
Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
Image classification with multi-view multi-instance metric learning
Expert Systems with Applications
Progressive learning with multi-scale attention network for cross-domain vehicle re-identification
Science China Information Sciences
Learning multi-modal similarity
Journal of Machine Learning Research
Multi-view metric learning in vector-valued kernel spaces
Weighted graph embedding-based metric learning for kinship verification
IEEE Transactions on Image Processing
FISH-MML: Fisher-HSIC multi-view metric learning
Geometric mean metric learning
Hierarchical multimodal metric learning for multimodal classification
Sharable and individual multi-view metric learning
IEEE transactions on pattern analysis and machine intelligence
Measuring statistical dependence with hilbert-schmidt norms
Distance metric learning for large margin nearest neighbor classification
Journal of Machine Learning Research
Local large-margin multi-metric learning for face and kinship verification
IEEE Transactions on Circuits and Systems for Video Technology
Cited by (2)
Efficient Information-Theoretic Large-Scale Semi-Supervised Metric Learning via Proxies
2023, Applied Sciences (Switzerland)Multi-view Anal Fistula Disease Diagnosis Based on Local Enhancement and Global Fusion Via Mri
2023, IET Conference Proceedings
Huiyuan Deng received his BS degree from South-Central University for Nationalities, in 2014. Now he is working towards the Ph.D. degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include metric learning, computer vision, and data mining.
Xiangzhu Meng received his BS degree from Anhui University, in 2015, Ph.D. degree in School of Computer Science and Technology, Dalian University of Technology, in 2021. He has authored and co-authored some papers in some famous journals, including KBS, EAAI, TNNLS, etc. Furthermore, he serves as a reviewer for ACM Transactions on Multimedia Computing Communications and Applications. His research interests include multi-view learning, deep learning, data mining, and computing vision.
Huibing Wang received a Ph.D. degree from the School of Computer Science and Technology, Dalian University of Technology, Dalian, in 2018. During 2016 and 2017, he was a visiting scholar at the University of Adelaide, Adelaide, Australia. Now, he is currently an Associate Professor at Dalian Maritime University, Dalian, Liaoning, China. He has authored and co-authored more than 40 papers in some famous journals or conferences. His research interests include computer vision and machine learning.
Lin Feng received his B.S. degree and M.S. degree in internal combustion engine, and a Ph.D. degree in mechanical design and theory from the Dalian University of Technology, China, in 1992, 1995, and 2004, respectively. He is currently a Professor and a Doctoral Supervisor with the School of Innovation Experiment, Dalian University of Technology. His research interests include intelligent image processing, robotics, data mining, and embedded systems.
- 1
These authors have contributed equally to this work.