Hierarchical multi-view metric learning with HSIC regularization

doi:10.1016/j.neucom.2022.09.073

Neurocomputing

Volume 510, 21 October 2022, Pages 135-148

https://doi.org/10.1016/j.neucom.2022.09.073 Get rights and content

Abstract

As the information era develops rapidly, it’s common to utilize multiple features from different sources to represent one object. Measuring the similarity between multi-view objects is the fundamental task in multi-view learning. To effectively measure the similarity between multi-view samples, multi-view metric learning has gained extensive attention recently. Nevertheless, most existing methods merely focus on the closeness of similar pairs and the separability of dissimilar ones inside each view, so that rich consensus properties existing in multi-views data might be ignored to some extent. To mitigate this issue, we come up with a novel method entitled Hierarchical Multi-view Metric learning with HSIC regularization (HM²H). HM²H aims to simultaneously maintain the closeness of similar points and the separability of dissimilar ones in intra-view and inter-view. Since multiple views depict different perspectives of the same object, the shared metric is introduced to capture the consensus information among those views. Moreover, we take advantage of the Hilbert–Schmidt Independence Criterion to seek the maximum distribution agreement of the multi-view dataset. Correspondingly, an algorithm based on Alternating Direction Method is provided to solve the proposed HM²H. Finally, various experimental results on five visual recognition datasets confirm the effectiveness and feasibility of our proposed method.

Introduction

The increasing interest in feature extraction techniques has heightened the need for tools suitable to cope with multi-modality (multi-view) data [1]. Of particular fundamental is the Multi-view Metric Learning (MvML) techniques, which attempt to concurrently learn multiple Mahalanobis metrics from multi-view features to reveal the semantic similarity of multi-view data [2]. With proper distance metrics, the performance of various similarity-based multi-view applications can be significantly improved, such as image classification [3], face verification [4], and re-identification [5], [6].

Distance Metric Learning (DML) aims to learn a Mahalanobis metric matrix from the training data, such that samples from the same classes are aggregated close to each other while those from different classes are detached by a large margin [7]. Since DML takes weights and correlations of features of objects into consideration when evaluating the distance between samples, it can better measure the semantic similarity between data points than the Euclidean one. Advantages of DML have been discovered and validated from various perspectives [8]. Despite most DML methods having been widely utilized in practical applications, these studies can only be applied to single-view scenarios. With that in mind, researchers have proposed to extend the classical DML methods to multi-view scenarios, and simultaneously learn multiple Mahalanobis matrices from side information [9]. A simple yet effective manner is to concatenate all the multi-view features into a compact vector and then employ DML techniques directly in this case. Nevertheless, the concatenated vector lacks physical meaning and is prone to incur the curse of dimensionality problem. Hence, McFee and Lanckriet [10] adopted multiple kernel functions for different views to project the multi-view features into reproducing kernel Hilbert space (RKHS), and then multiple projections and the corresponding weights were concurrently learned under the supervision of human perceptual similarity information. Afterwards, many MKL-based MvML approaches have been developed [11], [12]. Although MKL based methods could capture the heterogeneous multi-view features structure, these methods might suffer from poor scalability problems.

Besides MKL techniques, there have also been some late fusion approaches. These methods combine outputs of the single-view DML methods constructed from different views and can be recognized as the naive extension of the classical single-view DML method under the multi-view setting [13], [14], [15]. For instance, EMGMML [13] was presented to extend GMML [16] into a multi-view scenario, in which the weight and the metric inside each view were collectively obtained. Whereas, late fusion methods only concentrate on the closeness of samples from the same classes and the separability of those from different classes inside each view, leaving the rich consensus information among multiple views unconsidered. Notably, most existing MvML methods optimize the model via the naive euclidean gradient descent algorithm [17], [18]. It has been validated by various studies that directly optimizing the loss functions constrained by the low-rank matrices in the Euclidean space are numerically unstable and are easily trapped into the bad local optimal solutions [19], [20].

In light of the aforementioned limitations and deficiencies of the existing MvML models, we put forward a novel MvML model entitled Hierarchical Multi-view Metric Learning with HSIC Regularization (H $M^{2}$ H), which aims to simultaneously optimize the closeness and the separability in intra-view and inter-view to improve the overall discriminative ability. For comprehensibility, we plot the main flowchart of H $M^{2}$ H in Fig. 1. H $M^{2}$ H adopts a hierarchical mechanism to build multiple metric matrices, where each metric is composed of a view-specific projection matrix and a Symmetric Positive-definite (PSD) metric shared by all the views. The view-specific projection matrix is exploited to extract the consensus information inside that view, and the shared PSD matrix is in charge of characterizing the correlations of the features in the shared latent space. Importantly, a regularization term based on the Hilbert–Schmidt Independence Criterion (HSIC) [21] is further come up with to calibrate the enormous distribution difference among different views. Accordingly, an alternating direction method is employed to settle the minimization problem. Instead of using the Euclidean gradient descent method to solve the sub-problem, we propose to view the sub-problem as an unconstrained minimization one on the Riemannian manifold and handle it efficiently by means of the Riemannian optimization technique. Finally, extensive experiments on visual recognition datasets confirm the effectiveness of our proposed method. In summary, the major contributions of this paper can be listed as follows:

•
We propose a novel MvML method to simultaneously maximize the discriminative ability in intra-view and inter-view, which aims to amply explore the consensus information among multiple views.
•
An alternating direction strategy is developed to seek the feasible solution of H $M^{2}$ H and the optimization of the sub-problem is further efficiently tackled by the Riemannian optimization technique.
•
Extensive visual recognition experiments on five datasets validate the effectiveness of the proposed method over the compared MvML methods used in experiments.

The rest of the paper is organized as follows. In Section 2, we review the related work, and describe the main motivation of H $M^{2}$ H in Section 3. In Section 4, based on Riemannian optimization techniques, we develop an effective Riemannian gradient descent algorithm to solve the proposed method, followed by the visual recognition experimental results in Section 5. In Section 6, we conclude this paper.

Section snippets

Related Work

This section begins with the related work of DML, followed by some multi-view learning work, lastly, the concept of the geometry of Riemannian manifolds, which provides the groundwork for techniques described in the optimization section.

The Proposed Framework

This section begins with notations and MvML problem definition. Then the motivation of the proposed H $M^{2}$ H is introduced in detail.

Optimization

H $M^{2}$ H can be settled alternatively between the projection matrices $P_{v}$ and $M_{0}$ . Since there exist slack variables in the objective function, for notational convenience, we employ $P_{S}$ ( $P_{D}$ ) to denote the active similar (dissimilar) sub-set of $P$ which violates the distance constraints when we optimize the matrix $P_{v} (1 ⩽ v ⩽ V)$ or $M_{0}$ . $\begin{matrix} P_{S} = \{(x_{i}^{v}, x_{j}^{v}) \in P | \sum_{v_{1} \neq v_{2}} d_{M_{0}}^{2} ({\tilde{x}}_{i}^{v_{1}}, {\tilde{x}}_{j}^{v_{2}}) > l, q_{ij} = 1\} \\ P_{D} = \{(x_{i}^{v}, x_{j}^{v}) \in P | \sum_{v_{1} \neq v_{2}} d_{M_{0}}^{2} ({\tilde{x}}_{i}^{v_{1}}, {\tilde{x}}_{j}^{v_{2}}) < u, q_{ij} = - 1\} \end{matrix}$

Fix $Θ ⧹ P_{v}$ and solve the transformation matrix $P_{v}$ : The sub-problem can be stated as: $\begin{matrix} \min_{P_{v}} \end{matrix}$

Experiments

In this section, we employ various visual recognition datasets with diverse characteristics to evaluate the effectiveness of H $M^{2}$ H, namely face verification (Labeled Faces in the Wild (LFW) [46]), kinship verification (KinFaceW-I and KinFaceW-II [47]), and person re-identification (VIPeR [48] and CUHK01[49]).

Conclusion

We claim that the discriminative ability in intra-view and inter-view, as well as view consistency, are indispensable parts of MvML. Targeting this goal, we develop a novel MvML approach entitled H $M^{2}$ H. H $M^{2}$ H hierarchically constructs multiple metric matrices and simultaneously optimizes the closeness of similar pairs and the separability of dissimilar pairs in intra-view and inter-view. Moreover, HSIC is adopted to calibrate the enormous distribution difference between multiple views to further

CRediT authorship contribution statement

Huiyuan Deng: Conceptualization, Methodology, Writing - original draft. Xiangzhu Meng: Software, Writing - review & editing. Huibing Wang: Writing - review & editing, Visualization. Lin Feng: Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors would like to thank the anonymous reviewers for their insightful comments and the suggestions to significantly improve the quality of this paper. This work was supported by National Natural Science Foundation of PR China (61972064), LiaoNing Revitalization Talents Program (XLYC1806006) and the Fundamental Research Funds for the Central Universities (DUT19RC(3)012).

Huiyuan Deng received his BS degree from South-Central University for Nationalities, in 2014. Now he is working towards the Ph.D. degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include metric learning, computer vision, and data mining.

References (59)

S. Zhang et al.
Large margin metric learning for multi-view vehicle re-identification
Neurocomputing
(2021)
M. Yang et al.
Adaptive-weighting discriminative regression for multi-view classification
Pattern Recognition
(2019)
X. Gao et al.
Multi-model fusion metric learning for image set classification
Knowledge-Based Systems
(2019)
J. Liang et al.
Efficient multi-modal geometric mean metric learning
Pattern Recognition
(2018)
S. Bai et al.
Distance metric learning for radio fingerprinting localization
Expert Systems with Applications
(2021)
M. Taheri et al.
A self-adaptive local metric learning method for classification
Pattern Recognition
(2019)
S. Kan et al.
Metric learning-based kernel transformer with triplets and label constraints for feature fusion
Pattern Recognition
(2020)
L. Feng et al.
Multi-view locality low-rank embedding for dimension reduction
Knowledge-Based Systems
(2020)
H. Peng et al.
Multi-dimensional clustering through fusion of high-order similarities
Pattern Recognition
(2022)
G.-H. Liu et al.
Image retrieval based on micro-structure descriptor
Pattern Recognition
(2011)

Y. Wang

Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

(2021)

M. Zhang, C. Li, X. Wang, Multi-view metric learning for multi-label image classification, in: 2019 IEEE International...

J. Tang et al.

Image classification with multi-view multi-instance metric learning

Expert Systems with Applications

(2021)

O. Laiadi, A. Ouamane, A. Benakcha, A. Taleb-Ahmed, A. Hadid, Multi-view deep features for robust facial kinship...

Y. Wang et al.

Progressive learning with multi-scale attention network for cross-domain vehicle re-identification

Science China Information Sciences

(2022)

E.P. Xing, M.I. Jordan, S.J. Russell, A.Y. Ng, Distance metric learning with application to clustering with...

A. Bellet, A. Habrard, M. Sebban, A survey on metric learning for feature vectors and structured data, arXiv preprint...

B. McFee et al.

Learning multi-modal similarity

Journal of Machine Learning Research

(2011)

R. Huusari et al.

Multi-view metric learning in vector-valued kernel spaces

J. Liang et al.

Weighted graph embedding-based metric learning for kinship verification

IEEE Transactions on Image Processing

(2018)

C. Zhang et al.

FISH-MML: Fisher-HSIC multi-view metric learning

P. Zadeh et al.

Geometric mean metric learning

H. Zhang et al.

Hierarchical multimodal metric learning for multimodal classification

J. Hu et al.

Sharable and individual multi-view metric learning

IEEE transactions on pattern analysis and machine intelligence

(2017)

T. Dan, Z. Huang, H. Cai, P.J. Laurienti, G. Wu, Learning brain dynamics of evolving manifold functional mri data using...

M. Harandi, M. Salzmann, R. Hartley, Joint dimensionality reduction and metric learning: A geometric take, in:...

A. Gretton et al.

Measuring statistical dependence with hilbert-schmidt norms

K.Q. Weinberger et al.

Distance metric learning for large margin nearest neighbor classification

Journal of Machine Learning Research

(2009)

J. Hu et al.

Local large-margin multi-metric learning for face and kinship verification

IEEE Transactions on Circuits and Systems for Video Technology

(2017)

Cited by (2)

Efficient Information-Theoretic Large-Scale Semi-Supervised Metric Learning via Proxies
2023, Applied Sciences (Switzerland)
Multi-view Anal Fistula Disease Diagnosis Based on Local Enhancement and Global Fusion Via Mri
2023, IET Conference Proceedings

Xiangzhu Meng received his BS degree from Anhui University, in 2015, Ph.D. degree in School of Computer Science and Technology, Dalian University of Technology, in 2021. He has authored and co-authored some papers in some famous journals, including KBS, EAAI, TNNLS, etc. Furthermore, he serves as a reviewer for ACM Transactions on Multimedia Computing Communications and Applications. His research interests include multi-view learning, deep learning, data mining, and computing vision.

Huibing Wang received a Ph.D. degree from the School of Computer Science and Technology, Dalian University of Technology, Dalian, in 2018. During 2016 and 2017, he was a visiting scholar at the University of Adelaide, Adelaide, Australia. Now, he is currently an Associate Professor at Dalian Maritime University, Dalian, Liaoning, China. He has authored and co-authored more than 40 papers in some famous journals or conferences. His research interests include computer vision and machine learning.

Lin Feng received his B.S. degree and M.S. degree in internal combustion engine, and a Ph.D. degree in mechanical design and theory from the Dalian University of Technology, China, in 1992, 1995, and 2004, respectively. He is currently a Professor and a Doctoral Supervisor with the School of Innovation Experiment, Dalian University of Technology. His research interests include intelligent image processing, robotics, data mining, and embedded systems.

¹: These authors have contributed equally to this work.

View full text

Hierarchical multi-view metric learning with HSIC regularization

Abstract

Introduction

Section snippets

Related Work

The Proposed Framework

Optimization

Experiments

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Neurocomputing

Pattern Recognition

Knowledge-Based Systems

Pattern Recognition

Expert Systems with Applications

Pattern Recognition

Pattern Recognition

Knowledge-Based Systems

Pattern Recognition

Pattern Recognition

Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

Image classification with multi-view multi-instance metric learning

Expert Systems with Applications

Progressive learning with multi-scale attention network for cross-domain vehicle re-identification

Science China Information Sciences

Learning multi-modal similarity

Journal of Machine Learning Research

Multi-view metric learning in vector-valued kernel spaces

Weighted graph embedding-based metric learning for kinship verification

IEEE Transactions on Image Processing

FISH-MML: Fisher-HSIC multi-view metric learning

Geometric mean metric learning

Hierarchical multimodal metric learning for multimodal classification

Sharable and individual multi-view metric learning

IEEE transactions on pattern analysis and machine intelligence

Measuring statistical dependence with hilbert-schmidt norms

Distance metric learning for large margin nearest neighbor classification

Journal of Machine Learning Research

Local large-margin multi-metric learning for face and kinship verification

IEEE Transactions on Circuits and Systems for Video Technology