Elsevier

Neurocomputing

Volume 510, 21 October 2022, Pages 135-148
Neurocomputing

Hierarchical multi-view metric learning with HSIC regularization

https://doi.org/10.1016/j.neucom.2022.09.073Get rights and content

Abstract

As the information era develops rapidly, it’s common to utilize multiple features from different sources to represent one object. Measuring the similarity between multi-view objects is the fundamental task in multi-view learning. To effectively measure the similarity between multi-view samples, multi-view metric learning has gained extensive attention recently. Nevertheless, most existing methods merely focus on the closeness of similar pairs and the separability of dissimilar ones inside each view, so that rich consensus properties existing in multi-views data might be ignored to some extent. To mitigate this issue, we come up with a novel method entitled Hierarchical Multi-view Metric learning with HSIC regularization (HM2H). HM2H aims to simultaneously maintain the closeness of similar points and the separability of dissimilar ones in intra-view and inter-view. Since multiple views depict different perspectives of the same object, the shared metric is introduced to capture the consensus information among those views. Moreover, we take advantage of the Hilbert–Schmidt Independence Criterion to seek the maximum distribution agreement of the multi-view dataset. Correspondingly, an algorithm based on Alternating Direction Method is provided to solve the proposed HM2H. Finally, various experimental results on five visual recognition datasets confirm the effectiveness and feasibility of our proposed method.

Introduction

The increasing interest in feature extraction techniques has heightened the need for tools suitable to cope with multi-modality (multi-view) data [1]. Of particular fundamental is the Multi-view Metric Learning (MvML) techniques, which attempt to concurrently learn multiple Mahalanobis metrics from multi-view features to reveal the semantic similarity of multi-view data [2]. With proper distance metrics, the performance of various similarity-based multi-view applications can be significantly improved, such as image classification [3], face verification [4], and re-identification [5], [6].

Distance Metric Learning (DML) aims to learn a Mahalanobis metric matrix from the training data, such that samples from the same classes are aggregated close to each other while those from different classes are detached by a large margin [7]. Since DML takes weights and correlations of features of objects into consideration when evaluating the distance between samples, it can better measure the semantic similarity between data points than the Euclidean one. Advantages of DML have been discovered and validated from various perspectives [8]. Despite most DML methods having been widely utilized in practical applications, these studies can only be applied to single-view scenarios. With that in mind, researchers have proposed to extend the classical DML methods to multi-view scenarios, and simultaneously learn multiple Mahalanobis matrices from side information [9]. A simple yet effective manner is to concatenate all the multi-view features into a compact vector and then employ DML techniques directly in this case. Nevertheless, the concatenated vector lacks physical meaning and is prone to incur the curse of dimensionality problem. Hence, McFee and Lanckriet [10] adopted multiple kernel functions for different views to project the multi-view features into reproducing kernel Hilbert space (RKHS), and then multiple projections and the corresponding weights were concurrently learned under the supervision of human perceptual similarity information. Afterwards, many MKL-based MvML approaches have been developed [11], [12]. Although MKL based methods could capture the heterogeneous multi-view features structure, these methods might suffer from poor scalability problems.

Besides MKL techniques, there have also been some late fusion approaches. These methods combine outputs of the single-view DML methods constructed from different views and can be recognized as the naive extension of the classical single-view DML method under the multi-view setting [13], [14], [15]. For instance, EMGMML [13] was presented to extend GMML [16] into a multi-view scenario, in which the weight and the metric inside each view were collectively obtained. Whereas, late fusion methods only concentrate on the closeness of samples from the same classes and the separability of those from different classes inside each view, leaving the rich consensus information among multiple views unconsidered. Notably, most existing MvML methods optimize the model via the naive euclidean gradient descent algorithm [17], [18]. It has been validated by various studies that directly optimizing the loss functions constrained by the low-rank matrices in the Euclidean space are numerically unstable and are easily trapped into the bad local optimal solutions [19], [20].

In light of the aforementioned limitations and deficiencies of the existing MvML models, we put forward a novel MvML model entitled Hierarchical Multi-view Metric Learning with HSIC Regularization (HM2H), which aims to simultaneously optimize the closeness and the separability in intra-view and inter-view to improve the overall discriminative ability. For comprehensibility, we plot the main flowchart of HM2H in Fig. 1. HM2H adopts a hierarchical mechanism to build multiple metric matrices, where each metric is composed of a view-specific projection matrix and a Symmetric Positive-definite (PSD) metric shared by all the views. The view-specific projection matrix is exploited to extract the consensus information inside that view, and the shared PSD matrix is in charge of characterizing the correlations of the features in the shared latent space. Importantly, a regularization term based on the Hilbert–Schmidt Independence Criterion (HSIC) [21] is further come up with to calibrate the enormous distribution difference among different views. Accordingly, an alternating direction method is employed to settle the minimization problem. Instead of using the Euclidean gradient descent method to solve the sub-problem, we propose to view the sub-problem as an unconstrained minimization one on the Riemannian manifold and handle it efficiently by means of the Riemannian optimization technique. Finally, extensive experiments on visual recognition datasets confirm the effectiveness of our proposed method. In summary, the major contributions of this paper can be listed as follows:

  • We propose a novel MvML method to simultaneously maximize the discriminative ability in intra-view and inter-view, which aims to amply explore the consensus information among multiple views.

  • An alternating direction strategy is developed to seek the feasible solution of HM2H and the optimization of the sub-problem is further efficiently tackled by the Riemannian optimization technique.

  • Extensive visual recognition experiments on five datasets validate the effectiveness of the proposed method over the compared MvML methods used in experiments.

The rest of the paper is organized as follows. In Section 2, we review the related work, and describe the main motivation of HM2H in Section 3. In Section 4, based on Riemannian optimization techniques, we develop an effective Riemannian gradient descent algorithm to solve the proposed method, followed by the visual recognition experimental results in Section 5. In Section 6, we conclude this paper.

Section snippets

Related Work

This section begins with the related work of DML, followed by some multi-view learning work, lastly, the concept of the geometry of Riemannian manifolds, which provides the groundwork for techniques described in the optimization section.

The Proposed Framework

This section begins with notations and MvML problem definition. Then the motivation of the proposed HM2H is introduced in detail.

Optimization

HM2H can be settled alternatively between the projection matrices Pv and M0. Since there exist slack variables in the objective function, for notational convenience, we employ PS (PD) to denote the active similar (dissimilar) sub-set of P which violates the distance constraints when we optimize the matrix Pv(1vV) or M0.PS=(xiv,xjv)Pv1v2dM02(x̃iv1,x̃jv2)>l,qij=1PD=(xiv,xjv)Pv1v2dM02(x̃iv1,x̃jv2)<u,qij=-1

Fix ΘPvand solve the transformation matrix Pv: The sub-problem can be stated as:minPv

Experiments

In this section, we employ various visual recognition datasets with diverse characteristics to evaluate the effectiveness of HM2H, namely face verification (Labeled Faces in the Wild (LFW) [46]), kinship verification (KinFaceW-I and KinFaceW-II [47]), and person re-identification (VIPeR [48] and CUHK01[49]).

Conclusion

We claim that the discriminative ability in intra-view and inter-view, as well as view consistency, are indispensable parts of MvML. Targeting this goal, we develop a novel MvML approach entitled HM2H. HM2H hierarchically constructs multiple metric matrices and simultaneously optimizes the closeness of similar pairs and the separability of dissimilar pairs in intra-view and inter-view. Moreover, HSIC is adopted to calibrate the enormous distribution difference between multiple views to further

CRediT authorship contribution statement

Huiyuan Deng: Conceptualization, Methodology, Writing - original draft. Xiangzhu Meng: Software, Writing - review & editing. Huibing Wang: Writing - review & editing, Visualization. Lin Feng: Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors would like to thank the anonymous reviewers for their insightful comments and the suggestions to significantly improve the quality of this paper. This work was supported by National Natural Science Foundation of PR China (61972064), LiaoNing Revitalization Talents Program (XLYC1806006) and the Fundamental Research Funds for the Central Universities (DUT19RC(3)012).

Huiyuan Deng received his BS degree from South-Central University for Nationalities, in 2014. Now he is working towards the Ph.D. degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include metric learning, computer vision, and data mining.

References (59)

  • Y. Wang

    Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion

    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

    (2021)
  • M. Zhang, C. Li, X. Wang, Multi-view metric learning for multi-label image classification, in: 2019 IEEE International...
  • J. Tang et al.

    Image classification with multi-view multi-instance metric learning

    Expert Systems with Applications

    (2021)
  • O. Laiadi, A. Ouamane, A. Benakcha, A. Taleb-Ahmed, A. Hadid, Multi-view deep features for robust facial kinship...
  • Y. Wang et al.

    Progressive learning with multi-scale attention network for cross-domain vehicle re-identification

    Science China Information Sciences

    (2022)
  • E.P. Xing, M.I. Jordan, S.J. Russell, A.Y. Ng, Distance metric learning with application to clustering with...
  • A. Bellet, A. Habrard, M. Sebban, A survey on metric learning for feature vectors and structured data, arXiv preprint...
  • B. McFee et al.

    Learning multi-modal similarity

    Journal of Machine Learning Research

    (2011)
  • R. Huusari et al.

    Multi-view metric learning in vector-valued kernel spaces

  • J. Liang et al.

    Weighted graph embedding-based metric learning for kinship verification

    IEEE Transactions on Image Processing

    (2018)
  • C. Zhang et al.

    FISH-MML: Fisher-HSIC multi-view metric learning

  • P. Zadeh et al.

    Geometric mean metric learning

  • H. Zhang et al.

    Hierarchical multimodal metric learning for multimodal classification

  • J. Hu et al.

    Sharable and individual multi-view metric learning

    IEEE transactions on pattern analysis and machine intelligence

    (2017)
  • T. Dan, Z. Huang, H. Cai, P.J. Laurienti, G. Wu, Learning brain dynamics of evolving manifold functional mri data using...
  • M. Harandi, M. Salzmann, R. Hartley, Joint dimensionality reduction and metric learning: A geometric take, in:...
  • A. Gretton et al.

    Measuring statistical dependence with hilbert-schmidt norms

  • K.Q. Weinberger et al.

    Distance metric learning for large margin nearest neighbor classification

    Journal of Machine Learning Research

    (2009)
  • J. Hu et al.

    Local large-margin multi-metric learning for face and kinship verification

    IEEE Transactions on Circuits and Systems for Video Technology

    (2017)
  • Huiyuan Deng received his BS degree from South-Central University for Nationalities, in 2014. Now he is working towards the Ph.D. degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include metric learning, computer vision, and data mining.

    Xiangzhu Meng received his BS degree from Anhui University, in 2015, Ph.D. degree in School of Computer Science and Technology, Dalian University of Technology, in 2021. He has authored and co-authored some papers in some famous journals, including KBS, EAAI, TNNLS, etc. Furthermore, he serves as a reviewer for ACM Transactions on Multimedia Computing Communications and Applications. His research interests include multi-view learning, deep learning, data mining, and computing vision.

    Huibing Wang received a Ph.D. degree from the School of Computer Science and Technology, Dalian University of Technology, Dalian, in 2018. During 2016 and 2017, he was a visiting scholar at the University of Adelaide, Adelaide, Australia. Now, he is currently an Associate Professor at Dalian Maritime University, Dalian, Liaoning, China. He has authored and co-authored more than 40 papers in some famous journals or conferences. His research interests include computer vision and machine learning.

    Lin Feng received his B.S. degree and M.S. degree in internal combustion engine, and a Ph.D. degree in mechanical design and theory from the Dalian University of Technology, China, in 1992, 1995, and 2004, respectively. He is currently a Professor and a Doctoral Supervisor with the School of Innovation Experiment, Dalian University of Technology. His research interests include intelligent image processing, robotics, data mining, and embedded systems.

    1

    These authors have contributed equally to this work.

    View full text