Elsevier

Pattern Recognition

Volume 88, April 2019, Pages 236-245
Pattern Recognition

Adaptive-weighting discriminative regression for multi-view classification

https://doi.org/10.1016/j.patcog.2018.11.015Get rights and content

Highlights

  • We address the multi-view feature learning problem with a novel discriminative regression based framework, which maps the multi-view data to a unified low-dimensional discriminative subspace.

  • We introduce a set of learnable weight parameters that can be merged into the transformation matrix, such that the correlative and the complementary information of the original views can be preserved in the projected subspace simultaneously.

  • We design an efficient iterative optimization algorithm with closed-form solution to update the learnable parameters during each iteration, which expresses a remarkable convergence speed in extensive experiments.

Abstract

Multi-view data represented by different features have been involved in many machine learning applications. Efficiently exploiting and preserving the correlative yet complementary information in multiple views remains challenging in multi-view learning. Comparing with existing methods that separately cope with each view, we propose a supervised multi-view feature learning framework to handle diverse views with a unified perception. Specifically, we fuse the multi-view data by mapping the concatenation of original features to a discriminative low-dimensional subspace, where the features from different views are adaptively assigned with the learned optimal weights. This strategy can simultaneously preserve the correlative and the complementary information, which is further enhanced to be more discriminative for subsequent classification. An efficient iterative algorithm is devised to optimize the formulated framework with closed-form solutions. Comprehensive evaluations with several state-of-the-art competitors demonstrate the efficiency and the superiority of the proposed method.

Introduction

Due to the development of information technology, multi-view data have met a widespread increase over the last few decades. For instance, an image can be represented by a variety of features, e.g., GIST [1], LBP [2], SIFT [3], HOG [4], and Color [5], [6]. Since these features naturally reflect different descriptions of the image from different views, the goal of multi-view learning is to exploit the correlative yet complementary information of these different views. Efficiently fusing the information from diverse views has always been a challenging issue in multi-view learning. During this decade, much progress has been made in addressing this issue with unsupervised and semi-supervised manners. Some methods conduct multi-view learning based on classical co-training [7], [8], [9], [10] and co-regularization [11], [12], [13], [14] approaches, which carry out satisfactory fusions but ignore the problems induced by high-dimensional multi-view data, e.g., over-fitting and expensive computational cost. Considering the impact of high-dimensional multi-view data, some other methods [15], [16], [17] perform multi-view learning based on Canonical Correlation Analysis (CCA), which maximizes the correlations across different views in a latent subspace. Despite the fact that these methods can effectively project high-dimensional features into a low-dimensional subspace, they merely focus on the correlations across different views while neglect the complementary information of each view, which may result in inefficient fusions of the original multi-view data.

Benefiting from the label information, supervised multi-view learning has received more and more attention in recent years. Practically, Arora and Livescu [18] combined a series of CCA and Linear Discriminant Analysis (LDA) models to learn the bottleneck features when the training set is phonetically labeled. Liu et al. [19] proposed a multi-view logistic regression framework, taking advantage of multiple Hessian regularization obtained from each particular view. Another method [20] learns multiple view-specific projections guided by the label information, which maps the view-wise features into a group of low-dimensional subspaces. Essentially, these methods follow the intention of Multiple Kernel Learning (MKL) methods [21] to conduct “decision-level” fusions (fusing after view-vise processing), which preserve the complementary information of each view but ignore the correlations across diverse views.

In this paper, we propose a novel supervised multi-view feature learning framework, which is able to preserve both the correlative and the complementary information of the original views. Specifically, we formulate our framework with a regression-based structure and employ a new discriminative regression target to preserve and enhance the discrimination of the features in the projected subspace. Based on the fact that different views often contribute unequally in various tasks, a set of learnable weight parameters are merged into the transformation matrix to conduct efficient fusion of diverse views. With the aid of the adaptive weights, the correlative information and the complementary one can be simultaneously preserved in the projected discriminative subspace. In contrast to the work [20] that involves multiple projections, our method learns one adaptive-weighting discriminative projection, which simplifies the learning process as well as guarantees a unified perception. An efficient iterative algorithm is devised to optimize the formulated framework jointly, which ensures robust discrimination when learning the correlative and the complementary features. We evaluate our method on ten broadly used benchmark datasets, with each composed of up to six types of popular visual features. Comprehensive experimental results of the proposed method and several state-of-the-art competitors can demonstrate the superiority of our proposed method.

To sum up, the contributions of this paper are three-fold:

  • (1)

    We address the multi-view feature learning problem with a novel discriminative regression based framework, which maps the multi-view data to a unified low-dimensional discriminative subspace.

  • (2)

    We introduce a set of learnable weight parameters which can be merged into the transformation matrix, such that the correlative and the complementary information of the original views can be preserved in the projected subspace simultaneously.

  • (3)

    We design an efficient iterative optimization algorithm with closed-form solution to update the learnable parameters during each iteration, which expresses a remarkable convergence speed in extensive experiments.

The remainder of this paper is organized as follows. We provide the notations and background in Section 2. In Section 3, we present the proposed multi-view learning framework, followed by an iterative algorithm exploited to optimize the proposed method in Section 4. Section 5 exhibits the experimental results, and conclusions are drawn in Section 6.

Section snippets

Background

In this section, we will introduce the notations throughout the paper, and then briefly review several representative multi-view learning methods.

The proposed method

Given an n-sample dataset {xi,yi}i=1n, where each sample xiRd has d-dimensional features, yi{1,2,,c} is the corresponding label information of xi with c ( ≥ 2) classes. The common used least squares regression for single view classification can be addressed as the following optimization problem:minW,bi=1nWTxi+byi22+λWF2,where WRd×c is a transformation matrix, bRc is an intercept vector, and λ is a trade-off parameter. In multi-class scenario, yiRc is often coded as [1,,1,+1,1,,

Optimization

According to relevant convex optimization theory, it is easy to be justified that both items in Eq. (11) are convex (the Hessian matrix of the second item can prove to be semi-positive), indicating problem (11) is thereby convex. The convexity implies the existence of the optimal solutions, which are presented in the following three theorems. An iterative algorithm is thus exploited to guide the training process using the four derived optimal solutions during each iteration, with rigorous

Experiments

In this section, we first introduce the datasets in which the experiments are conducted, and then describe the experimental settings and results of the proposed method as well as several comparisons with some important observations.

Conclusion

In this paper, we present a novel supervised multi-view feature learning framework, which is able to capture the correlative and the complementary information with a discriminative technique. The proposed framework is formulated in a regression-based structure with the introductions of a new discriminative regression target and a set of learnable weights. Applying the learned adaptive-weighting transformation on the concatenation of the original multi-view features, the correlative and the

Acknowledgments

Our work was supported in part by the National Natural Science Foundation of China under Grant 61572388 and 61703327, in part by the Key R&D Program-The Key Industry Innovation Chain of Shaanxi under Grant 2017ZDCXL-GY-05-04-02 and 2017ZDCXL-GY-05-02.

Muli Yang is now a Ph.D. student at the School of Electronic Engineering, Xidian University, Xi'an, China. His research interest includes pattern recognition, computer vision and machine learning.

References (56)

  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    CVPR

    (2005)
  • J. Van De Weijer et al.

    Learning color names from real-world images

    CVPR

    (2007)
  • R. Khan et al.

    Discriminative color descriptors

    CVPR

    (2013)
  • E. Eaton et al.

    Multi-view clustering with constraint propagation for learning with an incomplete mapping between views

    CIKM

    (2010)
  • X. Cai et al.

    Heterogeneous image feature integration via multi-modal spectral clustering

    CVPR

    (2011)
  • Y. Han et al.

    Compact and discriminative descriptor inference using multi-cues

    IEEE Trans. Image Process.

    (2015)
  • Y. Han et al.

    Semisupervised feature selection via spline regression for video semantic recognition

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • A. Kumar et al.

    Co-regularized multi-view spectral clustering

    NIPS

    (2011)
  • Y. Jiang et al.

    Co-regularized plsa for multi-view clustering

    ACCV

    (2012)
  • Y. Xu et al.

    New l2, 1-norm relaxation of multi-way graph cut for clustering

    AAAI

    (2018)
  • Y. Luo et al.

    Tensor canonical correlation analysis for multi-view dimension reduction

    IEEE Trans. Knowl. Data Eng.

    (2015)
  • R. Arora et al.

    Multi-view learning with supervision for transformed bottleneck features

    ICASSP

    (2014)
  • J. Xu et al.

    Multi-view feature learning with discriminative regularization

    IJCAI

    (2017)
  • A. Rakotomamonjy et al.

    Simplemkl

    J. Mach. Learn. Res.

    (2008)
  • H. Hotelling

    Relations between two sets of variates

    Biometrika

    (1936)
  • J.R. Kettenring

    Canonical analysis of several sets of variables

    Biometrika

    (1971)
  • D.R. Hardoon et al.

    Canonical correlation analysis: an overview with application to learning methods

    Neural Comput.

    (2004)
  • A. Sharma et al.

    Generalized multiview analysis: a discriminative latent space

    CVPR

    (2012)
  • Cited by (0)

    Muli Yang is now a Ph.D. student at the School of Electronic Engineering, Xidian University, Xi'an, China. His research interest includes pattern recognition, computer vision and machine learning.

    Cheng Deng (S’09) received the B.E., M.S., and Ph.D. degrees in signal and information processing from Xidian University, Xi'an, China. He is currently a Full Professor with the School of Electronic Engineering at Xidian University. His research interests include computer vision, multimedia processing and analysis, and information hiding. He is the author and coauthor of more than 50 scientific articles at top venues, including IEEE TNNLS, TMM, TCYB, TSMC, TIP, ICCV, CVPR, IJCAI, and AAAI.

    Feiping Nie received the Ph.D. degree in Computer Science from Tsinghua University, China in 2009, and currently is a Full Professor in Northwestern Polytechnical University, Xi'an, China. His research interests are machine learning and its applications, such as pattern recognition, data mining, computer vision, image processing and information retrieval. He has published more than 100 papers in the following journals and conferences: TPAMI, IJCV, TIP, TNNLS, TKDE, ICML, NIPS, KDD, IJCAI, AAAI, ICCV, CVPR, ACM MM. His papers have been cited more than 10000 times and the H-index is 53. He is now serving as Associate Editor or PC member for several prestigious journals and conferences in the related fields.

    View full text