C2DMCP: View-consistent collaborative discriminative multiset correlation projection for data representation

https://doi.org/10.1016/j.jvcir.2016.06.012Get rights and content

Highlights

  • Consider both between-set cumulative correlations and structural information.

  • Guarantee structural consistency among different views.

  • Within-class (between-class) reconstructive distance is decreased (enlarged).

Abstract

Multiset canonical correlation analysis (MCCA) is a powerful technique for multi-view joint dimensionality reduction by maximizing linear correlations among the projections. However, most existing MCCA-related methods fail to discover the intrinsic discriminating structure among data spaces and the correspondence between multiple views. In order to address these problems, we incorporate the collaborative representation structure of data points in each view. Then we construct a view-consistent collaborative multiset correlation projection (C2MCP) framework, in which the structures among different views are guaranteed to be consistent and preserved in low-dimensional subspaces. Also, by taking within-class and between-class collaborative reconstruction into account to improve discriminative power for the supervised scenario, we then propose a novel algorithm, called view-consistent collaborative discriminative multiset correlation projection (C2DMCP), to explicitly consider both between-set cumulative correlations and discriminative structure in multiple representation data. The feasibility and effectiveness of the proposed method has been verified on three benchmark databases, i.e., ETH-80, AR and Extended Yale B, with promising results.

Introduction

In natural applications, many scientific data are described by heterogeneous features collected from diverse data sources or obtained from various feature extractors. For instance, in video surveillance, a series of cameras might be positioned in different locations to achieve comprehensive surveillance; an image can be represented by image and text features for search. Usually, this kind of data is called multi-view data1 [1]. Since an individual view cannot comprehensively describe all examples in real-world applications, it is very meaningful to integrate these multiple views to accomplish learning tasks [70], [71], [72], [73], [74], [75], [76], [77], [78], [79]. For example, Liu and Tao [72] proposed the multi-view Hessian regularization (mHR) to deal with the poor generalization problem in Laplacian regularization based image annotation. And Yu et al. [75] presented a high-order distance-based multi-view stochastic learning (HD-MSL) method for image classification. HD-MSL can effectively combine varied features into a unified representation and integrates the label information based on a probabilistic framework.

There are three groups [5] of representative methods for multi-view learning, i.e., co-training, multiple kernel learning, and subspace learning. The first successful multi-view learning algorithm is co-training [6], whose main aim is to maximize the mutual agreement between two distinct views. Co-training based methods [7], [8] are usually used for semi-supervised classification and the reasons for its success have been investigated by [9]. In multiple kernel learning (MKL), different kernels can handle different views and can be elegantly integrated by algorithms such as semi-definite programming (SDP) [10], semi-infinite linear programming (SILP) [11], and simple MKL [12]. However, though multi-view data derived from the same objects reflect different characteristics and usually contain more useful information and knowledge than a single feature representation, they also lead to several issues [13], [14], including “curse of dimensionality”, high space requirement for data storage, expensive computational costs and redundant information among different views. Since we believe that the efficient feature representation is the key engine for subsequent learning tasks and the “curse of dimensionality” is correspondingly reduced by subspace representation, thus in this paper we will focus on the subspace learning based approaches, which aim to find a latent subspace shared by multi-view features.

Originally, the straightforward approach for multi-view subspace learning is to directly concatenate all the different feature vectors together into a long vector and then apply a certain feature dimensionality reduction (DR) algorithm, e.g., Principal Component Analysis (PCA) [15], Linear Discriminant Analysis (LDA) [16], Locality Preserving Projection (LPP) [17], and Neighborhood Preserving Embedding NPE [18]. However, in [1], it has been pointed out that those methods will treat all representations in the same way and unavoidably ignore their diversities and complementary information. Obviously, this is not conducive to multi-view dimensionality reduction (MDR) and subsequent pattern classification tasks. Recently, a number of MDR methods have been proposed, including canonical correlation analysis (CCA) [24], kernel embedding [19], Markov network [20], Information Bottleneck [1], Multi-view Intact Space Learning (MISL) [2], Multi-task Multi-view Feature Embedding (MMFE) [21] and Multi-view Discriminant Analysis (MvDA) [22], [23]. Among them, CCA is the most typical and extensively used approach, which has been employed in two views for web image search [25], action categorization [26], and fMRI data analysis [27].

By analyzing two high-dimensional paired dataset X=[x1,x2,,xN]Rp×N and Y=[y1,y2,,yN]Rq×N, CCA finds projection directions αRp and βRq so that linear correlation between projected subspaces αTX and βTY is maximized. Recently, the stochastic optimization solution [28] and the equivalent simplified when dealing with two sets of high-dimensional vectors [29] have also been proposed for linear CCA. To handle the nonlinear relationships between many samples in practice, some related nonlinear extensions, e.g., kernel CCA (KCCA) [30], locality preserving CCA (LPCCA) [31], sparse CCA (SCCA) [32], [33], [34], [35], [36], and deep CCA (DCCA) [37], [38], [39] have been proposed. KCCA [30] performs two nonlinear transformations on the input data by using the so-called “kernel trick”. Then, maximally correlated nonlinear projections, restricted to reproducing kernel Hilbert spaces with corresponding kernels, are found. From the other point of view of the locality idea, LPCCA [31] discovers the local manifold structure of each view for data visualization and pose estimation. By requiring canonical loadings to be sparse, SCCA improves the ability of CCA in high dimensional data analysis. Chu et al. [36] presented a sparse CCA approach where sparse projections were computed by solving two l1-minimization problems corresponding to two sets of variables. DCCA [37] utilizes deep networks to learn flexible nonlinear representations in each view separately, such that the resulting representations are highly linearly correlated in the output layer. As we know, incorporating supervised information is very meaningful for pattern classification. Thus, several supervised variants [40], [41], [42], [43] were presented. For instance, Sun et al. proposed a generalized CCA (GCCA) [40], which cannot only guarantee the maximal correlation but minimize within-class scatter. Compared with CCA, it is more advantageous in keeping cluster information of the within-class samples.

In spite of the profound theoretical foundation and practical success of CCA in multi-view learning, it can only handle data that is represented by two-view features. The features utilized in many real-world applications, however, are usually extracted from more than two views. For example, different kinds of color, texture and shape features are popular used in visual analysis-based tasks such as image annotation and video retrieval. As a generalized extension of CCA, several models about multiset CCA (MCCA) [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56] based on different criteria and constraints have been proposed to solve this problem. Nielsen [46] presented some formulations of MCCA by introducing four different constraints into the correlations, and applied them to remote sensing data analysis successfully. Later, Yuan and Sun [47] proposed a fractional-order embedding multiset canonical correlations (FEMCC) method to reduce the deviation of sample covariance matrices. Due to the high dimensionality, regularized MCCA (RMCCA) [48] was proposed to tackle overfitting and singularity of within-set covariance matrices. From the point of view of the generalized correlation coefficient, a multiset integrated CCA (MICCA) [49] framework was presented to project multiple high-dimensional representations in parallel into respective low-dimensional subspaces and then fuses multi-set features by given strategies to form discriminative feature vectors for recognition tasks. From the nonlinear viewpoint, kernel based extensions of MCCA [50], [51] were proposed by using implicitly nonlinear mappings for cross-lingual information retrieval tasks. Since MCCA is essentially an unsupervised learning method, it cannot effectively reveal discriminant information in multiple canonical subspaces. To solve this issue, multiple principal angle (MPA) [52] was presented, where within-class subspaces possess the minimal principal angles and between-class subspaces have the maximal ones. Moreover, some other methods [53], [54], [55], [56] have also been proposed.

In recent years, sparse coding or sparse representation (SR) [57], [58], [59], [60] has been widely studied and used in pattern classification. Huang and Aviyente [58] sparsely coded a signal over a set of redundant bases and classified the signal based on its coding vector. In [57], Wright et al. reported a very interesting work by using sparse representation for robust face recognition (FR). Sparsity preserving projections (SPP) [60] aims to preserve the sparse reconstructive relationship of the data in the subspace. In our previous work [54], we proposed an algorithm, called sparse discrimination based multiset canonical correlations (SDbMCCs), which utilized both correlation and sparse reconstructive relationship in multiple representation data. However, it has been proved that it is the collaborative representation (CR) but not the l1-norm sparsity that makes sparse representation based classification (SRC) powerful [61], [62], and the l1-minimization required in sparsity based pattern classification may be time-consuming. In addition, most MCCA-related methods, including SDbMCC, only consider discrimination of the data within each view instead of cross-view which is the focus of MCCA. As multiple views actually correspond to the same objects, there should be some correspondence between multiple views. Furthermore, conventional methods are proposed just for either unsupervised or supervised scenarios. Motivated by recent progress in correlation analysis, collaborative representation and our considerations of the aforementioned issues, in this paper, we propose a view-consistent collaborative preserve projection mechanism and a novel algorithm, termed as view-consistent collaborative discriminative multiset correlation projection (C2DMCP). Some aspects of the proposed C2DMCP method are worth being highlighted:

  • (a)

    C2DMCP considers both between-set cumulative correlations and structural information of multiple representation data, which makes the extracted features more discriminative and robust and has a relatively low computation complexity.

  • (b)

    C2DMCP is able to guarantee the structural consistency among different views, which is helpful to pattern classification.

  • (c)

    From the viewpoint of pattern classification, C2DMCP can largely decrease within-class collaborative reconstructive distances and enlarge between-class collaborative reconstructive distances, simultaneously.

Before introducing the proposed method, we first exhibit the commonly employed notations with their corresponding meanings in Table 1.

Section snippets

CCA: canonical correlation analysis

Given N pairs of zero-mean random vectors X=[x1,x2,,xN]Rp×N and Y=[y1,y2,,yN]Rq×N, a pair of projection directions, αRp and βRq, is computed by CCA [24] such that the correlation coefficient of canonical variates Z1=αTX and Z2=βTY is maximized bymaxJCCA(α,β)=E(αTXYTβ)E(αTXXTα)·E(βTYYTβ)=αTSxyβαTSxxαβTSyyβ,where E(·) denotes the expectation, Sxx and Syy are within-set covariance matrices of X and Y, respectively, and Sxy is the between-set covariance matrix between X and Y. Generally, we

Motivation

This work is motivated by the following three aspects. First of all, as we mentioned before, MCCA is efficient for multiple feature representations based classification tasks. But, it only considers the cumulative correlation information and ignores the intrinsic reconstructive relationship, i.e., structural information, among multi-view data, while representation-based methods’ main idea is to explore the pairwise affinities, i.e., representation structure, between data points. And we take for

Experimental results and analysis

In this section, in order to evaluate the proposed C2MCP and C2DMCP methods, we test the performance of our methods on three widely used benchmark datasets, i.e., AR [67], Extended Yale-B [68], and ETH-80 [69], where the first two are face datasets and the last one is an object dataset. The statistics of each database and the feature representation of different views will be described in Section 4.1. We also compare the performance of the proposed methods with several state-of-the-art MDR

Conclusions

In this paper, we have developed a new technique for joint dimensionality reduction or subspace learning of high dimensional data, called view-consistent collaborative multiset correlation projection (C2MCP) and its supervised extension view-consistent collaborative discriminative multiset correlation projection (C2DMCP). C2DMCP can guarantee the between-view structure consistency. And experiments on several benchmark databases demonstrate that C2DMCP has more discriminating abilities and

Acknowledgments

This work is supported in part by Graduate Research and Innovation Foundation of Jiangsu Province, China under Grant KYLX15_0379, in part by the National Natural Science Foundation of China under Grants 61273251, 61401209, and 61402203, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20140790, and in part by China Postdoctoral Science Foundation under Grants 2014T70525 and 2013M531364.

References (79)

  • L.S. Qiao et al.

    Sparsity preserving projections with applications to face recognition

    Pattern Recogn.

    (2010)
  • S. Warfield

    Fast k-NN classification for multichannel image data

    Pattern Recogn. Lett.

    (1996)
  • W. Liu et al.

    Multiview hessian discriminative sparse coding for image annotation

    Comput. Vis. Image Underst.

    (2014)
  • W. Liu et al.

    Multiview Hessian regularized logistic regression for action recognition

    Signal Process.

    (2015)
  • J. Yu et al.

    Pairwise constraints based multiview features fusion for scene classification

    Pattern Recogn.

    (2013)
  • C. Xu et al.

    Large-margin multi-view information bottleneck

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • C. Xu et al.

    Multi-view intact space learning

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • C.H. Lampert et al.

    Weakly-paired maximum covariance analysis for multimodal dimensionality reduction and transfer learning

  • K. Kailing et al.

    Clustering multi-represented objects with noise

  • S. Sun

    A survey of multi-view machine learning

    Neural Comput. Appl.

    (2013)
  • A. Blum et al.

    Combining labeled and unlabeled data with co-training

  • A. Kumar et al.

    A co-training approach for multi-view spectral clustering

  • A. Kumar et al.

    Co-regularized multi-view spectral clustering

    Adv. Neural Inform. Process. Syst.

    (2011)
  • W. Wang et al.

    A new analysis of co-training

  • G. Lanckriet et al.

    Learning the kernel matrix with semi-definite programming

    J. Mach. Learn. Res.

    (2004)
  • S. Sonnenburg et al.

    Large scale multiple kernel learning

    J. Mach. Learn. Res.

    (2006)
  • A. Rakotomamonjy et al.

    SimpleMKL

    J. Mach. Learn. Res.

    (2008)
  • A.K. Jain et al.

    Statistical pattern recognition: a review

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • F. Camastra et al.

    Estimating the intrinsic dimension of data with a fractal-based method

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • I.T. Jolliffe

    Principle Components Analysis

    (1986)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugenic.

    (1936)
  • X. He et al.

    Face recognition using laplacianfaces

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • T. Zhang et al.

    Patch alignment for dimensionality reduction

    IEEE Trans. Knowl. Data Eng.

    (2009)
  • R. Memisevic et al.

    Shared kernel information embedding for discriminative inference

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • N. Chen et al.

    Large-margin predictive latent subspace learning for multiview data analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • Q. Zhang, L. Zhang, B. Du, W. Zheng, W. Bian, D. Tao, MMFE: Multitask multiview feature embedding, in: IEEE...
  • M. Kan et al.

    Multi-view discriminant analysis

  • M. Kan et al.

    Multi-view discriminant analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • H. Hotelling

    Relations between two sets of variates

    Biometrika

    (1936)
  • Cited by (0)

    This paper has been recommended for acceptance by Zicheng Liu.

    View full text