C2DMCP: View-consistent collaborative discriminative multiset correlation projection for data representation

doi:10.1016/j.jvcir.2016.06.012

Journal of Visual Communication and Image Representation

Volume 40, Part B, October 2016, Pages 393-405

https://doi.org/10.1016/j.jvcir.2016.06.012 Get rights and content

Highlights

•
Consider both between-set cumulative correlations and structural information.
•
Guarantee structural consistency among different views.
•
Within-class (between-class) reconstructive distance is decreased (enlarged).

Abstract

Multiset canonical correlation analysis (MCCA) is a powerful technique for multi-view joint dimensionality reduction by maximizing linear correlations among the projections. However, most existing MCCA-related methods fail to discover the intrinsic discriminating structure among data spaces and the correspondence between multiple views. In order to address these problems, we incorporate the collaborative representation structure of data points in each view. Then we construct a view-consistent collaborative multiset correlation projection (C²MCP) framework, in which the structures among different views are guaranteed to be consistent and preserved in low-dimensional subspaces. Also, by taking within-class and between-class collaborative reconstruction into account to improve discriminative power for the supervised scenario, we then propose a novel algorithm, called view-consistent collaborative discriminative multiset correlation projection (C²DMCP), to explicitly consider both between-set cumulative correlations and discriminative structure in multiple representation data. The feasibility and effectiveness of the proposed method has been verified on three benchmark databases, i.e., ETH-80, AR and Extended Yale B, with promising results.

Introduction

In natural applications, many scientific data are described by heterogeneous features collected from diverse data sources or obtained from various feature extractors. For instance, in video surveillance, a series of cameras might be positioned in different locations to achieve comprehensive surveillance; an image can be represented by image and text features for search. Usually, this kind of data is called multi-view data¹ [1]. Since an individual view cannot comprehensively describe all examples in real-world applications, it is very meaningful to integrate these multiple views to accomplish learning tasks [70], [71], [72], [73], [74], [75], [76], [77], [78], [79]. For example, Liu and Tao [72] proposed the multi-view Hessian regularization (mHR) to deal with the poor generalization problem in Laplacian regularization based image annotation. And Yu et al. [75] presented a high-order distance-based multi-view stochastic learning (HD-MSL) method for image classification. HD-MSL can effectively combine varied features into a unified representation and integrates the label information based on a probabilistic framework.

There are three groups [5] of representative methods for multi-view learning, i.e., co-training, multiple kernel learning, and subspace learning. The first successful multi-view learning algorithm is co-training [6], whose main aim is to maximize the mutual agreement between two distinct views. Co-training based methods [7], [8] are usually used for semi-supervised classification and the reasons for its success have been investigated by [9]. In multiple kernel learning (MKL), different kernels can handle different views and can be elegantly integrated by algorithms such as semi-definite programming (SDP) [10], semi-infinite linear programming (SILP) [11], and simple MKL [12]. However, though multi-view data derived from the same objects reflect different characteristics and usually contain more useful information and knowledge than a single feature representation, they also lead to several issues [13], [14], including “curse of dimensionality”, high space requirement for data storage, expensive computational costs and redundant information among different views. Since we believe that the efficient feature representation is the key engine for subsequent learning tasks and the “curse of dimensionality” is correspondingly reduced by subspace representation, thus in this paper we will focus on the subspace learning based approaches, which aim to find a latent subspace shared by multi-view features.

Originally, the straightforward approach for multi-view subspace learning is to directly concatenate all the different feature vectors together into a long vector and then apply a certain feature dimensionality reduction (DR) algorithm, e.g., Principal Component Analysis (PCA) [15], Linear Discriminant Analysis (LDA) [16], Locality Preserving Projection (LPP) [17], and Neighborhood Preserving Embedding NPE [18]. However, in [1], it has been pointed out that those methods will treat all representations in the same way and unavoidably ignore their diversities and complementary information. Obviously, this is not conducive to multi-view dimensionality reduction (MDR) and subsequent pattern classification tasks. Recently, a number of MDR methods have been proposed, including canonical correlation analysis (CCA) [24], kernel embedding [19], Markov network [20], Information Bottleneck [1], Multi-view Intact Space Learning (MISL) [2], Multi-task Multi-view Feature Embedding (MMFE) [21] and Multi-view Discriminant Analysis (MvDA) [22], [23]. Among them, CCA is the most typical and extensively used approach, which has been employed in two views for web image search [25], action categorization [26], and fMRI data analysis [27].

By analyzing two high-dimensional paired dataset $X = [x_{1}, x_{2}, \dots, x_{N}] \in R^{p \times N}$ and $Y = [y_{1}, y_{2}, \dots, y_{N}] \in R^{q \times N}$ , CCA finds projection directions $α \in R^{p}$ and $β \in R^{q}$ so that linear correlation between projected subspaces $α^{T} X$ and $β^{T} Y$ is maximized. Recently, the stochastic optimization solution [28] and the equivalent simplified when dealing with two sets of high-dimensional vectors [29] have also been proposed for linear CCA. To handle the nonlinear relationships between many samples in practice, some related nonlinear extensions, e.g., kernel CCA (KCCA) [30], locality preserving CCA (LPCCA) [31], sparse CCA (SCCA) [32], [33], [34], [35], [36], and deep CCA (DCCA) [37], [38], [39] have been proposed. KCCA [30] performs two nonlinear transformations on the input data by using the so-called “kernel trick”. Then, maximally correlated nonlinear projections, restricted to reproducing kernel Hilbert spaces with corresponding kernels, are found. From the other point of view of the locality idea, LPCCA [31] discovers the local manifold structure of each view for data visualization and pose estimation. By requiring canonical loadings to be sparse, SCCA improves the ability of CCA in high dimensional data analysis. Chu et al. [36] presented a sparse CCA approach where sparse projections were computed by solving two l₁-minimization problems corresponding to two sets of variables. DCCA [37] utilizes deep networks to learn flexible nonlinear representations in each view separately, such that the resulting representations are highly linearly correlated in the output layer. As we know, incorporating supervised information is very meaningful for pattern classification. Thus, several supervised variants [40], [41], [42], [43] were presented. For instance, Sun et al. proposed a generalized CCA (GCCA) [40], which cannot only guarantee the maximal correlation but minimize within-class scatter. Compared with CCA, it is more advantageous in keeping cluster information of the within-class samples.

In spite of the profound theoretical foundation and practical success of CCA in multi-view learning, it can only handle data that is represented by two-view features. The features utilized in many real-world applications, however, are usually extracted from more than two views. For example, different kinds of color, texture and shape features are popular used in visual analysis-based tasks such as image annotation and video retrieval. As a generalized extension of CCA, several models about multiset CCA (MCCA) [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56] based on different criteria and constraints have been proposed to solve this problem. Nielsen [46] presented some formulations of MCCA by introducing four different constraints into the correlations, and applied them to remote sensing data analysis successfully. Later, Yuan and Sun [47] proposed a fractional-order embedding multiset canonical correlations (FEMCC) method to reduce the deviation of sample covariance matrices. Due to the high dimensionality, regularized MCCA (RMCCA) [48] was proposed to tackle overfitting and singularity of within-set covariance matrices. From the point of view of the generalized correlation coefficient, a multiset integrated CCA (MICCA) [49] framework was presented to project multiple high-dimensional representations in parallel into respective low-dimensional subspaces and then fuses multi-set features by given strategies to form discriminative feature vectors for recognition tasks. From the nonlinear viewpoint, kernel based extensions of MCCA [50], [51] were proposed by using implicitly nonlinear mappings for cross-lingual information retrieval tasks. Since MCCA is essentially an unsupervised learning method, it cannot effectively reveal discriminant information in multiple canonical subspaces. To solve this issue, multiple principal angle (MPA) [52] was presented, where within-class subspaces possess the minimal principal angles and between-class subspaces have the maximal ones. Moreover, some other methods [53], [54], [55], [56] have also been proposed.

In recent years, sparse coding or sparse representation (SR) [57], [58], [59], [60] has been widely studied and used in pattern classification. Huang and Aviyente [58] sparsely coded a signal over a set of redundant bases and classified the signal based on its coding vector. In [57], Wright et al. reported a very interesting work by using sparse representation for robust face recognition (FR). Sparsity preserving projections (SPP) [60] aims to preserve the sparse reconstructive relationship of the data in the subspace. In our previous work [54], we proposed an algorithm, called sparse discrimination based multiset canonical correlations (SDbMCCs), which utilized both correlation and sparse reconstructive relationship in multiple representation data. However, it has been proved that it is the collaborative representation (CR) but not the l₁-norm sparsity that makes sparse representation based classification (SRC) powerful [61], [62], and the l₁-minimization required in sparsity based pattern classification may be time-consuming. In addition, most MCCA-related methods, including SDbMCC, only consider discrimination of the data within each view instead of cross-view which is the focus of MCCA. As multiple views actually correspond to the same objects, there should be some correspondence between multiple views. Furthermore, conventional methods are proposed just for either unsupervised or supervised scenarios. Motivated by recent progress in correlation analysis, collaborative representation and our considerations of the aforementioned issues, in this paper, we propose a view-consistent collaborative preserve projection mechanism and a novel algorithm, termed as view-consistent collaborative discriminative multiset correlation projection (C²DMCP). Some aspects of the proposed C²DMCP method are worth being highlighted:

(a)
C²DMCP considers both between-set cumulative correlations and structural information of multiple representation data, which makes the extracted features more discriminative and robust and has a relatively low computation complexity.
(b)
C²DMCP is able to guarantee the structural consistency among different views, which is helpful to pattern classification.
(c)
From the viewpoint of pattern classification, C²DMCP can largely decrease within-class collaborative reconstructive distances and enlarge between-class collaborative reconstructive distances, simultaneously.

Before introducing the proposed method, we first exhibit the commonly employed notations with their corresponding meanings in Table 1.

Section snippets

CCA: canonical correlation analysis

Given $N$ pairs of zero-mean random vectors $X = [x_{1}, x_{2}, \dots, x_{N}] \in R^{p \times N}$ and $Y = [y_{1}, y_{2}, \dots, y_{N}] \in R^{q \times N}$ , a pair of projection directions, $α \in R^{p}$ and $β \in R^{q}$ , is computed by CCA [24] such that the correlation coefficient of canonical variates $Z_{1} = α^{T} X$ and $Z_{2} = β^{T} Y$ is maximized by $\max J_{CCA} (α, β) = \frac{E (α^{T} {XY}^{T} β)}{\sqrt{E (α^{T} {XX}^{T} α) \cdot E (β^{T} {YY}^{T} β)}} = \frac{α^{T} S_{xy} β}{\sqrt{α^{T} S_{xx} α} \sqrt{β^{T} S_{yy} β}},$ where $E (\cdot)$ denotes the expectation, $S_{xx}$ and $S_{yy}$ are within-set covariance matrices of $X$ and $Y$ , respectively, and $S_{xy}$ is the between-set covariance matrix between $X$ and $Y$ . Generally, we

Motivation

This work is motivated by the following three aspects. First of all, as we mentioned before, MCCA is efficient for multiple feature representations based classification tasks. But, it only considers the cumulative correlation information and ignores the intrinsic reconstructive relationship, i.e., structural information, among multi-view data, while representation-based methods’ main idea is to explore the pairwise affinities, i.e., representation structure, between data points. And we take for

Experimental results and analysis

In this section, in order to evaluate the proposed C²MCP and C²DMCP methods, we test the performance of our methods on three widely used benchmark datasets, i.e., AR [67], Extended Yale-B [68], and ETH-80 [69], where the first two are face datasets and the last one is an object dataset. The statistics of each database and the feature representation of different views will be described in Section 4.1. We also compare the performance of the proposed methods with several state-of-the-art MDR

Conclusions

In this paper, we have developed a new technique for joint dimensionality reduction or subspace learning of high dimensional data, called view-consistent collaborative multiset correlation projection (C²MCP) and its supervised extension view-consistent collaborative discriminative multiset correlation projection (C²DMCP). C²DMCP can guarantee the between-view structure consistency. And experiments on several benchmark databases demonstrate that C²DMCP has more discriminating abilities and

Acknowledgments

This work is supported in part by Graduate Research and Innovation Foundation of Jiangsu Province, China under Grant KYLX15_0379, in part by the National Natural Science Foundation of China under Grants 61273251, 61401209, and 61402203, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20140790, and in part by China Postdoctoral Science Foundation under Grants 2014T70525 and 2013M531364.

References (79)

L. Dong et al.
Characterizing nonlinear relationships in functional imaging data using eigenspace maximal information canonical correlation analysis (emiCCA)
NeuroImage
(2015)
X. Xing et al.
Complete canonical correlation analysis with application to multi-view gait recognition
Pattern Recogn.
(2016)
T. Melzer et al.
Appearance models based on kernel canonical correlation analysis
Pattern Recogn.
(2003)
T.K. Sun et al.
Locality preserving CCA with applications to data visualization and pose estimation
Image Vis. Comput.
(2007)
Q.S. Sun et al.
A theorem on the generalized canonical projective vectors
Pattern Recogn.
(2005)
Q.S. Sun et al.
A new method of feature fusion and its application in image recognition
Pattern Recogn.
(2005)
Y.H. Yuan et al.
Fractional-order embedding multiset canonical correlations with applications to multi-feature fusion and recognition
Neurocomputing
(2013)
Y.H. Yuan et al.
A novel multiset integrated canonical correlation analysis framework and its application in feature fusion
Pattern Recogn.
(2011)
Y.H. Yuan et al.
Graph regularized multiset canonical correlations with applications to joint feature extraction
Pattern Recogn.
(2014)
X.B. Shen et al.
A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction
Neurocomputing
(2015)

L.S. Qiao et al.

Sparsity preserving projections with applications to face recognition

Pattern Recogn.

(2010)

S. Warfield

Fast k-NN classification for multichannel image data

Pattern Recogn. Lett.

(1996)

W. Liu et al.

Multiview hessian discriminative sparse coding for image annotation

Comput. Vis. Image Underst.

(2014)

W. Liu et al.

Multiview Hessian regularized logistic regression for action recognition

Signal Process.

(2015)

J. Yu et al.

Pairwise constraints based multiview features fusion for scene classification

Pattern Recogn.

(2013)

C. Xu et al.

Large-margin multi-view information bottleneck

IEEE Trans. Pattern Anal. Mach. Intell.

(2014)

C. Xu et al.

Multi-view intact space learning

IEEE Trans. Pattern Anal. Mach. Intell.

(2015)

C.H. Lampert et al.

Weakly-paired maximum covariance analysis for multimodal dimensionality reduction and transfer learning

K. Kailing et al.

Clustering multi-represented objects with noise

S. Sun

A survey of multi-view machine learning

Neural Comput. Appl.

(2013)

A. Blum et al.

Combining labeled and unlabeled data with co-training

A. Kumar et al.

A co-training approach for multi-view spectral clustering

A. Kumar et al.

Co-regularized multi-view spectral clustering

Adv. Neural Inform. Process. Syst.

(2011)

W. Wang et al.

A new analysis of co-training

G. Lanckriet et al.

Learning the kernel matrix with semi-definite programming

J. Mach. Learn. Res.

(2004)

S. Sonnenburg et al.

Large scale multiple kernel learning

J. Mach. Learn. Res.

(2006)

A. Rakotomamonjy et al.

SimpleMKL

J. Mach. Learn. Res.

(2008)

A.K. Jain et al.

Statistical pattern recognition: a review

IEEE Trans. Pattern Anal. Mach. Intell.

(2000)

F. Camastra et al.

Estimating the intrinsic dimension of data with a fractal-based method

IEEE Trans. Pattern Anal. Mach. Intell.

(2002)

I.T. Jolliffe

Principle Components Analysis

(1986)

R.A. Fisher

The use of multiple measurements in taxonomic problems

Ann. Eugenic.

(1936)

X. He et al.

Face recognition using laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

T. Zhang et al.

Patch alignment for dimensionality reduction

IEEE Trans. Knowl. Data Eng.

(2009)

R. Memisevic et al.

Shared kernel information embedding for discriminative inference

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

N. Chen et al.

Large-margin predictive latent subspace learning for multiview data analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

Q. Zhang, L. Zhang, B. Du, W. Zheng, W. Bian, D. Tao, MMFE: Multitask multiview feature embedding, in: IEEE...

M. Kan et al.

Multi-view discriminant analysis

M. Kan et al.

Multi-view discriminant analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2016)

H. Hotelling

Relations between two sets of variates

Biometrika

(1936)

Cited by (0)

^☆: This paper has been recommended for acceptance by Zicheng Liu.

View full text

C2DMCP: View-consistent collaborative discriminative multiset correlation projection for data representation☆

Highlights

Abstract

Introduction

Section snippets

CCA: canonical correlation analysis

Motivation

Experimental results and analysis

Conclusions

Acknowledgments

NeuroImage

Pattern Recogn.

Pattern Recogn.

Image Vis. Comput.

Pattern Recogn.

Pattern Recogn.

Neurocomputing

Pattern Recogn.

Pattern Recogn.

Neurocomputing

Pattern Recogn.

Pattern Recogn. Lett.

Comput. Vis. Image Underst.

Signal Process.

Pattern Recogn.

Large-margin multi-view information bottleneck

IEEE Trans. Pattern Anal. Mach. Intell.

Multi-view intact space learning

IEEE Trans. Pattern Anal. Mach. Intell.

Weakly-paired maximum covariance analysis for multimodal dimensionality reduction and transfer learning

Clustering multi-represented objects with noise

A survey of multi-view machine learning

Neural Comput. Appl.

Combining labeled and unlabeled data with co-training

A co-training approach for multi-view spectral clustering

Co-regularized multi-view spectral clustering

Adv. Neural Inform. Process. Syst.

A new analysis of co-training

Learning the kernel matrix with semi-definite programming

J. Mach. Learn. Res.

Large scale multiple kernel learning

J. Mach. Learn. Res.

SimpleMKL

J. Mach. Learn. Res.

Statistical pattern recognition: a review

IEEE Trans. Pattern Anal. Mach. Intell.

Estimating the intrinsic dimension of data with a fractal-based method

IEEE Trans. Pattern Anal. Mach. Intell.

Principle Components Analysis

The use of multiple measurements in taxonomic problems

Ann. Eugenic.

Face recognition using laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.

Patch alignment for dimensionality reduction

IEEE Trans. Knowl. Data Eng.

Shared kernel information embedding for discriminative inference

IEEE Trans. Pattern Anal. Mach. Intell.

Large-margin predictive latent subspace learning for multiview data analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Multi-view discriminant analysis

Multi-view discriminant analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Relations between two sets of variates

Biometrika

C²DMCP: View-consistent collaborative discriminative multiset correlation projection for data representation☆