Journal of Visual Communication and Image Representation
C2DMCP: View-consistent collaborative discriminative multiset correlation projection for data representation☆
Introduction
In natural applications, many scientific data are described by heterogeneous features collected from diverse data sources or obtained from various feature extractors. For instance, in video surveillance, a series of cameras might be positioned in different locations to achieve comprehensive surveillance; an image can be represented by image and text features for search. Usually, this kind of data is called multi-view data1 [1]. Since an individual view cannot comprehensively describe all examples in real-world applications, it is very meaningful to integrate these multiple views to accomplish learning tasks [70], [71], [72], [73], [74], [75], [76], [77], [78], [79]. For example, Liu and Tao [72] proposed the multi-view Hessian regularization (mHR) to deal with the poor generalization problem in Laplacian regularization based image annotation. And Yu et al. [75] presented a high-order distance-based multi-view stochastic learning (HD-MSL) method for image classification. HD-MSL can effectively combine varied features into a unified representation and integrates the label information based on a probabilistic framework.
There are three groups [5] of representative methods for multi-view learning, i.e., co-training, multiple kernel learning, and subspace learning. The first successful multi-view learning algorithm is co-training [6], whose main aim is to maximize the mutual agreement between two distinct views. Co-training based methods [7], [8] are usually used for semi-supervised classification and the reasons for its success have been investigated by [9]. In multiple kernel learning (MKL), different kernels can handle different views and can be elegantly integrated by algorithms such as semi-definite programming (SDP) [10], semi-infinite linear programming (SILP) [11], and simple MKL [12]. However, though multi-view data derived from the same objects reflect different characteristics and usually contain more useful information and knowledge than a single feature representation, they also lead to several issues [13], [14], including “curse of dimensionality”, high space requirement for data storage, expensive computational costs and redundant information among different views. Since we believe that the efficient feature representation is the key engine for subsequent learning tasks and the “curse of dimensionality” is correspondingly reduced by subspace representation, thus in this paper we will focus on the subspace learning based approaches, which aim to find a latent subspace shared by multi-view features.
Originally, the straightforward approach for multi-view subspace learning is to directly concatenate all the different feature vectors together into a long vector and then apply a certain feature dimensionality reduction (DR) algorithm, e.g., Principal Component Analysis (PCA) [15], Linear Discriminant Analysis (LDA) [16], Locality Preserving Projection (LPP) [17], and Neighborhood Preserving Embedding NPE [18]. However, in [1], it has been pointed out that those methods will treat all representations in the same way and unavoidably ignore their diversities and complementary information. Obviously, this is not conducive to multi-view dimensionality reduction (MDR) and subsequent pattern classification tasks. Recently, a number of MDR methods have been proposed, including canonical correlation analysis (CCA) [24], kernel embedding [19], Markov network [20], Information Bottleneck [1], Multi-view Intact Space Learning (MISL) [2], Multi-task Multi-view Feature Embedding (MMFE) [21] and Multi-view Discriminant Analysis (MvDA) [22], [23]. Among them, CCA is the most typical and extensively used approach, which has been employed in two views for web image search [25], action categorization [26], and fMRI data analysis [27].
By analyzing two high-dimensional paired dataset and , CCA finds projection directions and so that linear correlation between projected subspaces and is maximized. Recently, the stochastic optimization solution [28] and the equivalent simplified when dealing with two sets of high-dimensional vectors [29] have also been proposed for linear CCA. To handle the nonlinear relationships between many samples in practice, some related nonlinear extensions, e.g., kernel CCA (KCCA) [30], locality preserving CCA (LPCCA) [31], sparse CCA (SCCA) [32], [33], [34], [35], [36], and deep CCA (DCCA) [37], [38], [39] have been proposed. KCCA [30] performs two nonlinear transformations on the input data by using the so-called “kernel trick”. Then, maximally correlated nonlinear projections, restricted to reproducing kernel Hilbert spaces with corresponding kernels, are found. From the other point of view of the locality idea, LPCCA [31] discovers the local manifold structure of each view for data visualization and pose estimation. By requiring canonical loadings to be sparse, SCCA improves the ability of CCA in high dimensional data analysis. Chu et al. [36] presented a sparse CCA approach where sparse projections were computed by solving two l1-minimization problems corresponding to two sets of variables. DCCA [37] utilizes deep networks to learn flexible nonlinear representations in each view separately, such that the resulting representations are highly linearly correlated in the output layer. As we know, incorporating supervised information is very meaningful for pattern classification. Thus, several supervised variants [40], [41], [42], [43] were presented. For instance, Sun et al. proposed a generalized CCA (GCCA) [40], which cannot only guarantee the maximal correlation but minimize within-class scatter. Compared with CCA, it is more advantageous in keeping cluster information of the within-class samples.
In spite of the profound theoretical foundation and practical success of CCA in multi-view learning, it can only handle data that is represented by two-view features. The features utilized in many real-world applications, however, are usually extracted from more than two views. For example, different kinds of color, texture and shape features are popular used in visual analysis-based tasks such as image annotation and video retrieval. As a generalized extension of CCA, several models about multiset CCA (MCCA) [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56] based on different criteria and constraints have been proposed to solve this problem. Nielsen [46] presented some formulations of MCCA by introducing four different constraints into the correlations, and applied them to remote sensing data analysis successfully. Later, Yuan and Sun [47] proposed a fractional-order embedding multiset canonical correlations (FEMCC) method to reduce the deviation of sample covariance matrices. Due to the high dimensionality, regularized MCCA (RMCCA) [48] was proposed to tackle overfitting and singularity of within-set covariance matrices. From the point of view of the generalized correlation coefficient, a multiset integrated CCA (MICCA) [49] framework was presented to project multiple high-dimensional representations in parallel into respective low-dimensional subspaces and then fuses multi-set features by given strategies to form discriminative feature vectors for recognition tasks. From the nonlinear viewpoint, kernel based extensions of MCCA [50], [51] were proposed by using implicitly nonlinear mappings for cross-lingual information retrieval tasks. Since MCCA is essentially an unsupervised learning method, it cannot effectively reveal discriminant information in multiple canonical subspaces. To solve this issue, multiple principal angle (MPA) [52] was presented, where within-class subspaces possess the minimal principal angles and between-class subspaces have the maximal ones. Moreover, some other methods [53], [54], [55], [56] have also been proposed.
In recent years, sparse coding or sparse representation (SR) [57], [58], [59], [60] has been widely studied and used in pattern classification. Huang and Aviyente [58] sparsely coded a signal over a set of redundant bases and classified the signal based on its coding vector. In [57], Wright et al. reported a very interesting work by using sparse representation for robust face recognition (FR). Sparsity preserving projections (SPP) [60] aims to preserve the sparse reconstructive relationship of the data in the subspace. In our previous work [54], we proposed an algorithm, called sparse discrimination based multiset canonical correlations (SDbMCCs), which utilized both correlation and sparse reconstructive relationship in multiple representation data. However, it has been proved that it is the collaborative representation (CR) but not the l1-norm sparsity that makes sparse representation based classification (SRC) powerful [61], [62], and the l1-minimization required in sparsity based pattern classification may be time-consuming. In addition, most MCCA-related methods, including SDbMCC, only consider discrimination of the data within each view instead of cross-view which is the focus of MCCA. As multiple views actually correspond to the same objects, there should be some correspondence between multiple views. Furthermore, conventional methods are proposed just for either unsupervised or supervised scenarios. Motivated by recent progress in correlation analysis, collaborative representation and our considerations of the aforementioned issues, in this paper, we propose a view-consistent collaborative preserve projection mechanism and a novel algorithm, termed as view-consistent collaborative discriminative multiset correlation projection (C2DMCP). Some aspects of the proposed C2DMCP method are worth being highlighted:
- (a)
C2DMCP considers both between-set cumulative correlations and structural information of multiple representation data, which makes the extracted features more discriminative and robust and has a relatively low computation complexity.
- (b)
C2DMCP is able to guarantee the structural consistency among different views, which is helpful to pattern classification.
- (c)
From the viewpoint of pattern classification, C2DMCP can largely decrease within-class collaborative reconstructive distances and enlarge between-class collaborative reconstructive distances, simultaneously.
Before introducing the proposed method, we first exhibit the commonly employed notations with their corresponding meanings in Table 1.
Section snippets
CCA: canonical correlation analysis
Given pairs of zero-mean random vectors and , a pair of projection directions, and , is computed by CCA [24] such that the correlation coefficient of canonical variates and is maximized bywhere denotes the expectation, and are within-set covariance matrices of and , respectively, and is the between-set covariance matrix between and . Generally, we
Motivation
This work is motivated by the following three aspects. First of all, as we mentioned before, MCCA is efficient for multiple feature representations based classification tasks. But, it only considers the cumulative correlation information and ignores the intrinsic reconstructive relationship, i.e., structural information, among multi-view data, while representation-based methods’ main idea is to explore the pairwise affinities, i.e., representation structure, between data points. And we take for
Experimental results and analysis
In this section, in order to evaluate the proposed C2MCP and C2DMCP methods, we test the performance of our methods on three widely used benchmark datasets, i.e., AR [67], Extended Yale-B [68], and ETH-80 [69], where the first two are face datasets and the last one is an object dataset. The statistics of each database and the feature representation of different views will be described in Section 4.1. We also compare the performance of the proposed methods with several state-of-the-art MDR
Conclusions
In this paper, we have developed a new technique for joint dimensionality reduction or subspace learning of high dimensional data, called view-consistent collaborative multiset correlation projection (C2MCP) and its supervised extension view-consistent collaborative discriminative multiset correlation projection (C2DMCP). C2DMCP can guarantee the between-view structure consistency. And experiments on several benchmark databases demonstrate that C2DMCP has more discriminating abilities and
Acknowledgments
This work is supported in part by Graduate Research and Innovation Foundation of Jiangsu Province, China under Grant KYLX15_0379, in part by the National Natural Science Foundation of China under Grants 61273251, 61401209, and 61402203, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20140790, and in part by China Postdoctoral Science Foundation under Grants 2014T70525 and 2013M531364.
References (79)
- et al.
Characterizing nonlinear relationships in functional imaging data using eigenspace maximal information canonical correlation analysis (emiCCA)
NeuroImage
(2015) - et al.
Complete canonical correlation analysis with application to multi-view gait recognition
Pattern Recogn.
(2016) - et al.
Appearance models based on kernel canonical correlation analysis
Pattern Recogn.
(2003) - et al.
Locality preserving CCA with applications to data visualization and pose estimation
Image Vis. Comput.
(2007) - et al.
A theorem on the generalized canonical projective vectors
Pattern Recogn.
(2005) - et al.
A new method of feature fusion and its application in image recognition
Pattern Recogn.
(2005) - et al.
Fractional-order embedding multiset canonical correlations with applications to multi-feature fusion and recognition
Neurocomputing
(2013) - et al.
A novel multiset integrated canonical correlation analysis framework and its application in feature fusion
Pattern Recogn.
(2011) - et al.
Graph regularized multiset canonical correlations with applications to joint feature extraction
Pattern Recogn.
(2014) - et al.
A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction
Neurocomputing
(2015)
Sparsity preserving projections with applications to face recognition
Pattern Recogn.
Fast k-NN classification for multichannel image data
Pattern Recogn. Lett.
Multiview hessian discriminative sparse coding for image annotation
Comput. Vis. Image Underst.
Multiview Hessian regularized logistic regression for action recognition
Signal Process.
Pairwise constraints based multiview features fusion for scene classification
Pattern Recogn.
Large-margin multi-view information bottleneck
IEEE Trans. Pattern Anal. Mach. Intell.
Multi-view intact space learning
IEEE Trans. Pattern Anal. Mach. Intell.
Weakly-paired maximum covariance analysis for multimodal dimensionality reduction and transfer learning
Clustering multi-represented objects with noise
A survey of multi-view machine learning
Neural Comput. Appl.
Combining labeled and unlabeled data with co-training
A co-training approach for multi-view spectral clustering
Co-regularized multi-view spectral clustering
Adv. Neural Inform. Process. Syst.
A new analysis of co-training
Learning the kernel matrix with semi-definite programming
J. Mach. Learn. Res.
Large scale multiple kernel learning
J. Mach. Learn. Res.
SimpleMKL
J. Mach. Learn. Res.
Statistical pattern recognition: a review
IEEE Trans. Pattern Anal. Mach. Intell.
Estimating the intrinsic dimension of data with a fractal-based method
IEEE Trans. Pattern Anal. Mach. Intell.
Principle Components Analysis
The use of multiple measurements in taxonomic problems
Ann. Eugenic.
Face recognition using laplacianfaces
IEEE Trans. Pattern Anal. Mach. Intell.
Patch alignment for dimensionality reduction
IEEE Trans. Knowl. Data Eng.
Shared kernel information embedding for discriminative inference
IEEE Trans. Pattern Anal. Mach. Intell.
Large-margin predictive latent subspace learning for multiview data analysis
IEEE Trans. Pattern Anal. Mach. Intell.
Multi-view discriminant analysis
Multi-view discriminant analysis
IEEE Trans. Pattern Anal. Mach. Intell.
Relations between two sets of variates
Biometrika
Cited by (0)
- ☆
This paper has been recommended for acceptance by Zicheng Liu.