1 Introduction

Sparse representations of signals have received a great deal of attentions in recent years [1]. Many classification and dimensionality reduction approaches have been proposed to explore the sparsity of the data. For classification, Wright et al. [2] presented a sparse representation based classification (SRC) scheme. Mi and Liu [3] suggest performing SRC on the K selected classes. The L1-norm used in SRC has shown certain robustness to noise in the data. For dimensionality reduction, Qiao et al. [4] proposed a sparsity preserving projections (SPP) and Ly et al. [5] proposed a sparse graph-based discriminant analysis (SGDA). Both of them aim to preserve the sparse construction relationship of the data. Besides sparse representation, sparse subspace learning methods, which seek sparse projection directions, also enjoy the benefits of sparsity. The famous sparse PCA (SPCA) [6] and sparse linear discriminant analysis (SLDA) [7] both impose a sparseness constraint on projection vectors. Generally, the above methods have achieved outstanding performance in various tasks. However, solving L1-norm minimization makes the sparsity based classification schemes very expensive, especially for large-scale datasets.

While SRC emphasizes the role of sparsity on representation coefficients, it has been shown in [8, 9] that collaborative representation mechanism is more important to the success of SRC and a collaborative representation based classification (CRC) is then proposed. Following it, Yang et al. [10] proposed a collaborative representation based projection (CRP) which aims at preserving the CR based reconstruction relationship of the data. But like CRC, CRP is also an unsupervised method. To utilize the label information, collaborative graph-based discriminant analysis (CGDA) [11] was proposed to use all training samples from the same class to reconstruct each training sample by CR. It then seeks a discriminant subspace by minimizing the intra-class scatter matrix. The motivation of all these methods is to emphasize “collaborative” instead of “competitive” nature of relationships. At the same time, these methods enjoy computational efficiency because a closed-form solution is available when estimating the representation coefficients, due to the used L2-norm based optimization.

Nevertheless, CGDA only constructs the intra-class graph which takes into account local geometry and ignores global geometry. How to construct supervised graph and embed local and global structures into subspace learning is still a challenging problem in the field of dimensionality reduction (DR). The classic graph-construction DR methods in literature include linear discriminant analysis (LDA) [12], locality preserving projection (LPP) [13, 14], neighborhood preserving projection (NPE) [15], neighborhood preserving discriminant embedding (NPDE) [16], and Marginal Fisher Analysis (MFA) [17]. For these methods, it is commonly believed that, if two data points lay closely in the original space, their intrinsic geometry distribution should be preserved in the new subspace.

Building on the success of CR and graph-construction methods, we propose a novel DR method called collaborative representation based neighborhood preserving projection (CRNPP). It incorporates the intra-class and inter-class discriminant information into the graph construction of collaborative representation coefficients. To be specific, the main merits of CRNPP lie in three folds:

  1. (1)

    When deriving the representation coefficients of input samples, we strengthen the collaborative relationship of data from the same class, as well as inhibit the collaborative relationship of data from different classes. This step also avoids the difficulty of choosing parameters in graph construction;

  2. (2)

    By constructing the intra-class and inter-class discriminant graphs of reconstruction coefficients, the global and local geometry is preserved so that samples in the same class are as compact as possible while samples in different classes are as separable as possible in the projection subspace.

  3. (3)

    A closed-form solution can be obtained and CRNPP does not involve the iterative calculation as in sparse presentation based methods. Experiments on ORL, UMIST and BANCA face databases demonstrate the effectiveness and efficiency of the proposed approach.

The rest of paper is organized as follows. In Sect. 2, we briefly review collaborative representation and CGDA. In Sect. 3, we describe our proposed method CRNPP in detail and in Sect. 4, the experimental results and discussions are provided. Finally, conclusions are given in Sect. 5.

2 Collaborative Graph-Based Discriminant Analysis

2.1 Collaborative Representation

Suppose \(X=\{x_1,x_2,\dots ,x_n\}\) is a training set of n samples. The mechanism of CR [8, 9] is to represent every sample x by a linear combination of the rest samples via L2-norm regularized least squares. The collaborative representation coefficient \(\alpha \) for x with L2-norm optimization is to be found:

$$\begin{aligned} \begin{aligned}&\mathop {{{\mathrm{argmin}}}}_{\alpha }\,{\Vert \alpha \Vert }_2 \\&s.t.\quad x=X\alpha . \end{aligned} \end{aligned}$$
(1)

The constrained optimization problem in Eq. (1) can be formulates as:

$$\begin{aligned} \mathop {{{\mathrm{argmin}}}}_{\alpha }\,{\Vert x-X\alpha \Vert }_2^2+\lambda \Vert \alpha \Vert _2^2, \end{aligned}$$
(2)

where \(\lambda \) is an adjusting parameter which controls the relative importance of L2-regularization term \(\Vert \alpha \Vert _2^2 \) and error term \(\Vert x-X\alpha \Vert _2^2\).

The role of L2-regularization term \(\Vert \alpha \Vert _2^2 \) is two-folds. First, it makes the least square solution stable, particularly when X is under-determined; second, it introduces a certain amount of sparsity to \(\alpha \), although this sparsity is generally weaker than that by L1-norm regularization. The solution to Eq. (2) can be obtained as:

$$\begin{aligned} \alpha =(X^TX+\lambda I)^{-1}X^Tx, \end{aligned}$$
(3)

where I is an identity matrix.

2.2 Collaborative Graph-Based Discriminant Analysis

We introduce CGDA [11] briefly. Suppose that we have a training set \(X=\{X_1,X_2,\dots ,X_c\}\) of n samples. There are \(N_i (i=1,2,\dots ,C)\) samples in the ith class. In CGDA, the collaborative representation vector is calculated by solving a L2-norm optimization problem,

$$\begin{aligned} \begin{aligned}&\mathop {{{\mathrm{argmin}}}}_{\alpha _j^i}\,{\Vert \alpha _j^i\Vert }_2\\&s.t.\quad x_j^i=X_i\alpha _j^i. \end{aligned} \end{aligned}$$
(4)

In (4), \(x_j^i\) denotes the jth samples in the ith class and \(X_i\) denotes the samples from the ith class except \(x_j^i\). Note that Eq. (4) can be further written as:

$$\begin{aligned} \mathop {{{\mathrm{argmin}}}}_{\alpha _j^i}\,{\Vert x_j^i-X_i\alpha _j^i\Vert }_2^2+\lambda \Vert \alpha _j^i\Vert _2^2, \end{aligned}$$
(5)

where \(\lambda \) is an adjusting parameter. The solution of Eq. (5) is

$$\begin{aligned} \alpha _j^i=(X_i^TX_i+\lambda I)^{-1}X_i^Tx_j^i , \end{aligned}$$
(6)

We define \(W_i=[\alpha _1^i,\alpha _2^i,\dots ,\alpha _{N_i}^i]\) as the graph weighted matrix of size \(N_i\times N_i\) whose column \(\alpha _j^i\) is the collaborative representation vector corresponding to \(x_j^i\). Note that the diagonal elements in \(W_i\) are set to be zero. Thus, the intra-class weight matrix \(W_s\) can be expressed as:

$$\begin{aligned} W_s=\begin{bmatrix} W_1&0&\dots&0\\ 0&W_2&\dots&0\\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\dots&W_c \end{bmatrix}. \end{aligned}$$
(7)

According to the graph-embedding based dimensionality reduction framework, the aim is to find a \(m\times d\) projection matrix P (with \(d\ll m\)) which results in a low-dimensional subspace \(Y=P^TX\). The objective of CGDA is to maintain the collaborative representation in the low-dimensional space, which can be formulated as:

$$\begin{aligned} \begin{aligned} P^*&=\mathop {{{\mathrm{argmin}}}}_{P^TXL_pX^TP=I}\sum _{i\ne j}{\Vert P^Tx_i-P^T\sum _j W_{ij}x_j\Vert ^2}\\&=\mathop {{{\mathrm{argmin}}}}_{P^TXL_pX^TP=I}{tr(P^TXLX^TP)}, \end{aligned} \end{aligned}$$
(8)

where \(L=(I-W_s)^T(I-W_s)\) and \(L_p=I\). The optimal projection matrix P can be obtained as:

$$\begin{aligned} P^*=\mathop {{{\mathrm{argmin}}}}_{P} \,{\frac{|P^TXLX^TP|}{|P^TXL_pX^TP|}}, \end{aligned}$$
(9)

which can be solved as a generalized eigenvalue decomposition problem,

$$\begin{aligned} XLX^Tp=\varLambda XL_pX^Tp. \end{aligned}$$
(10)

For a \(m\times d\) projection matrix P, it is constructed by the d eigenvectors corresponding to the d smallest nonzero eigenvalues.

3 Collaborative Representation Based Neighborhood Preserving Projection

The aforementioned CGDA only constructs the intra-class graph and ignores the separability for samples in different classes. To cope with this, we propose to incorporate the intra-class and inter-class discriminant information into the graph construction of collaborative representation coefficients simultaneously. We formally state the proposed CRNPP method in detail here.

3.1 Calculating Coefficients and Weight Matrix

We construct local weight matrix like in CGDA. In the collaborative representation, for intra-class graph construction, every sample can be represented as a linear combination of remaining samples from the same class as in Eq. (4). Then we can get the intra-class weight matrix as in Eq. (7).

On the other hand, every sample can be represented as a linear combination of samples from different classes,

$$\begin{aligned} \begin{aligned}&\mathop {{{\mathrm{argmin}}}}_{\beta _j^i}\,{\Vert \beta _j^i\Vert }_2\\&s.t.\quad x_j^i=X^i\beta _j^i, \end{aligned} \end{aligned}$$
(11)

where \(x_j^i\) denotes the jth samples in the ith class and \(X^i\) denotes the samples except the ones from the ith class. Note that Eq. (11) can be further written as:

$$\begin{aligned} \mathop {{{\mathrm{argmin}}}}_{\beta _j^i}\,{\Vert x_j^i-X^i\beta _j^i\Vert }_2^2+\lambda \Vert \beta _j^i\Vert _2^2, \end{aligned}$$
(12)

where \(\lambda \) is an adjusting parameter. The solution of Eq. (12) is:

$$\begin{aligned} \beta _j^i=((X^i)^TX^i+\lambda I)^{-1}(X^i)^Tx_j^i, \end{aligned}$$
(13)

We can get the optimal inter-class collaborative representation vectors as \(\beta _j^i=(b_{j,1}^i,b_{j,2}^i,\dots ,b_{j,i-1}^i,0,b_{j,i+1}^i,\dots ,b_{j,c}^i)^T\). Similar to the intra-class weight vector, the inter-class weight vector for the ith class is in the form of \(B_k^i=[(b_{1,k}^i)^T,(b_{2,k}^i)^T,\dots ,(b_{N_i,k}^i)^T]\), which denotes the weight vector of kth class to reconstruct the ith class. The element of inter-class weight matrix \(W_b^{(ij)}\) can be expressed as:

$$\begin{aligned} W_b= {\left\{ \begin{array}{ll} B_i^j, &{}if\quad i\ne j\\ 0, &{} if\quad i=j \end{array}\right. }. \end{aligned}$$
(14)

Finally, we can get the inter-class weight matrix as:

$$\begin{aligned} W_b=\begin{bmatrix} 0&B_1^2&B_1^3&\dots&B_1^c \\ B_2^1&0&B_2^3&\dots&B_2^c \\ \vdots&\vdots&\vdots&\ddots&\vdots \\ B_c^1&B_c^2&B_c^3&\dots&0 \end{bmatrix}. \end{aligned}$$
(15)

3.2 Computing the Neighborhood Preserving Projection

To get the linear projection in the low-dimensional subspace, we aim to minimize the local compactness, which can be defined as:

$$\begin{aligned} \begin{aligned} \mathop {{{\mathrm{argmin}}}}_{P}\,J_1(P)&= \Vert P^Tx_i-\sum _j{W_s^{(ij)}P^Tx_j}\Vert ^2\\&= tr(P^TXM_sX^TP), \end{aligned} \end{aligned}$$
(16)

where \(M_s=(I-W_s)^T(I-W_s)\) is the Laplacian matrix of the graph associated with the intra-class weight matrix.

At the same time, in the projection subspace, we also want to maximize the distance of samples between different classes. The objective function can be defined as:

$$\begin{aligned} \begin{aligned} \mathop {{{\mathrm{argmax}}}}_{P}\,J_2(P)&= \Vert P^Tx_i-\sum _j{W_b^{(ij)}P^Tx_j}\Vert ^2\\&= tr(P^TXM_pX^TP), \end{aligned} \end{aligned}$$
(17)

where \(M_p=(I-W_b)^T(I-W_b)\) is Laplacian matrix of the graph associated with the inter-class weight matrix.

Combining Eqs. (16) and (17) together, we can get the following optimization problem,

$$\begin{aligned} \begin{aligned} \mathop {{{\mathrm{argmax}}}}_{P}\,J(P)&=J_2(P)-\gamma J_1(P)\\&= tr(P^TXM_pX^TP)-\gamma tr(P^TXM_sX^TP)\\&= tr(P^TXMX^TP), \end{aligned} \end{aligned}$$
(18)

where \(M=M_p- \gamma M_s\), and \(\gamma \) is a parameter to balance the inter-class and intra-class information of the data.

Generally, the dimension of the sample is much larger than number of training samples. Thus, we replace the quotient criterion in previous DR methods [12,13,14,15,16,17] with the difference criterion, which can overcome the small sample size problem. This criterion is similar to the generalized version of conventional MMC criterion [18]. In this way, we transform the data from original space to some suitable place, which aims at maximizing the margin of different classes in the reduced space.

Equation (18) can be solved as a standard eigenvalue decomposition problem,

$$\begin{aligned} XMX^Tp=\varLambda p, \end{aligned}$$
(19)

where \(\varLambda \) is the eigenvalue and p is the corresponding eigenvector. Suppose that the first d largest eigenvalues are \(\varLambda _1,\varLambda _2,\dots ,\varLambda _d\) and their corresponding eigenvectors are \(p_1,p_2,\dots ,p_d\), then we can get a \(m\times d\) projection matrix \(P=[p_1,p_2,\dots ,p_d]\).

The proposed CRNPP algorithm is summarized as follows:

figure a

In summary, the key idea for the proposed CRNPP is to incorporate the reconstruction coefficients of collaborative representation into subsequent graph construction. Then, the optimal projection is obtained by minimizing the local compactness and maximizing the global discriminant information simultaneously. After the projection, the Nearest Neighbors (NN) algorithm can be used to identify the class label of each testing sample.

4 Experimental Verification

We evaluate the proposed CRNPP algorithm on three well-known face databases (ORL, UMIST and BANCA) and compare its performance with CRC [8], NPE [15], SPP [4], MFA [17] and CGDA [11]. In the experiments, we keep 99% energy in the pre-processing step as in [13] and use the NN classifier for final classification due to its simplicity. The average recognition accuracy is reported over 10 random splits.

Dataset preparation. The ORL face database contains 400 images of 40 individuals under various facial expressions and lighting conditions. In our experiments, each image is manually cropped and resized to \(32\times 32\) pixels. The UMIST database consists of 20 people with totally 564 images. Each image is down-sampled to \(56\times 46\). The BANCA database took 208 people at different times, status, quality, light and expressions of the standard face images. We randomly selected 52 people which are of different ages and different genders. This subset contains 520 pictures with each picture being down-sampled to \(56\times 46\).

Performance Comparison. First, we compare the classification performance of the proposed CRNPP method with the related methods. For all ORL, UMIST and BANCA datasets, we randomly choose 4, 5 and 6 images per person for training, respectively, and the corresponding rest images are for test. Tables 1, 2 and 3 show the top classification accuracy corresponding to optimal dimension of all methods on ORL, UMIST and BANCA datasets, respectively. From the results, it can be seen that with increasing number of training samples, the accuracy of all methods increases. CRNPP consistently outperforms the other methods.

Table 1. The top recognition accuracy (%) of the six approaches and the corresponding optimal dimension on the ORL database
Table 2. The top recognition accuracy (%) of the six approaches and the corresponding optimal dimension on the UMIST database
Table 3. The top recognition accuracy (%) of the six approaches and the corresponding optimal dimension on the BANCA database

Further, we plot the average accuracy of the DR methods versus the reduced dimensions in Fig. 1, where the case of 4 training samples are considered. CRC, which is not a DR method, is also presented as a baseline. We have the following observations. First, SPP is inferior to NPE and CGDA, due to the fact that SPP is an unsupervised method. Second, both NPE and CGDA do not explicitly encode the discriminant information of data points which have different labels, so MFA, which considers both global and local information, shows a little performance improvements. Finally, with further preserving the collaborative relationship of the data, the proposed CRNPP performs the best.

Fig. 1.
figure 1

The recognition accuracy versus the number of reduced dimensions on three databases, when 4 training samples are considered. (a) ORL database; (b) UMIST database; (c) BANCA database.

To further investigate the performance of our method against the used adjusting parameters, we also discuss the recognition accuracies of CRNPP with the variations of \(\lambda \) and \(\gamma \), respectively. Figure 2(a) shows the recognition accuracies of CRNPP on the UMIST dataset when \(\lambda \) is tuned from \(\{10^{-5},10^{-4},\dots ,10^0\}\), while \(\gamma \) is fixed as 15. Figure 2(b) gives the recognition accuracies of CRNPP when \(\gamma \) is turned from \(\{10,11,\dots ,20\}\), while \(\lambda \) is set to 0.1. From Fig. 2, we can see that the performance of CRNPP is not very sensitive to the chosen parameters and it is more stable on \(\gamma \) than on \(\lambda \).

Fig. 2.
figure 2

The performance of CRNPP on the UMIST database with different choices of adjusting parameters. (a) average accuracies versus \(\lambda \); (b) average accuracies versus \(\gamma \).

Table 4. The average time cost (in seconds) in training stage on the ORL, UMIST and BANCA databases

Finally, we compare the average computational time of related methods in the training stage. Table 4 gives the average time cost (in seconds) of all the methods on the ORL, UMIST and BANCA datasets, respectively, where 4 training samples are selected from per individual. It can be seen that CRNPP shows the same level of computational complexity as CGDA, which is another collaborative representation based DR method. It should be noted that CRNPP consumes much less time than SPP because CRNPP provides a closed-form solution while SPP is an iterative method. This experiment shows the advantage of L2-norm based representation against L1-norm based one, in terms of computational cost.

5 Conclusion

In this paper, we have presented a novel dimensionality reduction (DR) method for image classification, named as collaborative representation based neighborhood preserving projection (CRNPP). On one hand, it incorporates the intra-class and inter-class discriminant information into the graph construction, so the global and local geometry of the data can be preserved in the projection subspace. On the other hand, it can take advantage of the collaborative representation to mining the collaborative relationship of the data, thus the proposed method avoids the difficulty of choosing the number of nearest neighbors in weight matrix construction as in traditional DR methods. Moreover, by using L2-norm minimization-based optimization, our method can provide a closed-form solution, therefore it is as fast as CRC alike methods. Comparisons to most used DR techniques show the superiority of the proposed algorithm in terms of both recognition accuracy and computional cost.

Kernel methods can map an input feature space into a high dimensional feature space [19,20,21], where the problem of weak linear separability could be solved. Kernel trick has been widely used without explicitly evaluating a nonlinear mapping function. It is interesting to further extend our CRNPP into a kernel version, which is able to extract nonlinear discriminant features in kernel-included spaces. We left it as our future work.