Incremental min–max projection analysis for classification

doi:10.1016/j.neucom.2013.06.010

Neurocomputing

Volume 123, 10 January 2014, Pages 121-130

https://doi.org/10.1016/j.neucom.2013.06.010 Get rights and content

Abstract

For data classification, the standard implementation of projection algorithms do not scale well with large dataset size. It makes the computation of large samples infeasible. In this paper, we utilize a block optimization strategy to propose a new locally discriminant projection algorithm termed min–max projection analysis (MMPA). The algorithm takes into account both intra-class and interclass geometries and also possesses the orthogonality property. Furthermore, an incremental MMPA is proposed to learn the local discriminant subspace with newly inserted data by employing the idea of singular value decomposition updating algorithm. Moreover, we extend MMPA to the semi-supervised case and nonlinear case, namely, semi-supervised MMPA and kernel MMPA. The experimental results on image database, hand written digit database, and face database demonstrate the effectiveness of those proposed algorithms.

Introduction

The problem of feature extraction is one of the core issues for data mining and classification. Using a more efficient feature extraction method can improve the classification results in the reduced subspace. The problem of dimensionality reduction can be described as follows. Consider a data set X, which consists of n samples x_i (1≤i≤n) in a high-dimensionality space R^m. The objective of dimensionality reduction is to compute a faithful low-dimensionality representation of X, i.e. Y=[y₁,…,y_n]∈ R^d×n, where d⪡m.

Over the past decades, numerous dimension reduction methods have been proposed to find the low-dimensional feature representation. The two most popular techniques for this purpose are principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2]. PCA is an unsupervised algorithm based on the computation of low-dimensional representation of high dimensional data, which maximizes the total scatter. Comparatively LDA is a supervised feature extraction technique for pattern recognition, and it tends to find a set of projective directions which maximize the between-class scatter and simultaneously minimize the within-class scatter. An intrinsic limitation of LDA is that it usually suffers from the small sample size (SSS) problem, where the sample size is much smaller than the size of dimensionality of samples. Additionally, both PCA and LDA can only see the global Euclidean structure but cannot discover the embedding structure hidden in the high-dimensional data.

In order to exploit the local discriminative manifold structure, a lot of subspace learning techniques have been proposed, such as locality preserving projections (LPP) [3], Maximal Similarity Embedding [4], Local Spline Discriminant Projection [5], and Neighborhood Preserving Embedding [6]. Recently, some researchers pointed out that enforcing an orthogonality relationship between projection directions can achieve competitive effectiveness, and therefore the orthogonal neighborhood preserving projection (ONPP) was introduced [5], [7]. However, for classification problems, ONPP and LPP (even in a supervised setting) only focus on the intra-class geometrical information while the interaction of samples from different classes is ignored.

More recently, numerous algorithms have been proposed which take the intra-class preserving into consideration as well as the interclass discriminant [8], [9], [10], [11]. Among them, the locality sensitive discriminant analysis (LSDA) [8] and its variation maximum margin projection (MMP) [9] are two typical examples, which gain more competitive results in image recognition applications. Yan et al. [12] explained most of these manifold learning techniques as a general framework that can be defined in a graph-embedding way. Generally, a discriminative feature extraction algorithm is summarized as a graph-based constraint embedding by defining the intrinsic and penalty graphs. In other words, it finds a set of projection directions in the linear embedded subspace, i.e., J(U)=argmin{(U^TXLX^TU)/(U^TXBX^TU)} or J(U)=argmin{U^TXLX^TU}, subject to U^TXBX^TU=c, where c is a constant, X is the data matrix, and L is the Laplacian matrix of intrinsic graph, which is defined as follows: L=D−W, D_ii=∑_jW_ij. Here, W is the affinity matrix of the intrinsic graph. In addition, B can be the Laplacian matrix of penalty graph, B=D_p−W_p, where W_p indicates the adjacency matrix of penalty graph. W_p describes the similarity of interclass data which should be avoided for classification. D_p is the diagonal matrix defined in the graph-embedding framework. The general solution of the optimal U is to find the eigenvector corresponding to the smallest eigenvalue using the generalized eigenvalue decomposition (ED) XLX^TU=λXBX^TU, which has a heavy computation burden because of the high data dimensionality, especially in image and video applications.

Incremental learning has already attracted much attention as a result of the increasing demand for developing machine vision/ intelligent systems. Numerous incremental learning algorithms have been proposed, especially in the data-mining domain and the image-retrieval field [13], [14], [15]. Most of these recent works are designed for incremental principal component analysis [16], [17] and incremental linear discriminant analysis [18], [19]. Both of them are global statistic feature extraction algorithms. To our best knowledge, there are few works focusing on the incremental local discriminant embedding except for the ILDSE proposed by Miao et al. [20], which demands that B must be Laplacian matrix.

In this paper, we propose a new algorithm termed min–max projection analysis (MMPA) based on the perspective of block optimization [21]. MMPA offers three main benefits (1) the algorithm takes into account both intra-class and interclass geometries so that it can achieve better performance in classification; (2) the algorithm produces the orthogonal projection matrix; and (3) the combination matrix of this algorithm can be iteratively computed for the newly inserted samples.

Furthermore, an incremental MMPA is introduced to learn the discriminative sub-manifold structure incrementally, namely, incremental MMPA (IMMPA). This paper also extends MMPA to the semisupervised case and nonlinear case, termed semisupervised MMPA (SMMPA) and kernel MMPA (KMMPA) respectively. SMMPA is produced by incorporating the additional unlabeled samples, and KMMPA performs MMPA in reproducing kernel hilbert space (RKHS).They are powerful. For generalization, the proposed algorithm is also based on graph-embedding framework [12] that incorporates the graph adjacency to represent the discriminative weights of data.

The rest of the paper is organized as follows. Section 2 introduces MMPA algorithm as well as the incremental implementation. Subsequently, the semisupervised MMPA algorithm is proposed in Section 3. In Section 4, the algorithm is extended to the nonlinear case, termed KMMPA. The experimental performance of the proposed algorithms is presented in Section 5. Finally, we conclude this paper in Section 6.

Section snippets

Min-max projection analysis (MMPA)

For a given training set X=[x₁,x₂,…,x_n]∈ R^m×n, where m, n denote the dimension and the number of the original samples respectively. The proposed MMPA algorithm aims at learning a linear transformation matrix U, which can be used as Y=U^TX to projection the original samples to subspace data Y=[y₁,…,y_n] ∈R^d×n, where d⪡m.

After the transformation, the considered pairwise samples within the same class are as close as possible, while those between classes are as far as possible. The whole algorithm

Semi-supervised MMPA

It has been found that unlabeled samples may be helpful to improve the classification performance [24], [25]. Therefore, this paper generalizes MMPA by introducing new block optimizations based on unlabeled samples and then incorporating them into the combination stage as SMMPA.

Suppose X_u=[X_n+1,…,X_n+nu] is the newly inserted unlabeled data matrix. For each unlabeled sample x_i (i=n+1,…,n+n_u), we search its k_iu nearest neighbors, x_i1,…, x_ikiu in all training samples including both labeled and

Kernel MMPA

MMPA is a linear algorithm. It may fail to discover the intrinsic geometry when the data manifold is highly nonlinear. In this section, we discuss how to perform MMPA in reproducing kernel hilbert space (RKHS), which gives rise to kernel MMPA.

With the same training set X=[x₁,x₂,…,x_n]∈R^m×n, we consider the problem in a feature space F induced by some nonlinear mapping ϕ=X−>F. For a proper chosen ϕ, an inner product <,> can be defined on F which makes for a so-called reproducing kernel hilbert

Experiments

In this section, we evaluate the effectiveness of the presented discriminant embedding method. Three publicly available databases, namely, COIL, USPS and ORL are selected to evaluate the performance of our proposed methods in comparison to other classical algorithms. All experiments are performed on a PC with Intel Core2 CPU 1.8 GHz and 2GB main memory. In the experiments, we apply the three-steps commonly used in recognition problems. First, each algorithm is applied to the training samples to

Conclusion

According to the block optimization scheme, we have proposed a new dimensionality reduction algorithm, namely, MMPA. Furthermore, the incremental MMPA is implemented by the SVD update. MMPA has the following advantages: MMPA performs better than the classical dimension reduction algorithms in classification, because it preserves discriminative information over the built blocks, and its projection matrix is orthogonal. Experiments have demonstrated the effectiveness of the proposed algorithm,

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This project was supported in part by Provincial Science Foundation of Zhejiang (LQ12F03011), National Natural Science Foundation of China (613250211, 61173096), and National Science and Technology Support Plan (2012BAD10B0101).

JianWei Zheng received M.S. degree in Electrical & Information Engineering in 2005 and Ph.D. degree in Control Theory and Control Engineering in 2010 from Zhejiang University of Technology, China. Currently he is a Lecturer at Zhejiang University of Technology. His research interest covers machine learning and feature extraction.

References (26)

Y. Chen et al.
Optimal locality preserving projection for face recognition
Neurocomputing
(2011)
Y.K. Lei et al.
Orthogonal local spline discriminant projection with application to face recognition
Pattern Recognit. Lett.
(2011)
J. Gui et al.
Discriminant sparse neighborhood preserving embedding for face recognition
Pattern Recognit.
(2012)
H. Huang et al.
Ear recognition based on uncorrelated local Fisher discriminant analysis
Neurocomputing
(2011)
C. Jia et al.
Incremental multi-linear discriminant analysis using canonical correlations for action recognition
Neurocomputing
(2012)
G.F. Lu et al.
Incremental complete LDA for face recognition
Pattern Recognit.
(2012)
C.M. Bishop
Pattern Recognition and Machine Learning
(2007)
R.O. Duda et al.
Pattern Classification
(2000)
F. Lin et al.
Maximal similarity embedding
Neurocomputing
(2013)
E. Kokiopoulou et al.
Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)

D. Cai, X. He, K. Zhou, Locality sensitive discriminant analysis, in: International Joint Conferences on Artificial...

X. He et al.

Learning a maximum margin subspace for image retrieval

IEEE Trans. Know. Data Eng.

(2008)

H. Cai et al.

Learning linear discriminant projections for dimensionality reduction of image descriptors

IEEE Trans. Pattern Anal. Mach. Intell.

(2010)

Cited by (4)

Kernel group sparse representation classifier via structural and non-convex constraints
2018, Neurocomputing
Citation Excerpt :
Both the theoretical analysis and the experimental results have showed the promising performance of GSC, outperforming both SRC and CRC. More recently, it has been verified that the property of locality preservation is more important for a classifier [13,14]. As a result, many regression-based works have been proposed, such as integrating the data locality into the constraints of l1-norm [15,16], l2-norm [17,18] or group norm [19,20], for improvement.
In this paper, we propose a new classifier named kernel group sparse representation via structural and non-convex constraints (KGSRSN) for image recognition. The new approach integrates both group sparsity and structure locality in the kernel feature space and then penalties a non-convex function to the representation coefficients. On the one hand, by mapping the training samples into the kernel space, the so-called norm normalization problem will be naturally alleviated. On the other hand, an interval for the parameter of penalty function is provided to promote more sparsity without sacrificing the uniqueness of the solution and robustness of convex optimization. Our method is computationally efficient due to the utilization of the Alternating Direction Method of Multipliers (ADMM) and Majorization-Minimization (MM). Experimental results on three real-world benchmark datasets, i.e., AR face database, PIE face database and MNIST handwritten digits database, demonstrate that KGSRSN can achieve more discriminative sparse coefficients, and it outperforms many state-of-the-art approaches for classification with respect to both recognition rates and running time.
Incremental methods in face recognition: a survey
2021, Artificial Intelligence Review
Feature weighted group sparse discriminative projection algorithm
2016, Zidonghua Xuebao/Acta Automatica Sinica
Dimensionality reduction by supervised neighbor embedding using laplacian search
2014, Computational and Mathematical Methods in Medicine

Dan Yang received her B.E. degree from Zhejiang University of Technology in 2012, and now majored in her M.E. degree in the Artificial Intelligence Laboratory of Zhejiang University of Technology. Her research interests include pattern recognition and dispatching algorithm.

Shengyong Chen received the Ph.D. degree in computer vision from City University of Hong Kong, Hong Kong, in 2003. He joined Zhejiang University of Technology, China, in Feb. 2004 where he is currently a Professor in the Department of Computer Science. He received a fellowship from the Alexander von Humboldt Foundation of Germany and worked at University of Hamburg in 2006–2007. He worked as a visiting professor at Imperial College, London in 2008–2009 and a visiting professor at University of Cambridge, U.K in 2012. His research interests include computer vision, robotics, 3D object modeling, and image analysis. Dr. Chen is a Fellow of IET, a senior member of IEEE, and a committee member of IET Shanghai Branch. He has published over 100 scientific papers in international journals and conferences.

Wanliang Wang received Ph.D. degree in Control Theory and Control Engineering in 2001. He has devoted nearly 20 years to educational work and now he is the dean of School of Computer Science and Technology at the Zhejiang University of Technology. As a researcher, he is leading a large research group in the field of simulation for small hydropower projects and many valuable achievements have made. His research interest covers intelligent algorithms and network control.

View full text

Incremental min–max projection analysis for classification

Abstract

Introduction

Section snippets

Min-max projection analysis (MMPA)

Semi-supervised MMPA

Kernel MMPA

Experiments

Conclusion

Acknowledgments

Neurocomputing

Pattern Recognit. Lett.

Pattern Recognit.

Neurocomputing

Neurocomputing

Pattern Recognit.

Pattern Recognition and Machine Learning

Pattern Classification

Maximal similarity embedding

Neurocomputing

Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique

IEEE Trans. Pattern Anal. Mach. Intell.

Learning a maximum margin subspace for image retrieval

IEEE Trans. Know. Data Eng.

Learning linear discriminant projections for dimensionality reduction of image descriptors

IEEE Trans. Pattern Anal. Mach. Intell.