Elsevier

Neurocomputing

Volume 123, 10 January 2014, Pages 121-130
Neurocomputing

Incremental min–max projection analysis for classification

https://doi.org/10.1016/j.neucom.2013.06.010Get rights and content

Abstract

For data classification, the standard implementation of projection algorithms do not scale well with large dataset size. It makes the computation of large samples infeasible. In this paper, we utilize a block optimization strategy to propose a new locally discriminant projection algorithm termed min–max projection analysis (MMPA). The algorithm takes into account both intra-class and interclass geometries and also possesses the orthogonality property. Furthermore, an incremental MMPA is proposed to learn the local discriminant subspace with newly inserted data by employing the idea of singular value decomposition updating algorithm. Moreover, we extend MMPA to the semi-supervised case and nonlinear case, namely, semi-supervised MMPA and kernel MMPA. The experimental results on image database, hand written digit database, and face database demonstrate the effectiveness of those proposed algorithms.

Introduction

The problem of feature extraction is one of the core issues for data mining and classification. Using a more efficient feature extraction method can improve the classification results in the reduced subspace. The problem of dimensionality reduction can be described as follows. Consider a data set X, which consists of n samples xi (1≤in) in a high-dimensionality space Rm. The objective of dimensionality reduction is to compute a faithful low-dimensionality representation of X, i.e. Y=[y1,…,yn]∈ Rd×n, where dm.

Over the past decades, numerous dimension reduction methods have been proposed to find the low-dimensional feature representation. The two most popular techniques for this purpose are principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2]. PCA is an unsupervised algorithm based on the computation of low-dimensional representation of high dimensional data, which maximizes the total scatter. Comparatively LDA is a supervised feature extraction technique for pattern recognition, and it tends to find a set of projective directions which maximize the between-class scatter and simultaneously minimize the within-class scatter. An intrinsic limitation of LDA is that it usually suffers from the small sample size (SSS) problem, where the sample size is much smaller than the size of dimensionality of samples. Additionally, both PCA and LDA can only see the global Euclidean structure but cannot discover the embedding structure hidden in the high-dimensional data.

In order to exploit the local discriminative manifold structure, a lot of subspace learning techniques have been proposed, such as locality preserving projections (LPP) [3], Maximal Similarity Embedding [4], Local Spline Discriminant Projection [5], and Neighborhood Preserving Embedding [6]. Recently, some researchers pointed out that enforcing an orthogonality relationship between projection directions can achieve competitive effectiveness, and therefore the orthogonal neighborhood preserving projection (ONPP) was introduced [5], [7]. However, for classification problems, ONPP and LPP (even in a supervised setting) only focus on the intra-class geometrical information while the interaction of samples from different classes is ignored.

More recently, numerous algorithms have been proposed which take the intra-class preserving into consideration as well as the interclass discriminant [8], [9], [10], [11]. Among them, the locality sensitive discriminant analysis (LSDA) [8] and its variation maximum margin projection (MMP) [9] are two typical examples, which gain more competitive results in image recognition applications. Yan et al. [12] explained most of these manifold learning techniques as a general framework that can be defined in a graph-embedding way. Generally, a discriminative feature extraction algorithm is summarized as a graph-based constraint embedding by defining the intrinsic and penalty graphs. In other words, it finds a set of projection directions in the linear embedded subspace, i.e., J(U)=argmin{(UTXLXTU)/(UTXBXTU)} or J(U)=argmin{UTXLXTU}, subject to UTXBXTU=c, where c is a constant, X is the data matrix, and L is the Laplacian matrix of intrinsic graph, which is defined as follows: L=DW, Dii=jWij. Here, W is the affinity matrix of the intrinsic graph. In addition, B can be the Laplacian matrix of penalty graph, B=DpWp, where Wp indicates the adjacency matrix of penalty graph. Wp describes the similarity of interclass data which should be avoided for classification. Dp is the diagonal matrix defined in the graph-embedding framework. The general solution of the optimal U is to find the eigenvector corresponding to the smallest eigenvalue using the generalized eigenvalue decomposition (ED) XLXTUXBXTU, which has a heavy computation burden because of the high data dimensionality, especially in image and video applications.

Incremental learning has already attracted much attention as a result of the increasing demand for developing machine vision/ intelligent systems. Numerous incremental learning algorithms have been proposed, especially in the data-mining domain and the image-retrieval field [13], [14], [15]. Most of these recent works are designed for incremental principal component analysis [16], [17] and incremental linear discriminant analysis [18], [19]. Both of them are global statistic feature extraction algorithms. To our best knowledge, there are few works focusing on the incremental local discriminant embedding except for the ILDSE proposed by Miao et al. [20], which demands that B must be Laplacian matrix.

In this paper, we propose a new algorithm termed min–max projection analysis (MMPA) based on the perspective of block optimization [21]. MMPA offers three main benefits (1) the algorithm takes into account both intra-class and interclass geometries so that it can achieve better performance in classification; (2) the algorithm produces the orthogonal projection matrix; and (3) the combination matrix of this algorithm can be iteratively computed for the newly inserted samples.

Furthermore, an incremental MMPA is introduced to learn the discriminative sub-manifold structure incrementally, namely, incremental MMPA (IMMPA). This paper also extends MMPA to the semisupervised case and nonlinear case, termed semisupervised MMPA (SMMPA) and kernel MMPA (KMMPA) respectively. SMMPA is produced by incorporating the additional unlabeled samples, and KMMPA performs MMPA in reproducing kernel hilbert space (RKHS).They are powerful. For generalization, the proposed algorithm is also based on graph-embedding framework [12] that incorporates the graph adjacency to represent the discriminative weights of data.

The rest of the paper is organized as follows. Section 2 introduces MMPA algorithm as well as the incremental implementation. Subsequently, the semisupervised MMPA algorithm is proposed in Section 3. In Section 4, the algorithm is extended to the nonlinear case, termed KMMPA. The experimental performance of the proposed algorithms is presented in Section 5. Finally, we conclude this paper in Section 6.

Section snippets

Min-max projection analysis (MMPA)

For a given training set X=[x1,x2,…,xn]∈ Rm×n, where m, n denote the dimension and the number of the original samples respectively. The proposed MMPA algorithm aims at learning a linear transformation matrix U, which can be used as Y=UTX to projection the original samples to subspace data Y=[y1,…,yn] ∈Rd×n, where dm.

After the transformation, the considered pairwise samples within the same class are as close as possible, while those between classes are as far as possible. The whole algorithm

Semi-supervised MMPA

It has been found that unlabeled samples may be helpful to improve the classification performance [24], [25]. Therefore, this paper generalizes MMPA by introducing new block optimizations based on unlabeled samples and then incorporating them into the combination stage as SMMPA.

Suppose Xu=[Xn+1,…,Xn+nu] is the newly inserted unlabeled data matrix. For each unlabeled sample xi (i=n+1,…,n+nu), we search its kiu nearest neighbors, xi1,…, xikiu in all training samples including both labeled and

Kernel MMPA

MMPA is a linear algorithm. It may fail to discover the intrinsic geometry when the data manifold is highly nonlinear. In this section, we discuss how to perform MMPA in reproducing kernel hilbert space (RKHS), which gives rise to kernel MMPA.

With the same training set X=[x1,x2,…,xn]∈Rm×n, we consider the problem in a feature space F induced by some nonlinear mapping ϕ=X−>F. For a proper chosen ϕ, an inner product <,> can be defined on F which makes for a so-called reproducing kernel hilbert

Experiments

In this section, we evaluate the effectiveness of the presented discriminant embedding method. Three publicly available databases, namely, COIL, USPS and ORL are selected to evaluate the performance of our proposed methods in comparison to other classical algorithms. All experiments are performed on a PC with Intel Core2 CPU 1.8 GHz and 2GB main memory. In the experiments, we apply the three-steps commonly used in recognition problems. First, each algorithm is applied to the training samples to

Conclusion

According to the block optimization scheme, we have proposed a new dimensionality reduction algorithm, namely, MMPA. Furthermore, the incremental MMPA is implemented by the SVD update. MMPA has the following advantages: MMPA performs better than the classical dimension reduction algorithms in classification, because it preserves discriminative information over the built blocks, and its projection matrix is orthogonal. Experiments have demonstrated the effectiveness of the proposed algorithm,

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This project was supported in part by Provincial Science Foundation of Zhejiang (LQ12F03011), National Natural Science Foundation of China (613250211, 61173096), and National Science and Technology Support Plan (2012BAD10B0101).

JianWei Zheng received M.S. degree in Electrical & Information Engineering in 2005 and Ph.D. degree in Control Theory and Control Engineering in 2010 from Zhejiang University of Technology, China. Currently he is a Lecturer at Zhejiang University of Technology. His research interest covers machine learning and feature extraction.

References (26)

  • D. Cai, X. He, K. Zhou, Locality sensitive discriminant analysis, in: International Joint Conferences on Artificial...
  • X. He et al.

    Learning a maximum margin subspace for image retrieval

    IEEE Trans. Know. Data Eng.

    (2008)
  • H. Cai et al.

    Learning linear discriminant projections for dimensionality reduction of image descriptors

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • Cited by (4)

    • Kernel group sparse representation classifier via structural and non-convex constraints

      2018, Neurocomputing
      Citation Excerpt :

      Both the theoretical analysis and the experimental results have showed the promising performance of GSC, outperforming both SRC and CRC. More recently, it has been verified that the property of locality preservation is more important for a classifier [13,14]. As a result, many regression-based works have been proposed, such as integrating the data locality into the constraints of l1-norm [15,16], l2-norm [17,18] or group norm [19,20], for improvement.

    • Incremental methods in face recognition: a survey

      2021, Artificial Intelligence Review
    • Feature weighted group sparse discriminative projection algorithm

      2016, Zidonghua Xuebao/Acta Automatica Sinica
    • Dimensionality reduction by supervised neighbor embedding using laplacian search

      2014, Computational and Mathematical Methods in Medicine

    JianWei Zheng received M.S. degree in Electrical & Information Engineering in 2005 and Ph.D. degree in Control Theory and Control Engineering in 2010 from Zhejiang University of Technology, China. Currently he is a Lecturer at Zhejiang University of Technology. His research interest covers machine learning and feature extraction.

    Dan Yang received her B.E. degree from Zhejiang University of Technology in 2012, and now majored in her M.E. degree in the Artificial Intelligence Laboratory of Zhejiang University of Technology. Her research interests include pattern recognition and dispatching algorithm.

    Shengyong Chen received the Ph.D. degree in computer vision from City University of Hong Kong, Hong Kong, in 2003. He joined Zhejiang University of Technology, China, in Feb. 2004 where he is currently a Professor in the Department of Computer Science. He received a fellowship from the Alexander von Humboldt Foundation of Germany and worked at University of Hamburg in 2006–2007. He worked as a visiting professor at Imperial College, London in 2008–2009 and a visiting professor at University of Cambridge, U.K in 2012. His research interests include computer vision, robotics, 3D object modeling, and image analysis. Dr. Chen is a Fellow of IET, a senior member of IEEE, and a committee member of IET Shanghai Branch. He has published over 100 scientific papers in international journals and conferences.

    Wanliang Wang received Ph.D. degree in Control Theory and Control Engineering in 2001. He has devoted nearly 20 years to educational work and now he is the dean of School of Computer Science and Technology at the Zhejiang University of Technology. As a researcher, he is leading a large research group in the field of simulation for small hydropower projects and many valuable achievements have made. His research interest covers intelligent algorithms and network control.

    View full text