Elsevier

Neurocomputing

Volume 171, 1 January 2016, Pages 664-672
Neurocomputing

Locality sensitive batch feature extraction for high-dimensional data

https://doi.org/10.1016/j.neucom.2015.07.076Get rights and content

Abstract

For feature extraction, the dimensionality of the feature space is usually much larger than the size of training set. This is known as under sample problem. At this time, local structure is more important than global structure. In this paper, locality sensitive batch feature extraction (LSBFE) is derived based on a new gradient optimization model by exploiting both local and global discriminant structure of data manifold. With the proposed LSBFE, a set of features can be extracted simultaneously. Recognition rate is improved compared with batch feature extraction (BFE), which only considers global information. It is shown that the proposed method achieves good performance for face databases, handwritten digit database, object database and DBWorld data set.

Introduction

Discriminative dimension reduction is a hot research topic in the field of pattern recognition since high-dimensional data in the feature space is not suitable for the subsequent work of classification due to dimension disaster. By extracting features in low dimension, it preserves well the discriminative information and then can separate different classes more accurately and efficiently. It has been widely applied in various areas, such as image and text retrieval [1], [2], bioinformatics [3], biometrics [4], and signal processing [5].

For feature extraction, the dimensionality of the feature space is usually much larger than the size of training set. This is known as under sample problem (USP) [6], [7]. A conventional classifier, for example, linear discriminant analysis (LDA) [8], [9], often fails when faced by USP. One solution is to reduce the dimensionality of the feature space by using the principal components analysis (PCA) [10], [11] or the multilinear subspace analysis (MSA) [12]. Unfortunately, some discriminant information is discarded by PCA and MSA. Some other algorithms such as discriminant common vectors [13], regularized LDA [14], general tensor discriminant analysis (GTDA) [15] based on differential scatter discriminant criterion (DSDC) [8] and Fukunaga–Koontz transform (FKT) [16] just consider global discriminant information without exploiting the underlying structure of data manifold.

Recently there has been a lot of interest in geometrically motivated approaches to data analysis in high dimensional spaces, including Laplacian Eigenmaps [17], [18], locally linear embedding (LLE) [19], locality preserving projections (LPP) [20], local discriminant embedding (LDE) [21] and LDE-based algorithms [22]. These methods have been shown to be effective in discovering the geometrical structure of the underlying manifold. From this point of view, to reduce USP, we propose a new algorithm called locality sensitive batch feature extraction (LSBFE) by exploiting the geometry of data manifold, which can extract a set of features simultaneously. Firstly, we construct a nearest neighbor region for every sample to model the local geometrical structure of the underlying manifold and introduce local differential scatter discriminant criterion (LDSDC). Subsequently based on LDSDC, a constraint optimization problem with a new objective function is formulated. Then we transfer the optimization problem into an unconstraint optimization problem. However, solving such an unconstraint optimization problem is still a challenging task. To overcome this, an idea of by applying the gradient method on two unknown matrices respectively and iteratively is initiated. This enables us to propose an algorithm that ensures the objective function converges and an optimal projection matrix is obtained.

Laplacian Eigenmaps and LLE can be considered as the proper baselines which incorporate local structure of data manifold. They are closely related and the optimization problem as stated by LLE can be reformulated as trying to find the eigenvectors of graph Laplacian L2 [18]. The nonlinear property in these algorithms is more general compared with linear algorithms, yielding impressive results. Based on the characteristics of locality-preserving, Laplacian Eigenmaps is relatively robust to outliers and noise and exhibits stability with respect to the embedding. The optimization of LLE avoids local minimum problem and LLE can also be extended in classification [23]. However, experiments show that LLE is an effective method for visualization but may not seem to be very useful in classification [24]. For small numbers of features, LLE outperforms PCA. As the number of dimensions increases, it starts overfitting and cannot extract further information while PCA continues to improve the performance since less and less information is discarded [23], [24]. Later by taking label information into account, supervised LLE (SLLE) is proposed [25], [26]. It is worthwhile to highlight the differences of our proposed algorithm LSBFE from Laplacian Eigenmaps and LLE here. Firstly, by exploiting local information, Laplacian Eigenmaps and LLE are to construct local graph while LSBFE is to reconstruct scatter matrices and reduce under sample problem. Secondly, Laplacian Eigenmaps or LLE seeks to preserve the intrinsic geometry of the data and local structure after dimension reduction while LSBFE preserves the discriminant information by attempting to maximize between-class scatter and minimize within-class scatter for classification. Finally, LSBFE is a linear method for dimension reduction while Laplacian Eigenmaps and LLE are nonlinear algorithms.

The main advantage of our algorithm is that LSBFE can extract a set of features simultaneously and capture both local and global structures of the data sets, resulting in clear improvement of recognition rate over the results of batch feature extraction (BFE), which is based on differential scatter discriminant criterion (DSDC) and only takes account of global information. Thus LSBFE is more efficient and convenient than BFE for feature extraction. The main contributions of this paper are summarized as follows:

  • 1.

    By constructing a nearest neighbor region for every training sample, class scatter matrices are redefined. Through this process, LDSDC is proposed and USP is reduced.

  • 2.

    The problem is solved by exploring a new methodology from the point of view of gradient optimization. This enables us to propose a new algorithm in a different way from conventional approaches such as LDA and PCA based on eigenvalue decomposition of matrices.

  • 3.

    An alternating optimization procedure is designed. It is illustrated that the procedure enables the objective function to converge, resulting in that the feature space spanned by a corresponding projection matrix is optimal.

The rest paper is organized as follows. In Section 2 we briefly introduce the model of LDSDC. The optimization procedure of the proposed algorithm is given in Section 3. The algorithm of LSBFE and related analyses are presented in Section 4. Experimental results on different data sets are reported and analyzed in Section 5. Paper is concluded in Section 6.

Section snippets

Problem formulation

LDA, as one of the prototypical method, has been widely applied to feature extraction and dimension reduction due to its effectiveness and simplicity. The aim of LDA is to find a projection matrix, which can separate different classes well in a low-dimensional subspace. The subspace is spanned by a set of vectors wi, 1im, which form the projection matrix W. Therefore, an optimal problem is formulated as determining a projection matrix so that the ratio between the trace of between-class

Optimization procedure of LSBFE

In this section, we first formulate our problem and then transform it into two sub-models. Finally we propose a new iterative algorithm to obtain an optimal projection matrix.

LSBFE algorithm and related analysis

In this section, we propose an alternatively iterative algorithm, called LSBFE algorithm, to solve the optimization problem (10) based on two sub-models given in (11), (14), as summarized in Table 1. After the objective function converges to a single point, we obtain a converged subspace matrix W. Then we project the samples into a low-dimensional subspace by Y=WTX. Clearly, the proposed algorithm has a two-layer structure, namely the input layer and the output layer. We now analyze the

Simulation and discussion

In this section, we use two face databases, one handwritten digit database, one object database and one email database to evaluate the performance of our algorithm LSBFE by comparing it with LDA, PCA, SLLE, GLDA-TRA [35] and BFE. With respect to recognition, we employ 1-NN for subsequent classification for all databases in the experiments. In addition, in our experiments, SW˜ is regularized by adding a very small number to the diagonal to make it full rank (10−5 of the trace in our

Conclusion

In this paper, a feature extraction algorithm from locality sensitive batch feature extraction (LSBFE) has been proposed. LSBFE, by discovering the local geometrical structure of the data manifold, can extract a set of features simultaneously and deal with under sample problem of high dimensional data. Besides, LSBFE is proposed based on gradient method instead of traditional eigenvalue decomposition method. It is shown that the algorithm ensures the convergence of iterative process. The new

Jie Ding received her B.Eng. degree from Harbin Engineering University, China, in July 2012. She is currently a Ph.D. candidate in School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. Her research interests include machine learning, pattern recognition and convex optimization.

References (37)

  • G. Li et al.

    Error tolerance based support vector machine for regression

    Neurocomputing

    (2011)
  • N. Qian

    On the momentum term in gradient descent learning algorithms

    Neural Netw.

    (1999)
  • W. Bian et al.

    Biased discriminant Euclidean embedding for content-based image retrieval

    IEEE Trans. Image Process.

    (2010)
  • J. Ye et al.

    An optimization criterion for generalized discriminant analysis on undersampled problems

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • S. Dudoit et al.

    Comparison of discrimination methods for the classification of tumors using gene expression data

    J. Am. Stat. Assoc.

    (2002)
  • T.-K. Kim et al.

    Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • G. Potamianos, H.P. Graf. Linear discriminant analysis for speech reading, in: IEEE Second Workshop on Multimedia...
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfacesrecognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • W. Krzanowski et al.

    Discriminant analysis with singular covariance matricesmethods and applications to spectroscopic data

    Appl. Stat.

    (1995)
  • K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, USA,...
  • A.M. Martínez et al.

    Pca versus lda

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • J. Han, B. Bhanu, Statistical feature fusion for gait-based human recognition, in: Proceedings of the 2004 IEEE...
  • C. Liu, H. Wechsler, Enhanced fisher linear discriminant models for face recognition, in: Proceedings of the Fourteenth...
  • M.A.O. Vasilescu, D. Terzopoulos, Multilinear subspace analysis of image ensembles, in: Proceedings of the 2003 IEEE...
  • H. Cevikalp et al.

    Discriminative common vectors for face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • J.H. Friedman

    Regularized discriminant analysis

    J. Am. Stat. Assoc.

    (1989)
  • D. Tao et al.

    General tensor discriminant analysis and gabor features for gait recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • S. Zhang et al.

    Discriminant subspace analysisa fukunaga-koontz approach

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • Cited by (0)

    Jie Ding received her B.Eng. degree from Harbin Engineering University, China, in July 2012. She is currently a Ph.D. candidate in School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. Her research interests include machine learning, pattern recognition and convex optimization.

    Changyun Wen received B.Eng. degree from Xi׳an Jiaotong University, China in 1983 and Ph.D. degree from the University of Newcastle, Australia in 1990. From August 1989 to August 1991, he was a Postdoctoral Fellow at University of Adelaide. Since August 1991, he has been with School of EEE, Nanyang Technological University, where he is currently a Full Professor. His main research activities are control systems and applications, intelligent power management system, smart grids, model based online learning and system identification.

    He is an Associate Editor of a number of journals including Automatica, IEEE Transactions on Industrial Electronics and IEEE Control Systems Magazine. He is the Executive Editor-in-Chief, Journal of Control and Decision. He served the IEEE Transactions on Automatic Control as an Associate Editor from January 2000 to December 2002. He has been actively involved in organizing international conferences playing the roles of General Chair, General Co-Chair, Technical Program Committee Chair, Program Committee Member, General Advisor, Publicity Chair and so on. He received the IES Prestigious Engineering Achievement Award 2005 from the Institution of Engineers, Singapore (IES) in 2005.

    He is a Fellow of IEEE, a member of IEEE Fellow Committee from 2011 to 2013 and a Distinguished Lecturer of IEEE Control Systems Society from February 2010 to February 2013.

    Guoqi Li received the B.Eng. degree and M.Eng. degree from Xi׳an University of Technology and Xi׳an Jiaotong University, PR China, in 2004 and 2007, respectively, and Ph.D. degree from Nanyang Technological University, Singapore in 2011.

    He was a Scientist with Data Storage Institute and Institute of High Performance Computing, Agency for Science, Technology and Research (A⁎STAR), Singapore, from September 2011 to March 2014. Since March 2014, he has been an Assistant Professor with the Department of Precision Instrument, Tsinghua University, PR China. His current research interests include brain inspired computing, complex systems, neuromorphic computing, machine learning and system identification.

    Dr. Li has published more than 30 journal and conference papers. He services as a reviewer for a number of international journals and has also been actively involved in professional services such as serving as an International Technical Program Committee member, and a Track Chair for international conferences.

    Chin Seng Chua graduated with B.Eng (Honours) (1987–1990) from Nanyang Technological University and Ph.D. (1992–1995) in Computer Vision from Monash University, Australia. His industrial experiences include working with Hewlett Packard Singapore (1986–1987, 1990–1992) and with Defence Science Organisation (1995–1997). Two MINDEF projects were completed during his stay with DSO. Dr. Chua joined the University as an academic staff in 1997 and since then he has been involved in several projects in Computer Vision. His research interests include video tracking and surveillance, activity/gait/face recognition and motion estimation.

    View full text