Tensor linear Laplacian discrimination (TLLD) for feature extraction

doi:10.1016/j.patcog.2009.01.010

Pattern Recognition

Volume 42, Issue 9, September 2009, Pages 1941-1948

https://doi.org/10.1016/j.patcog.2009.01.010 Get rights and content

Abstract

Discriminant feature extraction plays a central role in pattern recognition and classification. In this paper, we propose the tensor linear Laplacian discrimination (TLLD) algorithm for extracting discriminant features from tensor data. TLLD is an extension of linear discriminant analysis (LDA) and linear Laplacian discrimination (LLD) in directions of both nonlinear subspace learning and tensor representation. Based on the contextual distance, the weights for the within-class scatters and the between-class scatter can be determined to capture the principal structure of data clusters. This makes TLLD free from the metric of the sample space, which may not be known. Moreover, unlike LLD, the parameter tuning of TLLD is very easy. Experimental results on face recognition, texture classification and handwritten digit recognition show that TLLD is effective in extracting discriminative features.

Introduction

Discriminant feature extraction is an important topic in pattern recognition and classification. Principal component analysis (PCA) and linear discriminant analysis (LDA) are two traditional algorithms for linear discriminant feature extraction. Both methods involve scatters computed in the Euclidean metric, i.e., the underlying assumption is that the sample space is Euclidean. Both PCA and LDA have found wide application in pattern recognition and computer vision. For example, they are known as the famous Eigenfaces method and Fisherfaces method in face recognition [2], respectively. And many variants of LDA have shown good performance in various applications [9], [12], [20], [21], [22], [27]. As the data manifold may not be linear, some nonlinear discriminant feature extraction algorithms, e.g., locality preserving projections (LPP) [8] and linear Laplacian discrimination (LLD) [31], have recently been developed. In addition, the kernel trick [15] is also widely applied to extend linear feature extraction algorithms to nonlinear ones by performing linear operations in a higher or even infinite dimensional space transformed by a kernel mapping function.

It is worth noting that most of the existing discriminant analysis methods are vector based, i.e., the input data are always (re)arranged in a vector form regardless of the inherent correlation among different dimensions. In practice, vector-based methods have been found to have some intrinsic problems [26]: singularity of within-class scatter matrices, limited available projection directions and high computational cost. Much work has been done to deal with these problems [20], [21], [22], [4], [5]. Recently, several tensor-based methods have been proposed as alternatives to overcome these drawbacks. Tensor-based methods respect the dimensional structure of data, hence can extract better discriminant features robustly. They perform well particularly when the number of samples is relatively small, a case in which vector-based methods often suffer the singularity problem. Along this line, Ye et al.'s 2DLDA [29] and Yan et al.'s DATER [26] are the tensor extensions of the popular vector-based LDA algorithm. And tensor LPP [6], [7] is an extension of LPP, also preserving local neighbor structures of tensor samples. All these methods work in tensor spaces with Euclidean metrics if metrics are to be used.

Despite the success of various subspace learning algorithms, we notice that almost all of them rely on the Euclidean assumption on the data space when computing the distance between samples, unless the appropriate metric for the data space is known, e.g., KL divergence or $χ^{2}$ distance are suitable for histogram-based data. Distance metric learning attempts to learn metrics from data. However, it has mainly focused on finding a linear distance metric that optimizes the data compactness and separability in a global sense [23], [24], [28]. It is computationally expensive when treating high-dimensional data, and no current nonlinear dimensionality reduction approaches can learn an explicit nonlinear metric [28]. Approximated geodesic distance [18], which attempts to estimate the distances among samples, could help alleviate, but also not resolve, the issue of metrics. For example, a slenderly distributed cluster can have large geodesic distance between the samples, which makes distance-based cluster analysis error-prone. Actually, what is more important is the structure of the data, rather than the absolute distance between the data samples.

From the above observations, we propose the tensor linear Laplacian discrimination (TLLD) method for nonlinear feature extraction from tensor data. TLLD could be viewed as an extension of both LDA and LLD [31] in directions of nonlinearity and tensor representation. LLD has shown its superiority of feature extraction in nonlinear spaces [31], but it still has all the abovementioned drawbacks of vector-based methods because it has the same number of available projection directions and the same null spaces of the within-class scatter matrices as LDA (the proof is in Appendix A). And although LLD has aimed at removing the metric assumption by introducing weights to the scatter matrices, nonetheless the weights are still defined as a function of the distance in the sample space. Therefore, LLD still needs the a priori assumption on the metric of the sample space. To further reduce the dependence on the metric of the sample space, TLLD computes the weights based on the contextual distances instead, which are measured by the contribution to the structure of data in the sample space. This idea is inspired by the recent work on structural perception of data [13], [30]. In order to match the tensor nature of data, we further extend the vector-based coding length [13], [30] to tensor coding length as the contextual set [30] descriptor. Another advantage of using contextual-distance-based weights is that tuning the time variable in the weights now becomes very easy by rescaling. In short, TLLD handles two kinds of structure in the sample data, the tensor structure within each individual sample and the distributional structure across all samples, in a unified way.

The rest of this paper is organized as follows. We first present TLLD in Section 2, then discuss the choice of the weights for scatter matrices in Section 3. The experimental results are presented in Section 4 and Section 5 concludes our paper.

Section snippets

Tensor linear Laplacian discrimination

In this section, we first give definitions of some basic tensor operations. Then we present the formulation of TLLD.

Definition of weights

In this section, we discuss how to choose the weights $w_{i}$ and $w^{s}$ so as to make the TLLD algorithm complete.

Motivated by LLD [31] and Laplacian Eigenmap [3], we define the weights in the following forms: $\{\begin{matrix} w_{i} = \exp (- \frac{d^{2} (X_{i}, Ω_{s_{i}})}{t}), & i = 1, 2, \dots, N, \\ w^{s} = \exp (- \frac{d^{2} (Ω_{s}, Ω)}{t}), & s = 1, 2, \dots, c, \end{matrix}$ where $d (\cdot, \cdot)$ is some distance, $t$ is the time variable and $s_{i}$ is the class label of $X_{i}$ .

In LLD, the weights are simply related to the distances to the centroids, using the metric of the sample space: $\{\begin{matrix} w_{i} = \exp (- \frac{∥ X_{i} - {\bar{X}}^{s_{i}} ∥_{S}^{2}}{t}), & i = 1, 2, \dots, N, \\ w^{s} = \exp (- ∥ {\bar{X}}^{s} - \bar{X} ∥_{S}) \end{matrix}$

Experimental results

To evaluate our TLLD algorithm, we perform experiments on facial databases (FRGC version 2 [17] and CMU PIE¹), texture database (USC SIPI from the Brodatz album²) and handwritten digit database (MNIST³). We compare TLLD with PCA, LDA, LLD, TLDA (DATER [26]) and tensor LPP [7].⁴

Conclusions

In this paper, a novel algorithm named TLLD is proposed for extracting discriminative features from tensor data. Contextual-distance-based weighting mechanism enables TLLD to work effectively without assuming an a priori metric for the tensor space. Experiments on different tasks have proven the superiority of TLLD, including higher discriminative power, metric independence, and easy parameter tuning.

As the features extracted by TLLD are also tensors, we expect that the recognition results

Acknowledgment

The first author would like to thank Deli Zhao for valuable discussions.

About the Author—WEI ZHANG received the Bachelor degree from Tsinghua University in 2007. He is currently an MPhil-PhD stream student in the Department of Information Engineering, the Chinese University of Hong Kong. His research interests include pattern recognition and statistical learning.

References (31)

S. Arivazhagan et al.
Texture classification using Gabor wavelets based rotation invariant features
Pattern Recognition Letters
(2006)
P.N. Belhumeur et al.
Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997)
M. Belkin et al.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Computation
(2003)
J. Friedman
Regularized discriminant analysis
Journal of American Statistical Association
(1989)
T. Hastie et al.
Penalized discriminant analysis
The Annals of Statistics
(1995)
X. He, D. Cai, H. Liu, J. Han, Image clustering with tensor representation, in: MULTIMEDIA’05: Proceedings of the 13th...
X. He et al.
Tensor subspace analysis
X. He et al.
Locality preserving projections
P. Howland et al.
Generalizing discriminant analysis using the generalized singular value decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2004)
L.D. Lathauwer et al.
A multilinear singular value decomposition
SIAM Journal on Matrix Analysis and Applications
(2000)

X. Li et al.

Discriminant locally linear embedding with high-order tensor data

IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics

(2008)

C. Liu

Capitalize on dimensionality increasing techniques for improving face recognition grand challenge performance

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2006)

Y. Ma et al.

Segmentation of multivariate mixed data via lossy data coding and compression

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2007)

B.S. Manjunath et al.

Texture features for browsing and retrieval of image data

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1996)

K. Muller et al.

An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

(2001)

Cited by (29)

Higher order spectral regression discriminant analysis (HOSRDA): A tensor feature reduction method for ERP detection
2017, Pattern Recognition
Citation Excerpt :
Tensors are natural representations of data containing information in higher order modes. In the recent years, tensor-based signal processing [1,2] and dimensionality reduction methods [3–14] have achieved tremendous popularity for analyzing multidimensional data. Working with such data in the “flat-world” of matrices may prevent us from making full use of the information provided in each mode and also the interactions among them.
Tensors are valuable tools to represent Electroencephalogram (EEG) data. Tucker decomposition is the most used tensor decomposition in multidimensional discriminant analysis and tensor extension of Linear Discriminant Analysis (LDA), called Higher Order Discriminant Analysis (HODA), is a popular tensor discriminant method used for analyzing Event Related Potentials (ERP). In this paper, we introduce a new tensor-based feature reduction technique, named Higher Order Spectral Regression Discriminant Analysis (HOSRDA), for use in a classification framework for ERP detection. The proposed method (HOSRDA) is a tensor extension of Spectral Regression Discriminant Analysis (SRDA) and casts the eigenproblem of HODA to a regression problem. The formulation of HOSRDA can open a new framework for adding different regularization constraints in higher order feature reduction problem. Additionally, when the dimension and number of samples is very large, the regression problem can be solved via efficient iterative algorithms. We applied HOSRDA on data of a P300 speller from BCI competition III and reached average character detection accuracy of 96.5% for the two subjects. HOSRDA outperforms almost all of other reported methods on this dataset. Additionally, the results of our method are fairly comparable with those of other methods when 5 and 10 repetitions are used in the P300 speller paradigm.
Hidden discriminative features extraction for supervised high-order time series modeling
2016, Computers in Biology and Medicine
Citation Excerpt :
Compared to the conventional matrix-based methods, the Common Spatial Patterns (CSP) [30] and PCA approaches [41] obtained accuracies of only 75.25% and 72.56%, respectively. Further, the previous tensor algorithms of the tensor linear Laplacian discrimination analysis (TLLDA) [23] and local tensor discriminant analysis (LTDA) [24] that are based on discriminant analysis show accuracies of 81.25% and 87.27%, respectively. Compared to the original Tucker decomposition, the proposed method achieved better prediction with 4.81% improvement.
In this paper, an orthogonal Tucker-decomposition-based extraction of high-order discriminative subspaces from a tensor-based time series data structure is presented, named as Tensor Discriminative Feature Extraction (TDFE). TDFE relies on the employment of category information for the maximization of the between-class scatter and the minimization of the within-class scatter to extract optimal hidden discriminative feature subspaces that are simultaneously spanned by every modality for supervised tensor modeling. In this context, the proposed tensor-decomposition method provides the following benefits: i) reduces dimensionality while robustly mining the underlying discriminative features, ii) results in effective interpretable features that lead to an improved classification and visualization, and iii) reduces the processing time during the training stage and the filtering of the projection by solving the generalized eigenvalue issue at each alternation step. Two real third-order tensor-structures of time series datasets (an epilepsy electroencephalogram (EEG) that is modeled as channel×frequency bin×time frame and a microarray data that is modeled as gene×sample×time) were used for the evaluation of the TDFE. The experiment results corroborate the advantages of the proposed method with averages of 98.26% and 89.63% for the classification accuracies of the epilepsy dataset and the microarray dataset, respectively. These performance averages represent an improvement on those of the matrix-based algorithms and recent tensor-based, discriminant-decomposition approaches; this is especially the case considering the small number of samples that are used in practice.
Spatial regularization in subspace learning for face recognition: Implicit vs. explicit
2016, Neurocomputing
Citation Excerpt :
Among these methods, 2DPCA [15] and 2D-LDA [17] extract features only along the row (or the column) direction of image matrices, while GLRAM and 2DLDA can extract features along both the row and column directions of image matrices. Subsequently some researchers have further extended the subspace dimensionality reduction methods to higher order (HO) tensor data [19–21]. Until now, almost all existing vector-based subspace methods have been successively extended to their corresponding 2D or HO counterparts.
In applying traditional statistical method to face recognition, each original face image is often vectorized as a vector. But such a vectorization not only leads to high-dimensionality, thus small sample size (SSS) problem, but also loses the original spatial relationship between image pixels. It has been proved that spatial regularization (SR) is an effective means to compensate the loss of such relationship and at the same time, and mitigate SSS problem by explicitly imposing spatial constraints. However, SR still suffers from two main problems: one is high computational cost due to high dimensionality and the other is the selection of the key regularization factors controlling the spatial regularization and thus learning performance. Accordingly, in this paper, we provide a new idea, coined as implicit spatial regularization (ISR), to avoid losing the spatial relationship between image pixels and deal with SSS problem simultaneously for face recognition. Different from explicit spatial regularization (ESR), which introduces directly spatial regularization term and is based on vector representation, the proposed ISR constrains spatial smoothness within each small image region by reshaping image and then executing 2D-based feature extraction methods. Specifically, we follow the same assumption as made in SSSL (a typical ESR method) that a small image region around an image pixel is smooth, and reshape each original image into a new matrix whose each column corresponds to a vectorized small image region, and then we extract features from the newly-formed matrix using any off-the-shelf 2D-based method which can take the relationship between pixels in the same row or column into account, such that the original spatial relationship within the neighboring region can be greatly retained. Since ISR does not impose constraint items, compared with ESR, ISR not only avoids the selection of the troublesome regularization parameter, but also greatly reduces computational cost. Experimental results on four face databases show that the proposed ISR can achieve competitive performance as SSSL but with lower computational cost.
Supervised feature extraction for tensor objects based on maximization of mutual information
2013, Pattern Recognition Letters
Several supervised feature extraction methods for tensor objects have been proposed recently, with applications in recognition of objects, faces and handwritten digits. However, the existing methods usually use only second order statistics of the data, typically through calculation of the within- and between-class scatters. Here we propose a method for supervised feature extraction for tensor objects based on maximization of an approximation of mutual information. In this way we utilize information contained in the higher order statistics of the data. Several experiments show that the proposed method results in highly discriminative features.
Enhanced fisher discriminant criterion for image recognition
2012, Pattern Recognition
Many previous studies have shown that image recognition can be significantly improved by Fisher linear discriminant analysis (FLDA) technique. However, FLDA ignores the variation of data points from the same class, which characterizes the most important modes of variability of patterns and helps to improve the generalization capability of FLDA. Thus, the performance of FLDA on testing data is not good enough. In this paper, we propose an enhanced fisher discriminant criterion (EFDC). EFDC explicitly considers the intra-class variation and incorporates the intra-class variation into the Fisher discriminant criterion to build a robust and efficient dimensionality reduction function. EFDC obtains a subspace which best detects the discriminant structure and simultaneously preserves the modes of variability of patterns, which will result in stable intraclass representation. Experimental results on four image database show a clear improvement over the results of FLDA-based methods.
Orthogonal Tensor Neighborhood Preserving Embedding for facial expression recognition
2011, Pattern Recognition
Citation Excerpt :
This model is reasonable and compatible for the reasons that: (1) obviously, in the case of the first-order tensor, this model degenerates to the traditional vector subspace model; (2) many existing datum-as-is based tensor algorithms like TNPE [19] conform to this generalized tensor subspace model, which can be explained as follows. Thus, we say the general datum-as-is based tensor algorithms [18,20–25,31,32,34,35] including the TNPE conforms to our generalized tensor subspace model. To sum up, the introduced generalized tensor subspace model explicitly defines the conceptions about bases, projection and reconstruction in the high-order tensor space, it is a natural extension of the vector subspace model, and it is compatible with the existing datum-as-is dimensionality reduction algorithms.
In this paper a generalized tensor subspace model is concluded from the existing tensor dimensionality reduction algorithms. With this model, we investigate the orthogonality of the bases of the high-order tensor subspace, and propose the Orthogonal Tensor Neighborhood Preserving Embedding (OTNPE) algorithm. We evaluate the algorithm by applying it to facial expression recognition, where both the 2nd-order gray-level raw pixels and the encoded 3rd-order tensor-formed Gabor features of facial expression images are utilized. The experiments show the excellent performance of our algorithm for the dimensionality reduction of the tensor-formed data especially when they lie on some smooth and compact manifold embedded in the high dimensional tensor space.

View all citing articles on Scopus

About the Author—ZHOUCHEN LIN received the Ph.D. degree in applied mathematics from Peking University in 2000. He is currently a researcher in Visual Computing Group, Microsoft Research Asia. His research interests include computer vision, computer graphics, pattern recognition, statistical learning, document processing, and human computer interaction.

About the Author—XIAOOU TANG received the Ph.D. degree from the Massachusetts Institute of Technology, Cambridge, in 1996. He is a professor and the director of Multimedia Lab in the Department of Information Engineering, the Chinese University of Hong Kong. He is an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). His research interests include computer vision, pattern recognition, and video processing.

View full text

Tensor linear Laplacian discrimination (TLLD) for feature extraction

Abstract

Introduction

Section snippets

Tensor linear Laplacian discrimination

Definition of weights

Experimental results

Conclusions

Acknowledgment

Pattern Recognition Letters

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Computation

Regularized discriminant analysis

Journal of American Statistical Association

Penalized discriminant analysis

The Annals of Statistics

Tensor subspace analysis

Locality preserving projections

Generalizing discriminant analysis using the generalized singular value decomposition

IEEE Transactions on Pattern Analysis and Machine Intelligence

A multilinear singular value decomposition

SIAM Journal on Matrix Analysis and Applications

Discriminant locally linear embedding with high-order tensor data

IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics

Capitalize on dimensionality increasing techniques for improving face recognition grand challenge performance

IEEE Transactions on Pattern Analysis and Machine Intelligence

Segmentation of multivariate mixed data via lossy data coding and compression

IEEE Transactions on Pattern Analysis and Machine Intelligence

Texture features for browsing and retrieval of image data

IEEE Transactions on Pattern Analysis and Machine Intelligence

An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks