Non-linear dimensionality reduction techniques for unsupervised feature extraction☆
Introduction
In many classification problems, high-dimensional data are involved, because large feature vectors are generated to be able to describe complex objects and to distinguish between them. On the other hand, the amount of available data points is limited in many practical situations. For a classifier the estimation of the class probability distributions in these sparsely sampled high-dimensional data spaces is troublesome and generally affecting the liability of the obtained classification results.
To avoid these problems, the dimension of the feature space is reduced. This can be done in several ways. The easiest way is to select a limited set of features out of the total set (Devijver and Kittler, 1982). The classification performance serves as a measure for selecting the features. Some well-known feature selection techniques are forward selection and branch and bound techniques. Another way is feature extraction. Here, features are extracted as functions (linear or non-linear) of the original set of features. Unsupervised linear feature extraction techniques more or less all rely on Principal Component Analysis (PCA), which rotates the original feature space, before projecting the feature vectors onto a limited amount of axes. Supervised feature extraction techniques usually relate to the discriminant analysis technique (Fukunaga, 1990) which uses the within and between-class scatter matrices.
Already in the early days of pattern recognition several non-linear mapping techniques were developed. For example multidimensional scaling (Shepard, 1962; Kruskal, 1964) and Sammon's mapping (Sammon, 1969) are such techniques which, according to some predefined error criterion, try to map the original data space into a lower-dimensional space, hereby preserving as much as possible the local structure of the original space.
With the development of neural networks, new possibilities for non-linear mapping were created. Amongst them, Serf-Organizing Maps are probably the most well known (Kohonen, 1995). Other ways include auto-associative feedforward networks (Baldi and Hornik, 1989) and a neural network version of Sammon's mapping (Mao and Jain, 1995). Although the mentioned techniques are theoretically capable of generating a non-linear mapping into a space with arbitrary dimension, most applications were mappings to d = 2, with the purpose of visualizing the data (some recent applications are found in (Kraaijveld et al., 1995; Mao and Jain, 1995)). Recently a supervised neural network approach was presented for feature extraction for classification purposes, and compared to PCA (Lee and Landgrebe, 1997).
In this paper, a study is performed on unsupervised non-linear dimensionality reduction. Four techniques are evaluated: a multidimensional scaling algorithm, Sammon's mapping technique, Kohonen's self-organizing map and an auto-associative feedforward neural network. First we will present these four algorithms within the same formalism, based on a minimization of an error function, which will be performed by gradient-descent or higher order optimization techniques. Secondly the performance of the techniques for feature extraction is evaluated and compared to linear mapping (PCA). The performance is evaluated by examining the mapped feature space's ability to perform supervised classification tasks. Apart from an artificial data set, real world applications are studied in the field of texture analysis where high-dimensional wavelet-based feature sets from grey-level and color texture images are used. We will show that non-linear feature extraction leads to feature sets which improve classification performance compared to linear mappings.
The outline of the paper is the following. In Section 2 the four non-linear techniques are presented, and an efficient optimization scheme is presented for each of them. The experiments are conducted and a discussion on the results is given in Section 3.
Section snippets
Mapping algorithms
Let us first fix the notations used in this section:
N: number of objects in the feature space,
D: dimension of the feature space (called input space,
yi: d-dimensional feature vector representing point i in the input space,
Dij: Euclidean distance between points i and j in the input space,
d: dimension of the space of extracted features (called output space),
xt: d-dimensional vector representing point i in the output space,
dij: Euclidean distance between points i and j in the output space,
X: vector
Experiments and discussion
In the previous section, the four non-linear mapping techniques were described as minimizations of an error function. All four error functions aim at preserving topology of the original space into the map in one way or another. Instead of trying to analyse the behaviour of the mappings directly from the form of the error function, we chose to use the maps for a specific task, and compare the behaviour of the mappings by the performances of the task. The topology preserving properties of the
Conclusion
In this paper, a study is performed on unsupervised non-linear feature extraction. Four techniques were studied: a multidimensional scaling approach (MDS), Sammon's mapping (SAM), Kohonen's self-organizing map (SOM) and an auto-associative feedforward neural network (AFN). All four yield better classification results than the optimal linear approach (PCA), and therefore can be utilized as a feature extraction step in a design for classification schemes. Because of the nature of the techniques,
Acknowledgements
The first author is granted by the Flemish Institute for the promotion of Scientific and Technological Research for the Industry (IWT).
References (24)
- et al.
Floating search methods in feature selection
Pattern Recognition Letters
(1994) - et al.
Neural networks and principal component analysis: Learning from examples without local minima
Neural Networks
(1989) - et al.
Statistical Inference under Order Restrictions: The Theory and Application of Isotonic Regression
(1972) - et al.
Multidimensional Scaling
(1994) - et al.
Pattern Recognition: A Statistical Approach
(1982) - et al.
Pattern Classification and Scene Analysis
(1973) Introduction to Statistical Pattern Recognition
(1990)The Self-Organizing Map
Proc. IEEE
(1990)Self-Organizing Maps
(1995)- et al.
A nonlinear projection method based on Kohonen's topology preserving maps
IEEE Trans. Neural Networks
(1995)
Nonlinear principal component analysis using auto-associative neural networks
AIChE J.
Nonmetric multidimensional scaling: A numerical method
Psychometrika
Cited by (57)
Machine learning in medical applications: A review of state-of-the-art methods
2022, Computers in Biology and Medicine3D high-content screening of organoids for drug discovery
2017, Comprehensive Medicinal Chemistry IIIP1Dsom - A fast search algorithm for high-dimensional feature space problems
2014, International Journal of Pattern Recognition and Artificial IntelligenceDimensionality reduction for sensory datasets based on master-slave synchronization of lorenz system
2013, International Journal of Bifurcation and Chaos
- ☆
Electronic Annexes available. See http://www.elsevier.nl/locate/patrec.