Non-linear dimensionality reduction techniques for unsupervised feature extraction

doi:10.1016/S0167-8655(98)00049-X

Pattern Recognition Letters

Volume 19, Issue 8, June 1998, Pages 711-720

https://doi.org/10.1016/S0167-8655(98)00049-X Get rights and content

Abstract

Dimensionality reduction techniques have been regularly used for visualization of high-dimensional data sets. In this paper, reduction to d ≥ 2 is studied, with the purpose of feature extraction. Four different non-linear techniques are studied: multidimensional scaling, Sammon's mapping, self-organizing maps and auto-associative feedforward networks. All four techniques will be presented in the same framework of optimization. A comparison with respect to feature extraction is made by evaluating the reduced feature sets ability to perform classification tasks. The experiments involve an artificial data set and grey-level and color texture data sets. We demonstrate the usefulness of non-linear techniques compared to linear feature extraction.

Introduction

In many classification problems, high-dimensional data are involved, because large feature vectors are generated to be able to describe complex objects and to distinguish between them. On the other hand, the amount of available data points is limited in many practical situations. For a classifier the estimation of the class probability distributions in these sparsely sampled high-dimensional data spaces is troublesome and generally affecting the liability of the obtained classification results.

To avoid these problems, the dimension of the feature space is reduced. This can be done in several ways. The easiest way is to select a limited set of features out of the total set (Devijver and Kittler, 1982). The classification performance serves as a measure for selecting the features. Some well-known feature selection techniques are forward selection and branch and bound techniques. Another way is feature extraction. Here, features are extracted as functions (linear or non-linear) of the original set of features. Unsupervised linear feature extraction techniques more or less all rely on Principal Component Analysis (PCA), which rotates the original feature space, before projecting the feature vectors onto a limited amount of axes. Supervised feature extraction techniques usually relate to the discriminant analysis technique (Fukunaga, 1990) which uses the within and between-class scatter matrices.

Already in the early days of pattern recognition several non-linear mapping techniques were developed. For example multidimensional scaling (Shepard, 1962; Kruskal, 1964) and Sammon's mapping (Sammon, 1969) are such techniques which, according to some predefined error criterion, try to map the original data space into a lower-dimensional space, hereby preserving as much as possible the local structure of the original space.

With the development of neural networks, new possibilities for non-linear mapping were created. Amongst them, Serf-Organizing Maps are probably the most well known (Kohonen, 1995). Other ways include auto-associative feedforward networks (Baldi and Hornik, 1989) and a neural network version of Sammon's mapping (Mao and Jain, 1995). Although the mentioned techniques are theoretically capable of generating a non-linear mapping into a space with arbitrary dimension, most applications were mappings to d = 2, with the purpose of visualizing the data (some recent applications are found in (Kraaijveld et al., 1995; Mao and Jain, 1995)). Recently a supervised neural network approach was presented for feature extraction for classification purposes, and compared to PCA (Lee and Landgrebe, 1997).

In this paper, a study is performed on unsupervised non-linear dimensionality reduction. Four techniques are evaluated: a multidimensional scaling algorithm, Sammon's mapping technique, Kohonen's self-organizing map and an auto-associative feedforward neural network. First we will present these four algorithms within the same formalism, based on a minimization of an error function, which will be performed by gradient-descent or higher order optimization techniques. Secondly the performance of the techniques for feature extraction is evaluated and compared to linear mapping (PCA). The performance is evaluated by examining the mapped feature space's ability to perform supervised classification tasks. Apart from an artificial data set, real world applications are studied in the field of texture analysis where high-dimensional wavelet-based feature sets from grey-level and color texture images are used. We will show that non-linear feature extraction leads to feature sets which improve classification performance compared to linear mappings.

The outline of the paper is the following. In Section 2 the four non-linear techniques are presented, and an efficient optimization scheme is presented for each of them. The experiments are conducted and a discussion on the results is given in Section 3.

Section snippets

Mapping algorithms

Let us first fix the notations used in this section:

N: number of objects in the feature space,
D: dimension of the feature space (called input space,
y_i: d-dimensional feature vector representing point i in the input space,
D_ij: Euclidean distance between points i and j in the input space,
d: dimension of the space of extracted features (called output space),
x_t: d-dimensional vector representing point i in the output space,
d_ij: Euclidean distance between points i and j in the output space,
X: vector

Experiments and discussion

In the previous section, the four non-linear mapping techniques were described as minimizations of an error function. All four error functions aim at preserving topology of the original space into the map in one way or another. Instead of trying to analyse the behaviour of the mappings directly from the form of the error function, we chose to use the maps for a specific task, and compare the behaviour of the mappings by the performances of the task. The topology preserving properties of the

Conclusion

In this paper, a study is performed on unsupervised non-linear feature extraction. Four techniques were studied: a multidimensional scaling approach (MDS), Sammon's mapping (SAM), Kohonen's self-organizing map (SOM) and an auto-associative feedforward neural network (AFN). All four yield better classification results than the optimal linear approach (PCA), and therefore can be utilized as a feature extraction step in a design for classification schemes. Because of the nature of the techniques,

Acknowledgements

The first author is granted by the Flemish Institute for the promotion of Scientific and Technological Research for the Industry (IWT).

References (24)

P. Pudil et al.
Floating search methods in feature selection
Pattern Recognition Letters
(1994)
B. Baldi et al.
Neural networks and principal component analysis: Learning from examples without local minima
Neural Networks
(1989)
R.E. Barlow et al.
Statistical Inference under Order Restrictions: The Theory and Application of Isotonic Regression
(1972)
T.F. Cox et al.
Multidimensional Scaling
(1994)
P.J. Devijver et al.
Pattern Recognition: A Statistical Approach
(1982)
R.O. Duda et al.
Pattern Classification and Scene Analysis
(1973)
K. Fukunaga
Introduction to Statistical Pattern Recognition
(1990)
T. Kohonen
The Self-Organizing Map
Proc. IEEE
(1990)
T. Kohonen
Self-Organizing Maps
(1995)
M.A. Kraaijveld et al.
A nonlinear projection method based on Kohonen's topology preserving maps
IEEE Trans. Neural Networks
(1995)

M.A. Kramer

Nonlinear principal component analysis using auto-associative neural networks

AIChE J.

(1991)

J.B. Kruskal

Nonmetric multidimensional scaling: A numerical method

Psychometrika

(1964)

Cited by (57)

Feature extraction and source–load collaborative analysis method for distribution network
2023, Energy Reports
The data from new distribution system is increasingly multi-dimensional and massive. When the distribution network participates in virtual grid division, the deep mining of distribution network data is required. In this study, a feature extraction and source–load collaborative analysis method for distribution network is proposed to master the suitability between renewable energy generation and various loads. Firstly, this method realizes raw data preprocessing through anomaly identification and reconstruction. It then uses information entropy to improve traditional piecewise aggregation approximation to reduce data dimension while considering the fluctuations of data. Finally, the method extracts the features of source–load data through spectral clustering and obtains quantitative analysis results of the matching degree between source and load through collaborative analysis. The effectiveness of the proposed method is verified by load power data and photovoltaic output data.
Machine learning in medical applications: A review of state-of-the-art methods
2022, Computers in Biology and Medicine
Applications of machine learning (ML) methods have been used extensively to solve various complex challenges in recent years in various application areas, such as medical, financial, environmental, marketing, security, and industrial applications. ML methods are characterized by their ability to examine many data and discover exciting relationships, provide interpretation, and identify patterns. ML can help enhance the reliability, performance, predictability, and accuracy of diagnostic systems for many diseases. This survey provides a comprehensive review of the use of ML in the medical field highlighting standard technologies and how they affect medical diagnosis. Five major medical applications are deeply discussed, focusing on adapting the ML models to solve the problems in cancer, medical chemistry, brain, medical imaging, and wearable sensors. Finally, this survey provides valuable references and guidance for researchers, practitioners, and decision-makers framing future research and development directions.
3D high-content screening of organoids for drug discovery
2017, Comprehensive Medicinal Chemistry III
We are entering a new era of biomedical research that is driven by the demand for more effective therapeutics to prevent and treat human disease. Organoids, cultured ex vivo, are the future of this new era of biomedical research and are poised to replace preclinical 2D cell models, and in some cases animal models of human disease. Therefore, the drug discovery and development pipeline is retooling high-throughput technologies to accommodate organoids as the model of choice. In particular, the marriage of high-content screening (HCS) with organoid models for drug discovery will be a critical component in this new era of drug development. This book chapter is focused on the state-of-the-art HCS technology and how this technology is being retooled for drug discovery and development with human organoids.
P1Dsom - A fast search algorithm for high-dimensional feature space problems
2014, International Journal of Pattern Recognition and Artificial Intelligence
Dimensionality reduction for sensory datasets based on master-slave synchronization of lorenz system
2013, International Journal of Bifurcation and Chaos
Nonlinear dimensionality reduction methods for potentiometric multisensor systems data analysis
2024, Electroanalysis

View all citing articles on Scopus

^☆: Electronic Annexes available. See http://www.elsevier.nl/locate/patrec.

View full text

Non-linear dimensionality reduction techniques for unsupervised feature extraction☆

Abstract

Introduction

Section snippets

Mapping algorithms

Experiments and discussion

Conclusion

Acknowledgements

Pattern Recognition Letters

Neural networks and principal component analysis: Learning from examples without local minima

Neural Networks

Statistical Inference under Order Restrictions: The Theory and Application of Isotonic Regression

Multidimensional Scaling

Pattern Recognition: A Statistical Approach

Pattern Classification and Scene Analysis

Introduction to Statistical Pattern Recognition

The Self-Organizing Map

Proc. IEEE

Self-Organizing Maps

A nonlinear projection method based on Kohonen's topology preserving maps

IEEE Trans. Neural Networks

Nonlinear principal component analysis using auto-associative neural networks

AIChE J.

Nonmetric multidimensional scaling: A numerical method

Psychometrika