Prototype reduction techniques: A comparison among different approaches
Highlights
► Survey of different methods for prototype reduction. ► “Supervised bagging”: an ensemble of the classifiers, where each classifier is trained with a different reduced set of prototypes. ► Solutions to the problem of the sensitivity to outliers of nearest neighbor based classifiers.
Introduction
Currently, several machine learning applications require managing extremely large data sets with the aim of data mining or classification. In many problems a general purpose classifier based on the distance from a set of prototypes, i.e. nearest neighbor (NN) classification rule, has been successfully used. The good behavior of nearest neighbor based classifiers is related to the number of prototypes, but in many practical pattern recognition applications only a small number of prototypes is usually available and, typically, this limitation causes a strong degrade of the ideal asymptotical behavior of a nearest neighbor based classifiers (Bezdek and Kuncheva, 2001, Dasarathy, 1991). Unfortunately, another strong limitation exists: the computational cost of a nearest neighbor based classifier increases with the number of prototypes. In fact, nearest neighbor based classifiers require the storage of the whole training set, which, in some cases, may be very large and may require a high computation time in the classification stage.
One possible solution to this computational problem is to reduce the number of prototypes, while simultaneously insisting that the performance on the reduced dataset are nearly as well as the performance based on the original whole set of prototypes. The idea of prototype reduction for classification purposes has been explored by many researchers and has resulted in the development of many algorithms (Bezdek and Kuncheva, 2001, Dasarathy, 1991) which are usually divided into two groups:
- •
Selective, i.e. methods for prototype selection (or extraction) which concerns the identification of an optimal subset of representative objects from the original data.
- •
Creative, i.e. approaches for prototype generation which involves the creation of an entirely new set of objects.
At a second level the selective approaches can be further classified as:
- •
Editing: concerning the processing of the training set with the aim of increasing generalization capabilities (i.e. removing “outlier” prototypes that contribute to the misclassification rate, or patterns that are surrounded mostly by others of different classes (e. g. Devijver & Kittler, 1980).
- •
Condensing: concerning the selection of a subset of the training set without changing the nearest neighbor decision boundary substantially (i.e. leaving unchanged the patterns near the decision boundary and removing the ones far from the boundary) (e.g. Huang and Chow, 2006, Sánchez, 2004).
From another perspective, the existing approaches can be classified as being either deterministic or non-deterministic, depending on whether or not the number of prototypes generated by the algorithms can be a priori fixed. An excellent survey of the field is reported in Bezdek and Kuncheva (2001) where several methods for finding prototypes are discussed: in Section 2 a brief review of some methods from both the classes is presented.
Since prototype reduction techniques can be used for reducing the computational time and for improving the performance of a nearest-neighbor based classifier, in this work an exhaustive evaluation of different methods for creating a good set of prototypes is performed by coupling them with different classification approaches: i.e. 1-nearest-neighbor classifier; 5-nearest-neighbor classifier; nearest feature line based classifier (NFL); nearest feature plane based classifier (NFP).
The prototype reduction techniques considered are: a creative approach based on particle swarm optimization (with two different initializations1); a creative approach based on a Genetic algorithm; the well-known method named learning prototypes and distances (LPD) (with two different initializations 1).
Moreover we suggest a new method for the generation of ensembles based on multiple prototype generation: an easy way to improve the classification performance of a nearest-neighbor based classifier is to repeat, during the training phase, the prototype generation N times. Each of the resulting N sets of prototypes is used to separately classify each test pattern, finally the N scores are combined by the “vote rule”.
The main findings of this work are:
- •
The creation of different edited training set is an effective way for obtaining an ensemble of classifiers.
- •
LPD is well suited for standard nearest-neighbor classifiers, while the best method for prototype reduction using other nearest based classifiers (NFP, NFL) is the Genetic algorithm.
The paper is organized as follows: in Section 2 a review of existing works about prototype reduction is reported, in Section 3 the systems tested in this paper are detailed, discussing different techniques for prototype generation and the classification systems, in Section 4 the experimental results are presented, finally, in Section 5 some concluding remarks are given.
Section snippets
Creative methods
A recent approach for prototype reduction, called learning prototypes and distances (LPD) (Parades & Vidal, 2006), is based on the search of a reduced set of prototypes and a suitable local metric for these prototypes. Starting with an initial random selection of a small number of prototypes, LPD iteratively adjusts their position and their local-metric according to a rule that minimizes a suitable estimation of the classification error probability. Parades and Vidal (2006) show that LPD
Proposed system
In this section the proposed ensemble based on the perturbation of the training patterns is described: each classifier of the ensemble is trained by a different set of prototypes obtained from the original training set by iterating a prototype reduction technique. The schema of the whole approach is shown in Fig. 1, Fig. 2, where the training and the testing phase are outlined. The components of the system are detailed in the following subsections.
Experiments
We perform experiments in order to: (i) compare the different classifiers; (ii) compare the different approaches for prototype reduction; (iii) compare the classification performance of our best method with other nearest neighbor based classifiers.
The experiments have been conducted on nine benchmark datasets, eight are from the UCI Repository4 (Ionosphere (IO), Heart (HE), Pima Indians Diabetes (PI), Wisconsin Breast Cancer Databases (BR), Cardiac
Conclusion
The problem addressed in this paper is to compare several prototype reduction techniques for nearest-neighbor based classifiers. Several classification approaches and prototype reduction techniques are coupled and their behaviors are compared.
Our tests have shown that the prototype reduction techniques are a useful method for building an ensemble of classifiers; moreover, even if the number of retained patterns of the “supervised bagging” ensemble is evidently higher than a standalone approach
References (36)
- et al.
Center-based nearest neighbour classifier
Pattern Recognition
(2007) - et al.
A Memetic algorithm for evolutionary prototype selection: A scaling up approach
Pattern Recognition
(2008) Editing for the k-nearest neighbors rule by a genetic algorithm
Pattern Recognition Letters
(1995)- et al.
Genetic nearest feature plane
Expert Systems with Applications
(2009) - et al.
Particle swarm optimization for prototype reduction
NeuroComputing
(2009) - et al.
Finding representative patterns with ordered projections
Pattern Recognition
(2003) High training set size reduction by space partitioning and prototype abstraction
Pattern Recognition
(2004)- et al.
Locally nearest neighbor classifiers for pattern classification
Pattern Recognition
(2004) - Barandela, R., & Gasca, E. (2000). Decontamination of training Samples for supervised pattern recognition methods. In...
Pattern recognition with fuzzy objective function algorithms
(1981)
Nearest prototype classifier designs: An experimental study
International Journal of Intelligent Systems
Bagging predictors
Machine Learning
Discriminant wavelet faces and nearest feature classifiers for face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearest neighbor (NN) norms: NN pattern classification techniques
Nearest neighbour editing and condensing tools – Synergy exploitation
Pattern Analysis and Applications
Unsupervised learning of finite mixture models
IEEE Transactions on Pattern Analysis and Machine Intelligence
The condensed NN rule
IEEE Transactions on Information Theory
Cited by (34)
Multilabel Prototype Generation for data reduction in K-Nearest Neighbour classification
2023, Pattern RecognitionEfficient k-nearest neighbor search based on clustering and adaptive k values
2022, Pattern RecognitionCitation Excerpt :Their main disadvantages are that they focus solely on searching for the closest nearest neighbor and that they become inefficient with large datasets. Regarding the DR paradigm, this family is also subdivided into two general approaches: (i) Prototype Generation, which studies the creation of new artificial prototypes from the original ones to subsequently replace them; and (ii) Prototype Selection, which aims at selecting a representative subset of the original prototypes, being the rest of them discarded [36]. Being the so-called Condensed Nearest Neighbor (CoNN) [37] one of the first proposals in DR, a large variety of methods may now be found in literature under the generation [38] and selection paradigms [39].
Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning
2018, Expert Systems with ApplicationsCHI-PG: A fast prototype generation algorithm for Big Data classification problems
2018, NeurocomputingCitation Excerpt :There are two different approaches for PR, depending on whether prototypes are selected [14] or generated [31], or both models are combined [32]. Even though many proposals have successfully dealt with PR problems [27], the scalability of these solutions when tackling large datasets is still a major constraint. The two main reasons are time complexity and memory requirements.
Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation
2018, Pattern RecognitionCitation Excerpt :DR comprises a subset of the Data Preprocessing strategies that aim at reducing the size of the initial training set while keeping the same recognition performance [13]. The two most common approaches are Prototype Generation and Prototype Selection [21]. The former creates new artificial data to replace the initial set while the latter simply selects certain elements from that set.
Instance selection of linear complexity for big data
2016, Knowledge-Based SystemsCitation Excerpt :In the scientific literature, the term “reduction techniques” includes [61]: prototype generation [32]; prototype selection [52] (when the classifier is based on kNN); and (for other classifiers) instance selection [8]. While prototype generation replaces the original instances with new artificial ones, instance selection and prototype selection attempt to find a representative subset of the initial training set that does not lessen the predictive power of the algorithms trained with such a subset [45]. In the paper, prototype generation is not addressed, however a complete review on it can be found in [57].