Prototype reduction techniques: A comparison among different approaches

doi:10.1016/j.eswa.2011.03.070

Expert Systems with Applications

Volume 38, Issue 9, September 2011, Pages 11820-11828

https://doi.org/10.1016/j.eswa.2011.03.070 Get rights and content

Abstract

The main two drawbacks of nearest neighbor based classifiers are: high CPU costs when the number of samples in the training set is high and performance extremely sensitive to outliers. Several attempts of overcoming such drawbacks have been proposed in the pattern recognition field aimed at selecting/generating an adequate subset of prototypes from the training set.

The problem addressed in this paper concerns the comparison of methods for prototype reduction; several methods for finding a good set of prototypes are evaluated: particle swarm optimization; clustering algorithm; genetic algorithm; learning prototypes and distances. Experiments are carried out on several classification problems in order to evaluate the considered approaches in conjunction with different nearest neighbor based classifiers: 1-nearest-neighbor classifier, 5-nearest-neighbor classifier, nearest feature plane based classifier, nearest feature line based classifier.

Moreover, we propose a method for creating an ensemble of the classifiers, where each classifier is trained with a different reduced set of prototypes. Since these prototypes are generated using a supervised optimization function, we have called our ensemble: “supervised bagging”. The training phase consists in repeating N times the prototype generation, then the scores resulting from classifying a test pattern using each set of prototypes are combined by the “vote rule”. The reported results show the superiority of this method with respect to the well known bagging approach for building ensembles of classifiers.

Our results are obtained when 1-nearest-neighbor classifier is coupled with a “supervised” bagging ensemble of learning prototypes and distances. As expected, the approaches for prototype reduction proposed for 1-nearest-neighbor classifier do not work so well when other classifiers are tested. In our experiments the best method for prototype reduction when different classifiers are used is the genetic algorithm.

Highlights

► Survey of different methods for prototype reduction. ► “Supervised bagging”: an ensemble of the classifiers, where each classifier is trained with a different reduced set of prototypes. ► Solutions to the problem of the sensitivity to outliers of nearest neighbor based classifiers.

Introduction

Currently, several machine learning applications require managing extremely large data sets with the aim of data mining or classification. In many problems a general purpose classifier based on the distance from a set of prototypes, i.e. nearest neighbor (NN) classification rule, has been successfully used. The good behavior of nearest neighbor based classifiers is related to the number of prototypes, but in many practical pattern recognition applications only a small number of prototypes is usually available and, typically, this limitation causes a strong degrade of the ideal asymptotical behavior of a nearest neighbor based classifiers (Bezdek and Kuncheva, 2001, Dasarathy, 1991). Unfortunately, another strong limitation exists: the computational cost of a nearest neighbor based classifier increases with the number of prototypes. In fact, nearest neighbor based classifiers require the storage of the whole training set, which, in some cases, may be very large and may require a high computation time in the classification stage.

One possible solution to this computational problem is to reduce the number of prototypes, while simultaneously insisting that the performance on the reduced dataset are nearly as well as the performance based on the original whole set of prototypes. The idea of prototype reduction for classification purposes has been explored by many researchers and has resulted in the development of many algorithms (Bezdek and Kuncheva, 2001, Dasarathy, 1991) which are usually divided into two groups:

•
Selective, i.e. methods for prototype selection (or extraction) which concerns the identification of an optimal subset of representative objects from the original data.
•
Creative, i.e. approaches for prototype generation which involves the creation of an entirely new set of objects.

At a second level the selective approaches can be further classified as:

•
Editing: concerning the processing of the training set with the aim of increasing generalization capabilities (i.e. removing “outlier” prototypes that contribute to the misclassification rate, or patterns that are surrounded mostly by others of different classes (e. g. Devijver & Kittler, 1980).
•
Condensing: concerning the selection of a subset of the training set without changing the nearest neighbor decision boundary substantially (i.e. leaving unchanged the patterns near the decision boundary and removing the ones far from the boundary) (e.g. Huang and Chow, 2006, Sánchez, 2004).

From another perspective, the existing approaches can be classified as being either deterministic or non-deterministic, depending on whether or not the number of prototypes generated by the algorithms can be a priori fixed. An excellent survey of the field is reported in Bezdek and Kuncheva (2001) where several methods for finding prototypes are discussed: in Section 2 a brief review of some methods from both the classes is presented.

Since prototype reduction techniques can be used for reducing the computational time and for improving the performance of a nearest-neighbor based classifier, in this work an exhaustive evaluation of different methods for creating a good set of prototypes is performed by coupling them with different classification approaches: i.e. 1-nearest-neighbor classifier; 5-nearest-neighbor classifier; nearest feature line based classifier (NFL); nearest feature plane based classifier (NFP).

The prototype reduction techniques considered are: a creative approach based on particle swarm optimization (with two different initializations¹); a creative approach based on a Genetic algorithm; the well-known method named learning prototypes and distances (LPD) (with two different initializations 1).

Moreover we suggest a new method for the generation of ensembles based on multiple prototype generation: an easy way to improve the classification performance of a nearest-neighbor based classifier is to repeat, during the training phase, the prototype generation N times. Each of the resulting N sets of prototypes is used to separately classify each test pattern, finally the N scores are combined by the “vote rule”.

The main findings of this work are:

•
The creation of different edited training set is an effective way for obtaining an ensemble of classifiers.
•
LPD is well suited for standard nearest-neighbor classifiers, while the best method for prototype reduction using other nearest based classifiers (NFP, NFL) is the Genetic algorithm.

The paper is organized as follows: in Section 2 a review of existing works about prototype reduction is reported, in Section 3 the systems tested in this paper are detailed, discussing different techniques for prototype generation and the classification systems, in Section 4 the experimental results are presented, finally, in Section 5 some concluding remarks are given.

Section snippets

Creative methods

A recent approach for prototype reduction, called learning prototypes and distances (LPD) (Parades & Vidal, 2006), is based on the search of a reduced set of prototypes and a suitable local metric for these prototypes. Starting with an initial random selection of a small number of prototypes, LPD iteratively adjusts their position and their local-metric according to a rule that minimizes a suitable estimation of the classification error probability. Parades and Vidal (2006) show that LPD

Proposed system

In this section the proposed ensemble based on the perturbation of the training patterns is described: each classifier of the ensemble is trained by a different set of prototypes obtained from the original training set by iterating a prototype reduction technique. The schema of the whole approach is shown in Fig. 1, Fig. 2, where the training and the testing phase are outlined. The components of the system are detailed in the following subsections.

Experiments

We perform experiments in order to: (i) compare the different classifiers; (ii) compare the different approaches for prototype reduction; (iii) compare the classification performance of our best method with other nearest neighbor based classifiers.

The experiments have been conducted on nine benchmark datasets, eight are from the UCI Repository⁴ (Ionosphere (IO), Heart (HE), Pima Indians Diabetes (PI), Wisconsin Breast Cancer Databases (BR), Cardiac

Conclusion

The problem addressed in this paper is to compare several prototype reduction techniques for nearest-neighbor based classifiers. Several classification approaches and prototype reduction techniques are coupled and their behaviors are compared.

Our tests have shown that the prototype reduction techniques are a useful method for building an ensemble of classifiers; moreover, even if the number of retained patterns of the “supervised bagging” ensemble is evidently higher than a standalone approach

References (36)

Q-B. Gao et al.
Center-based nearest neighbour classifier
Pattern Recognition
(2007)
S. García et al.
A Memetic algorithm for evolutionary prototype selection: A scaling up approach
Pattern Recognition
(2008)
L.I. Kuncheva
Editing for the k-nearest neighbors rule by a genetic algorithm
Pattern Recognition Letters
(1995)
L. Nanni et al.
Genetic nearest feature plane
Expert Systems with Applications
(2009)
L. Nanni et al.
Particle swarm optimization for prototype reduction
NeuroComputing
(2009)
J.C. Riquelme et al.
Finding representative patterns with ordered projections
Pattern Recognition
(2003)
J.S. Sánchez
High training set size reduction by space partitioning and prototype abstraction
Pattern Recognition
(2004)
W. Zheng et al.
Locally nearest neighbor classifiers for pattern classification
Pattern Recognition
(2004)
Barandela, R., & Gasca, E. (2000). Decontamination of training Samples for supervised pattern recognition methods. In...
J.C. Bezdek
Pattern recognition with fuzzy objective function algorithms
(1981)

J.C. Bezdek et al.

Nearest prototype classifier designs: An experimental study

International Journal of Intelligent Systems

(2001)

L. Breiman

Bagging predictors

Machine Learning

(1996)

J.T. Chien et al.

Discriminant wavelet faces and nearest feature classifiers for face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2002)

BV. Dasarathy

Nearest neighbor (NN) norms: NN pattern classification techniques

(1991)

B.V. Dasarathy et al.

Nearest neighbour editing and condensing tools – Synergy exploitation

Pattern Analysis and Applications

(2000)

Devijver, P. A., & Kittler, J. (1980). On the edited nearest neighbor rule. In Proceedings of the 5th international...

M. Figueiredo et al.

Unsupervised learning of finite mixture models

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2002)

P. Hart

The condensed NN rule

IEEE Transactions on Information Theory

(1968)

Cited by (34)

Multilabel Prototype Generation for data reduction in K-Nearest Neighbour classification
2023, Pattern Recognition
Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$ -Nearest Neighbour ( $k$ NN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addressed the proposal of PG methods for the multilabel space. In this regard, this work presents the novel adaptation of four multiclass PG strategies to the multilabel case. These proposals are evaluated with three multilabel $k$ NN-based classifiers, 12 corpora comprising a varied range of domains and corpus sizes, and different noise scenarios artificially induced in the data. The results obtained show that the proposed adaptations are capable of significantly improving—both in terms of efficiency and classification performance—the only reference multilabel PG work in the literature as well as the case in which no PG method is applied, also presenting statistically superior robustness in noisy scenarios. Moreover, these novel PG strategies allow prioritising either the efficiency or efficacy criteria through its configuration depending on the target scenario, hence covering a wide area in the solution space not previously filled by other works.
Efficient k-nearest neighbor search based on clustering and adaptive k values
2022, Pattern Recognition
Citation Excerpt :
Their main disadvantages are that they focus solely on searching for the closest nearest neighbor and that they become inefficient with large datasets. Regarding the DR paradigm, this family is also subdivided into two general approaches: (i) Prototype Generation, which studies the creation of new artificial prototypes from the original ones to subsequently replace them; and (ii) Prototype Selection, which aims at selecting a representative subset of the original prototypes, being the rest of them discarded [36]. Being the so-called Condensed Nearest Neighbor (CoNN) [37] one of the first proposals in DR, a large variety of methods may now be found in literature under the generation [38] and selection paradigms [39].
The $k$ -Nearest Neighbor ( $k$ NN) algorithm is widely used in the supervised learning field and, particularly, in search and classification tasks, owing to its simplicity, competitive performance, and good statistical properties. However, its inherent inefficiency prevents its use in most modern applications due to the vast amount of data that the current technological evolution generates, being thus the optimization of $k$ NN-based search strategies of particular interest. This paper introduces the caKD+ algorithm, which tackles this limitation by combining the use of feature learning techniques, clustering methods, adaptive search parameters per cluster, and the use of pre-calculated K-Dimensional Tree structures, and results in a highly efficient search method. This proposal has been evaluated using 10 datasets and the results show that caKD+ significantly outperforms 16 state-of-the-art efficient search methods while still depicting such an accurate performance as the one by the exhaustive $k$ NN search.
Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning
2018, Expert Systems with Applications
In this paper, the focus is on the application of prototype selection to multi-label data sets as a preliminary stage in the learning process. There are two general strategies when designing Machine Learning algorithms that are capable of dealing with multi-label problems: data transformation and method adaptation. These strategies have been successfully applied in obtaining classifiers and regressors for multi-label learning. Here we investigate the feasibility of data transformation in obtaining prototype selection algorithms for multi-label data sets from three prototype selection algorithms for single-label. The data transformation methods used were: binary relevance, dependent binary relevance, label powerset, and random k-labelsets. The general conclusion is that the methods of prototype selection obtained using data transformation are not better than those obtained through method adaptation. Moreover, prototype selection algorithms designed for multi-label do not do an entirely satisfactory job, because, although they reduce the size of the data set, without affecting significantly the accuracy, the classifier trained with the reduced data set does not improve the accuracy of the classifier when it is trained with the whole data set.
CHI-PG: A fast prototype generation algorithm for Big Data classification problems
2018, Neurocomputing
Citation Excerpt :
There are two different approaches for PR, depending on whether prototypes are selected [14] or generated [31], or both models are combined [32]. Even though many proposals have successfully dealt with PR problems [27], the scalability of these solutions when tackling large datasets is still a major constraint. The two main reasons are time complexity and memory requirements.
The growing amount of available data has become a serious challenge to data mining and machine learning techniques. Well-known classification methods that have been widely applied so far are no longer feasible in Big Data environments. For this reason, prototype reduction techniques (both selection and generation) come up as a candidate solution to build a reduced version of the dataset that speeds up the execution of algorithms such as k-Nearest Neighbors and overcome their memory constraints. However, these solutions generally have a quadratic O(N²) time complexity and share similar limitations to those encountered in data mining and machine learning algorithms in terms of time and memory requirements. In order to overcome these limitations, we introduce a new distributed MapReduce prototype generation method called CHI-PG that provides a linear O(N) time complexity and ensures constant accuracy regardless of the degree of parallelism. This approach builds prototypes by applying a simple scheme based on the rule generation process of the Chi et al. Fuzzy Rule-Based Classification System and takes advantage of the suitability of this classifier for the MapReduce paradigm. The empirical study shows that our new approach significantly improves the execution time of a state-of-the-art distributed prototype reduction algorithm (MRPR) without decreasing (and even improving) classification accuracy and reduction rates. Moreover, CHI-PG has been shown to be a candidate solution to the time and memory constraints of k-Nearest Neighbors when tackling large-scale datasets.
Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation
2018, Pattern Recognition
Citation Excerpt :
DR comprises a subset of the Data Preprocessing strategies that aim at reducing the size of the initial training set while keeping the same recognition performance [13]. The two most common approaches are Prototype Generation and Prototype Selection [21]. The former creates new artificial data to replace the initial set while the latter simply selects certain elements from that set.
While standing as one of the most widely considered and successful supervised classification algorithms, the k-nearest Neighbor (kNN) classifier generally depicts a poor efficiency due to being an instance-based method. In this sense, Approximated Similarity Search (ASS) stands as a possible alternative to improve those efficiency issues at the expense of typically lowering the performance of the classifier. In this paper we take as initial point an ASS strategy based on clustering. We then improve its performance by solving issues related to instances located close to the cluster boundaries by enlarging their size and considering the use of Deep Neural Networks for learning a suitable representation for the classification task at issue. Results using a collection of eight different datasets show that the combined use of these two strategies entails a significant improvement in the accuracy performance, with a considerable reduction in the number of distances needed to classify a sample in comparison to the basic kNN rule.
Instance selection of linear complexity for big data
2016, Knowledge-Based Systems
Citation Excerpt :
In the scientific literature, the term “reduction techniques” includes [61]: prototype generation [32]; prototype selection [52] (when the classifier is based on kNN); and (for other classifiers) instance selection [8]. While prototype generation replaces the original instances with new artificial ones, instance selection and prototype selection attempt to find a representative subset of the initial training set that does not lessen the predictive power of the algorithms trained with such a subset [45]. In the paper, prototype generation is not addressed, however a complete review on it can be found in [57].
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets.
In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, $O (n^{2}),$ or log-linear, $O (n \log n)$ ) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).

View all citing articles on Scopus

View full text

Prototype reduction techniques: A comparison among different approaches

Abstract

Highlights

Introduction

Section snippets

Creative methods

Proposed system

Experiments

Conclusion

Pattern Recognition

Pattern Recognition

Pattern Recognition Letters

Expert Systems with Applications

NeuroComputing

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern recognition with fuzzy objective function algorithms

Nearest prototype classifier designs: An experimental study

International Journal of Intelligent Systems

Bagging predictors

Machine Learning

Discriminant wavelet faces and nearest feature classifiers for face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Nearest neighbor (NN) norms: NN pattern classification techniques

Nearest neighbour editing and condensing tools – Synergy exploitation

Pattern Analysis and Applications

Unsupervised learning of finite mixture models

IEEE Transactions on Pattern Analysis and Machine Intelligence

The condensed NN rule

IEEE Transactions on Information Theory