Elsevier

Pattern Recognition

Volume 34, Issue 2, February 2001, Pages 523-525
Pattern Recognition

Comparison of genetic algorithm based prototype selection schemes

https://doi.org/10.1016/S0031-3203(00)00094-7Get rights and content

Introduction

Prototype selection is the process of finding representative patterns from the data. Representative patterns help in reducing the data on which further operations such as data mining can be carried out. The current work discusses computation of prototypes using medoids [1], leaders [2] and distance based thresholds. After finding the initial set of prototypes, the optimal set is found by means of genetic algorithms (GAs). A comparison of stochastic search algorithms is carried out by Susheela Devi and Narasimha Murty [3]. They conclude that performance of genetic algorithms is the best among the search algorithms. Chang and Lipmann [4] suggest the use of genetic algorithms for pattern classification.

In the following sections, we discuss and compare various prototype selection methods under consideration. Comparison of results are based on nearest neighbor classifier (NNC). Subsequently, considering those prototype sets which provided good classification accuracy, GAs are used for optimal prototype selection. Based on the nature of the data characteristics a number of experiments based on GAs are carried out. A summary of results is presented.

Section snippets

Description of data

Handwritten digit data [5] is used for the comparison exercises. The training data consists of 667 patterns for each class of digits 0–9, totalling to 6670 patterns. The test data consists of 3333 patterns. While carrying out experiments using GAs, validation data is drawn from the training data itself.

Initial prototype selection

The prototypes are selected based on medoids, leaders and Euclidean distance based thresholds.

Optimal prototype sections using GAs

Each of the handwritten digit classes is subjected to a preliminary statistical analysis. This results in obtaining a range of distances among all the patterns of each class. For example, the overall range of distances in the current data is between 1 and 12 and this range may vary for each class. During prototype selection the patterns should be found such that they capture all representative patterns with least redundancy. But in each of the previously discussed methods, in the absence of

Summary and conclusions

The current study enlists three prototype selection methods and demonstrates experimentally their merits and demerits based on CA and the number of selected prototypes. Considering the set of prototypes obtained in each of the above methods, the prototype reduction is carried out using GAs. The best CA, of 92.65% is found to be better than previously reported result [5] on the data in the literature.

First page preview

First page preview
Click to open first page preview

References (5)

  • L. Kaufman, P.J. Rousseeuw, Finding Groups in Data — An Introduction to Cluster Analysis, Wiley, NY,...
  • H. Spath, Cluster Analysis — Algorithms for Data Reduction and Classification of Objects Ellis Horwood Limited, West...
There are more references available in the full text version of this article.

Cited by (58)

  • A subregion division based multi-objective evolutionary algorithm for SVM training set selection

    2020, Neurocomputing
    Citation Excerpt :

    The first group of instance selection is prototype selection, whose aim is to obtain an instance subset that allows the KNN classifier to achieve the maximum classification rate [11]. Representative algorithms of this group includes Edited Nearest Neighbor (ENN) [34], Reduced Nearest Neighbor (RNN) [34], Decremental Reduction Optimization Procedure (DROP) [35], Multi- Class Instance Selection (MCIS) [36], Generational Genetic Algorithm (GGA) [37], Steady-State Genetic Algorithm (SSGA) [38] and so on. These prototype selection algorithms have shown the effectiveness on selecting instance subsets for KNN with high quality, and in recognizing their competitiveness, some researchers considered whether the PS techniques can be extended to improve the performance of other classification methods, which yields the second group of instance selection: training set selection (TSS).

  • Speeding-up the kernel k-means clustering method: A prototype based hybrid approach

    2013, Pattern Recognition Letters
    Citation Excerpt :

    Iris and Pendigits datasets are available at the UCI machine learning repository (Murphy, 1994). Letter Image Recognition (LIR), Shuttle and OCR datasets are also used in (Babu and Murty, 2001; Ananthanarayana et al., 2001). Banana, Rings, Concentric Circles, Desert, Handwritten symbols and Gaussian datasets are artificially generated.

  • Instance-reduction method based on ant colony optimization

    2018, ACM International Conference Proceeding Series
  • Address clustering for e-commerce applications

    2018, CEUR Workshop Proceedings
View all citing articles on Scopus
View full text