Abstract
In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66
Bezdek JC, Kuncheva LI (2001) Nearest prototype classifier designs: an experimental study. Int J Hybrid Intell Syst 16(12): 1445–1473
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2): 153–172
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97: 245–271
Caises Y, González A, Leyva E, Pérez R (2009) SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains. In: Corchado E, Yin H (eds) IDEAL 2009, LNCS 5788. Burgos, Spain, pp 17–24
Cano JR, Herrera F, Lozano M (2005) Stratification for scaling up evolutionary prototype selection. Pattern Recognit Lett 26: 953–963
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6): 561–575
Cerverón V, Ferri FJ (2001) Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbour rule. IEEE Trans Syst Man Cybern B 31(3): 408–413
Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, Hong-Kong, pp 556–559
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13: 21–27
De Haro-García A, García-Pedrajas N (2009) A divide-and-conquer approach for scaling up instance sele ction algorithm. Data Min Knowl Discov 18: 392–418
Devijver PA, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the 5th international conference on pattern recognition. Los Alamitos, CA, pp 72–80
Friedman JH, Bentley JL, Finkel RA (1997) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3): 209–226
Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11: 353–363
García S, Cano JR, Herera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit 41: 2693–2709
Glover F (1986) The general employee scheduling problem: an integration of management science and artificial intelligence. Comput Oper Res 13(4): 563–593
Grochowski M, Jankowski N et al (2004) Comparison of instance selection algorithms II. In: Results , comments. Rutkowski L (eds) ICAISC 2004, LNAI. Zacopane, Poland, pp 580–585
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14: 515–516
Ke-Ping Z, Shui-Geng Z, Ji-Hong G, Ao-Ying A (2003) C-Pruner: An improved instance pruning algorithm. In: Proceedings of 2nd IEEE international conference on machine learning and cybernetics, vol 1. pp 94–99
Kittler J (1986) Feature selection and extraction. In: Young TY, Fu KS (eds) Handbook of pattern recognition and image processing. Academic Press, New York, pp 203–217
Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16: 809–814
Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recognit 30: 1041–1049
Kuncheva LI, Bezdek JC (1998) Nearest prototype classification, clustering, genetic algorithms, or random search?. IEEE Trans Syst Man Cybern C 28(1): 160–164
Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6: 115–130
Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recognit 39: 495–497
Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recognit 35: 2771–2782
Narayan BL, Murthy CA, Pal SK (2006) Maxdiff kd-trees for data condensation. Pattern Recognit Lett 27: 187–200
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2005) Sequential search for decremental edition. In: Gallagher M, Hogan J, Maire F (eds) LNCS 3578: IDEAL 2005. Queensland, Australia, pp 280–285
Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA (2007a) Restricted sequential floating search applied to object selection. In: Perner P (eds) MLDM 2007:LNAI 4571. Leipzig, Germany, pp 694–702
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF et al (2007) Object selection based on clustering and border objects. In: Kurzynski M (eds) Computer recognition systems 2, ASC 45. Wroclaw, Poland, pp 27–34
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2008) Prototype selection via prototype relevance. In: Ruiz-Shulcloper J, Kropatsch WG (eds) CIARP 2008, LNCS 5197. Habana, Cuba, pp 153–160
Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Kittler J (2009) Prototype selection based on sequeintial search. Intell Data Anal 13(4): 599–631
Paredes R, Vidal E (2000) Weighting prototypes. A new editing approach. In: Proceedings of the international conference on pattern recognition ICPR, vol. 2. pp 25–28
Pudil P, Ferri FJ, Novovicová J, Kittler J (1994) Floating search methods for feature selection with nonmonotonic criterion functions. In: Proceedings of the 12th international conference on pattern recognition. IEEE Computer Society Press, pp 279–283
Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognit Lett 26(10): 1554–1567
Ritter GL, Woodruff HB, Lowry SR, Isenhour TL (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6): 665–669
Riquelme JC, Aguilar-Ruíz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recognit 36: 1009–1018
Srisawat A, Phienthrakul T, Kijsirikul B (2006) SV-kNNC: an algorithm for improving the efficency of k-Nearest neighbr. In: Yang Q, Webb G (eds) PRICAI 2006:LNAI 4099. Guilin, China, pp 975–979
Spillmann B, Neuhaus M, Bunke H, Pȩkalska E, Duin RPW (2006) Transforming strings to vector spaces using prototype selection. In: Yeung D-Y et al (eds) SSPR&SPR 2006, LNCS 4109. Hong-Kong, pp. 287–296
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6-6: 448–452
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Vázquez F, Sánchez S, Pla F et al (2005) A stochastic approach to Wilson’s editing algorithm. In: Marques JS (eds) IbPRIA 2005, LNCS 3523. Estoril, Portugal, pp 35–42
Venmann CJ, Reinders MJT (2005) The nearest sub-class classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Anal Mach Intell 27(9): 1417–1429
Venmann CJ, Reinders MJT, Backer E (2002) A maximum variance clustering algorithm. IEEE Trans Pattern Anal Mach Intell 24(9): 1273–1280
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2: 408–421
Wilson DR, Martínez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286
Yuangui L, Zhonhui H, Yunze C, Weidong Z et al (2005) Support vector based prototype selection method for nearest neighbor rules. In: Wang L (eds) ICNC 2005, LNCS 3610. Changsha, China, pp 528–535
Zhang H, Sun G (2002) Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit 35: 1481–1490
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. et al. A review of instance selection methods. Artif Intell Rev 34, 133–143 (2010). https://doi.org/10.1007/s10462-010-9165-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-010-9165-y