Skip to main content
Log in

A review of instance selection methods

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66

    Google Scholar 

  • Bezdek JC, Kuncheva LI (2001) Nearest prototype classifier designs: an experimental study. Int J Hybrid Intell Syst 16(12): 1445–1473

    Article  MATH  Google Scholar 

  • Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2): 153–172

    Article  MATH  MathSciNet  Google Scholar 

  • Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97: 245–271

    Article  MATH  MathSciNet  Google Scholar 

  • Caises Y, González A, Leyva E, Pérez R (2009) SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains. In: Corchado E, Yin H (eds) IDEAL 2009, LNCS 5788. Burgos, Spain, pp 17–24

    Google Scholar 

  • Cano JR, Herrera F, Lozano M (2005) Stratification for scaling up evolutionary prototype selection. Pattern Recognit Lett 26: 953–963

    Article  Google Scholar 

  • Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6): 561–575

    Article  Google Scholar 

  • Cerverón V, Ferri FJ (2001) Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbour rule. IEEE Trans Syst Man Cybern B 31(3): 408–413

    Article  Google Scholar 

  • Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, Hong-Kong, pp 556–559

  • Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13: 21–27

    Article  MATH  Google Scholar 

  • De Haro-García A, García-Pedrajas N (2009) A divide-and-conquer approach for scaling up instance sele ction algorithm. Data Min Knowl Discov 18: 392–418

    Article  Google Scholar 

  • Devijver PA, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the 5th international conference on pattern recognition. Los Alamitos, CA, pp 72–80

  • Friedman JH, Bentley JL, Finkel RA (1997) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3): 209–226

    Article  Google Scholar 

  • Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11: 353–363

    Article  Google Scholar 

  • García S, Cano JR, Herera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit 41: 2693–2709

    Article  MATH  Google Scholar 

  • Glover F (1986) The general employee scheduling problem: an integration of management science and artificial intelligence. Comput Oper Res 13(4): 563–593

    Article  MathSciNet  Google Scholar 

  • Grochowski M, Jankowski N et al (2004) Comparison of instance selection algorithms II. In: Results , comments. Rutkowski L (eds) ICAISC 2004, LNAI. Zacopane, Poland, pp 580–585

    Google Scholar 

  • Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14: 515–516

    Article  Google Scholar 

  • Ke-Ping Z, Shui-Geng Z, Ji-Hong G, Ao-Ying A (2003) C-Pruner: An improved instance pruning algorithm. In: Proceedings of 2nd IEEE international conference on machine learning and cybernetics, vol 1. pp 94–99

  • Kittler J (1986) Feature selection and extraction. In: Young TY, Fu KS (eds) Handbook of pattern recognition and image processing. Academic Press, New York, pp 203–217

    Google Scholar 

  • Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16: 809–814

    Article  Google Scholar 

  • Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recognit 30: 1041–1049

    Article  Google Scholar 

  • Kuncheva LI, Bezdek JC (1998) Nearest prototype classification, clustering, genetic algorithms, or random search?. IEEE Trans Syst Man Cybern C 28(1): 160–164

    Article  Google Scholar 

  • Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6: 115–130

    Article  MathSciNet  Google Scholar 

  • Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recognit 39: 495–497

    Article  MATH  Google Scholar 

  • Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recognit 35: 2771–2782

    Article  MATH  Google Scholar 

  • Narayan BL, Murthy CA, Pal SK (2006) Maxdiff kd-trees for data condensation. Pattern Recognit Lett 27: 187–200

    Article  Google Scholar 

  • Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2005) Sequential search for decremental edition. In: Gallagher M, Hogan J, Maire F (eds) LNCS 3578: IDEAL 2005. Queensland, Australia, pp 280–285

    Google Scholar 

  • Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA (2007a) Restricted sequential floating search applied to object selection. In: Perner P (eds) MLDM 2007:LNAI 4571. Leipzig, Germany, pp 694–702

    Google Scholar 

  • Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF et al (2007) Object selection based on clustering and border objects. In: Kurzynski M (eds) Computer recognition systems 2, ASC 45. Wroclaw, Poland, pp 27–34

    Chapter  Google Scholar 

  • Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2008) Prototype selection via prototype relevance. In: Ruiz-Shulcloper J, Kropatsch WG (eds) CIARP 2008, LNCS 5197. Habana, Cuba, pp 153–160

    Google Scholar 

  • Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Kittler J (2009) Prototype selection based on sequeintial search. Intell Data Anal 13(4): 599–631

    Google Scholar 

  • Paredes R, Vidal E (2000) Weighting prototypes. A new editing approach. In: Proceedings of the international conference on pattern recognition ICPR, vol. 2. pp 25–28

  • Pudil P, Ferri FJ, Novovicová J, Kittler J (1994) Floating search methods for feature selection with nonmonotonic criterion functions. In: Proceedings of the 12th international conference on pattern recognition. IEEE Computer Society Press, pp 279–283

  • Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognit Lett 26(10): 1554–1567

    Article  Google Scholar 

  • Ritter GL, Woodruff HB, Lowry SR, Isenhour TL (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6): 665–669

    Article  MATH  Google Scholar 

  • Riquelme JC, Aguilar-Ruíz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recognit 36: 1009–1018

    Article  Google Scholar 

  • Srisawat A, Phienthrakul T, Kijsirikul B (2006) SV-kNNC: an algorithm for improving the efficency of k-Nearest neighbr. In: Yang Q, Webb G (eds) PRICAI 2006:LNAI 4099. Guilin, China, pp 975–979

    Google Scholar 

  • Spillmann B, Neuhaus M, Bunke H, Pȩkalska E, Duin RPW (2006) Transforming strings to vector spaces using prototype selection. In: Yeung D-Y et al (eds) SSPR&SPR 2006, LNCS 4109. Hong-Kong, pp. 287–296

  • Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6-6: 448–452

    MathSciNet  Google Scholar 

  • Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  • Vázquez F, Sánchez S, Pla F et al (2005) A stochastic approach to Wilson’s editing algorithm. In: Marques JS (eds) IbPRIA 2005, LNCS 3523. Estoril, Portugal, pp 35–42

    Google Scholar 

  • Venmann CJ, Reinders MJT (2005) The nearest sub-class classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Anal Mach Intell 27(9): 1417–1429

    Article  Google Scholar 

  • Venmann CJ, Reinders MJT, Backer E (2002) A maximum variance clustering algorithm. IEEE Trans Pattern Anal Mach Intell 24(9): 1273–1280

    Article  Google Scholar 

  • Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2: 408–421

    Article  MATH  Google Scholar 

  • Wilson DR, Martínez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286

    Article  MATH  Google Scholar 

  • Yuangui L, Zhonhui H, Yunze C, Weidong Z et al (2005) Support vector based prototype selection method for nearest neighbor rules. In: Wang L (eds) ICNC 2005, LNCS 3610. Changsha, China, pp 528–535

    Google Scholar 

  • Zhang H, Sun G (2002) Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit 35: 1481–1490

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Arturo Olvera-López.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. et al. A review of instance selection methods. Artif Intell Rev 34, 133–143 (2010). https://doi.org/10.1007/s10462-010-9165-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-010-9165-y

Keywords

Navigation