A review of instance selection methods

Olvera-López, J. Arturo; Carrasco-Ochoa, J. Ariel; Martínez-Trinidad, J. Francisco; Kittler, Josef

doi:10.1007/s10462-010-9165-y

A review of instance selection methods

Published: 27 May 2010

Volume 34, pages 133–143, (2010)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

J. Arturo Olvera-López¹,
J. Ariel Carrasco-Ochoa²,
J. Francisco Martínez-Trinidad² &
…
Josef Kittler³

2187 Accesses
238 Citations
Explore all metrics

Abstract

In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66
Google Scholar
Bezdek JC, Kuncheva LI (2001) Nearest prototype classifier designs: an experimental study. Int J Hybrid Intell Syst 16(12): 1445–1473
Article MATH Google Scholar
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2): 153–172
Article MATH MathSciNet Google Scholar
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97: 245–271
Article MATH MathSciNet Google Scholar
Caises Y, González A, Leyva E, Pérez R (2009) SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains. In: Corchado E, Yin H (eds) IDEAL 2009, LNCS 5788. Burgos, Spain, pp 17–24
Google Scholar
Cano JR, Herrera F, Lozano M (2005) Stratification for scaling up evolutionary prototype selection. Pattern Recognit Lett 26: 953–963
Article Google Scholar
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6): 561–575
Article Google Scholar
Cerverón V, Ferri FJ (2001) Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbour rule. IEEE Trans Syst Man Cybern B 31(3): 408–413
Article Google Scholar
Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, Hong-Kong, pp 556–559
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13: 21–27
Article MATH Google Scholar
De Haro-García A, García-Pedrajas N (2009) A divide-and-conquer approach for scaling up instance sele ction algorithm. Data Min Knowl Discov 18: 392–418
Article Google Scholar
Devijver PA, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the 5th international conference on pattern recognition. Los Alamitos, CA, pp 72–80
Friedman JH, Bentley JL, Finkel RA (1997) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3): 209–226
Article Google Scholar
Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11: 353–363
Article Google Scholar
García S, Cano JR, Herera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit 41: 2693–2709
Article MATH Google Scholar
Glover F (1986) The general employee scheduling problem: an integration of management science and artificial intelligence. Comput Oper Res 13(4): 563–593
Article MathSciNet Google Scholar
Grochowski M, Jankowski N et al (2004) Comparison of instance selection algorithms II. In: Results , comments. Rutkowski L (eds) ICAISC 2004, LNAI. Zacopane, Poland, pp 580–585
Google Scholar
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14: 515–516
Article Google Scholar
Ke-Ping Z, Shui-Geng Z, Ji-Hong G, Ao-Ying A (2003) C-Pruner: An improved instance pruning algorithm. In: Proceedings of 2nd IEEE international conference on machine learning and cybernetics, vol 1. pp 94–99
Kittler J (1986) Feature selection and extraction. In: Young TY, Fu KS (eds) Handbook of pattern recognition and image processing. Academic Press, New York, pp 203–217
Google Scholar
Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16: 809–814
Article Google Scholar
Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recognit 30: 1041–1049
Article Google Scholar
Kuncheva LI, Bezdek JC (1998) Nearest prototype classification, clustering, genetic algorithms, or random search?. IEEE Trans Syst Man Cybern C 28(1): 160–164
Article Google Scholar
Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6: 115–130
Article MathSciNet Google Scholar
Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recognit 39: 495–497
Article MATH Google Scholar
Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recognit 35: 2771–2782
Article MATH Google Scholar
Narayan BL, Murthy CA, Pal SK (2006) Maxdiff kd-trees for data condensation. Pattern Recognit Lett 27: 187–200
Article Google Scholar
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2005) Sequential search for decremental edition. In: Gallagher M, Hogan J, Maire F (eds) LNCS 3578: IDEAL 2005. Queensland, Australia, pp 280–285
Google Scholar
Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA (2007a) Restricted sequential floating search applied to object selection. In: Perner P (eds) MLDM 2007:LNAI 4571. Leipzig, Germany, pp 694–702
Google Scholar
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF et al (2007) Object selection based on clustering and border objects. In: Kurzynski M (eds) Computer recognition systems 2, ASC 45. Wroclaw, Poland, pp 27–34
Chapter Google Scholar
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2008) Prototype selection via prototype relevance. In: Ruiz-Shulcloper J, Kropatsch WG (eds) CIARP 2008, LNCS 5197. Habana, Cuba, pp 153–160
Google Scholar
Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Kittler J (2009) Prototype selection based on sequeintial search. Intell Data Anal 13(4): 599–631
Google Scholar
Paredes R, Vidal E (2000) Weighting prototypes. A new editing approach. In: Proceedings of the international conference on pattern recognition ICPR, vol. 2. pp 25–28
Pudil P, Ferri FJ, Novovicová J, Kittler J (1994) Floating search methods for feature selection with nonmonotonic criterion functions. In: Proceedings of the 12th international conference on pattern recognition. IEEE Computer Society Press, pp 279–283
Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognit Lett 26(10): 1554–1567
Article Google Scholar
Ritter GL, Woodruff HB, Lowry SR, Isenhour TL (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6): 665–669
Article MATH Google Scholar
Riquelme JC, Aguilar-Ruíz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recognit 36: 1009–1018
Article Google Scholar
Srisawat A, Phienthrakul T, Kijsirikul B (2006) SV-kNNC: an algorithm for improving the efficency of k-Nearest neighbr. In: Yang Q, Webb G (eds) PRICAI 2006:LNAI 4099. Guilin, China, pp 975–979
Google Scholar
Spillmann B, Neuhaus M, Bunke H, Pȩkalska E, Duin RPW (2006) Transforming strings to vector spaces using prototype selection. In: Yeung D-Y et al (eds) SSPR&SPR 2006, LNCS 4109. Hong-Kong, pp. 287–296
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6-6: 448–452
MathSciNet Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Vázquez F, Sánchez S, Pla F et al (2005) A stochastic approach to Wilson’s editing algorithm. In: Marques JS (eds) IbPRIA 2005, LNCS 3523. Estoril, Portugal, pp 35–42
Google Scholar
Venmann CJ, Reinders MJT (2005) The nearest sub-class classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Anal Mach Intell 27(9): 1417–1429
Article Google Scholar
Venmann CJ, Reinders MJT, Backer E (2002) A maximum variance clustering algorithm. IEEE Trans Pattern Anal Mach Intell 24(9): 1273–1280
Article Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2: 408–421
Article MATH Google Scholar
Wilson DR, Martínez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286
Article MATH Google Scholar
Yuangui L, Zhonhui H, Yunze C, Weidong Z et al (2005) Support vector based prototype selection method for nearest neighbor rules. In: Wang L (eds) ICNC 2005, LNCS 3610. Changsha, China, pp 528–535
Google Scholar
Zhang H, Sun G (2002) Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit 35: 1481–1490
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Benemérita Universidad Autónoma de puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Ciudad Universitaria, 72570, Puebla, Mexico
J. Arturo Olvera-López
National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Luis Enrrique Erro No. 1, Sta. María Tonantzintla, 72000, Puebla, Mexico
J. Ariel Carrasco-Ochoa & J. Francisco Martínez-Trinidad
University of Surrey, Center for Vision, Speech and Signal Processing, Guilford, GU2 7XH, UK
Josef Kittler

Authors

J. Arturo Olvera-López
View author publications
You can also search for this author in PubMed Google Scholar
J. Ariel Carrasco-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
J. Francisco Martínez-Trinidad
View author publications
You can also search for this author in PubMed Google Scholar
Josef Kittler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Arturo Olvera-López.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. et al. A review of instance selection methods. Artif Intell Rev 34, 133–143 (2010). https://doi.org/10.1007/s10462-010-9165-y

Download citation

Published: 27 May 2010
Issue Date: August 2010
DOI: https://doi.org/10.1007/s10462-010-9165-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of instance selection methods

Abstract

Access this article

Similar content being viewed by others

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Cluster-Based Instance Selection for the Imbalanced Data Classification

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A review of instance selection methods

Abstract

Access this article

Similar content being viewed by others

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Cluster-Based Instance Selection for the Imbalanced Data Classification

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation