Skip to main content
Log in

Large symmetric margin instance selection algorithm

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In instance-based classifiers, there is a need for storing a large number of samples as a training set. In this paper, we propose a large symmetric margin instance selection algorithm, namely LAMIS. LAMIS removes non-border (interior) instances and keeps border ones. This paper presents an instance selection process through formulating it as a constrained binary optimization problem and solves it by employment filled function algorithm. Instance-based learning algorithms are often confronted with the problem of deciding which instances must be stored for use during an actual test. Storing too many instances can result in large memory requirements and slow execution. In LAMIS, the core of instance selection process is based on keeping the hyperplane that separates a two-class data, to provide large margin separation. LAMIS selects the most representative instances, satisfying both objectives: high accuracy and reduction rates. The performance has been evaluated on real world data sets from UCI repository by the ten-fold cross-validation method. The results of experiments have been compared with state-of-the-art methods, where the overall results, show the superiority of the proposed method in terms of classification accuracy and reduction percentage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141

    Article  MathSciNet  Google Scholar 

  2. Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133

    Article  Google Scholar 

  3. Cheng H, Shan J, Ju W, Guo Y, Zhang L (2010) Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn 43:299–317

    Article  MATH  Google Scholar 

  4. Rosset S, Perlich C, Swirszcz G, Melville P, Liu Y (2010) Medical data mining: insights from winning two competitions. Data Min Knowl Disc 20:439–468

    Article  MathSciNet  Google Scholar 

  5. Liu H, Liu L, Zhang H (2010) Ensemble gene selection for cancer classification. Pattern Recogn 43:2763–2772

    Article  Google Scholar 

  6. Twala B, Phorah M (2010) Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recogn Lett 31:2061–2069

    Article  Google Scholar 

  7. Dhurandhar A, Dobra A (2013) Probabilistic characterization of nearest neighbor classifier. Int J Mach Learn Cybernet 4:259–272

    Article  Google Scholar 

  8. Basu T, Murthy CA (2013) Towards enriching the quality of k-nearest neighbor rule for document classification. Int J Mach Learn Cybernet. doi:10.1007/s13042-013-0177-1

    Google Scholar 

  9. Tomašev N, Radovanović M, Mladenić D, Ivanović M (2012) Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int J Mach Learn Cybernet. doi:10.1007/s13042-012-0137-1

    Google Scholar 

  10. Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2012) DDC: distance-based decision classifier. Neural Comput Appl 21:1697–1707

    Article  Google Scholar 

  11. Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1:3–25

    Article  Google Scholar 

  12. Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505

    Article  Google Scholar 

  13. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286

    Article  MATH  Google Scholar 

  14. Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition, IEEE Computer Society, Hong-Kong, pp 556–559

  15. Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 24(8):1075–1090

    Article  Google Scholar 

  16. Veenman CJ, Reinders MJT (2005) The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Mach Intell 27(9):1417–1429

    Article  Google Scholar 

  17. García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Mach Intell 34(3):417–435

    Article  Google Scholar 

  18. Olvera-Lopez AJ, Carrasco-Ochoa JF, Martinez-Trinidad JA, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143

    Article  Google Scholar 

  19. Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275

    Article  MathSciNet  Google Scholar 

  20. Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516

    Article  Google Scholar 

  21. Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433

    Article  Google Scholar 

  22. Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernet 2:408–421

    Article  MATH  Google Scholar 

  23. Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybernet 6(6):448–452

    Article  MATH  MathSciNet  Google Scholar 

  24. Lowe DG (1995) Similarity metric learning for a variable-kernel classifier. Neural Comput 7(1):72–85

    Article  Google Scholar 

  25. Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172

    Article  MATH  MathSciNet  Google Scholar 

  26. Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recogn 39:495–497

    Article  MATH  Google Scholar 

  27. Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recogn Lett 26(10):1554–1567

    Article  Google Scholar 

  28. Fayed HA, Atiya AF (2009) A novel template, reduction approach for the K-nearest neighbor method. IEEE Trans Neural Netw 20(5):890–896

    Article  Google Scholar 

  29. Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017

    MATH  MathSciNet  Google Scholar 

  30. Marchiori E (2010) Class conditional nearest neighbor for large margin instance selection. IEEE Trans Pattern Anal Mach Intell 32(2):364–370

    Article  Google Scholar 

  31. Nikolaidis K, Goulermasn JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recogn 44:704–715

    Article  MATH  Google Scholar 

  32. Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recogn Lett 31:133–142

    Article  Google Scholar 

  33. Hernandez-Leal P, Carrasco-Ochoaa JA, Martinez-Trinidada JF, Olvera-Lopez JA (2013) InstanceRank based on borders for instance selection. Pattern Recogn 46:365–375

    Article  Google Scholar 

  34. Kuncheva L (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814

    Article  Google Scholar 

  35. Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recogn 30:1041–1049

    Article  Google Scholar 

  36. Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575

    Article  Google Scholar 

  37. Garcia S, Cano JR, Herera F (2008) A Memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41:2693–2709

    Article  MATH  Google Scholar 

  38. Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11:353–363

    Article  MathSciNet  Google Scholar 

  39. Reeves CR, Bush DR (2001) Using genetic algorithms for training data selection in RBF networks. In: Instance selection and construction for data mining. Kluwer Academic Publishers, pp 339–356

  40. Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357

    Article  Google Scholar 

  41. Li Y, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201

    Article  Google Scholar 

  42. Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41:6–25

    Article  Google Scholar 

  43. Smith-Miles K, Islam R (2010) Meta-learning for data summarization based on instance selection method. In: 2010 IEEE congress on evolutionary computation (CEC). Barcelona, Spain, pp 1–8

  44. Caises Y, González A, Leyva E, Pérez R (2011) Combining instance selection methods based on data characterization: an approach to increase their effectiveness. Inf Sci 181(20):4780–4798

    Article  Google Scholar 

  45. Leyva E, González A, Pérez R (2013) Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl Based Syst 47:65–76

    Article  Google Scholar 

  46. Wu ZY, Bai FS, Lee HWJ, Yang YJ (2007) A filled function method for constrained global optimization. J Glob Optim 39:495–507

    Article  MATH  MathSciNet  Google Scholar 

  47. Ge RP (1990) A filled function method for finding a global minimizer of a function of several variables. Math Progr 46:191–204

    Article  MATH  Google Scholar 

  48. Shang YL, Zhang LS (2008) Finding discrete global minima with a filled function for integer programming. Eur J Oper Res 189:31–40

    Article  MATH  MathSciNet  Google Scholar 

  49. Zhang Y, Zhang L, Xu Y (2009) New filled functions for nonsmooth global optimization. Appl Math Model 33:3114–3129

    Article  MATH  MathSciNet  Google Scholar 

  50. Ling AF, Xu CX, Xu F-M (2009) A discrete filled function algorithm embedded with continuous approximation for solving max-cut problems. Eur J Oper Res 197:519–531

    Article  MATH  MathSciNet  Google Scholar 

  51. Zhang Y, Xu Y, Zhang L (2009) A filled function method applied to nonsmooth constrained global optimization. J Comput Appl Math 232:415–426

    Article  MATH  MathSciNet  Google Scholar 

  52. Wang C, Yang Y, Li J (2009) A new filled function method for unconstrained global optimization. J Comput Appl Math 225:68–79

    Article  MATH  MathSciNet  Google Scholar 

  53. Ma S, Yang Y, Liu H (2010) A parameter free filled function for unconstrained global optimization. Appl Math Comput 215:3610–3619

    Article  MATH  MathSciNet  Google Scholar 

  54. Jie L (2011) A new filled function algorithm for constrained global optimization problems. In: Seventh International conference on computational intelligence and security, pp 38–41

  55. Shuqing J (2012) A filled function method with one parameter for box constraint. In: Eighth International conference on computational intelligence and security, pp 1–4

  56. Lin Y, Yang Y (2012) A new filled function method for constrained nonlinear equations. Appl Math Comput 219:3100–3112

    Article  MATH  MathSciNet  Google Scholar 

  57. Wang W, Shang Y (2012) A quasi-filled function approach for nonlinear global integer optimization. In: Fifth International joint conference on computational sciences and optimization, pp 359–361

  58. Antczak T (2009) Exact penalty functions method for mathematical programming problems involving index functions. Eur J Oper Res 198:29–36

    Article  MATH  MathSciNet  Google Scholar 

  59. Bache K, Lichman M (2013) UCI machine learning repository. (http://archive.ics.uci.edu/ml), University of California, School of Information and Computer Science, Irvine

  60. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  61. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064

    Article  Google Scholar 

  62. Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88:920–923

    Article  MATH  MathSciNet  Google Scholar 

  63. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the suggestions of the anonymous reviewers and editor which greatly improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javad Hamidzadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hamidzadeh, J., Monsefi, R. & Sadoghi Yazdi, H. Large symmetric margin instance selection algorithm. Int. J. Mach. Learn. & Cyber. 7, 25–45 (2016). https://doi.org/10.1007/s13042-014-0239-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-014-0239-z

Keywords

Navigation