Abstract
In instance-based classifiers, there is a need for storing a large number of samples as a training set. In this paper, we propose a large symmetric margin instance selection algorithm, namely LAMIS. LAMIS removes non-border (interior) instances and keeps border ones. This paper presents an instance selection process through formulating it as a constrained binary optimization problem and solves it by employment filled function algorithm. Instance-based learning algorithms are often confronted with the problem of deciding which instances must be stored for use during an actual test. Storing too many instances can result in large memory requirements and slow execution. In LAMIS, the core of instance selection process is based on keeping the hyperplane that separates a two-class data, to provide large margin separation. LAMIS selects the most representative instances, satisfying both objectives: high accuracy and reduction rates. The performance has been evaluated on real world data sets from UCI repository by the ten-fold cross-validation method. The results of experiments have been compared with state-of-the-art methods, where the overall results, show the superiority of the proposed method in terms of classification accuracy and reduction percentage.
Similar content being viewed by others
References
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141
Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133
Cheng H, Shan J, Ju W, Guo Y, Zhang L (2010) Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn 43:299–317
Rosset S, Perlich C, Swirszcz G, Melville P, Liu Y (2010) Medical data mining: insights from winning two competitions. Data Min Knowl Disc 20:439–468
Liu H, Liu L, Zhang H (2010) Ensemble gene selection for cancer classification. Pattern Recogn 43:2763–2772
Twala B, Phorah M (2010) Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recogn Lett 31:2061–2069
Dhurandhar A, Dobra A (2013) Probabilistic characterization of nearest neighbor classifier. Int J Mach Learn Cybernet 4:259–272
Basu T, Murthy CA (2013) Towards enriching the quality of k-nearest neighbor rule for document classification. Int J Mach Learn Cybernet. doi:10.1007/s13042-013-0177-1
Tomašev N, Radovanović M, Mladenić D, Ivanović M (2012) Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int J Mach Learn Cybernet. doi:10.1007/s13042-012-0137-1
Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2012) DDC: distance-based decision classifier. Neural Comput Appl 21:1697–1707
Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1:3–25
Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286
Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition, IEEE Computer Society, Hong-Kong, pp 556–559
Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 24(8):1075–1090
Veenman CJ, Reinders MJT (2005) The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Mach Intell 27(9):1417–1429
García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Mach Intell 34(3):417–435
Olvera-Lopez AJ, Carrasco-Ochoa JF, Martinez-Trinidad JA, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143
Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275
Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516
Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433
Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernet 2:408–421
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybernet 6(6):448–452
Lowe DG (1995) Similarity metric learning for a variable-kernel classifier. Neural Comput 7(1):72–85
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172
Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recogn 39:495–497
Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recogn Lett 26(10):1554–1567
Fayed HA, Atiya AF (2009) A novel template, reduction approach for the K-nearest neighbor method. IEEE Trans Neural Netw 20(5):890–896
Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017
Marchiori E (2010) Class conditional nearest neighbor for large margin instance selection. IEEE Trans Pattern Anal Mach Intell 32(2):364–370
Nikolaidis K, Goulermasn JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recogn 44:704–715
Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recogn Lett 31:133–142
Hernandez-Leal P, Carrasco-Ochoaa JA, Martinez-Trinidada JF, Olvera-Lopez JA (2013) InstanceRank based on borders for instance selection. Pattern Recogn 46:365–375
Kuncheva L (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814
Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recogn 30:1041–1049
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575
Garcia S, Cano JR, Herera F (2008) A Memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41:2693–2709
Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11:353–363
Reeves CR, Bush DR (2001) Using genetic algorithms for training data selection in RBF networks. In: Instance selection and construction for data mining. Kluwer Academic Publishers, pp 339–356
Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357
Li Y, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201
Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41:6–25
Smith-Miles K, Islam R (2010) Meta-learning for data summarization based on instance selection method. In: 2010 IEEE congress on evolutionary computation (CEC). Barcelona, Spain, pp 1–8
Caises Y, González A, Leyva E, Pérez R (2011) Combining instance selection methods based on data characterization: an approach to increase their effectiveness. Inf Sci 181(20):4780–4798
Leyva E, González A, Pérez R (2013) Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl Based Syst 47:65–76
Wu ZY, Bai FS, Lee HWJ, Yang YJ (2007) A filled function method for constrained global optimization. J Glob Optim 39:495–507
Ge RP (1990) A filled function method for finding a global minimizer of a function of several variables. Math Progr 46:191–204
Shang YL, Zhang LS (2008) Finding discrete global minima with a filled function for integer programming. Eur J Oper Res 189:31–40
Zhang Y, Zhang L, Xu Y (2009) New filled functions for nonsmooth global optimization. Appl Math Model 33:3114–3129
Ling AF, Xu CX, Xu F-M (2009) A discrete filled function algorithm embedded with continuous approximation for solving max-cut problems. Eur J Oper Res 197:519–531
Zhang Y, Xu Y, Zhang L (2009) A filled function method applied to nonsmooth constrained global optimization. J Comput Appl Math 232:415–426
Wang C, Yang Y, Li J (2009) A new filled function method for unconstrained global optimization. J Comput Appl Math 225:68–79
Ma S, Yang Y, Liu H (2010) A parameter free filled function for unconstrained global optimization. Appl Math Comput 215:3610–3619
Jie L (2011) A new filled function algorithm for constrained global optimization problems. In: Seventh International conference on computational intelligence and security, pp 38–41
Shuqing J (2012) A filled function method with one parameter for box constraint. In: Eighth International conference on computational intelligence and security, pp 1–4
Lin Y, Yang Y (2012) A new filled function method for constrained nonlinear equations. Appl Math Comput 219:3100–3112
Wang W, Shang Y (2012) A quasi-filled function approach for nonlinear global integer optimization. In: Fifth International joint conference on computational sciences and optimization, pp 359–361
Antczak T (2009) Exact penalty functions method for mathematical programming problems involving index functions. Eur J Oper Res 198:29–36
Bache K, Lichman M (2013) UCI machine learning repository. (http://archive.ics.uci.edu/ml), University of California, School of Information and Computer Science, Irvine
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064
Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88:920–923
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471
Acknowledgments
The authors are grateful to the suggestions of the anonymous reviewers and editor which greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hamidzadeh, J., Monsefi, R. & Sadoghi Yazdi, H. Large symmetric margin instance selection algorithm. Int. J. Mach. Learn. & Cyber. 7, 25–45 (2016). https://doi.org/10.1007/s13042-014-0239-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-014-0239-z