Large symmetric margin instance selection algorithm

Hamidzadeh, Javad; Monsefi, Reza; Sadoghi Yazdi, Hadi

doi:10.1007/s13042-014-0239-z

Large symmetric margin instance selection algorithm

Original Article
Published: 24 February 2014

Volume 7, pages 25–45, (2016)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Javad Hamidzadeh¹,
Reza Monsefi² &
Hadi Sadoghi Yazdi²

322 Accesses
13 Citations
Explore all metrics

Abstract

In instance-based classifiers, there is a need for storing a large number of samples as a training set. In this paper, we propose a large symmetric margin instance selection algorithm, namely LAMIS. LAMIS removes non-border (interior) instances and keeps border ones. This paper presents an instance selection process through formulating it as a constrained binary optimization problem and solves it by employment filled function algorithm. Instance-based learning algorithms are often confronted with the problem of deciding which instances must be stored for use during an actual test. Storing too many instances can result in large memory requirements and slow execution. In LAMIS, the core of instance selection process is based on keeping the hyperplane that separates a two-class data, to provide large margin separation. LAMIS selects the most representative instances, satisfying both objectives: high accuracy and reduction rates. The performance has been evaluated on real world data sets from UCI repository by the ten-fold cross-validation method. The results of experiments have been compared with state-of-the-art methods, where the overall results, show the superiority of the proposed method in terms of classification accuracy and reduction percentage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance selection improves geometric mean accuracy: a study on imbalanced data classification

Article 06 February 2019

Ludmila I. Kuncheva, Álvar Arnaiz-González, … Iain A. D. Gunn

Local feature selection for multiple instance learning

Article 01 November 2021

Aliasghar Shahrjooihaghighi & Hichem Frigui

References

Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2010) A new fast prototype selection method based on clustering. Pattern Anal Appl 13(2):131–141
Article MathSciNet Google Scholar
Czarnowski I (2012) Cluster-based instance selection for machine classification. Knowl Inf Syst 30(1):113–133
Article Google Scholar
Cheng H, Shan J, Ju W, Guo Y, Zhang L (2010) Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn 43:299–317
Article MATH Google Scholar
Rosset S, Perlich C, Swirszcz G, Melville P, Liu Y (2010) Medical data mining: insights from winning two competitions. Data Min Knowl Disc 20:439–468
Article MathSciNet Google Scholar
Liu H, Liu L, Zhang H (2010) Ensemble gene selection for cancer classification. Pattern Recogn 43:2763–2772
Article Google Scholar
Twala B, Phorah M (2010) Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recogn Lett 31:2061–2069
Article Google Scholar
Dhurandhar A, Dobra A (2013) Probabilistic characterization of nearest neighbor classifier. Int J Mach Learn Cybernet 4:259–272
Article Google Scholar
Basu T, Murthy CA (2013) Towards enriching the quality of k-nearest neighbor rule for document classification. Int J Mach Learn Cybernet. doi:10.1007/s13042-013-0177-1
Google Scholar
Tomašev N, Radovanović M, Mladenić D, Ivanović M (2012) Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int J Mach Learn Cybernet. doi:10.1007/s13042-012-0137-1
Google Scholar
Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2012) DDC: distance-based decision classifier. Neural Comput Appl 21:1697–1707
Article Google Scholar
Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybernet 1:3–25
Article Google Scholar
Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505
Article Google Scholar
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286
Article MATH Google Scholar
Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition, IEEE Computer Society, Hong-Kong, pp 556–559
Lam W, Keung CK, Liu D (2002) Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans Pattern Anal Mach Intell 24(8):1075–1090
Article Google Scholar
Veenman CJ, Reinders MJT (2005) The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Mach Intell 27(9):1417–1429
Article Google Scholar
García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Mach Intell 34(3):417–435
Article Google Scholar
Olvera-Lopez AJ, Carrasco-Ochoa JF, Martinez-Trinidad JA, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143
Article Google Scholar
Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275
Article MathSciNet Google Scholar
Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516
Article Google Scholar
Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433
Article Google Scholar
Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernet 2:408–421
Article MATH Google Scholar
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybernet 6(6):448–452
Article MATH MathSciNet Google Scholar
Lowe DG (1995) Similarity metric learning for a variable-kernel classifier. Neural Comput 7(1):72–85
Article Google Scholar
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172
Article MATH MathSciNet Google Scholar
Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recogn 39:495–497
Article MATH Google Scholar
Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recogn Lett 26(10):1554–1567
Article Google Scholar
Fayed HA, Atiya AF (2009) A novel template, reduction approach for the K-nearest neighbor method. IEEE Trans Neural Netw 20(5):890–896
Article Google Scholar
Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017
MATH MathSciNet Google Scholar
Marchiori E (2010) Class conditional nearest neighbor for large margin instance selection. IEEE Trans Pattern Anal Mach Intell 32(2):364–370
Article Google Scholar
Nikolaidis K, Goulermasn JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recogn 44:704–715
Article MATH Google Scholar
Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recogn Lett 31:133–142
Article Google Scholar
Hernandez-Leal P, Carrasco-Ochoaa JA, Martinez-Trinidada JF, Olvera-Lopez JA (2013) InstanceRank based on borders for instance selection. Pattern Recogn 46:365–375
Article Google Scholar
Kuncheva L (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814
Article Google Scholar
Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recogn 30:1041–1049
Article Google Scholar
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6):561–575
Article Google Scholar
Garcia S, Cano JR, Herera F (2008) A Memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recogn 41:2693–2709
Article MATH Google Scholar
Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11:353–363
Article MathSciNet Google Scholar
Reeves CR, Bush DR (2001) Using genetic algorithms for training data selection in RBF networks. In: Instance selection and construction for data mining. Kluwer Academic Publishers, pp 339–356
Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357
Article Google Scholar
Li Y, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201
Article Google Scholar
Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41:6–25
Article Google Scholar
Smith-Miles K, Islam R (2010) Meta-learning for data summarization based on instance selection method. In: 2010 IEEE congress on evolutionary computation (CEC). Barcelona, Spain, pp 1–8
Caises Y, González A, Leyva E, Pérez R (2011) Combining instance selection methods based on data characterization: an approach to increase their effectiveness. Inf Sci 181(20):4780–4798
Article Google Scholar
Leyva E, González A, Pérez R (2013) Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl Based Syst 47:65–76
Article Google Scholar
Wu ZY, Bai FS, Lee HWJ, Yang YJ (2007) A filled function method for constrained global optimization. J Glob Optim 39:495–507
Article MATH MathSciNet Google Scholar
Ge RP (1990) A filled function method for finding a global minimizer of a function of several variables. Math Progr 46:191–204
Article MATH Google Scholar
Shang YL, Zhang LS (2008) Finding discrete global minima with a filled function for integer programming. Eur J Oper Res 189:31–40
Article MATH MathSciNet Google Scholar
Zhang Y, Zhang L, Xu Y (2009) New filled functions for nonsmooth global optimization. Appl Math Model 33:3114–3129
Article MATH MathSciNet Google Scholar
Ling AF, Xu CX, Xu F-M (2009) A discrete filled function algorithm embedded with continuous approximation for solving max-cut problems. Eur J Oper Res 197:519–531
Article MATH MathSciNet Google Scholar
Zhang Y, Xu Y, Zhang L (2009) A filled function method applied to nonsmooth constrained global optimization. J Comput Appl Math 232:415–426
Article MATH MathSciNet Google Scholar
Wang C, Yang Y, Li J (2009) A new filled function method for unconstrained global optimization. J Comput Appl Math 225:68–79
Article MATH MathSciNet Google Scholar
Ma S, Yang Y, Liu H (2010) A parameter free filled function for unconstrained global optimization. Appl Math Comput 215:3610–3619
Article MATH MathSciNet Google Scholar
Jie L (2011) A new filled function algorithm for constrained global optimization problems. In: Seventh International conference on computational intelligence and security, pp 38–41
Shuqing J (2012) A filled function method with one parameter for box constraint. In: Eighth International conference on computational intelligence and security, pp 1–4
Lin Y, Yang Y (2012) A new filled function method for constrained nonlinear equations. Appl Math Comput 219:3100–3112
Article MATH MathSciNet Google Scholar
Wang W, Shang Y (2012) A quasi-filled function approach for nonlinear global integer optimization. In: Fifth International joint conference on computational sciences and optimization, pp 359–361
Antczak T (2009) Exact penalty functions method for mathematical programming problems involving index functions. Eur J Oper Res 198:29–36
Article MATH MathSciNet Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. (http://archive.ics.uci.edu/ml), University of California, School of Information and Computer Science, Irvine
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064
Article Google Scholar
Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88:920–923
Article MATH MathSciNet Google Scholar
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors are grateful to the suggestions of the anonymous reviewers and editor which greatly improved the paper.

Author information

Authors and Affiliations

Department of Computer Engineering, Sadjad Institute of Higher Education, Mashhad, Iran
Javad Hamidzadeh
Department of Computer Engineering, Ferdowsi University of Mashhad (FUM), Mashhad, Iran
Reza Monsefi & Hadi Sadoghi Yazdi

Authors

Javad Hamidzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Reza Monsefi
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Sadoghi Yazdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javad Hamidzadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hamidzadeh, J., Monsefi, R. & Sadoghi Yazdi, H. Large symmetric margin instance selection algorithm. Int. J. Mach. Learn. & Cyber. 7, 25–45 (2016). https://doi.org/10.1007/s13042-014-0239-z

Download citation

Received: 31 January 2013
Accepted: 02 February 2014
Published: 24 February 2014
Issue Date: February 2016
DOI: https://doi.org/10.1007/s13042-014-0239-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large symmetric margin instance selection algorithm

Abstract

Access this article

Similar content being viewed by others

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance selection improves geometric mean accuracy: a study on imbalanced data classification

Local feature selection for multiple instance learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large symmetric margin instance selection algorithm

Abstract

Access this article

Similar content being viewed by others

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance selection improves geometric mean accuracy: a study on imbalanced data classification

Local feature selection for multiple instance learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation