Creating diversity in ensembles using synthetic neighborhoods of training samples

Chen, Zhi; Lin, Tao; Chen, Rui; Xie, Yingtao; Xu, Hongyan

doi:10.1007/s10489-017-0922-3

Creating diversity in ensembles using synthetic neighborhoods of training samples

Published: 13 April 2017

Volume 47, pages 570–583, (2017)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhi Chen¹,
Tao Lin^1,2,
Rui Chen¹,
Yingtao Xie¹ &
…
Hongyan Xu¹

395 Accesses
8 Citations
Explore all metrics

Abstract

Diversity among base classifiers is known to be a key driver for the construction of an effective ensemble classifier. Several methods have been proposed to construct diverse base classifiers using artificially generated training samples. However, in these methods, diversity is often obtained at the expense of the accuracy of base classifiers. Inspired by the localized generalization error model a new sample generation method is proposed in this study. When preparing different training sets for base classifiers, the proposed method generates samples located within limited neighborhoods of the corresponding training samples. The generated samples are different with the original training samples but they also expand different parts of the original training data. Learning these datasets can result in a set of base classifiers that are accurate in different regions of the input space as well as maintaining appropriate diversity. Experiments performed on 26 benchmark datasets showed that: (1) our proposed method significantly outperformed some state-of-the-art ensemble methods in term of the classification accuracy; (2) our proposed method was significantly more efficient that other sample generation based ensemble methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1):5–20
Article Google Scholar
Bi Y (2012) The impact of diversity on the accuracy of evidential classifier ensembles. Int J Approx Reason 53(4):584–607
Article MathSciNet Google Scholar
Sun B, Chen H, Wang J (2015) An empirical margin explanation for the effectiveness of DECORATE ensemble learning algorithm. Knowl-Based Syst 78:1–12
Article Google Scholar
Tsakonas A (2014) An analysis of accuracy-diversity trade-off for hybrid combined system with multiobjective predictor selection. Appl Intell 40(4):710–723
Article Google Scholar
Kuncheva LI (2001) Combining classifiers: soft computing solutions. Pattern recognition: from classical to modern approaches, pp 427–451
Melville P, Mooney RJ (2005) Creating diversity in ensembles using artificial data. Inf Fusion 6(1):99–111
Article Google Scholar
Ho T (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Akhand MA, Murase K (2012) Ensembles of neural networks based on the alteration of input feature values. Int J Neural Syst 22(1):77–87
Article Google Scholar
Akhand MAH, Islam MM, Murase K (2009) A comparative study of data sampling techniques for constructing neural network ensembles. Int J Neural Syst 19(02):67–89
Britto AS, Sabourin R, Oliveira LES (2014) Dynamic selection of classifiers—a comprehensive review. Pattern Recogn 47(11):3665–3680
Article Google Scholar
Yeung DS, Ng WW, Wang D, Tsang EC, et al (2007) Localized generalization error model and its application to architecture selection for radial basis function neural network. IEEE Trans Neural Netw 18(5):1294–1305
Article Google Scholar
Ng WWY, Dorado A, Yeung DS, Pedrycz W, et al (2007) Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error. Pattern Recogn 40 (1):19–32
Article MATH Google Scholar
Ng WWY, Yeung DS, Firth M, Tsang ECC, et al (2008) Feature selection using localized generalization error for supervised classification problems using RBFNN. Pattern Recogn 41(12):3706–3719
Article MATH Google Scholar
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1-2):1–39
Article Google Scholar
Kotsiantis SB (2013) Bagging and boosting variants for handling classifications problems: a survey. Knowl Eng Rev 29(01):78–100
Article Google Scholar
Dai Q, Han XM (2016) An efficient ordering-based ensemble pruning algorithm via dynamic programming. Appl Intell 44(4):816–830
Article MathSciNet Google Scholar
Ahmad A (2014) Decision tree ensembles based on kernel features. Appl Intell 41(3):855–869
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach. Learn. 24(2):123–140
MATH Google Scholar
Freund Y (1996) Experiments with a new boosting algorithm. In: Thirteenth international conference on machine learning
Breiman L (2001) Random Forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–30
Article Google Scholar
Freund Y, Schapire RE (1997) A Decision-Theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet MATH Google Scholar
Xiao J, He C, Jiang X, Liu D (2010) A dynamic classifier ensemble selection approach for noise data. Inf Sci 180(18): 3402–3421
Article Google Scholar
Mao S, Jiao LC, Xiong L, Gou S (2011) Greedy optimization classifiers ensemble based on diversity. Pattern Recogn 44(6):1245–1261
Article MATH Google Scholar
Antal B (2016) Classifier ensemble creation via false labelling. Knowl-Based Syst 89(C):278–287
Google Scholar
Elyan E, Gaber MM (2017) A genetic algorithm approach to optimising random forests applied to class engineered data. Information Sciences 384:220–234
Kuncheva LI (2013) A bound on kappa-error diagrams for analysis of classifier ensembles. IEEE Trans Knowl Data Eng 25(3):494–501
Article Google Scholar
Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131
Article Google Scholar
Schapire RE, Freund Y, Barlett P, Lee WS (1997) Boosting the margin: a new explanation for the effectiveness of voting methods. Morgan Kaufmann Publishers Inc., pp 322–330
Wang L, Sugiyama M, Jing Z, Yang C, et al (2011) A refined margin analysis for boosting algorithms via equilibrium margin. J Mach Learn Res 12(2):1835–1863
MathSciNet MATH Google Scholar
Gao W, Zhou ZH (2013) On the doubt about margin explanation of boosting. Artif Intell 203(5):1–18
Article MathSciNet MATH Google Scholar
Hu Q, Li L, Wu X, Schaefer G, et al (2014) Exploiting diversity for optimizing margin distribution in ensemble learning. Knowl-Based Syst 67:90–104
Article Google Scholar
Li L, Zou B, Hu Q, Wu X, et al (2013) Dynamic classifier ensemble using classification confidence. Neurocomputing 99:581–591
Article Google Scholar
Sun B, Ng WWY, Yeung DS, Chan PPK (2013) Hyper-parameter selection for sparse LS-SVM via minimization of its localized generalization error. Int J Wavelets Multiresolution Inf Process 11(03):1350030
Zhang H, Li M (2014) RWO-Sampling: a random walk over-sampling approach to imbalanced data classification. Inf Fusion 20:99–116
Article Google Scholar
Asuncion A (2007) And D. UCI machine learning repository, Newman
Google Scholar
Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ , et al (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, et al (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Chang CC, Lin CJ (2007) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2(3, article 27):389–396
Google Scholar
Mukherjee I, Schapire RE (2011) A theory of multiclass boosting. J Mach Learn Res 14(1):437–497
MathSciNet MATH Google Scholar
Dem and J. Ar (2006) Statistical Comparisons of Classifiers over Multiple Data Sets. J Mach Learn Res 7(1):1–30
Hodges JL, Lehmann EL (1962) Rank methods for combination of independent experiments in analysis of variance. Ann Math Stat 33(2):482–497
Article MathSciNet MATH Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70
MathSciNet MATH Google Scholar
Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65(1):247–271
Article Google Scholar
Tsymbal A, Pechenizkiy M, Cunningham P (2005) Diversity in search strategies for ensemble feature selection. Inf Fusion 6(1):83–98
Article Google Scholar
Dai Q (2013) A competitive ensemble pruning approach based on cross-validation technique. Knowl-Based Syst 37(2):394–414
Article Google Scholar
Dai Q, Yao CS (2016) A hierarchical and parallel branch-and-bound ensemble selection algorithm. Appl Intell:1–17

Download references

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, Sichuan, China
Zhi Chen, Tao Lin, Rui Chen, Yingtao Xie & Hongyan Xu
Sichuan University, No.24 South Section 1 of Yihuan Road, Chengdu, Sichuan province, China
Tao Lin

Authors

Zhi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Rui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yingtao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Lin, T., Chen, R. et al. Creating diversity in ensembles using synthetic neighborhoods of training samples. Appl Intell 47, 570–583 (2017). https://doi.org/10.1007/s10489-017-0922-3

Download citation

Published: 13 April 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10489-017-0922-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Creating diversity in ensembles using synthetic neighborhoods of training samples

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A comparative analysis of gradient boosting algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Creating diversity in ensembles using synthetic neighborhoods of training samples

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A comparative analysis of gradient boosting algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation