Abstract
This paper presents a novel ensemble classifier generation method by integrating the ideas of bootstrap aggregation and Principal Component Analysis (PCA). To create each individual member of an ensemble classifier, PCA is applied to every out-of-bag sample and the computed coefficients of all principal components are stored, and then the principal components calculated on the corresponding bootstrap sample are taken as additional elements of the original feature set. A classifier is trained with the bootstrap sample and some features randomly selected from the new feature set. The final ensemble classifier is constructed by majority voting of the trained base classifiers. The results obtained by empirical experiments and statistical tests demonstrate that the proposed method performs better than or as well as several other ensemble methods on some benchmark data sets publicly available from the UCI repository. Furthermore, the diversity-accuracy patterns of the ensemble classifiers are investigated by kappa-error diagrams.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Alpaydin, E.: Combined 5×2 cv F test for comparing supervised classification learning algorithms. Neural Comput. 11(8), 1885–1892 (1999)
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A comparison of decision tree ensemble creation techniques. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 173–180 (2007)
Blake, C.L., Merz, C.J.: UCI repository of machine learning datasets. http://www.ics.uci.edu/~mlearn/MLRepository.html (1998)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996a)
Breiman, L.: Out-of-bag estimation. Technical report, Statistics Department, University of California Berkeley, Berkeley, CA (1996b)
Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998)
Breiman, L.: Random forests. Ann. Stat. 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Tree. Chapman and Hall, New York (1984)
Chandra, A., Yao, X.: Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing 69(7–9), 686–700 (2006)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)
Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)
Fleiss, J.L.: Statistical Methods for Rates and Proportions, 2nd edn. Wiley, New York (1981)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 148–156 (1996)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Hothorn, T., Lausen, B.: Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognit. 36(6), 1303–1309 (2003)
Hothorn, T., Lausen, B.: Bundling classifiers by bagging trees. Comput. Stat. Data Anal. 49(4), 1068–1078 (2005)
Hothorn, T., Leisch, F., Zeileis, A., Hornik, K.: The design and analysis of benchmark experiments. J. Comput. Graph. Stat. 14(3), 675–699 (2005)
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 231–238. MIT Press, Cambridge (1995)
Leblanc, M., Tibshirani, R.: Combining estimates in regression and classification. J. Am. Stat. Assoc. 91(436), 1641–1650 (1996)
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 211–218. Morgan Kaufmann, San Mateo (1997)
Meir, R., Rätsch, G.: An introduction to boosting and leveraging. In: Advanced Lectures on Machine Learning. Lecture Notes in Computer Science, vol. 2600, pp. 118–183. Springer, Berlin (2003)
Optiz, D.W., Shavlik, J.W.: Generating accurate and diverse members of a neural-network ensemble. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.L. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 535–541. MIT Press, Cambridge (1996)
Optiz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
Rodríguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Skurichina, M., Duin, R.P.W.: Combining feature subsets in feature selection. In: Multiple Classifier Systems. Lecture Notes in Computer Science, vol. 2541, pp. 165–175. Springer, Berlin (2005)
Tumer, K., Oza, N.C.: Input decimated ensembles. Pattern Anal. Appl. 6(1), 65–77 (2003)
Tresp, V.: Committee machines. In: Hu, Y.H., Hwang, J.-N. (eds.) Handbook for Neural Network Signal Processing. CRC, Boca Raton (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, CX., Zhang, JS. A novel method for constructing ensemble classifiers. Stat Comput 19, 317–327 (2009). https://doi.org/10.1007/s11222-008-9094-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-008-9094-7