Abstract
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with different ensemble sizes and with six different ensemble integration methods. Our experiments show that the greatest correlation of the accuracy improvement, on average, is with the disagreement, entropy, and ambiguity diversity measures, and the lowest correlation, surprisingly, is with the Q and double fault measures. Normally, the correlation decreases linearly as the ensemble size increases. Much higher correlation values can be seen with the dynamic integration methods, which are shown to better utilize the ensemble diversity than their static analogues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36(1,2), 105–139 (1999)
Blake, C.L., Keogh, E., Merz, C.J.: UCI repository of machine learning databases, Dept. of Information and Computer Science, University of California, Irvine, CA (1999), http://www.ics.uci.edu/~mlearn/MLRepository.html
Brodley, C., Lane, T.: Creating and exploiting coverage and diversity. In: Proc. AAAI 1996 Workshop on Integrating Multiple Learned Models, Portland, OR, pp. 8–14 (1996)
Cunningham, P., Carney, J.: Diversity versus quality in classification ensembles based on feature selection. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 109–116. Springer, Heidelberg (2000)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Dietterich, T.G.: Machine learning research: four current directions. AI Magazine 18(4), 97–136 (1997)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zeroone loss. Machine Learning 29(2,3), 103–130 (1997)
Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification processes. Image Vision and Computing Journal 19(9-10), 699–707 (2001)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. MIT Press, Cambridge (1995)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207 (2003)
Opitz, D.: Feature selection for ensembles. In: Proc. 16th National Conf. on Artificial Intelligence, pp. 379–384. AAAI Press, Menlo Park (1999)
Puuronen, S., Terziyan, V., Tsymbal, A.: A dynamic integration algorithm for an ensemble of classifiers. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1999. LNCS, vol. 1609, pp. 592–600. Springer, Heidelberg (1999)
Schaffer, C.: Selecting a classification method by cross-validation. Machine Learning 13, 135–143 (1993)
Shipp, C.A., Kuncheva, L.I.: Relationship between combination methods and measures of diversity in combining classifiers. Information Fusion 3, 135–148 (2002)
Skalak, D.B.: The sources of increased accuracy for two proposed boosting algorithms. In: AAAI 1996 Workshop on Integrating Multiple Models for Improving and Scaling Machine Learning Algorithms (in conjunction with AAAI 1996), Portland, Oregon, USA, pp. 120–125 (1996)
Skurichina, M., Duin, R.P.W.: Bagging and the random subspace method for redundant feature spaces. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 1–10. Springer, Heidelberg (2001)
Tsymbal, A., Puuronen, S., Patterson, D.: Ensemble feature selection with the simple Bayesian classification. Information Fusion 4(2), 87–100 (2003)
Tsymbal, A., Puuronen, S., Skrypnyk, I.: Ensemble feature selection with dynamic integration of classifiers. In: Int. ICSC Congress on Computational Intelligence Methods and Applications CIMA 2001, Bangor, Wales, U.K, pp. 558–564 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsymbal, A., Pechenizkiy, M., Cunningham, P. (2004). Diversity in Random Subspacing Ensembles. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2004. Lecture Notes in Computer Science, vol 3181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30076-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-30076-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22937-7
Online ISBN: 978-3-540-30076-2
eBook Packages: Springer Book Archive