Abstract
In some real-world problems solved by machine learning it is compulsory for the solution provided to be comprehensible so that the correct decision can be made. It is in this context that this paper compares bagging (one of the most widely used multiple classifier systems) with the consolidated trees construction (CTC) algorithm, when the learning problem to be solved requires the classification made to be provided with an explanation. Bearing in mind the comprehensibility shortcomings of bagging, the Domingos’ proposal, called combining multiple models, has been used to address this problem. The two algorithms have been compared from three main points of view: accuracy, quality of the explanation the classification is provided with, and computational cost. The results obtained show that it is beneficial to use CTC in situations where an explanation is required, because: CTC has a greater discriminating capacity than the explanation extraction algorithm added to bagging; the explanation provided is of a greater quality; it is simpler and more reliable; and CTC is computationally more efficient.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agnar A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. Artif Intell Commun 7(1): 39–52
Andonova S, Elisseeff A, Evgeniou T, Pontil M (2002) A simple algorithm for learning stable machines. In: Proceedings of the European conference on artificial intelligence, pp 513–517
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://www.ics.uci.edu/~learn/MLRepository.html
Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Kegelmeyer WP, Eschrich S (2004) A comparison of ensemble creation techniques. In: The fifth international conference on multiple classifier systems. Cagliari, Italy, pp 223–232
Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29: 173–180
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36: 105–139
Breiman L (1996) Bagging predictors. Mach Learn 24: 123–140
Chawla NV, Hall LO, Bowyer KW, Kegelmeyer WP (2004) Learning ensembles from bites: a scalable and accurate approach. J Mach Learn Res 5: 421–451
Craven WM (1996) Extracting comprehensible models from trained neural networks, Phd Thesis. University of Wisconsin, Madison
Wall R, Cunningham P, Walsh P (2002) Explaining predictions from a neural network ensemble one at a time. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery, pp 449–460
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30
Dietterich TG (1997) Machine learning research: four currents directions. AI Mag 18(4): 97–136
Dietterich TG (2002) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40: 139–157
Domingos P (1997) Knowledge acquisition from examples via multiple models. In: Proceedings of 14th international conference on machine learning, Nashville, pp 98–106
Drummond C, Holte RC (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: Proceedings of the 17th international conference on Machine Learning, pp 239–246
Dwyer K, Holte R (2007) Decision tree instability and active learning. In: Proceedings of the 18th European conference on machine learning, ECML, pp 128–139
Elisseeff A, Evgeniou T, Pontil M, Kaelbling P (2005) Stability of randomized learning algorithms. J Mach Learn 6: 55–79
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, pp 148–156
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9: 2677–2694
Gurrutxaga I, Pérez JM, Arbelaitz O, Martín JI, Muguerza J (2006) Analysis of the performance of a parallel implementation for the CTC algorithm. In: Workshop on state-of-the-art in Scientific and Parallel Computing (PARA’06), Umea, Sweden
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin ISBN: 0-387-95284-5
Johansson U, Niklasson L, König R (2004) Accuracy vs. comprehensibility in data mining models. In: The 7th international conference on information fusion, Stockholm, Sweden
Mease D, Wyner AJ, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn Res 8: 409–439
Núñez H, Angulo C, Català A (2002) Rule extraction from support vector machines. In: ESANN’2002 proceedings of the European symposium on artificial neural networks bruges (Belgium), pp 107–112
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. JAIR 11: 169–198
Paliouras G, Brée DS (1995) The effect of numeric features on the scalability of inductive learning programs. LNCS, vol 912. In: 8th European conference on machine learning (ECML), Greece, pp 218–231
Pérez JM, Muguerza J, Arbelaitz O, Gurrutxaga I, Martín JI et al (2006) Consolidated trees: an analysis of structural convergence, LNAI 3755. In: Graham JW (eds) Data mining: theory, methodology, techniques, and applications. Springer, Berlin, pp 39–52
Pérez JM (2006) Árboles consolidados: construcción de un árbol de clasificación basado en múltiples submuestras sin renunciar a la explicación, Phd thesis. University of Basque Country, Donostia
Pérez JM, Muguerza J, Arbelaitz O, Gurrutxaga I, Martín JI (2007) Combining multiple class distribution modified subsamples in a single tree. Pattern Recognit Lett 28(4): 414–422
Provost F, Jensen D, Oates T (1999) Efficient progressive sampling. In: Proceedings of 5th international conference on knowledge discovery and data mining. AAAI Press, Menlo Park, pp 23–32
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Schank R (1982) Dynamic memory: a theory of learning in computers and people. Cambridge University Press, New York
Setiono R, Leow WK, Zurada JM (2002) Extraction of rules from artificial neural networks for nonlinear regression. IEEE Trans Neural Netw 13(3): 564–577
Skurichina M, Kuncheva LI, Duin RPW (2002) Bagging and boosting for the nearest mean classifier: effects of sample size on diversity and accuracy. Multiple classifier systems: proceedings of 3rd international workshop, MCS, LNCS. Cagliari, Italy, vol 2364, pp 62–71
Turney P (1995) Bias and the quantification of stability. Mach Learn 20: 23–33
Windeatt T, Ardeshir G (2002) Boosted tree ensembles for solving multiclass problems. In: Multiple classifier systems: proceedings of 3rd interernational Workshop, MCS, LNCS. Cagliari, Italy, vol 2364, pp 42–51
Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern SMC-22(3): 418–435
Yao YY, Zhao Y, Maguire RB (2003) Explanation oriented association mining using rough set theory. In: Proceedings of the 9th international conference rough sets, fuzzy sets, data mining, and granular computing, (RSFDGrC, 2003), LNAI, vol 2639, pp 165–172
Yao YY, Zhao Y, Maguire RB (2003) Explanation oriented association mining using combination of unsupervised and supervised learning algorithms. In: Advances in artificial intelligence, proceedings of the 16th conference of the Canadian Society for Computational Studies of Intelligence (AI 2003), LNAI, vol 2671, pp 527–532
Zenobi G, Cunningham P (2002) An approach to aggregating ensembles of lazy learners that supports explanation. In: Advances in case-based reasoning, 6th European conference ECCBR, pp 436–447
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by R. Neruda.
Rights and permissions
About this article
Cite this article
Pérez, J.M., Albisua, I., Arbelaitz, O. et al. Consolidated trees versus bagging when explanation is required. Computing 89, 113–145 (2010). https://doi.org/10.1007/s00607-010-0094-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-010-0094-z