Consolidated Trees: An Analysis of Structural Convergence

Pérez, Jesús M.; Muguerza, Javier; Arbelaitz, Olatz; Gurrutxaga, Ibai; Martín, José I.

doi:10.1007/11677437_4

Jesús M. Pérez²⁰,
Javier Muguerza²⁰,
Olatz Arbelaitz²⁰,
Ibai Gurrutxaga²⁰ &
…
José I. Martín²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3755))

3408 Accesses
2 Citations

Abstract

When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary. We have developed a methodology for building classification trees from multiple samples where the final classifier is a single decision tree (Consolidated Trees). The paper presents an analysis of the structural stability of our algorithm versus C4.5 algorithm. The classification trees generated with our algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using resampling techniques. The main focus on this paper is showing how Consolidated Trees built with different sets of subsamples tend to converge to the same tree when the number of used subsamples is increased.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Effects of Resampling in Determining the Number of Clusters in a Data Set

Article 16 July 2019

Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

Measuring the Stability of Feature Selection with Applications to Ensemble Methods

References

Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning 36, 105–139 (1999)
Article Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn.MLRepository.html
Breiman, L.: Bagging Predictors. Machine Learning 24, 123–140 (1996)
MATH MathSciNet Google Scholar
Chan, P.K., Stolfo, S.J.: Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 164–168 (1998)
Google Scholar
Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1924 (1998)
Article Google Scholar
Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40, 139–157 (2000)
Article Google Scholar
Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proc. 14th International Conference on Machine Learning Nashville, TN, pp. 98–106 (1997)
Google Scholar
Drummond, C., Holte, R.C.: Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. In: Proceedings of the 17th International Conference on Machine Learning, pp. 239–246 (2000)
Google Scholar
Elkan, C.: The Foundations of Cost-Sensitive Learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 148–156 (1996)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001) ISBN: 0-387-95284-5
MATH Google Scholar
Japkowicz, N.: Learning from Imbalanced Data Sets: A Comparison of Various Strategies. In: Proceedings of the AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park, CA (2000)
Google Scholar
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I.: A new algorithm to build consolidated trees: study of the error rate and steadiness. In: Proceedings of the conference on Intelligent Information Systems, Zakopane, Poland (2004)
Google Scholar
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Behaviour of Consolidated Trees when using Resampling Techniques. In: Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems, PRIS, Porto, Portugal (2004)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. (eds.), San Mateo (1993)
Google Scholar
Skurichina, M., Kuncheva, L.I., Duin, R.P.W.: Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 62–71. Springer, Heidelberg (2002)
Chapter Google Scholar
Turney, P.: Bias and the quantification of stability. Machine Learning 20, 23–33 (1995)
Google Scholar
Weiss, G.M., Provost, F.: Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
MATH Google Scholar
Windeatt, T., Ardeshir, G.: Boosted Tree Ensembles for Solving Multiclass Problems. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 42–51. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Architecture and Technology, University of the Basque Country, M. Lardizabal, 1, 20018, Donostia, Spain
Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga & José I. Martín

Authors

Jesús M. Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Muguerza
View author publications
You can also search for this author in PubMed Google Scholar
Olatz Arbelaitz
View author publications
You can also search for this author in PubMed Google Scholar
Ibai Gurrutxaga
View author publications
You can also search for this author in PubMed Google Scholar
José I. Martín
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Australian Taxation Office,
Graham J. Williams
School of Computing and Mathematics, University of Western Sydney, Sydney, NSW, Australia
Simeon J. Simoff

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I. (2006). Consolidated Trees: An Analysis of Structural Convergence. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_4

Download citation

DOI: https://doi.org/10.1007/11677437_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32547-5
Online ISBN: 978-3-540-32548-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics