Abstract
The consolidation process, originally applied to the C4.5 tree induction algorithm, improved its discriminating capacity and stability. Consolidation creates multiple samples and builds a simple (non-multiple) classifier by applying the ensemble process during the model construction times. A benefit of consolidation is that the understandability of the base classifier is kept. The work presented aims to show the consolidation process can improve algorithms other than C4.5 by applying the consolidation process to another algorithm, CHAID*. The consolidation of CHAID*, CTCHAID, required solving the handicap of consolidating the value groupings proposed by each CHAID* tree for discrete attributes. The experimentation is divided in three classification contexts for a total of 96 datasets. Results show that consolidated algorithms perform robustly, ranking competitively in all contexts, never falling into lower positions unlike most of the other 23 rule inducting algorithms considered in the study. When performing a global comparison consolidated algorithms rank first.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abbasian, H., Drummond, C., Japkowicz, N., Matwin, S.: Inner ensembles: using ensemble methods inside the learning algorithm. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 33–48. Springer, Heidelberg (2013)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16(1), 321–357 (2002)
Fernández, A., Garcia, S., Luengo, J., Bernadó-Mansilla, E., Herrera, F.: Genetics-based machine learning for rule induction: State of the art, taxonomy, and comparative study. IEEE Transactions on Evolutionary Computation 14(6), 913–941 (2010)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180(10), 2044–2064 (2010)
Ibarguren, I., Lasarguren, A., Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I.: BFPART: Best-first PART. Submitted to Information Sciences
Ibarguren, I., Pérez, J.M., Muguerza, J., Gurrutxaga, I., Arbelaitz, O.: Coverage-based resampling: Building robust consolidated decision trees. Knowledge-Based Systems 79, 51–67 (2015)
Kass, G.V.: Significance testing in automatic interaction detection (a.i.d.). Journal of the Royal Statistical Society. Series C (Applied Statistics) 24(2), 178–189 (1975)
Morgan, J.A., Sonquist, J.N.: Problems in the analysis of survey data, and a proposal. J. Amer. Statistics Ass. 58, 415–434 (1963)
Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Combining multiple class distribution modified subsamples in a single tree. Pattern Recognition Letters 28(4), 414–422 (2007)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ibarguren, I., Pérez, J.M., Muguerza, J. (2015). CTCHAID: Extending the Application of the Consolidation Methodology. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds) Progress in Artificial Intelligence. EPIA 2015. Lecture Notes in Computer Science(), vol 9273. Springer, Cham. https://doi.org/10.1007/978-3-319-23485-4_56
Download citation
DOI: https://doi.org/10.1007/978-3-319-23485-4_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23484-7
Online ISBN: 978-3-319-23485-4
eBook Packages: Computer ScienceComputer Science (R0)