Abstract
In the field of attribute mining, several feature selection methods have recently appeared indicating that the use of sets of decision trees learnt from a data set can be an useful tool for selecting relevant and informative variables regarding to a main class variable. With this aim, in this study, we claim that the use of a new split criterion to build decision trees outperforms another classic split criterions for variable selection purposes. We present an experimental study on a wide and different set of databases using only one decision tree with each split criterion to select variables for the Naive Bayes classifier.
This work has been supported by the Spanish Ministry of Science and Technology under the projects TIN2005-02516 and TIN2004-06204-C03-02 and FPU scholarship programme (AP2004-4678).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abellán, J.: Uncertainty measures on probability intervals from Imprecise Dirichlet model. Int. J. of General Systems 35(5), 509–528 (2006)
Abellán, J., Moral, S.: Maximum entropy for credal sets. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 11, 587–597 (2003)
Abellán, J., Moral, S.: Building classification trees using the total uncertainty criterion. Int. J. of Intelligent Systems 18(12), 1215–1225 (2003)
Abellán, J., Moral, S.: An algorithm that computes the upper entropy for order-2 capacities. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 14(2), 141–154 (2005)
Abellán, J., Moral, S.: Upper entropy of credal sets. Applications to credal classification. Int. J. of Approximate Reasoning 39(2-3), 235–255 (2005)
Abellán, J., Klir, G.J., Moral, S.: Disaggregated total uncertainty measure for credal sets. Int. J. of General Systems 35(1), 29–44 (2006)
Bernard, J.M.: An introduction to the imprecise Dirichlet model for multinomial data. Int. J. of Approximate Reasoning 39, 123–150 (2005)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, Belmont. Wadsworth Statistics, Probability Series, Belmont (1984)
Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. John Wiley and Sons, New York (1973)
Fayyad, U.M., Irani, K.B.: Multi-valued interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)
Hall, M.A., Holmes, G.: Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE TKDE 15(3), 1–16 (2003)
Klir, G.J.: Uncertainty and Information: Foundations of Generalized Information Theory. John Wiley, Chichester (2006)
Lau, M., Schultz, M.: A Feature Selection Method for Gene Expression Data with Thousands of Features. Technical Report,CS Department, Yale University (2003)
Li, J., Liu, H., Ng, S.K., Wong, L.: Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics 19(2), 93–102 (2003)
Nadeau, C., Bengio, Y.: Inference for the Generalization Error. Machine Learning (2001)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning (1993)
Ratanamahatana, C., Gunopulos, D.: Feature selection for the naive bayesian classifier using decission trees. App. Art. Intelligence 17, 475–487 (2003)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London (1991)
Walley, P.: Inferences from multinomial data: learning about a bag of marbles. J. Roy. Statist. Soc. B 58, 3–57 (1996)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abellán, J., Masegosa, A.R. (2007). Split Criterions for Variable Selection Using Decision Trees. In: Mellouli, K. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2007. Lecture Notes in Computer Science(), vol 4724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75256-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-75256-1_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75255-4
Online ISBN: 978-3-540-75256-1
eBook Packages: Computer ScienceComputer Science (R0)