Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Osei-Bryson, Kweku-Muata; Giles, Kendall

doi:10.1007/s10796-006-8779-8

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Published: July 2006

Volume 8, pages 195–209, (2006)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Kweku-Muata Osei-Bryson¹ &
Kendall Giles¹

184 Accesses
9 Citations
Explore all metrics

Abstract

Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the Conditional Entropy (CE) family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore the relative performance of the Conditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methods may be more appropriate than the popular Gain Ratio (GR) method for datasets which have nominal predictor attributes, and are competitive with the GR method for those datasets where all predictor attributes are numeric. Given that it is never known beforehand which splitting method will lead to the best DT for a given dataset, and given the relatively good performance of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data mining toolsets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing techniques for learning decision trees from imbalanced data

Article 02 March 2019

Decision Tree Induction Methods and Their Application to Big Data

SPAARC: A Fast Decision Tree Algorithm

References

Bradley P, Fayyad Usama M, Mangasarian OL. Mathematical programming for data mining: Formulations and challenges. INFORMS Journal on Computing 1999;11(3):217–238.
Article Google Scholar
Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. Wadsworth, California, USA, 1984.
Google Scholar
Bryson K-M. On two families of entropy-based splitting methods. Working Paper, Department of Information Systems. Virginia Commonwealth University, USA, 2000.
Google Scholar
Cheeseman P, Stutz J, Bayesian Classification (AutoClass): Theory and results. In: Gregory Piatetsky-Shapiro Usama Fayyad, Padhraic Smyth, ed, Advances in Knowledge Discovery and Data Mining, Menlo Park, AAAI Press, MIT Press, 1996; 153–180.
Google Scholar
Ching J, Wong A, Chan K. Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 1995;17(7):631–641.
Article Google Scholar
Gersten W, Wirth D, Arndt D. Predictive modeling in automative direct marketing: Tools, experiences and open issues. In: Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining (KDD-2000) Boston, MA, 2000; 398–406.
Lopez de Mantaras R. A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning 6: 1991; 81–92.
Google Scholar
Martin J. An Exact Probability Metric For Decision Tree Splitting and Stopping. Machine Learning 1997;28:257–291.
Article Google Scholar
Murphy P, Aha DW. UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science,1994.
Piatetsky-Shapiro G. The Data-Mining Industry Coming of Age. IEEE Intelligent Systems 1999;14(6):32–34.
Article Google Scholar
Quinlan J. Induction of decision trees. Machine Learning 1986;1:81–106.
Google Scholar
Quinlan J. C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.
Google Scholar
Safavian S, Landgrebe D. A Survey of Decision Tree Classifier Methodology. IEEE Transactions on Systems, Man, and Cybernetics 1991;21(3):660–674.
Article Google Scholar
Shih Y-S. Families of Splitting Criteria for Classification Trees. Statistics and Computing 9:4, 1999; 309–315.
Article Google Scholar
Taylor P, Silverman B. Block Diagrams and Splitting Criteria for Classification Trees. Statistics and Computing 3(4), 1993; 147–161.
Article Google Scholar
Wu X, Urpani D. Induction by Attribute Elimination. IEEE Transactions on Knowledge and Data Engineering 1999;11(5):805–812.
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Information Systems Research Institute, Virginia Commonwealth University, Richmond, VA, 23284, U.S.A.
Kweku-Muata Osei-Bryson & Kendall Giles

Authors

Kweku-Muata Osei-Bryson
View author publications
You can also search for this author in PubMed Google Scholar
Kendall Giles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kweku-Muata Osei-Bryson.

Additional information

Kweku-Mauta Osei-Bryson is Professor of Information Systems at Virginia Commonwealth University, where he also served as the Coordinator of the Ph.D. program in Information Systems during 2001–2003. Previously he was Professor of Information Systems and Decision Analysis in the School of Business at Howard University, Washington, DC, U.S.A. He has also worked as an Information Systems practitioner in both industry and government. He holds a Ph.D. in Applied Mathematics (Management Science & Information Systems) from the University of Maryland at College Park, a M.S. in Systems Engineering from Howard University, and a B.Sc. in Natural Sciences from the University of the West Indies at Mona. He currently does research in various areas including: Data Mining, Expert Systems, Decision Support Systems, Group Support Systems, Information Systems Outsourcing, Multi-Criteria Decision Analysis. His papers have been published in various journals including: Information & Management, Information Systems Journal, Information Systems Frontiers, Business Process Management Journal, International Journal of Intelligent Systems, IEEE Transactions on Knowledge & Data Engineering, Data & Knowledge Engineering, Information & Software Technology, Decision Support Systems, Information Processing and Management, Computers & Operations Research, European Journal of Operational Research, Journal of the Operational Research Society, Journal of the Association for Information Systems, Journal of Multi-Criteria Decision Analysis, Applications of Management Science. Currently he serves an Associate Editor of the INFORMS Journal on Computing, and is a member of the Editorial Board of the Computers & Operations Research journal.

Kendall E. Giles received the BS degree in Electrical Engineering from Virginia Tech in 1991, the MS degree in Electrical Engineering from Purdue University in 1993, the MS degree in Information Systems from Virginia Commonwealth University in 2002, and the MS degree in Computer Science from Johns Hopkins University in 2004. Currently he is a PhD student (ABD) in Computer Science at Johns Hopkins, and is a Research Assistant in the Applied Mathematics and Statistics department. He has over 15 years of work experience in industry, government, and academic institutions. His research interests can be partially summarized by the following keywords: network security, mathematical modeling, pattern classification, and high dimensional data analysis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Osei-Bryson, KM., Giles, K. Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families. Inf Syst Front 8, 195–209 (2006). https://doi.org/10.1007/s10796-006-8779-8

Download citation

Received: 04 June 2004
Revised: 18 May 2005
Accepted: 17 August 2005
Issue Date: July 2006
DOI: https://doi.org/10.1007/s10796-006-8779-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Abstract

Access this article

Similar content being viewed by others

Enhancing techniques for learning decision trees from imbalanced data

Decision Tree Induction Methods and Their Application to Big Data

SPAARC: A Fast Decision Tree Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Abstract

Access this article

Similar content being viewed by others

Enhancing techniques for learning decision trees from imbalanced data

Decision Tree Induction Methods and Their Application to Big Data

SPAARC: A Fast Decision Tree Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation