Abstract
Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits.
In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, A Practical Tutorial for Decision Tree Induction: Evaluation Measures for Candidate Splits and Opportunities
- A. Adadi and M. Berrada. 2018. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 6 (2018), 52138--52160.Google ScholarCross Ref
- A. Albu. 2017. From logical inference to decision trees in medical diagnosis. In Proceedings of the E-Health and Bioengineering Conference (EHB’17). 65--68.Google ScholarCross Ref
- S. M. Ali and S. D. Silvey. 1966. A general class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. Series B (Methodol.) 28, 1 (1966), 131--142.Google Scholar
- J. Alvarado-Uribe, A. Gomez-Oliva, A. Y. Barrera-Animas, G. Molina, M. Gonzalez-Mendoza, M. C. Parra-Merono, and A. J. Jara. 2018. HyRA: A hybrid recommendation algorithm focused on smart POI. Ceuti as a study scenario. Sensors (Basel) 18, 3 (2018).Google Scholar
- A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, R. Chatila, and F. Herrera. 2019. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. arxiv:1910.10045 (2019).Google Scholar
- L. A. Badulescu. 2007. The choice of the best attribute selection measure in decision tree induction. Annals of University of Craiova, Math. Comp. Sci. Ser. 34, 1(2007), 88--93.Google Scholar
- L. A. Badulescu. 2016. Pruning methods and splitting criteria for optimal decision trees algorithms. Annals of University of Craiova, Series: Automation, Computers, Electronics and Mechatronics 13, 40, Article 1 (2016), 15--19.Google Scholar
- L. A. Badulescu. 2017. Data mining classification experiments with decision trees over the forest covertype database. In Proceedings of the 21st International Conference on System Theory, Control and Computing (ICSTCC’17). 236--241.Google ScholarCross Ref
- A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, R. Chatila, and F. Herrera. 2020. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58 (2020), 82--115.Google ScholarDigital Library
- R. C. Barros, A. C. De Carvalho, and A. A. Freitas. 2015. Automatic Design of Decision-tree Induction Algorithms. Springer.Google Scholar
- M. Ben-Bassat. 1978. F-entropies, probability of error, and feature selection. Inf. Contr. 39, 3 (1978), 227--242.Google ScholarCross Ref
- M. Ben-Bassat. 1982. 35 Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation. Vol. 2. Elsevier, 773--791.Google Scholar
- M. Ben-Bassat and J. Raviv. 1978. Renyi’s entropy and the probability of error. IEEE Trans. Inf. Theor. 24, 3 (1978), 324--331.Google ScholarDigital Library
- A. Benavoli, G. Corani, J. Demsar, and M. Zaffalon. 2017. Time for a change: A tutorial for comparing multiple classifiers through Bayesian analysis. J. Mach. Learn. Res. 18, 1 (2017), 2653--2688.Google ScholarDigital Library
- M. Bohanec and I. Bratko. 1994. Trading accuracy for simplicity in decision trees. Mach. Learn. 15, 3 (1994), 223--250.Google ScholarCross Ref
- L. Breiman, J. Friedman, R. Olshen, and C. Stone.1984. Classification and Regression Trees. Routledge.Google Scholar
- W. Buntine and T. Niblett. 1992. A further comparison of splitting rules for decision-tree induction. Mach. Learn. 8, 1 (1992), 75--85.Google ScholarDigital Library
- W. L. Buntine. 1990. A Theory of Learning Classification Rules. Doctoral dissertation. University of Technology, Sydney.Google Scholar
- L. Cañete Sifuentes, R. Monroy, M. A. Medina-Pérez, O. Loyola-González, and F. Vera Voronisky. 2019. Classification based on multivariate contrast patterns. IEEE Access 7 (2019), 55744--55762.Google ScholarCross Ref
- R. Carbonneau, K. Laframboise, and R. Vahidov. 2008. Application of machine learning techniques for supply chain demand forecasting. Eur. J. Oper. Res. 184, 3 (2008), 1140--1154.Google ScholarCross Ref
- J. Carrasco, S. García, M. M. Rueda, S. Das, and F. Herrera. 2020. Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: Practical guidelines and a critical review. Swarm Evolut. Comput. 54 (May 2020), 100665.Google Scholar
- L. M. Cañete Sifuentes. 2018. Mining Contrast Patterns from Multivariate Decision Trees. Master’s thesis. Instituto Tecnologico y de Estudios Superiores de Monterrey.Google Scholar
- B. Chandra, R. Kothari, and P. Paul. 2010. A new node splitting measure for decision tree construction. Pattern Recog. 43, 8 (2010), 2725--2731.Google ScholarDigital Library
- B. Chandra and V. B. Kuppili. 2011. Heterogeneous node split measure for decision tree construction. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. 872--877.Google Scholar
- L. Chang, M. M. Duarte, L. E. Sucar, and E. F. Morales. 2012. A Bayesian approach for object classification based on clusters of SIFT local features. Exp. Syst. Applic. 39, 2 (2012), 1679--1686.Google ScholarDigital Library
- W. Chao and W. Junzheng. 2018. Cloud-service decision tree classification for education platform. Cog. Syst. Res. 52 (2018), 234--239.Google ScholarCross Ref
- K. Cheng, T. Fan, Y. Jin, Y. Liu, T. Chen, and Q. Yang. 2019. SecureBoost: A Lossless Federated Learning Framework. arxiv:1901.08755 (2019).Google Scholar
- D. A. Cieslak and N. V. Chawla. 2008. Learning decision trees for unbalanced data. In Machine Learning and Knowledge Discovery in Databases, Walter Daelemans, Bart Goethals, and Katharina Morik (Eds.). Springer Berlin, 241--256.Google Scholar
- D. A. Cieslak, T. R. Hoens, N. V. Chawla, and W. P. Kegelmeyer. 2012. Hellinger distance decision trees are robust and skew-insensitive. Data Mining. Knowl. Discov. 24, 1 (2012), 136--158.Google ScholarDigital Library
- Z. Daroczy. 1970. Generalized information functions. Inf. Contr. 16, 1 (1970), 36--51.Google ScholarCross Ref
- J. Demsar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, Jan. (2006), 1--30.Google Scholar
- F. Denis, R. Gilleron, and F. Letouzey. 2005. Learning from positive and unlabeled examples. Theor. Comput. Sci. 348, 1 (2005), 70--83.Google ScholarDigital Library
- D. Deradjat and T. Minshall. 2018. Decision trees for implementing rapid manufacturing for mass customisation. CIRP J. Manuf. Sci. Technol. 23 (2018), 156--171.Google ScholarCross Ref
- P. A. Devijver. 1974. On a new class of bounds on Bayes Risk in multihypothesis pattern recognition. IEEE Trans. Comput. C-23, 1 (1974), 70--80.Google ScholarDigital Library
- T. Dietterich, M. Kearns, and Y. Mansour. 1996. Applying the weak learning framework to understand and improve C4. 5.Google Scholar
- T. Elomaa and J. Rousu. 1999. General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 3 (1999), 201--244.Google ScholarDigital Library
- U. M. Fayyad and K. B. Irani. 1992. Attribute selection problem in decision tree generation. In Proceedings of the 10th National Conference on Artificial Intelligence. 104--110.Google Scholar
- D. Fisher. 1996. Pessimistic and Optimistic Induction. Technical report CS-92-12. Department of Computer Science, Vanderbilt University, Nashville.Google Scholar
- D. Fournier and B. Crémilleux. 2002. A quality index for decision tree pruning. Knowl.-based Syst. 15, 1 (2002), 37--43.Google Scholar
- M. Galar, A. Fernández, E. Barrenechea, H. Bustince, and F. Herrera. 2016. Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets. Inf. Sci. 354 (2016), 178--196.Google ScholarDigital Library
- M. Gashler, C. Giraud-Carrier, and T. Martinez. 2008. Decision tree ensemble: Small heterogeneous is better than large homogeneous. In Proceedings of the 7th International Conference on Machine Learning and Applications. 900--905.Google Scholar
- R. Gonzalez Perea, E. Camacho Poyato, P. Montesinos, and J. A. Rodriguez Díaz. 2019. Prediction of irrigation event occurrence at farm level using optimal decision trees. Comput. Electron. Agricult. 157 (2019), 173--180.Google ScholarCross Ref
- K. Grabczewski. 2014. Meta-learning in Decision Tree Induction. Vol. 1. Springer.Google ScholarDigital Library
- M. Erdem Günay, Lemi Türker, and N. Alper Tapan. 2018. Decision tree analysis for efficient CO2 utilization in electrochemical systems. J. CO2 Utiliz. 28 (2018), 83--95.Google Scholar
- A. Hart. 1984. Experience in the use of an inductive system in knowledge engineering. In Research Development in Expert Systems. Cambridge University Press, Cambridge, UK, 121--129.Google Scholar
- A. Joshi. 1964. A note on a certain theorem stated by Kullback. IEEE Trans. Inf. Theor. 10, 1 (1964), 93--94.Google ScholarDigital Library
- B. H. Jun, C. S. Kim, H. Song, and J. Kim. 1997. A new criterion in selection and discretization of attributes for the generation of decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19, 12 (1997), 1371--1375.Google ScholarDigital Library
- G. Kalkanis. 1993. The application of confidence interval error analysis to the design of decision tree classifiers. Pattern Recog. Lett. 14, 5 (1993), 355--361.Google ScholarDigital Library
- G. V. Kass. 1980. An exploratory technique for investigating large quantities of categorical data. J. Roy. Stat. Soc. Series C (Appl. Stat.) 29, 2 (1980), 119--127.Google ScholarCross Ref
- S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas. 2006. Machine learning: A review of classification and combining techniques. Artif. Intell. Rev. 26, 3 (2006), 159--190.Google ScholarDigital Library
- J. K. Kruschke and T. M. Liddell. 2015. The Bayesian new statistics: Two historical trends converge. SSRN Electron. J. 2 (2015), 1--53.Google Scholar
- C. Kuzey, A. S. Karaman, and E. Akman. 2019. Elucidating the impact of visa regimes: A decision tree analysis. Tour. Manag. Perspect. 29 (2019), 148--156.Google ScholarCross Ref
- E. S. Laber and F. de A. Mello Pereira. 2018. Splitting criteria for classification problems with multi-valued attributes and large number of classes. Pattern Recog. Lett. 111 (2018), 58--63.Google ScholarCross Ref
- Q. Li, Z. Wen, and B. He. 2019. Practical Federated Gradient Boosting Decision Trees. arxiv:1911.04206 (2019).Google Scholar
- T. Lissack and Fu King-Sun. 1976. Error estimation in pattern recognition via L-distance between posterior density functions. IEEE Trans. Inf. Theor. 22, 1 (1976), 34--45.Google ScholarDigital Library
- W. Liu, S. Chawla, D. Cieslak, and N. Chawla. 2010. A Robust Decision Tree Algorithm for Imbalanced Data Sets. Society for Industrial and Applied Mathematics, 766--777.Google Scholar
- W. Z. Liu and A. P. White. 1994. The importance of attribute selection measures in decision tree induction. Mach. Learn. 15, 1 (1994), 25--41.Google ScholarCross Ref
- O. Loyola-González. 2019. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access 7, 1 (2019), 154096--154113.Google ScholarCross Ref
- O. Loyola-González. 2019. Understanding the criminal behavior in Mexico City through an explainable artificial intelligence model. In Advances in Soft Computing, Lourdes Martínez-Villaseñor, Ildar Batyrshin, and Antonio Marín-Hernández (Eds.). Springer International Publishing, Cham, 136--149.Google Scholar
- O. Loyola-González, A. E. Gutierrez-Rodríguez, M. A. Medina-Pérez, R. Monroy, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, and M. García-Borroto. 2020. An explainable artificial intelligence model for clustering numerical databases. IEEE Access 8 (2020), 52370--52384.Google ScholarCross Ref
- O. Loyola-González, A. López-Cuevas, M. A. Medina-Pérez, B. Camiña, J. E. Ramírez-Márquez, and R. Monroy. 2019. Fusing pattern discovery and visual analytics approaches in tweet propagation. Inf. Fusion 46 (2019), 91--101.Google ScholarDigital Library
- O. Loyola-Gonzalez, M. A. Medina-Perez, and M. Garcia-Borroto. 2015. Inducing decision trees based on a cluster quality index. IEEE Latin Amer. Trans. 13, 4 (2015), 1141--1147.Google ScholarCross Ref
- O. Loyola-González, M. A. Medina-Pérez, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, R. Monroy, and M. García-Borroto. 2017. PBC4cip: A new contrast pattern-based classifier for class imbalance problems. Knowl.-based Syst. 115 (2017), 100--109.Google Scholar
- R. López De Mántaras. 1991. A distance-based attribute selection measure for decision tree induction. Mach. Learn. 6, 1 (1991), 81--92.Google ScholarDigital Library
- R. J. Marshall. 1986. Partitioning methods for classification and decision making in medicine. Stat. Med. 5, 5 (1986), 517--526.Google ScholarCross Ref
- J. K. Martin. 1997. An exact probability metric for decision tree splitting and stopping. Mach. Learn. 28, 2 (1997), 257--291.Google ScholarDigital Library
- T. Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 267 (2019), 1--38.Google ScholarCross Ref
- J. Mingers. 1986. Expert systems-experiments with rule induction. J. Oper. Res. Soc. 37, 11 (1986), 1031--1037.Google Scholar
- J. Mingers. 1986. Inducing rules for expert systems-statistical aspects. Prof. Stat. 5, 7 (1986), 19--24.Google Scholar
- J. Mingers. 1987. Expert systems—Rule induction with statistical data. J. Oper. Res. Soc. 38, 1 (1987), 39--47.Google Scholar
- J. Mingers. 1989. An empirical comparison of selection measures for decision-tree induction. Mach. Learn. 3, 4 (1989), 319--342.Google ScholarDigital Library
- T. M. Mitchell. 1997. Mach. Learn. Vol. 45. 870--877.Google Scholar
- J. G. Moreno-Torres, J. A. Saez, and F. Herrera. 2012. Study on the impact of partition-induced dataset. IEEE Trans. Neural Netw. Learn. Syst. 23, 8 (2012), 1304--1312.Google ScholarCross Ref
- C. Nadeau and Y. Bengio. 2003. Inference for the generalization error. Mach. Learn. 52, 3 (2003), 239--281.Google ScholarDigital Library
- T. Niblett and I. Bratko. 1987. Learning decision rules in noisy domains. In Proceedings of Expert Systems’86, the 6th Annual Technical Conference on Research and Development in Expert Systems III. Cambridge University Press, New York, NY, 25--34.Google Scholar
- R. Nock and W. Henecka. 2020. Boosted and Differentially Private Ensembles of Decision Trees. arxiv:2001.09384 (2020).Google Scholar
- B. Omar, G. C. Daniel, B. Zineb, and C. J. Aida. 2018. A comparative study of machine learning algorithms for financial data prediction. In Proceedings of the International Symposium on Advanced Electrical and Communication Technologies (ISAECT’18). 1--5.Google Scholar
- A. E. Permanasari and A. Nurlayli. 2017. Decision tree to analyze the cardiotocogram data for fetal distress determination. In Proceedings of the International Conference on Sustainable Information Engineering and Technology (SIET’17). 459--463.Google Scholar
- J. R. Quinlan. 1986. Induction of decision trees. Mach. Learn. 1, 1 (1986), 81--106.Google ScholarCross Ref
- J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.Google ScholarDigital Library
- J. R. Quinlan and R. L. Rivest. 1989. Inferring decision trees using the minimum description length principle. Inf. Comput. 80, 3 (1989), 227--248.Google ScholarDigital Library
- L. Rokach. 2016. Decision forest: Twenty years of research. Inf. Fusion 27 (2016), 111--125.Google ScholarDigital Library
- L. Rokach and O. Maimon. 2014. Data Mining with Decision Trees: Theory and Applications. World Scientific Publishing Co., Inc.Google Scholar
- E. M. Rounds. 1980. A combined nonparametric approach to feature selection and binary decision tree design. Pattern Recog. 12, 5 (1980), 313--317.Google ScholarCross Ref
- C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 5 (2019), 206--215.Google ScholarCross Ref
- H. F. Ryan. 1968. The information content measure as a performance criterion for feature selection. In Proceedings of the 7th Symposium on Adaptive Processes. 23--23.Google ScholarCross Ref
- S. R. Safavian and D. Landgrebe. 1991. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man, Cyber. 21, 3 (1991), 660--674.Google ScholarCross Ref
- O. Sagi and L. Rokach. 2020. Explainable decision forest: Transforming a decision forest into an interpretable tree. Inf. Fusion 61 (2020), 124--138.Google ScholarCross Ref
- G. Santafe, I. Inza, and J. A. Lozano. 2015. Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44, 4 (2015), 467--508.Google ScholarDigital Library
- C. Su, S. Ju, Y. Liu, and Z. Yu. 2014. An empirical study of skew-insensitive splitting criteria and its application in traditional Chinese medicine. Intell. Autom. Soft Comput. 20, 4 (2014), 535--554.Google ScholarCross Ref
- G. T. Toussaint. 1972. Feature evaluation with quadratic mutual information. Inf. Proc. Lett. 1, 4 (1972), 153--156.Google ScholarCross Ref
- G. T. Toussaint. 1978. Probability of error, expected divergence, and the affinity of several distributions. IEEE Trans. Syst. Man, Cyber. 8, 6 (1978), 482--485.Google ScholarCross Ref
- L. A. Trejo, V. Ferman, F. M. Arredondo Giacinti, M. A. Medina-Pérez, R. Monroy, and J. E. Ramírez-Márquez. 2019. DNS-ADVP: A machine learning anomaly detection and visual platform to protect top-level domain name servers against DDoS attacks. IEEE Access 7 (2019), 116358--116369.Google ScholarCross Ref
- I. Triguero, S. González, J. M. Moyano, S. García, J. Alcalá-Fdez, J. Luengo, A. Fernández, M. J. Del Jesús, L. Sánchez, and F. Herrera. 2017. KEEL 3.0: An open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10, 1 (2017), 1238--1249.Google ScholarCross Ref
- A. Utku, I. A. Dogru, and M. A. Akcayol. 2018. Decision tree based Android malware detection system. In Proceedings of the 26th Signal Processing and Communications Applications Conference (SIU’18). 1--4.Google Scholar
- J. N. van Rijn, Geoffrey Holmes, B. Pfahringer, and J. Vanschoren. 2018. The online performance estimation framework: Heterogeneous ensemble learning for data streams. Mach. Learn. 107, 1 (2018), 149--176.Google ScholarDigital Library
- T. R. Vilmansen. 1972. On dependence and discrimination in pattern recognition. IEEE Trans. Comput. C-21, 9 (1972), 1029--1031.Google ScholarCross Ref
- T. R. Vilmansen. 1973. Feature evalution with measures of probabilistic dependence. IEEE Trans. Comput. C-22, 4 (1973), 381--388.Google ScholarDigital Library
- Y. Wang and S. Xia. 2017. Unifying attribute splitting criteria of decision trees by Tsallis entropy. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). 2507--2511.Google Scholar
- W. J. Wilbur and K. Sirotkin. 1992. The automatic identification of stop words. J. Inf. Sci. 18, 1 (1992), 45--55.Google ScholarDigital Library
- I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.Google ScholarDigital Library
- Y. Yang and J. O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning. 412--420.Google ScholarDigital Library
- H. Zhang, Y. Song, B. Jiang, B. Chen, and G. Shan. 2019. Two-stage bagging pruning for reducing the ensemble size and improving the classification performance. Math. Prob. Eng. 2019 (2019).Google Scholar
Index Terms
- A Practical Tutorial for Decision Tree Induction: Evaluation Measures for Candidate Splits and Opportunities
Recommendations
An Empirical Comparison of Selection Measures for Decision-Tree Induction
One approach to induction is to develop a decision tree from a set of examples. When used with noisy rather than deterministic data, the method involves three main stages – creating a complete tree able to classify all the examples, pruning this tree to ...
Crisp Decision Tree Induction Based on Fuzzy Decision Tree Algorithm
ICISE '09: Proceedings of the 2009 First IEEE International Conference on Information Science and EngineeringFuzzy decision tree is generally considered as an extension of crisp decision tree. The algorithms used in fuzzy decision tree induction are often the extended form of those used in crisp decision tree induction. In this paper, the problem is considered ...
Moving towards efficient decision tree construction
Motivated by the desire to construct compact (in terms of expected length to be traversed to reach a decision) decision trees, we propose a new node splitting measure for decision tree construction. We show that the proposed measure is convex and ...
Comments