Abstract
Decision Forests have attracted the academic community’s interest mainly due to their simplicity and transparency. This paper proposes two novel decision forest building techniques, called Maximal Information Coefficient Forest (MICF) and Pearson’s Correlation Coefficient Forest (PCCF). The proposed new algorithms use Pearson’s Correlation Coefficient (PCC) and Maximal Information Coefficient (MIC) as extra measures of the classification capacity score of each feature. Using those approaches, we improve the picking of the most convenient feature at each splitting node, the feature with the greatest Gain Ratio. We conduct experiments on 12 datasets that are available in the publicly accessible UCI machine learning repository. Our experimental results indicate that the proposed methods have the best average ensemble accuracy rank of 1.3 (for MICF) and 3.0 (for PCCF), compared to their closest competitor, Random Forest (RF), which has an average rank of 4.3. Additionally, the results from Friedman and Bonferroni-Dunn tests indicate statistically significant improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Adnan, N.: Decision tree and decision forest algorithms: on improving accuracy, efficiency and knowledge discovery (2017)
Bernard, S., Heutte, L., Adam, S.: Forest-rk: a new random forest induction method. In: ICIC (2008)
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (2004)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2004)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees (1983)
Delgado, M.F., Cernadas, E., Barro, S., Amorim, D.G.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Drousiotis, E., Pentaliotis, P., Shi, L., Cristea, A.I.: Capturing fairness and uncertainty in student dropout prediction – a comparison study. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12749, pp. 139–144. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78270-2_25
Drousiotis, E., Shi, L., Maskell, S.: Early predictor for student success based on behavioural and demographical indicators. In: Cristea, A.I., Troussas, C. (eds.) ITS 2021. LNCS, vol. 12677, pp. 161–172. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80421-3_19
Dunn, O.J.: Multiple comparisons among means. J. Am. Stat. Assoc. 56(293), 52–64 (1961)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Friedman, M.: A comparison of alternative tests of significance for the problem of \$m\$ rankings. Ann. Math. Stat. 11, 86–92 (1940)
Guo, Z., Yu, B., Hao, M., Wang, W., Jiang, Y., Zong, F.: A novel hybrid method for flight departure delay prediction using random forest regression and maximal information coefficient (2021)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Fbietkan statistic. Commun. Stat.-Theory Methods 9, 571–595 (1980)
Liu, S., Hu, T.: Parallel random forest algorithm optimization based on maximal information coefficient. In: 2018 IEEE 9th International Conference on Software Engineering and Service Science, pp. 1083–1087 (2018)
Maudes, J., Rodríguez, J.J., García-Osorio, C., García-Pedrajas, N.: Random feature weights for decision tree ensemble construction. Inf. Fusion 13(1), 20–30 (2012)
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining Knowl. Disc. 2, 345–389 (2004)
Nasridinov, A., Ihm, S., Park, Y.H.: A decision tree-based classification model for crime prediction. In: ITCS (2013)
Podgorelec, V., Kokol, P., Stiglic, B., Rozman, I.: Decision trees: an overview and their use in medicine. J. Med. Syst. 26, 445–463 (2004)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Salzberg, S., Murthy, K.: On growing better decision trees from data (1996)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education India (2016)
Zeleznikow, J.: Using web-based legal decision support systems to improve access to justice. Inf. Commun. Technol. Law 11, 15–33 (2002)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Drousiotis, E., Shi, L., Spirakis, P.G., Maskell, S. (2022). Novel Decision Forest Building Techniques by Utilising Correlation Coefficient Methods. In: Iliadis, L., Jayne, C., Tefas, A., Pimenidis, E. (eds) Engineering Applications of Neural Networks. EANN 2022. Communications in Computer and Information Science, vol 1600. Springer, Cham. https://doi.org/10.1007/978-3-031-08223-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-08223-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08222-1
Online ISBN: 978-3-031-08223-8
eBook Packages: Computer ScienceComputer Science (R0)