Abstract
Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)
Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction (1995)
Chikalov, I.: Average Time Complexity of Decision Trees. Intelligent Systems Reference Library, vol. 12. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22661-8
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Biau, G., Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)
Amaratunga, D., Cabrera, J., Lee, Y.-S.: Enriched random forests. Bioinformatics 24, 2010–2014 (2008). https://doi.org/10.1093/bioinformatics/btn356
Zhao, H., Williams, G.J., Huang, J.Z.: wsrf: an R package for classification with scalable weighted subspace random forests (2017). jstatsoft.org
Hssina, B., Merbouha, A., Ezzikouri, H., Erritali, M.: A comparative study of decision tree ID3 and C4.5. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(2) (2014)
Islam, M.Z.: EXPLORE: a novel decision tree classification algorithm. In: MacKinnon, L.M. (ed.) BNCOD 2010. LNCS, vol. 6121, pp. 55–71. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25704-9_7
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Los Altos (1993)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Los Altos (2006)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998). https://doi.org/10.1109/34.709601
Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31753-3_25
Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 585–588 (2003)
Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor -a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australasian Data Mining Conference (2011)
Xu, Y., Jones, G., Li, J., Wang, B., Sun, C.: A study on mutual information-based feature selection for text categorization. J. Comput. Inf. Syst. 3(3), 203–213 (2005)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Babar, Z., Islam, M.Z., Mansha, S. (2018). Rank Forest: Systematic Attribute Sub-spacing in Decision Forest. In: Boo, Y., Stirling, D., Chi, L., Liu, L., Ong, KL., Williams, G. (eds) Data Mining. AusDM 2017. Communications in Computer and Information Science, vol 845. Springer, Singapore. https://doi.org/10.1007/978-981-13-0292-3_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-0292-3_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0291-6
Online ISBN: 978-981-13-0292-3
eBook Packages: Computer ScienceComputer Science (R0)