Effects of Dynamic Subspacing in Random Forest

Adnan, Md Nasim; Islam, Md Zahidul

doi:10.1007/978-3-319-69179-4_21

Md Nasim Adnan¹⁸ &
Md Zahidul Islam¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10604))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3076 Accesses
5 Citations

Abstract

Due to its simplicity and good performance, Random Forest attains much interest from the research community. The splitting attribute at each node of a decision tree for Random Forest is determined from a predefined number of randomly selected attributes (a subset of the entire attribute set). The size of an attribute subset (subspace) is one of the most important factors that stems multitude of influences over Random Forest. In this paper, we propose a new technique that dynamically determines the size of subspaces based on the relative size of the current data segment to the entire data set. In order to assess the effects of the proposed technique, we conduct experiments involving five widely used data set from the UCI Machine Learning Repository. The experimental results indicate the capability of the proposed technique on improving the ensemble accuracy of Random Forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adnan, M.N., Islam, M.Z.: ComboSplit: Combining various splitting criteria for building a single decision tree. In: Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, pp. 1–8 (2014)
Google Scholar
Adnan, M.N., Islam, M.Z.: A comprehensive method for attribute space extension for random forest. In: Proceedings of 17th International Conference on Computer and Information Technology, Dec (2014)
Google Scholar
Adnan, M.N., Islam, M.Z.: Complement random forest. In: Proceedings of the 13th Australasian Data Mining Conference (AusDM), pp. 89–97 (2015)
Google Scholar
Adnan, M.N., Islam, M.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 391–396 (2015)
Google Scholar
Adnan, M.N., Islam, M.Z.: One-vs-all binarization technique in the context of random forest. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 385–390 (2015)
Google Scholar
Adnan, M.N., Islam, M.Z.: Forest CERN: A New Decision Forest Building Technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). doi:10.1007/978-3-319-31753-3_25
Chapter Google Scholar
Adnan, M.N., Islam, M.Z.: Knowledge discovery from a data set on dementia through decision forest. In: Proceedings of the 14th Australasian Data Mining Conference (AusDM) (2016). Accepted
Google Scholar
Adnan, M.N., Islam, M.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl. Based Syst. 110, 86–97 (2016)
Article Google Scholar
Amasyali, M.F., Ersoy, O.K.: Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2014)
Google Scholar
Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Article MathSciNet MATH Google Scholar
Barros, R.C., Basgalupp, M.P., de Carvalho, A.C.P.L.F., Freitas, A.A.: A survey of evolutionary algorithm for decision tree induction. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(3), 291–312 (2012)
Article Google Scholar
Bernard, S., Adam, S., Heutte, L.: Dynamic random forests. Pattern Recognit. Lett. 33, 1580–1586 (2012)
Article Google Scholar
Bernard, S., Heutte, L., Adam, S.: Forest-RK: A New Random Forest Induction Method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS, vol. 5227, pp. 430–437. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85984-0_52
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)
MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group. San Diego (1985)
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)
Article Google Scholar
Cutler, A., Zhao, G.: Pert: perfect random tree ensembles. Comput. Sci. Stat. 33, 204–497 (2001)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Article MATH Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers (2006)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Article Google Scholar
Hunt, E., Marin, J., Stone, P.: Experiments in Induction. Academic Press, New York (1966)
Google Scholar
Islam, M.Z., Giggins, H.: Knowledge discovery through sysfor - a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australlasian Data Mining Conference (2011)
Google Scholar
Jain, A.K., Mao, J.: Artificial neural network: a tutorial. Computer 29(3), 31–44 (1996)
Article Google Scholar
Kataria, A., Singh, M.D.: A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 3(6), 354–360 (2013)
Google Scholar
Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39, 261–283 (2013)
Article Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach. Learn. 51, 181–207 (2003)
Article MATH Google Scholar
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2004)
Article Google Scholar
Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the third IEEE International Conference on Data Mining, pp. 585–588. (2003)
Google Scholar
Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar 2016
Liu, S., Patel, R.Y., Daga, P.R., Liu, H., Fu, G., Doerksen, R.J., Chen, Y., Wilkins, D.E.: Combined rule extraction and feature elimination in supervised classification. IEEE Trans. NanoBioscience 11(3), 228–236 (2012)
Article Google Scholar
Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Handbook. Springer, New York (2005)
Google Scholar
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning, pp. 211–218 (1997)
Google Scholar
Maudes, J., Rodriguez, J.J., Osorio, C.G., Pedrajas, N.G.: Random feature weights for decision tree ensemble construction. Inf. Fusion 13, 20–30 (2012)
Article Google Scholar
Menze, B., Petrich, W., Hamprecht, F.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 1801–1807 (2007)
Article Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Murthy, S.K.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University, Baltimore, Maryland (1997)
Google Scholar
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345–389 (1998)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
MATH Google Scholar
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619–1630 (2006)
Article Google Scholar
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Chapter Google Scholar
Shipp, C.A., Kuncheva, L.I.: Relationships between combination methods and measures of diversity in combining classifiers. Inf. Fusion 3, 135–148 (2002)
Article Google Scholar
Robnik-Šikonja, M.: Improving Random Forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30115-8_34
Chapter Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, Boston(2006)
Google Scholar
Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65, 247–271 (2006)
Article Google Scholar
Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
MathSciNet MATH Google Scholar
Williams, G.J.: Combining decision trees: Initial results from the MIL algorithm. In: Proceedings of the First Australian Joint Artificial Intelligence Conference, pp. 273–289, Sydney, Australia, 2–4 Nov 1988, 1987
Google Scholar
Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K., Li, X.: Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recognit. 46, 769–787 (2014)
Article Google Scholar
Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: the state of the art. Int. J. Forecast. 14, 35–62 (1998)
Article Google Scholar
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)
Article Google Scholar
Zhang, L., Suganthan, P.N.: Random forests with ensemble of feature spaces. Pattern Recognit. 47, 3429–3437 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW, 2795, Australia
Md Nasim Adnan & Md Zahidul Islam

Authors

Md Nasim Adnan
View author publications
You can also search for this author in PubMed Google Scholar
Md Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md Nasim Adnan .

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore, Singapore
Gao Cong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng
Macquarie University, Sydney, New South Wales, Australia
Wei Emma Zhang
Wuhan University, Wuhan, China
Chengliang Li
Nanyang Technological University, Singapore, Singapore
Aixin Sun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adnan, M.N., Islam, M.Z. (2017). Effects of Dynamic Subspacing in Random Forest. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-69179-4_21
Published: 14 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69178-7
Online ISBN: 978-3-319-69179-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics