Skip to main content

Effects of Dynamic Subspacing in Random Forest

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10604))

Included in the following conference series:

Abstract

Due to its simplicity and good performance, Random Forest attains much interest from the research community. The splitting attribute at each node of a decision tree for Random Forest is determined from a predefined number of randomly selected attributes (a subset of the entire attribute set). The size of an attribute subset (subspace) is one of the most important factors that stems multitude of influences over Random Forest. In this paper, we propose a new technique that dynamically determines the size of subspaces based on the relative size of the current data segment to the entire data set. In order to assess the effects of the proposed technique, we conduct experiments involving five widely used data set from the UCI Machine Learning Repository. The experimental results indicate the capability of the proposed technique on improving the ensemble accuracy of Random Forest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adnan, M.N., Islam, M.Z.: ComboSplit: Combining various splitting criteria for building a single decision tree. In: Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, pp. 1–8 (2014)

    Google Scholar 

  2. Adnan, M.N., Islam, M.Z.: A comprehensive method for attribute space extension for random forest. In: Proceedings of 17th International Conference on Computer and Information Technology, Dec (2014)

    Google Scholar 

  3. Adnan, M.N., Islam, M.Z.: Complement random forest. In: Proceedings of the 13th Australasian Data Mining Conference (AusDM), pp. 89–97 (2015)

    Google Scholar 

  4. Adnan, M.N., Islam, M.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 391–396 (2015)

    Google Scholar 

  5. Adnan, M.N., Islam, M.Z.: One-vs-all binarization technique in the context of random forest. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 385–390 (2015)

    Google Scholar 

  6. Adnan, M.N., Islam, M.Z.: Forest CERN: A New Decision Forest Building Technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). doi:10.1007/978-3-319-31753-3_25

    Chapter  Google Scholar 

  7. Adnan, M.N., Islam, M.Z.: Knowledge discovery from a data set on dementia through decision forest. In: Proceedings of the 14th Australasian Data Mining Conference (AusDM) (2016). Accepted

    Google Scholar 

  8. Adnan, M.N., Islam, M.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl. Based Syst. 110, 86–97 (2016)

    Article  Google Scholar 

  9. Amasyali, M.F., Ersoy, O.K.: Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2014)

    Google Scholar 

  10. Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  11. Barros, R.C., Basgalupp, M.P., de Carvalho, A.C.P.L.F., Freitas, A.A.: A survey of evolutionary algorithm for decision tree induction. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(3), 291–312 (2012)

    Article  Google Scholar 

  12. Bernard, S., Adam, S., Heutte, L.: Dynamic random forests. Pattern Recognit. Lett. 33, 1580–1586 (2012)

    Article  Google Scholar 

  13. Bernard, S., Heutte, L., Adam, S.: Forest-RK: A New Random Forest Induction Method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS, vol. 5227, pp. 430–437. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85984-0_52

    Google Scholar 

  14. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)

    MATH  Google Scholar 

  15. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    MATH  Google Scholar 

  16. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  17. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group. San Diego (1985)

    Google Scholar 

  18. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)

    Article  Google Scholar 

  19. Cutler, A., Zhao, G.: Pert: perfect random tree ensembles. Comput. Sci. Stat. 33, 204–497 (2001)

    Google Scholar 

  20. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)

    Article  MATH  Google Scholar 

  21. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers (2006)

    Google Scholar 

  22. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)

    Article  Google Scholar 

  23. Hunt, E., Marin, J., Stone, P.: Experiments in Induction. Academic Press, New York (1966)

    Google Scholar 

  24. Islam, M.Z., Giggins, H.: Knowledge discovery through sysfor - a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australlasian Data Mining Conference (2011)

    Google Scholar 

  25. Jain, A.K., Mao, J.: Artificial neural network: a tutorial. Computer 29(3), 31–44 (1996)

    Article  Google Scholar 

  26. Kataria, A., Singh, M.D.: A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 3(6), 354–360 (2013)

    Google Scholar 

  27. Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39, 261–283 (2013)

    Article  Google Scholar 

  28. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach. Learn. 51, 181–207 (2003)

    Article  MATH  Google Scholar 

  29. Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2004)

    Article  Google Scholar 

  30. Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the third IEEE International Conference on Data Mining, pp. 585–588. (2003)

    Google Scholar 

  31. Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar 2016

  32. Liu, S., Patel, R.Y., Daga, P.R., Liu, H., Fu, G., Doerksen, R.J., Chen, Y., Wilkins, D.E.: Combined rule extraction and feature elimination in supervised classification. IEEE Trans. NanoBioscience 11(3), 228–236 (2012)

    Article  Google Scholar 

  33. Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Handbook. Springer, New York (2005)

    Google Scholar 

  34. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning, pp. 211–218 (1997)

    Google Scholar 

  35. Maudes, J., Rodriguez, J.J., Osorio, C.G., Pedrajas, N.G.: Random feature weights for decision tree ensemble construction. Inf. Fusion 13, 20–30 (2012)

    Article  Google Scholar 

  36. Menze, B., Petrich, W., Hamprecht, F.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 1801–1807 (2007)

    Article  Google Scholar 

  37. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  38. Murthy, S.K.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University, Baltimore, Maryland (1997)

    Google Scholar 

  39. Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345–389 (1998)

    Article  Google Scholar 

  40. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  41. Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)

    MATH  Google Scholar 

  42. Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619–1630 (2006)

    Article  Google Scholar 

  43. Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21

    Chapter  Google Scholar 

  44. Shipp, C.A., Kuncheva, L.I.: Relationships between combination methods and measures of diversity in combining classifiers. Inf. Fusion 3, 135–148 (2002)

    Article  Google Scholar 

  45. Robnik-Šikonja, M.: Improving Random Forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30115-8_34

    Chapter  Google Scholar 

  46. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, Boston(2006)

    Google Scholar 

  47. Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65, 247–271 (2006)

    Article  Google Scholar 

  48. Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)

    MathSciNet  MATH  Google Scholar 

  49. Williams, G.J.: Combining decision trees: Initial results from the MIL algorithm. In: Proceedings of the First Australian Joint Artificial Intelligence Conference, pp. 273–289, Sydney, Australia, 2–4 Nov 1988, 1987

    Google Scholar 

  50. Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K., Li, X.: Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recognit. 46, 769–787 (2014)

    Article  Google Scholar 

  51. Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: the state of the art. Int. J. Forecast. 14, 35–62 (1998)

    Article  Google Scholar 

  52. Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)

    Article  Google Scholar 

  53. Zhang, L., Suganthan, P.N.: Random forests with ensemble of feature spaces. Pattern Recognit. 47, 3429–3437 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Nasim Adnan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Adnan, M.N., Islam, M.Z. (2017). Effects of Dynamic Subspacing in Random Forest. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69179-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69178-7

  • Online ISBN: 978-3-319-69179-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics