Skip to main content

An Algorithm for Reducing the Number of Distinct Branching Conditions in a Decision Forest

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Abstract

Given a decision forest, we study a problem of reducing the number of its distinct branching conditions without changing each tree’s structure while keeping classification performance. A decision forest with a smaller number of distinct branching conditions can not only have a smaller description length but also be implemented by hardware more efficiently. To force the modified decision forest to keep classification performance, we consider a condition that the decision paths at each branching node do not change for \(100\sigma \)% of the given feature vectors passing through the node for a given \(0\le \sigma <1\). Under this condition, we propose an algorithm that minimizes the number of distinct branching conditions by sharing the same condition among multiple branching nodes. According to our experimental results using 13 datasets in UCI machine learning repository, our algorithm succeeded more than 90% reduction on the number of distinct branching conditions for random forests learned from 3 datasets without degrading classification performance. 90% condition reduction was also observed for 7 other datasets within 0.17 degradation of prediction accuracy from the original prediction accuracy at least 0.673.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html.

References

  1. Amato, F., Barbareschi, M., Casola, V., Mazzeo, A.: An FPGA-based smart classifier for decision support systems. In: Zavoral, F., Jung, J., Badica, C. (eds.) Intelligent Distributed Computing VII. SCI, vol. 511, pp. 289–299. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-01571-2_34

  2. Andrzejak, R., Lehnertz, K., Rieke, C., Mormann, F., David, P., Elger, C.E.: Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys. Rev. E 64, 061907 (2001)

    Article  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  4. Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013). Cancer Genome Atlas Research Network

    Article  Google Scholar 

  5. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47(4), 547–553 (2009)

    Article  Google Scholar 

  6. Dua, D., Taniskidou, E.K.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  7. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)

    Article  MathSciNet  Google Scholar 

  8. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006). https://doi.org/10.1007/s10994-006-6226-1

    Article  MATH  Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  10. Kulkarni, V.Y., Sinha, P.K.: Pruning of random forest classifiers: a survey and future directions. In: 2012 International Conference on Data Science Engineering (ICDSE), pp. 64–68 (2012)

    Google Scholar 

  11. Little, M.A., McSharry, P.E., Roberts, S.J., Costello, D.A., Moroz, I.M.: Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. BioMed. Eng. OnLine 6, 23 (2007). https://doi.org/10.1186/1475-925X-6-23

    Article  Google Scholar 

  12. Van Essen, B., Macaraeg, C., Gokhale, M., Prenger, R.: Accelerating a random forest classifier: multi-core, GP-GPU, or FPGA? In: Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, pp. 232–239 (2012)

    Google Scholar 

  13. Yeh, I.C., Yang, K.J., Ting, T.M.: Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst. Appl. 36(3), 5866–5871 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

We thank Assoc. Prof. Ichigaku Takigawa and Assoc. Prof. Shinya Takamaeda-Yamazaki of Hokkaido University for helpful comments to improve this research. We also thank Prof. Hiroki Arimura of Hokkaido University and Prof. Masato Motomura of Tokyo Institute of Technology for their support and encouragement. This work was supported by JST CREST Grant Number JPMJCR18K3, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atsuyoshi Nakamura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nakamura, A., Sakurada, K. (2020). An Algorithm for Reducing the Number of Distinct Branching Conditions in a Decision Forest. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46150-8_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46149-2

  • Online ISBN: 978-3-030-46150-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics