Skip to main content

Tuning ForestDisc Hyperparameters: A Sensitivity Analysis

  • Conference paper
  • First Online:
Optimization and Learning (OLA 2022)

Abstract

This paper presents and analyzes ForestDisc, a discretization method based on tree ensemble and moment matching optimization. ForestDisc is a supervised and multivariate discretizer that transforms continuous attributes into categorical ones following two steps. At first, ForestDisc extracts for each continuous attribute the ensemble of split points learned while constructing a Random Forest model. It then constructs a reduced set of split points based on moment matching optimization. Previous works showed that ForestDisc enables an excellent performance compared to 22 popular discretizers. This work analyzes ForestDisc performance sensitivity to its tunning parameters and provides some guidelines for users when using the ForestDisc package.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agre, G.: On supervised and unsupervised discretization. Cybern. Inf. Technol. (2002)

    Google Scholar 

  2. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley-Interscience, Hoboken (2006). oCLC: ocm61478842

    Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  4. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey (1984). 358 p., the wadsworth statistics/probability series edn. (1884)

    Google Scholar 

  5. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016, pp. 785–794. ACM Press, San Francisco (2016). https://doi.org/10.1145/2939672.2939785

  6. Ching, J., Wong, A., Chan, K.: Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 641–651 (1995). https://doi.org/10.1109/34.391407

    Article  Google Scholar 

  7. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning Proceedings 1995, pp. 194–202. Elsevier (1995). https://doi.org/10.1016/B978-1-55860-377-6.50032-3

  8. Dua, D., Graff, C.: UCI machine learning repository (2017)

    Google Scholar 

  9. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)

    MathSciNet  MATH  Google Scholar 

  10. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997). https://doi.org/10.1023/A:1007465528199

    Article  MATH  Google Scholar 

  11. Garcia, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013). https://doi.org/10.1109/TKDE.2012.35

    Article  Google Scholar 

  12. Haddouchi, M.: ForestDisc: forest discretization. R package version 0.1.0 (2020). https://CRAN.R-project.org/package=ForestDisc

  13. Haddouchi, M., Berrado, A.: An implementation of a multivariate discretization for supervised learning using Forestdisc, pp. 1–6 (2020). https://doi.org/10.1145/3419604.3419772

  14. Haddouchi, M., Berrado, A.: Discretizing continuous attributes for machine learning using nonlinear programming. Int. J. Comput. Sci. Appl. 18(1), 26–44, 20 (2021)

    Google Scholar 

  15. Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  16. Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993). https://doi.org/10.1007/BF00941892

    Article  MathSciNet  MATH  Google Scholar 

  17. Kraft, D.: A Software Package for Sequential Quadratic Programming. Deutsche Forschungs- Und Versuchsanstalt Für Luft- Und Raumfahrt Köln: Forschungsbericht, Wiss. Berichtswesen d. DFVLR (1988)

    Google Scholar 

  18. Kraft, D., Munchen, I.: Algorithm 733: TOMP - Fortran modules for optimal control calculations. ACM Trans. Math. Soft 20, 262–281 (1994)

    Article  MATH  Google Scholar 

  19. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6, 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  20. Maissae, H., Abdelaziz, B.: A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization. Int. J. Data Sci. Anal. (2022). https://doi.org/10.1007/s41060-022-00316-1

  21. Haddouchi, M., errado, A.: A survey of methods and tools used for interpreting random forest, pp. 1–6 (2019). https://doi.org/10.1109/ICSSD47982.2019.9002770

  22. Mehta, S., Parthasarathy, S., Yang, H.: Toward unsupervised correlation preserving discretization. IEEE Trans. Knowl. Data Eng. 17(9), 1174–1185 (2005). https://doi.org/10.1109/TKDE.2005.153

    Article  Google Scholar 

  23. Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7, 308–313 (1965). https://doi.org/10.1093/comjnl/7.4.308

    Article  MathSciNet  MATH  Google Scholar 

  24. Ramırez-Gallego, S., Garcıa, S., Martınez-Rego, D., Benıtez, J.M., Herrera, F.: Data discretization: taxonomy and big data challenge, p. 26 (2016)

    Google Scholar 

  25. Samworth, R.J.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012). https://doi.org/10.1214/12-AOS1049

    Article  MathSciNet  MATH  Google Scholar 

  26. Wang, C., Wang, M., She, Z., Cao, L.: CD: a coupled discretization algorithm. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012. LNCS (LNAI), vol. 7302, pp. 407–418. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30220-6_34

    Chapter  Google Scholar 

  27. Wilcoxon, F.: Individual comparisons by ranking methods. Biometr. Bull. 1(6), 80 (1945). https://doi.org/10.2307/3001968

    Article  Google Scholar 

  28. Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 101–116. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-09823-4_6

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maissae Haddouchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Haddouchi, M., Berrado, A. (2022). Tuning ForestDisc Hyperparameters: A Sensitivity Analysis. In: Dorronsoro, B., Pavone, M., Nakib, A., Talbi, EG. (eds) Optimization and Learning. OLA 2022. Communications in Computer and Information Science, vol 1684. Springer, Cham. https://doi.org/10.1007/978-3-031-22039-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22039-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22038-8

  • Online ISBN: 978-3-031-22039-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics