Abstract
This paper considers the problem of tuning the hyperparameters of a random forest (RF) algorithm, which can be formulated as a discrete black-box optimization problem. Although default settings of RF hyperparameters in software packages work well in many cases, tuning these hyperparameters can improve the predictive performance of the RF. When dealing with large data sets, the tuning of RF hyperparameters becomes a computationally expensive black-box optimization problem. A suitable approach is to use a surrogate-based method where surrogates are used to approximate the functional relationship between the hyperparameters and the overall out-of-bag (OOB) prediction error of the RF. This paper develops a surrogate-based method for discrete black-box optimization that can be used to tune RF hyperparameters. Global and local variants of the proposed method that use radial basis function (RBF) surrogates are applied to tune the RF hyperparameters for seven regression data sets that involve up to 81 predictors and up to over 21K data points. The RBF algorithms obtained better overall OOB RMSE than discrete global random search, a discrete local random search algorithm and a Bayesian optimization approach given a limited budget on the number of hyperparameter settings to consider.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Archetti, F., Candelieri, A.: Bayesian Optimization and Data Science. SO, Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24494-1
Bagheri, S., Konen, W., Emmerich, M., Bäck, T.: Self-adjusting parameter control for surrogate-assisted constrained optimization under limited budgets. Appl. Soft Comput. 61, 377–393 (2017)
Bartz-Beielstein, T., Zaefferer, M.: Model-based methods for continuous and discrete global optimization. Appl. Soft Comput. 55, 154–167 (2017)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Costa, A., Nannicini, G.: RBFOpt: an open-source library for black-box optimization with costly function evaluations. Math. Program. Comput. 10(4), 597–629 (2018). https://doi.org/10.1007/s12532-018-0144-7
De Cock, D.: Ames, Iowa: alternative to the Boston housing data as an end of semester regression project. J. Stat. Educ. 19(3) (2011) https://doi.org/10.1080/10691898.2011.11889627
Dua, D., Graff, C.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine (2019). https://archive.ics.uci.edu/ml
Fernandes, K., Vinagre, P., Cortez, P.: A proactive intelligent decision support system for predicting the popularity of online news. In: Proceedings of the 17th EPIA 2015 - Portuguese Conference on Artificial Intelligence, Coimbra, Portugal (2015)
Hamidieh, K.: A data-driven statistical model for predicting the critical temperature of a superconductor. Comput. Mater. Sci. 154, 346–354 (2018)
Ilievski, I., Akhtar, T., Feng, J., Shoemaker, C.: Efficient hyperparameter optimization for deep learning algorithms using deterministic RBF surrogates. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning with Applications in R. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13, 455–492 (1998)
Joy, T.T., Rana, S., Gupta, S., Venkatesh, S.: Hyperparameter tuning for big data using Bayesian optimisation. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2574–2579 (2016). https://doi.org/10.1109/ICPR.2016.7900023
Kuhn, M.: AmesHousing: the Ames Iowa housing data. R package version 0.0.4 (2020). https://CRAN.R-project.org/package=AmesHousing
Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6849-3
Kuhn, M., Johnson, K.: AppliedPredictiveModeling: functions and data sets for ‘applied predictive modeling’. R package version 1.1-7 (2018). https://CRAN.R-project.org/package=AppliedPredictiveModeling
Picheny, V., Ginsbourger, D., Roustant, O.: DiceOptim: Kriging-based optimization for computer experiments. R package version 2.1.1 (2021). https://CRAN.R-project.org/package=DiceOptim
Powell, M.J.D.: The theory of radial basis function approximation in 1990. In: Light, W. (ed.) Advances in Numerical Analysis, Volume 2: Wavelets, Subdivision Algorithms and Radial Basis Functions, pp. 105–210. Oxford University Press, Oxford (1992)
Probst, P., Wright, M.N., Boulesteix, A.-L.: Hyperparameters and tuning strategies for random forest. WIREs Data Min. Knowl. Discov. 9(3), e1301 (2019)
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2021). https://www.R-project.org/
Regis, R.G.: Stochastic radial basis function algorithms for large-scale optimization involving expensive black-box objective and constraint functions. Comput. Oper. Res. 38(5), 837–853 (2011)
Regis, R.G.: Large-scale discrete constrained black-box optimization using radial basis functions. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 2924–2931 (2020). https://doi.org/10.1109/SSCI47803.2020.9308581
Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J. Stat. Softw. 51(1), 1–55 (2012)
RStudio Team: RStudio: Integrated Development for R. RStudio, PBC, Boston (2020). https://www.rstudio.com/
Turner, R., et al.: Bayesian optimization is superior to random search for machine learning hyperparameter tuning: analysis of the black-box optimization challenge 2020. In: Proceedings of the NeurIPS 2020 Competition and Demonstration Track, in Proceedings of Machine Learning Research, vol. 133, pp. 3–26 (2021)
Vu, K.K., D’Ambrosio, C., Hamadi, Y., Liberti, L.: Surrogate-based methods for black-box optimization. Int. Trans. Oper. Res. 24, 393–424 (2017)
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). https://doi.org/10.1007/978-0-387-21706-2
Wright, M.N., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77(1), 1–17 (2017)
Yeh, I.C.: Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 28(12), 1797–1808 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Regis, R.G. (2023). Hyperparameter Tuning of Random Forests Using Radial Basis Function Models. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13810. Springer, Cham. https://doi.org/10.1007/978-3-031-25599-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-25599-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25598-4
Online ISBN: 978-3-031-25599-1
eBook Packages: Computer ScienceComputer Science (R0)