An evolutionary estimation procedure for generalized semilinear regression trees

Vannucci, Giulia; Gottard, Anna

doi:10.1007/s00180-022-01302-8

An evolutionary estimation procedure for generalized semilinear regression trees

Original paper
Published: 27 November 2022

Volume 38, pages 1927–1946, (2023)
Cite this article

Computational Statistics Aims and scope Submit manuscript

141 Accesses
Explore all metrics

Abstract

In many applications, the presence of interactions or even mild non-linearities can affect inference and predictions. For that reason, we suggest the use of a class of models laying between statistics and machine learning and we propose a learning procedure. The models combine a linear part and a tree component that is selected via an evolutionary algorithm, and they can be adopted for any kinds of response, such as, for instance, continuous, categorical, ordinal responses, and survival times. They are inherently interpretable but more flexible than standard regression models, as they easily capture non-linear and interaction effects. The proposed genetic-like learning algorithm allows avoiding a greedy search of the tree component. In a simulation study, we show that the proposed approach has a performance comparable with other machine learning algorithms, with a substantial gain in interpretability and transparency, and we illustrate the method on a real data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal survival trees

Article Open access 01 April 2022

Nonlinear Econometric Models with Machine Learning

Mixed-effect models with trees

Article Open access 08 July 2022

References

Back T (1996) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press
Benjamini Y (2010) Simultaneous and selective inference: current successes and future challenges. Biom J 52(6):708–721
Article MathSciNet MATH Google Scholar
Berk R, Brown L, Buja A, Zhang K, Zhao L (2013) Valid post-selection inference. Ann Stat 41(2):802–837
Article MathSciNet MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author. Stat Sci 16(3):199–231
Article MATH Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press
Chatterjee S, Laudato M, Lynch LA (1996) Genetic algorithms and their statistical applications: an introduction. Comput Stat Data Anal 22(6):633–651
Article MATH Google Scholar
Chipman HA, George EI, McCulloch RE et al (2010) Bart: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298
Article MathSciNet MATH Google Scholar
Claeskens G, Hjort NL et al (2008) Model selection and model averaging. Cambridge Books
Conversano C, Dusseldorp E (2017) Modeling threshold interaction effects through the logistic classification trunk. J Classif 34(3):399–426
Article MathSciNet MATH Google Scholar
Cox DR (1975) A note on data-splitting for the evaluation of significance levels. Biometrika 62(2):441–444
Article MathSciNet MATH Google Scholar
Davison AC (2003) Statistical models, vol 11. Cambridge University Press
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608
Dusseldorp E, Meulman JJ (2004) The regression trunk approach to discover treatment covariate interaction. Psychometrika 69(3):355–374
Article MathSciNet MATH Google Scholar
Dusseldorp E, Conversano C, Van Os BJ (2010) Combining an additive and tree-based regression model simultaneously: STIMA. J Comput Graph Stat 19(3):514–530
Article MathSciNet Google Scholar
Efron B (2020) Prediction, estimation, and attribution. J Am Stat Assoc 115(530):636–655
Article MathSciNet MATH Google Scholar
Fan G, Gray JB (2005) Regression tree analysis using target. J Comput Graph Stat 14(1):206–218
Article MathSciNet Google Scholar
Fithian W, Sun D, Taylor J (2014) Optimal inference after model selection. arXiv preprint arXiv:1410.2597
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67
MathSciNet MATH Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article MATH Google Scholar
Gottard A, Vannucci G, Marchetti GM (2020) A note on the interpretation of tree-based regression models. Biom J 62(6):1564–1573
Article MathSciNet MATH Google Scholar
Grubinger T, Zeileis A, Pfeiffer K-P (2014) evtree: evolutionary learning of globally optimal classification and regression trees in R. J Stat Softw 61(1):1–29
Article Google Scholar
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):93
Google Scholar
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674
Article MathSciNet Google Scholar
Loh W-Y (2002) Regression tress with unbiased variable selection and interaction detection. Statistica Sinica 12:361–386
MathSciNet MATH Google Scholar
Meyer MC (2003) An evolutionary algorithm with applications to statistics. J Comput Graph Stat 12(2):265–281
Article MathSciNet Google Scholar
Molnar C, König G, Herbinger J, Freiesleben T, Dandl S, Scholbeck CA, Casalicchio G, Grosse-Wentrup M, Bischl B (2020) Pitfalls to avoid when interpreting machine learning models. arXiv preprint arXiv:2007.04131
Murthy SK, Kasif S, Salzberg S (1994) A system for induction of oblique decision trees. J Artif Intell Res 2(1):1–32
Article MATH Google Scholar
R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Rudin C, Radin J (2019) Why are we using black box models in ai when we don’t need to? a lesson from an explainable ai competition. Harvard Data Sci Rev 1(2). https://doi.org/10.1162/99608f92.5a8a3a3d
Shmueli G et al (2010) To explain or to predict? Stat Sci 25(3):289–310
Article MathSciNet MATH Google Scholar
Sutton RS, Barto AG, et al (1998) Reinforcement learning: an introduction. MIT Press
Voigt P, Von dem Bussche A (2017) The EU general data protection regulation (GDPR). In: A practical guide, 1st edn. Springer, Cham, vol 10, p 3152676
Wermuth N, Cox DR et al (1998) On association models defined over independence graphs. Bernoulli 4(4):477–495
Article MathSciNet MATH Google Scholar
Zhu R, Zeng D, Kosorok MR (2015) Reinforcement learning trees. J Am Stat Assoc 110(512):1770–1784
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Computer Science, Applications, University of Florence, Florence, Italy
Giulia Vannucci & Anna Gottard

Authors

Giulia Vannucci
View author publications
You can also search for this author inPubMed Google Scholar
Anna Gottard
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Giulia Vannucci.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vannucci, G., Gottard, A. An evolutionary estimation procedure for generalized semilinear regression trees. Comput Stat 38, 1927–1946 (2023). https://doi.org/10.1007/s00180-022-01302-8

Download citation

Received: 23 November 2021
Accepted: 07 November 2022
Published: 27 November 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00180-022-01302-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An evolutionary estimation procedure for generalized semilinear regression trees

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimal survival trees

Nonlinear Econometric Models with Machine Learning

Mixed-effect models with trees

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now