Skip to main content
Log in

An evolutionary estimation procedure for generalized semilinear regression trees

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In many applications, the presence of interactions or even mild non-linearities can affect inference and predictions. For that reason, we suggest the use of a class of models laying between statistics and machine learning and we propose a learning procedure. The models combine a linear part and a tree component that is selected via an evolutionary algorithm, and they can be adopted for any kinds of response, such as, for instance, continuous, categorical, ordinal responses, and survival times. They are inherently interpretable but more flexible than standard regression models, as they easily capture non-linear and interaction effects. The proposed genetic-like learning algorithm allows avoiding a greedy search of the tree component. In a simulation study, we show that the proposed approach has a performance comparable with other machine learning algorithms, with a substantial gain in interpretability and transparency, and we illustrate the method on a real data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Back T (1996) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press

  • Benjamini Y (2010) Simultaneous and selective inference: current successes and future challenges. Biom J 52(6):708–721

    Article  MathSciNet  MATH  Google Scholar 

  • Berk R, Brown L, Buja A, Zhang K, Zhao L (2013) Valid post-selection inference. Ann Stat 41(2):802–837

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author. Stat Sci 16(3):199–231

    Article  MATH  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press

  • Chatterjee S, Laudato M, Lynch LA (1996) Genetic algorithms and their statistical applications: an introduction. Comput Stat Data Anal 22(6):633–651

    Article  MATH  Google Scholar 

  • Chipman HA, George EI, McCulloch RE et al (2010) Bart: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298

    Article  MathSciNet  MATH  Google Scholar 

  • Claeskens G, Hjort NL et al (2008) Model selection and model averaging. Cambridge Books

  • Conversano C, Dusseldorp E (2017) Modeling threshold interaction effects through the logistic classification trunk. J Classif 34(3):399–426

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR (1975) A note on data-splitting for the evaluation of significance levels. Biometrika 62(2):441–444

    Article  MathSciNet  MATH  Google Scholar 

  • Davison AC (2003) Statistical models, vol 11. Cambridge University Press

  • Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608

  • Dusseldorp E, Meulman JJ (2004) The regression trunk approach to discover treatment covariate interaction. Psychometrika 69(3):355–374

    Article  MathSciNet  MATH  Google Scholar 

  • Dusseldorp E, Conversano C, Van Os BJ (2010) Combining an additive and tree-based regression model simultaneously: STIMA. J Comput Graph Stat 19(3):514–530

    Article  MathSciNet  Google Scholar 

  • Efron B (2020) Prediction, estimation, and attribution. J Am Stat Assoc 115(530):636–655

    Article  MathSciNet  MATH  Google Scholar 

  • Fan G, Gray JB (2005) Regression tree analysis using target. J Comput Graph Stat 14(1):206–218

    Article  MathSciNet  Google Scholar 

  • Fithian W, Sun D, Taylor J (2014) Optimal inference after model selection. arXiv preprint arXiv:1410.2597

  • Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67

    MathSciNet  MATH  Google Scholar 

  • Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

    Article  MATH  Google Scholar 

  • Gottard A, Vannucci G, Marchetti GM (2020) A note on the interpretation of tree-based regression models. Biom J 62(6):1564–1573

    Article  MathSciNet  MATH  Google Scholar 

  • Grubinger T, Zeileis A, Pfeiffer K-P (2014) evtree: evolutionary learning of globally optimal classification and regression trees in R. J Stat Softw 61(1):1–29

    Article  Google Scholar 

  • Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):93

    Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674

    Article  MathSciNet  Google Scholar 

  • Loh W-Y (2002) Regression tress with unbiased variable selection and interaction detection. Statistica Sinica 12:361–386

    MathSciNet  MATH  Google Scholar 

  • Meyer MC (2003) An evolutionary algorithm with applications to statistics. J Comput Graph Stat 12(2):265–281

    Article  MathSciNet  Google Scholar 

  • Molnar C, König G, Herbinger J, Freiesleben T, Dandl S, Scholbeck CA, Casalicchio G, Grosse-Wentrup M, Bischl B (2020) Pitfalls to avoid when interpreting machine learning models. arXiv preprint arXiv:2007.04131

  • Murthy SK, Kasif S, Salzberg S (1994) A system for induction of oblique decision trees. J Artif Intell Res 2(1):1–32

    Article  MATH  Google Scholar 

  • R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

  • Rudin C, Radin J (2019) Why are we using black box models in ai when we don’t need to? a lesson from an explainable ai competition. Harvard Data Sci Rev 1(2). https://doi.org/10.1162/99608f92.5a8a3a3d

  • Shmueli G et al (2010) To explain or to predict? Stat Sci 25(3):289–310

    Article  MathSciNet  MATH  Google Scholar 

  • Sutton RS, Barto AG, et al (1998) Reinforcement learning: an introduction. MIT Press

  • Voigt P, Von dem Bussche A (2017) The EU general data protection regulation (GDPR). In: A practical guide, 1st edn. Springer, Cham, vol 10, p 3152676

  • Wermuth N, Cox DR et al (1998) On association models defined over independence graphs. Bernoulli 4(4):477–495

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu R, Zeng D, Kosorok MR (2015) Reinforcement learning trees. J Am Stat Assoc 110(512):1770–1784

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giulia Vannucci.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vannucci, G., Gottard, A. An evolutionary estimation procedure for generalized semilinear regression trees. Comput Stat 38, 1927–1946 (2023). https://doi.org/10.1007/s00180-022-01302-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01302-8

Keywords