Abstract
The traditional approach to regression trees involves partitioning the space of predictor variables into subsets that optimise a function of the response variable(s), and then predicting future response values by a single-valued summary statistic in each subset. Our belief is that a prediction interval is of greater practical use than a predictive value, and that the criterion for the partitioning should be based on such intervals rather than on single values. We define four potential criteria in the case of a single response variable, discuss computational aspects of producing the partition, evaluate the criteria on both real and simulated data, and draw some tentative conclusions about their relative efficacies. The methodology is extended to the case of multiple response variables, and its viability is demonstrated by application to some further real data. The possibility of fitting distributions to within-subsets data is discussed, and some potential extensions are briefly outlined.
Similar content being viewed by others
References
Breiman L (2001). Random forests. Mach Learn 45: 5–32
Breiman L, Friedman JH, Olshen RA and Stone CJ (1984). Classification and regression trees. Wadsworth, Belmont
Bremner AP and Ross H (2002). Modified classification and regression tree splitting criteria for data with interactions. Aust N Z J Stat 44: 169–176
Cariou V (2006). Extension of multivariate regression trees to interval data. Application to electricity load profiling. Computat Stat 21: 325–341
Johnson NL and Kotz S (1970). Distributions in statistics: continuous univariate distributions 1. Wiley, New York
Kendall MG (1961). A course in the geometry of n dimensions. Charles Griffin& Company, London
Loh W-Y and Vanichesetakul N (1988). Tree structured classification via generalized discriminant analysis. J Am Stat Assoc 83: 715–725
Morimoto Y, Ishii H and Morishita S (2001). Efficient construction of regression trees with range and region splitting. Mach Learn 45: 235–259
Osei-Bryson K-M (2006). Splitting methods for decision tree induction: an exploration of the relative performance of two entropy-based families. Inform Syst Front 8: 195–209
Segal MR (1992). Tree-structured methods for longitudinal data. J Am Stat Assoc 87: 407–418
Shi Y-S (1999). Families of splitting criteria for classification trees. Stat Comput 9: 309–315
Taylor PC and Silverman BW (1993). Block diagrams and splitting criteria for classification trees. Stat Comput 3: 147–161
Zhang HP (1998). Classification trees for multiple binary responses. J Am Stat Assoc 93: 180–193
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Krzanowski, W.J., Hand, D.J. A recursive partitioning tool for interval prediction. ADAC 1, 241–254 (2007). https://doi.org/10.1007/s11634-007-0015-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-007-0015-y
Keywords
- Binary splitting
- CART
- Clustering
- Criterion optimisation
- Multivariate response variable
- Prediction intervals
- Regression trees