Summary
Several data can be presented as interval curves where intervals reflect a within variability. In particular, this representation is well adapted for load profiles, which depict the electricity consumption of a class of customers. Electricity load profiling consists in assigning a daily load curve to a customer based on their characteristics such as energy requirement. Within the load profiling scope, this paper investigates the extension of multivariate regression trees to the case of interval dependent (or response) variables. The tree method aims at setting up simultaneously load profiles and their assignment rules based on independent variables. The extension of multivariate regression trees to interval responses is detailed and a global approach is defined. It consists in a first stage of a dimension reduction of the interval response variables. Thereafter, the extension of the tree method is applied to the first principal interval components. Outputs are the classes of the interval curves where each class is characterized both by an interval load profile (e.g. the class prototype) and an assignment rule based on the independent variables.
Similar content being viewed by others
References
Bailey, J. (2000), Load Profiling for Retail Choice: Examining a Complex and Crucial Component of Settlement. The Electricity Journal, 13, 10, 69–74.
Billard, L. and Diday, E. (2003), From the statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis. Journal of the American Statistical Association, 98, 462, 470–487.
Bock, H.-H. and Diday, E. (eds.) (2000), Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data. Studies in classification, data analysis and knowledge organization., Springer Verlag, Heidelberg.
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984), Classification And Regression Trees. Belmont, CA: Wadsworth.
Cazes, P., Chouakria, A., Diday, E. and Schektman, Y. (1997), Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée XIV, 3, 5–24.
Chavent, M. and Lechevallier, Y. (2002), Dynamical clustering of interval data: optimization of an adequacy criterion based on Hausdorff distance. In K. Jajuga, A. Sokolowski and H.-H. Bock eds, Classification, Clustering and Data Analysis: Proceedings of the 8th Conference of the International Federation of Classification Societies, IFCS-2002, Springer Verlag, Berlin, 53–60.
Chouakria, A. (1998), Extension des méthodes d’analyse factorielle à des données de type intervalle. PhD Thesis, University of ParisIX-Dauphine, France.
De’ath, G. (2002), Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology, 83, 1105–1117.
De Carvalho, F., De Souza, R., Chavent, M., Lechevallier, Y., (2005), Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognition Letters, In Press.
De Souza, R. and De Carvalho, F. (2004), Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25, 3, 353–365.
Diday, E. (2002), An introduction to Symbolic Data Analysis and the Sodas software. Journal of Symbolic Data Analysis, 0, 0, international electronic journal.
Diday, E., Noirhomme-Fraiture, M. (eds.) (2006), Symbolic Data Analysis and the SODAS Software, Wiley, To appear.
Figueiredo, V., Rodrigues, F., Vale, Z. and Gouveia, J.B. (2005), An electric energy consumer characterization framework based on data mining techniques. IEEE Transactions on Power Systems, 20, 2, 596–602.
Larsen, D.R. and Speckman, P.L. (2004), Multivariate regression trees for analysis of abundance data. Biometrics, 60, 543–549.
Lauro, C. and Palumbo, F. (2000), Principal Component Analysis of Interval Data: A Symbolic Data Analysis Approach. Computational Statistics, 15, 1, 73–87.
Limam, M., Diday, E. and Winsberg, S. (2003), Symbolic Class Description with Interval Data. Journal of Symbolic Data Analysis, 1, 1, international electronic journal.
Morgan, J.N. and Sonquist, J.A. (1963), Problems in the Analysis of Survey Data, and a Proposal. Journal of the American Statistical Association, 58, 415–435.
Palumbo, F. and Irpino A. (2005), Multidimensional Interval-Data: Metrics and Factorial Analysis. In International Symposium on Applied Stochastic Models and Data Analysis, ASMDA 2005, Brest, 756–763.
Segal, M.R. (1992), Tree-structured methods for longitudinal data. Journal of the American Statistical Association, 87, 418, 407–418.
Stéphan, V. (2005), Courbo Tree: Application des arbres multivariés pour le Load Profiling. Revue Modulad, 33, electronic journal, 129–138.
Torgo, L. (1999), Inductive Learning of Tree-based Regression Models. PhD. Thesis, University of Porto, Portugal.
Yu, Y. and Lambert, D. (1999), Fitting trees to functional data, with an application to time of day patterns. Journal of Computational and Graphical Statistics, 8, 749–762.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Cariou, V. Extension of multivariate regression trees to interval data. Application to electricity load profiling. Computational Statistics 21, 325–341 (2006). https://doi.org/10.1007/s00180-006-0266-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-006-0266-7