ABSTRACT
As in batch learning, one may identify a class of streaming real-world problems which require the modeling of several targets simultaneously. Due to the dependencies among the targets, simultaneous modeling can be more successful and informative than creating independent models for each target. As a result one may obtain a smaller model able to simultaneously explain the relations between the input attributes and the targets. This problem has not been addressed previously in the streaming setting. We propose an algorithm for inducing multi-target model trees with low computational complexity, based on the principles of predictive clustering trees and probability bounds for supporting splitting decisions. Linear models are computed for each target separately, by incremental training of perceptrons in the leaves of the tree. Experiments are performed on synthetic and real-world datasets. The multi-target regression tree algorithm produces equally accurate and smaller models for simultaneous prediction of all the target attributes, as compared to a set of independent regression trees built separately for each target attribute. When the regression surface is smooth, the linear models computed in the leaves significantly improve the accuracy for all of the targets.
- A. Appice and S. Džeroski. Stepwise induction of multi-target model trees. In Proc 18th European Conf on Machine Learning, volume 4701 of LNCS, pages 502--509. Springer, Berlin, 2007. Google ScholarDigital Library
- H. Blockeel, L. D. Raedt, and J. Ramon. Top-down induction of clustering trees. In Proc 15th Intl Conf on Machine Learning, pages 55--63. Morgan Kaufmann, San Mateo, CA, 1998. Google ScholarDigital Library
- L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.Google Scholar
- P. Domingos and G. Hulten. Mining high-speed data streams. In Proc 6th ACM SIGKDD Intl Conf on Knowledge Discovery and Data Mining, pages 71--80. ACM Press, New York, 2000. Google ScholarDigital Library
- J. H. Friedman. Multivariate adaptive regression splines. Annals of Statistics, 19(1): 1--61, 1991.Google ScholarCross Ref
- P. Geurts, L. Wehenkel, and F. d'Alché Buc. Kernelizing the output of tree-based methods. In Proc 23rd Intl Conf on Machine learning, pages 345--352. ACM Press, New York, 2006. Google ScholarDigital Library
- V. Gjorgjioski, S. Džeroski, and M. White. Clustering analysis of vegetation data. Technical Report 10065, Jožef Stefan Institute, Ljubljana, 2003.Google Scholar
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301): 13--30, 1963.Google ScholarCross Ref
- E. Ikonomovska and J. Gama. Learning model trees from data streams. In Proc 11th Intl Conf on Discovery Science, volume 5255 of LNAI, pages 52--63. Springer, Berlin, 2008. Google ScholarDigital Library
- E. Ikonomovska, J. Gama, and S. Džeroski. Learning model trees from evolving data streams. Data Mining and Knowledge Discovery, pages 1--41, 2010. Google ScholarDigital Library
- D. Potts and C. Sammut. Incremental learning of linear model trees. Machine Learning, 61(1--3): 5--48, 2005. Google ScholarDigital Library
- J. R. Quinlan. Learning with continuous classes. In Proc 5th Australian Joint Conf on Artificial Intelligence, pages 343--348. World Scientific, Singapore, 1992.Google Scholar
- M. R. Segal. Tree-structured methods for longitudinal data. Journal of the American Statistical Association, 87(418): 407--418, 1992.Google ScholarCross Ref
- D. Stojanova, P. Panov, V. Gjorgjioski, A. Kobler, and S. Džeroski. Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecological Informatics, 5(4): 256--266, 2010.Google ScholarCross Ref
- J. Struyf and S. Džeroski. Constraint based induction of multi-objective regression trees. In Proc 4th Intl Wshp on Knowledge Discovery in Inductive Databases, volume 3933 of LNCS, pages 222--233. Springer, Berlin, 2006. Google ScholarDigital Library
Index Terms
- Incremental multi-target model trees for data streams
Recommendations
An Incremental Fuzzy Decision Tree Classification Method for Mining Data Streams
MLDM '07: Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern RecognitionOne of most important algorithms for mining data streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with ...
Ambiguous decision trees for mining concept-drifting data streams
In real world situations, explanations for the same observations may be different depending on perceptions or contexts. They may change with time especially when concept drift occurs. This phenomenon incurs ambiguities. It is useful if an algorithm can ...
Incremental Learning of Linear Model Trees
A linear model tree is a decision tree with a linear functional model in each leaf. Previous model tree induction algorithms have been batch techniques that operate on the entire training set. However there are many situations when an incremental ...
Comments