Abstract
Methods that address the task of multi-target regression on data streams are relatively weakly represented in the current literature. We present several different approaches to learning trees and ensembles of trees for multi-target regression based on the Hoeffding bound. First, we introduce a local method, which learns multiple single-target trees to produce multiple predictions, which are then aggregated into a multi-target prediction. We follow with a tree-based method (iSOUP-Tree) which learns trees that predict all of the targets at once. We then introduce iSOUP-OptionTree, which extends iSOUP-Tree through the use of option nodes. We continue with ensemble methods, and describe the use of iSOUP-Tree as a base learner in the online bagging and online random forest ensemble approaches. We describe an evaluation scenario, and present and discuss the results of the described methods, most notably in terms of predictive performance and the use of computational resources. Finally, we present two case studies where we evaluate the introduced methods in terms of their efficiency and viability of application to real world domains.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Appice, A., & Džeroski, S (2007). Stepwise induction of multi-target model trees. In 18th European conference on machine learning (pp. 502–509).
Bifet, A., & Gavaldà, R (2009). Adaptive learning from evolving data streams. In 8th international symposium on advances in intelligent data analysis (pp. 249–260).
Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010). MOA: Massive online analysis. Journal of Machine Learning Research, 11, 1601–1604.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. New York: Chapman & Hall.
Buntine, W. (1992). Learning classification trees. Statistics and Computing, 2 (2), 63–73.
Ceci, M., Corizzo, R., Fumarola, F., Malerba, D., & Rashkovska, A. (2016). Predictive modeling of PV energy production: How to set up the learning task for a better prediction? IEEE Transactions on Industrial Informatics, PP(99), 1–1. doi:10.1109/TII.2016.2604758.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning research, 7, 1–30.
Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In 6th ACM SIGKDD (pp. 71–80).
Duarte, J., & Gama, J. (2014). Ensembles of adaptive model rules from high-speed data streams. In 3rd international workshop on big data, streams and heterogeneous source mining (pp. 198–213).
Duarte, J., Gama, J., & Bifet, A. (2016). Adaptive model rules from high-speed data streams. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(3), 30.
Fanaee-T, H., & Gama, J. (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2(2), 113–127.
Gama, J. (2010). Knowledge discovery from data streams. CRC Press.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.
Ikonomovska, E., Gama, J., & Džeroski, S (2011a). Incremental multi-target model trees for data streams. In 2011 ACM symposium on applied computing (pp. 988–993).
Ikonomovska, E., Gama, J., & Džeroski, S. (2011b). Learning model trees from evolving data streams. Data Mining and Knowledge Discovery, 23(1), 128–168.
Ikonomovska, E., & Gama, J. (2015). Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing, 150, 458–470.
Kocev, D., Vens, C., & Struyf, J. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.
Kohavi, R., & Kunz, C. (1997). Option decision trees with majority votes. In 14th international conference on machine learning, ICML ’97 (pp. 161–169).
Osojnik, A., Panov, P., & Džeroski, S. (2016a). Comparison of tree-based methods for multi-target regression on data streams, pp 17–31.
Osojnik, A., Panov, P., & Džeroski, S. (2016b). Multi-label classification via multi-target regression on data streams. Machine Learning. doi:10.1007/s10994-016-5613-5.
Oza, N.C., & Russel, S.J. (2001). Experimental comparisons of online and batch versions of bagging and boosting. In 7th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 359–364).
Rutkowski, L., Pietruczuk, L., Duda, P., & Jaworski, M. (2013). Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Transactions in Knowledge and Data Engineering, 25(6), 1272–1279.
Shaker, A., & Hüllermeier, E. (2012). IBLStreams: a system for instance-based classification and regression on data streams. Evolving Systems, 3(4), 235–249.
Silla, C.N., & Freitas, A.A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovvery, 22(1-2), 31–72.
Stojanova, D. (2009). Estimating forest properties from remotely sensed data by using machine learning. Master’s thesis, Ljubljana: Jožef Stefan International Postgraduate School.
Stojanova, D., Panov, P., Gjorgjioski, V., & Kobler, A. (2010). Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecological Informatics, 5(4), 256–266.
Struyf, J., & Dzeroski, S. (2005). Constraint based induction of multi-objective regression trees. In 4th international workshop on knowledge discovery in inductive databases (pp. 222–233).
Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 2007, 1–13.
Xioufis, E.S., Groves, W., Tsoumakas, G., & Vlahavas, I.P. (2012). Multi-label classification methods for multi-target regression. arXiv:1211.6581.
Acknowledgments
The authors are supported by The Slovenian Research Agency (Grant P2-0103 and a young researcher grant) and the European Commission (Grants ICT-2013-612944 MAESTRA and 720270 HBP SGA1).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Osojnik, A., Panov, P. & Džeroski, S. Tree-based methods for online multi-target regression. J Intell Inf Syst 50, 315–339 (2018). https://doi.org/10.1007/s10844-017-0462-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-017-0462-7