Skip to main content
Log in

Tree-based methods for online multi-target regression

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Methods that address the task of multi-target regression on data streams are relatively weakly represented in the current literature. We present several different approaches to learning trees and ensembles of trees for multi-target regression based on the Hoeffding bound. First, we introduce a local method, which learns multiple single-target trees to produce multiple predictions, which are then aggregated into a multi-target prediction. We follow with a tree-based method (iSOUP-Tree) which learns trees that predict all of the targets at once. We then introduce iSOUP-OptionTree, which extends iSOUP-Tree through the use of option nodes. We continue with ensemble methods, and describe the use of iSOUP-Tree as a base learner in the online bagging and online random forest ensemble approaches. We describe an evaluation scenario, and present and discuss the results of the described methods, most notably in terms of predictive performance and the use of computational resources. Finally, we present two case studies where we evaluate the introduced methods in terms of their efficiency and viability of application to real world domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://www.eunite.org/eunite/news/Summary%20Competition.pdf

  2. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset

  3. http://mulan.sourceforge.net/datasets-mtr.html

  4. https://kelvins.esa.int/mars-express-power-challenge/

  5. https://kelvins.esa.int/mars-express-power-challenge/data/

References

  • Appice, A., & Džeroski, S (2007). Stepwise induction of multi-target model trees. In 18th European conference on machine learning (pp. 502–509).

  • Bifet, A., & Gavaldà, R (2009). Adaptive learning from evolving data streams. In 8th international symposium on advances in intelligent data analysis (pp. 249–260).

  • Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010). MOA: Massive online analysis. Journal of Machine Learning Research, 11, 1601–1604.

    Google Scholar 

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    MathSciNet  MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. New York: Chapman & Hall.

    MATH  Google Scholar 

  • Buntine, W. (1992). Learning classification trees. Statistics and Computing, 2 (2), 63–73.

    Article  Google Scholar 

  • Ceci, M., Corizzo, R., Fumarola, F., Malerba, D., & Rashkovska, A. (2016). Predictive modeling of PV energy production: How to set up the learning task for a better prediction? IEEE Transactions on Industrial Informatics, PP(99), 1–1. doi:10.1109/TII.2016.2604758.

    Google Scholar 

  • Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning research, 7, 1–30.

    MathSciNet  MATH  Google Scholar 

  • Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In 6th ACM SIGKDD (pp. 71–80).

  • Duarte, J., & Gama, J. (2014). Ensembles of adaptive model rules from high-speed data streams. In 3rd international workshop on big data, streams and heterogeneous source mining (pp. 198–213).

  • Duarte, J., Gama, J., & Bifet, A. (2016). Adaptive model rules from high-speed data streams. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(3), 30.

    Article  Google Scholar 

  • Fanaee-T, H., & Gama, J. (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2(2), 113–127.

    Google Scholar 

  • Gama, J. (2010). Knowledge discovery from data streams. CRC Press.

  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.

    Article  MathSciNet  MATH  Google Scholar 

  • Ikonomovska, E., Gama, J., & Džeroski, S (2011a). Incremental multi-target model trees for data streams. In 2011 ACM symposium on applied computing (pp. 988–993).

  • Ikonomovska, E., Gama, J., & Džeroski, S. (2011b). Learning model trees from evolving data streams. Data Mining and Knowledge Discovery, 23(1), 128–168.

  • Ikonomovska, E., & Gama, J. (2015). Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing, 150, 458–470.

    Article  Google Scholar 

  • Kocev, D., Vens, C., & Struyf, J. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.

    Article  Google Scholar 

  • Kohavi, R., & Kunz, C. (1997). Option decision trees with majority votes. In 14th international conference on machine learning, ICML ’97 (pp. 161–169).

  • Osojnik, A., Panov, P., & Džeroski, S. (2016a). Comparison of tree-based methods for multi-target regression on data streams, pp 17–31.

  • Osojnik, A., Panov, P., & Džeroski, S. (2016b). Multi-label classification via multi-target regression on data streams. Machine Learning. doi:10.1007/s10994-016-5613-5.

  • Oza, N.C., & Russel, S.J. (2001). Experimental comparisons of online and batch versions of bagging and boosting. In 7th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 359–364).

  • Rutkowski, L., Pietruczuk, L., Duda, P., & Jaworski, M. (2013). Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Transactions in Knowledge and Data Engineering, 25(6), 1272–1279.

    Article  Google Scholar 

  • Shaker, A., & Hüllermeier, E. (2012). IBLStreams: a system for instance-based classification and regression on data streams. Evolving Systems, 3(4), 235–249.

    Article  Google Scholar 

  • Silla, C.N., & Freitas, A.A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovvery, 22(1-2), 31–72.

    Article  MathSciNet  MATH  Google Scholar 

  • Stojanova, D. (2009). Estimating forest properties from remotely sensed data by using machine learning. Master’s thesis, Ljubljana: Jožef Stefan International Postgraduate School.

    Google Scholar 

  • Stojanova, D., Panov, P., Gjorgjioski, V., & Kobler, A. (2010). Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecological Informatics, 5(4), 256–266.

    Article  Google Scholar 

  • Struyf, J., & Dzeroski, S. (2005). Constraint based induction of multi-objective regression trees. In 4th international workshop on knowledge discovery in inductive databases (pp. 222–233).

  • Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 2007, 1–13.

    Article  Google Scholar 

  • Xioufis, E.S., Groves, W., Tsoumakas, G., & Vlahavas, I.P. (2012). Multi-label classification methods for multi-target regression. arXiv:1211.6581.

Download references

Acknowledgments

The authors are supported by The Slovenian Research Agency (Grant P2-0103 and a young researcher grant) and the European Commission (Grants ICT-2013-612944 MAESTRA and 720270 HBP SGA1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aljaž Osojnik.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Osojnik, A., Panov, P. & Džeroski, S. Tree-based methods for online multi-target regression. J Intell Inf Syst 50, 315–339 (2018). https://doi.org/10.1007/s10844-017-0462-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0462-7

Keywords