Abstract
In this work, we address the task of learning ensembles of predictive models for predicting multiple continuous variables, i.e., multi-target regression (MTR). In contrast to standard regression, where the output is a single scalar value, in MTR the output is a data structure – a tuple/vector of continuous variables. The task of MTR is recently gaining increasing interest by the research community due to its applicability in a practically relevant domains. More specifically, we consider the Extra-Tree ensembles – the overall top performer in the DREAM4 and DREAM5 challenges for gene network reconstruction. We extend this method for the task of multi-target regression and call the extension Extra-PCTs ensembles. As base predictive models, we propose to use predictive clustering trees (PCTs) – a generalization of decision trees for predicting structured outputs, including multiple continuous variables. We consider both global and local prediction of the multiple variables, the former based on a single model that predicts all of the target variables simultaneously and the latter based on a collection of models, each predicting a single target variable. We conduct an experimental evaluation of the proposed method on a collection of 10 benchmark datasets for with multiple continuous targets and compare its performance to random forests of PCTs. The results reveal that a multi-target Extra-PCTs ensemble performs statistically significantly better than a single multi-target or single-target PCT. Next, the performance among the different ensemble learning methods is not statistically significantly different, while multi-target Extra-PCTs ensembles are the best performing method. Finally, in terms of efficiency (running times and model complexity), both multi-target variants of the ensemble methods are more efficient and produce smaller models as compared to the single-target ensembles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For more information, visit http://dreamchallenges.org/.
References
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46(3), 817–833 (2013)
Demšar, D., Džeroski, S., Larsen, T., Struyf, J., Axelsen, J., Bruns-Pedersen, M., Krogh, P.H.: Using multi-objective classification to model communities of soil. Ecol. Model. 191(1), 131–143 (2006)
Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 5(4), 256–266 (2010)
Kocev, D., Džeroski, S., White, M., Newell, G., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)
Tsoumakas, G., Spyromitros-Xioufis, E., Vrekou, A., Vlahavas, I.: Multi-target regression via random linear target combinations. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part III. LNCS, vol. 8726, pp. 225–240. Springer, Heidelberg (2014)
Bakır, G.H., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. Neural Information Processing. The MIT Press, Cambridge (2007)
Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 222–233. Springer, Heidelberg (2006)
Appice, A., Džeroski, S.: Stepwise induction of multi-target model trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 502–509. Springer, Heidelberg (2007)
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Ensembles of multi-objective decision trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 624–631. Springer, Heidelberg (2007)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 36(1), 3–42 (2006)
Maree, R., Geurts, P., Piater, J., Wehenkel, L.: Random subwindows for robust image classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 34–40 (2005)
Ruyssinck, J., Huynh-Thu, V.A., Geurts, P., Dhaene, T., Demeester, P., Saeys, Y.: NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms. PLoS ONE 9(3), 1–13 (2014)
Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5(9), 1–10 (2010)
Kocev, D.: Ensembles for Predicting Structured Outputs. Ph.D. thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2011)
Stojanova, D., Ceci, M., Malerba, D., Deroski, S.: Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinform. 14, 285 (2013)
Blockeel, H., Struyf, J.: Efficient algorithms for decision tree cross-validation. J. Mach. Learn. Res. 3, 621–650 (2002)
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Kampichler, C., Džeroski, S., Wieland, R.: Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics. Soil Biol. Biochem. 32(2), 197–209 (2000)
Karalič, A.: First order regression. Ph.D. thesis, Faculty of Computer Science, University of Ljubljana, Ljubljana, Slovenia (1995)
Stojanova, D.: Estimating forest properties from remotely sensed data by using machine learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009)
Demšar, D., Debeljak, M., Džeroski, S., Lavigne, C.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: The Annual Meeting of the Ecological Society of America, p. 152 (2005)
Asuncion, A., Newman, D.: UCI - Machine Learning Repository (2007). http://www.ics.uci.edu/mlearn/MLRepository.html
Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 13(1), 7–17 (2000)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Acknowledgments
We acknowledge the financial support of the European Commission through the grants ICT-2013-612944 MAESTRA and ICT-2013-604102 HBP.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kocev, D., Ceci, M. (2015). Ensembles of Extremely Randomized Trees for Multi-target Regression. In: Japkowicz, N., Matwin, S. (eds) Discovery Science. DS 2015. Lecture Notes in Computer Science(), vol 9356. Springer, Cham. https://doi.org/10.1007/978-3-319-24282-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-24282-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24281-1
Online ISBN: 978-3-319-24282-8
eBook Packages: Computer ScienceComputer Science (R0)