Extensions to Quantile Regression Forests for Very High-Dimensional Data

Tung, Nguyen Thanh; Huang, Joshua Zhexue; Khan, Imran; Li, Mark Junjie; Williams, Graham

doi:10.1007/978-3-319-06605-9_21

Nguyen Thanh Tung²³,
Joshua Zhexue Huang²⁴,
Imran Khan²³,
Mark Junjie Li²⁴ &
…
Graham Williams²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4187 Accesses
6 Citations

Abstract

This paper describes new extensions to the state-of-the-art regression random forests Quantile Regression Forests (QRF) for applications to high-dimensional data with thousands of features. We propose a new subspace sampling method that randomly samples a subset of features from two separate feature sets, one containing important features and the other one containing less important features. The two feature sets partition the input data based on the importance measures of features. The partition is generated by using feature permutation to produce raw importance feature scores first and then applying p-value assessment to separate important features from the less important ones. The new subspace sampling method enables to generate trees from bagged sample data with smaller regression errors. For point regression, we choose the prediction value of Y from the range between two quantiles Q _0.05 and Q _0.95 instead of the conditional mean used in regression random forests. Our experiment results have shown that random forests with these extensions outperformed regression random forests and quantile regression forests in reduction of root mean square residuals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
Google Scholar
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature Selection with the Boruta Package. Journal of Statistical Software 36(11) (2010)
Google Scholar
Liaw, A., Wiener, M.: randomForest 4.6-7. R package (2012), http://cran.r-project.org
Meinshausen, N.: Quantile Random Forests. Journal Machine Learning Research, 983–999 (2006)
Google Scholar
Meinshausen, N.: quantregForest 0.2-3. R package (2012), http://cran.r-project.org
Rosenwald, A., et al.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N. Engl. J. Med. 346, 1937–1947 (2002)
Article Google Scholar
Stoppiglia, H., Dreyfus, G.: Ranking a random feature for variable and feature selection. The Journal of Machine Learning Research 3, 1399–1414 (2003)
MATH Google Scholar
Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. The Journal of Machine Learning Research 10, 1341–1366 (2009)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Key Laboratory of High Performance Data Mining. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
Nguyen Thanh Tung, Imran Khan & Graham Williams
College of Computer Science and Software Engineering, Shenzhen University, China
Joshua Zhexue Huang & Mark Junjie Li

Authors

Nguyen Thanh Tung
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Zhexue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Imran Khan
View author publications
You can also search for this author in PubMed Google Scholar
Mark Junjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Graham Williams
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Tainan, Taiwan, R.O.C.
Vincent S. Tseng & Hung-Yu Kao &
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Tu Bao Ho
Nanjing University, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan, R.O.C.
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tung, N.T., Huang, J.Z., Khan, I., Li, M.J., Williams, G. (2014). Extensions to Quantile Regression Forests for Very High-Dimensional Data. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-06605-9_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics