Reference Hub57
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang, Yunming Ye
Copyright: © 2012 |Volume: 8 |Issue: 2 |Pages: 20
ISSN: 1548-3924|EISSN: 1548-3932|EISBN13: 9781466610422|DOI: 10.4018/jdwm.2012040103
Cite Article Cite Article

MLA

Xu, Baoxun, et al. "Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces." IJDWM vol.8, no.2 2012: pp.44-63. http://doi.org/10.4018/jdwm.2012040103

APA

Xu, B., Huang, J. Z., Williams, G., Wang, Q., & Ye, Y. (2012). Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces. International Journal of Data Warehousing and Mining (IJDWM), 8(2), 44-63. http://doi.org/10.4018/jdwm.2012040103

Chicago

Xu, Baoxun, et al. "Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces," International Journal of Data Warehousing and Mining (IJDWM) 8, no.2: 44-63. http://doi.org/10.4018/jdwm.2012040103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.