Conferences >2014 IEEE International Confe...

Calculating feature importance in data streams with concept drift using Online Random Forest

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Large volume data streams with concept drift have garnered a great deal of attention in the machine learning community. Numerous researchers have proposed online learning...Show More

Metadata

Abstract:

Large volume data streams with concept drift have garnered a great deal of attention in the machine learning community. Numerous researchers have proposed online learning algorithms that train iteratively from new observations, and provide continuously relevant predictions. Compared to previous offline, or sliding window approaches, these algorithms have shown better predictive performance, rapid detection of, and adaptation to, concept drift, and increased scalability to high volume or high velocity data. Online Random Forest (ORF) is one such approach to streaming classification problems. We adapted the feature importance metrics of Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini Impurity (MDG), both originally designed for offline Random Forest, to Online Random Forest so that they evolve with time and concept drift. Our work is novel in that previous streaming models have not provided any measures of feature importance. We experimentally tested our Online Random Forest versions of feature importance against their offline counterparts, and concluded that our approach to tracking the underlying drifting concepts in a simulated data stream is valid.

Published in: 2014 IEEE International Conference on Big Data (Big Data)

Date of Conference: 27-30 October 2014

Date Added to IEEE Xplore: 08 January 2015

Electronic ISBN:978-1-4799-5666-1

DOI: 10.1109/BigData.2014.7004352

Conference Location: Washington, DC, USA