Calculating feature importance in data streams with concept drift using Online Random Forest | IEEE Conference Publication | IEEE Xplore

Calculating feature importance in data streams with concept drift using Online Random Forest


Abstract:

Large volume data streams with concept drift have garnered a great deal of attention in the machine learning community. Numerous researchers have proposed online learning...Show More

Abstract:

Large volume data streams with concept drift have garnered a great deal of attention in the machine learning community. Numerous researchers have proposed online learning algorithms that train iteratively from new observations, and provide continuously relevant predictions. Compared to previous offline, or sliding window approaches, these algorithms have shown better predictive performance, rapid detection of, and adaptation to, concept drift, and increased scalability to high volume or high velocity data. Online Random Forest (ORF) is one such approach to streaming classification problems. We adapted the feature importance metrics of Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini Impurity (MDG), both originally designed for offline Random Forest, to Online Random Forest so that they evolve with time and concept drift. Our work is novel in that previous streaming models have not provided any measures of feature importance. We experimentally tested our Online Random Forest versions of feature importance against their offline counterparts, and concluded that our approach to tracking the underlying drifting concepts in a simulated data stream is valid.
Date of Conference: 27-30 October 2014
Date Added to IEEE Xplore: 08 January 2015
Electronic ISBN:978-1-4799-5666-1
Conference Location: Washington, DC, USA

Contact IEEE to Subscribe

References

References is not available for this document.