Abstract
The Very Fast Decision Tree (VFDT) is one of the most important classification algorithms for real-time data stream mining. However, imperfections in data streams, such as noise and imbalanced class distribution, do exist in real world applications and they jeopardize the performance of VFDT. Traditional sampling techniques and post-pruning may be impractical for a non-stopping data stream. To deal with the adverse effects of imperfect data streams, we have invented an incremental optimization model that can be integrated into the decision tree model for data stream classification. It is called the Incrementally Optimized Very Fast Decision Tree (I-OVFDT) and it balances performance (in relation to prediction accuracy, tree size and learning time) and diminishes error and tree size dynamically. Furthermore, two new Functional Tree Leaf strategies are extended for I-OVFDT that result in superior performance compared to VFDT and its variant algorithms. Our new model works especially well for imperfect data streams. I-OVFDT is an anytime algorithm that can be integrated into those existing VFDT-extended algorithms based on Hoeffding bound in node splitting. The experimental results show that I-OVFDT has higher accuracy and more compact tree size than other existing data stream classification methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pedro, D., Geoff, H.: Mining high-speed data streams. In: Proc. of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)
Geoff, H., Pedro, D.: VFML-a toolkit for mining high-speed time-changing data streams (2003), http://www.cs.washington.edu/dm/vfml/
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)
Yang, H., Fong, S.: Moderated VFDT in Stream Mining Using Adaptive Tie Threshold and Incremental Pruning. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 471–483. Springer, Heidelberg (2011)
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 329–338. ACM, New York (2009)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, pp. 97–106 (2001)
Gama, J., Ricardo, R.: Accurate decision trees for mining high-speed data streams. In: Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM (2003)
Pfahringer, B., Holmes, G., Kirkby, R.: New options for Hoeffding trees. In: Proc. of the 20th Australian Joint Conference on Advances in Artificial Intelligence, Gold Coast, Australia, pp. 90–99 (2007)
Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proc. of the 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, pp. 573–577 (2005)
Chen, L., Yang, Z., Xue, L.: OcVFDT: one-class very fast decision tree for one-class classification of data streams. In: Proc. of the Third International Workshop on Knowledge Discovery from Sensor Data, pp. 79–86. ACM (2009)
Sattar, H., Ying, Y.: Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Min. Knowl. Discov., 1384–5810 19(1), 95–131 (2009)
Bradford, J., Kunz, C., Kohavi, R., Brunk, C., Brodley, C.: Pruning Decision Trees with Misclassification Costs. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 131–136. Springer, Heidelberg (1998)
Oza, N., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)
Kirkby, R.: Improving Hoeffding Trees. PhD thesis, University of Waikato, New Zealand (2008)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sums of observations. Annals of Mathematical Statistics 23, 493–507 (1952)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, H., Fong, S. (2012). Incrementally Optimized Decision Tree for Mining Imperfect Data Streams. In: Benlamri, R. (eds) Networked Digital Technologies. NDT 2012. Communications in Computer and Information Science, vol 293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30507-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-30507-8_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30506-1
Online ISBN: 978-3-642-30507-8
eBook Packages: Computer ScienceComputer Science (R0)