Skip to main content

Incrementally Optimized Decision Tree for Mining Imperfect Data Streams

  • Conference paper
Networked Digital Technologies (NDT 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 293))

Included in the following conference series:

Abstract

The Very Fast Decision Tree (VFDT) is one of the most important classification algorithms for real-time data stream mining. However, imperfections in data streams, such as noise and imbalanced class distribution, do exist in real world applications and they jeopardize the performance of VFDT. Traditional sampling techniques and post-pruning may be impractical for a non-stopping data stream. To deal with the adverse effects of imperfect data streams, we have invented an incremental optimization model that can be integrated into the decision tree model for data stream classification. It is called the Incrementally Optimized Very Fast Decision Tree (I-OVFDT) and it balances performance (in relation to prediction accuracy, tree size and learning time) and diminishes error and tree size dynamically. Furthermore, two new Functional Tree Leaf strategies are extended for I-OVFDT that result in superior performance compared to VFDT and its variant algorithms. Our new model works especially well for imperfect data streams. I-OVFDT is an anytime algorithm that can be integrated into those existing VFDT-extended algorithms based on Hoeffding bound in node splitting. The experimental results show that I-OVFDT has higher accuracy and more compact tree size than other existing data stream classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pedro, D., Geoff, H.: Mining high-speed data streams. In: Proc. of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)

    Google Scholar 

  2. Geoff, H., Pedro, D.: VFML-a toolkit for mining high-speed time-changing data streams (2003), http://www.cs.washington.edu/dm/vfml/

  3. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)

    Google Scholar 

  4. Yang, H., Fong, S.: Moderated VFDT in Stream Mining Using Adaptive Tie Threshold and Incremental Pruning. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 471–483. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 329–338. ACM, New York (2009)

    Chapter  Google Scholar 

  6. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, pp. 97–106 (2001)

    Google Scholar 

  7. Gama, J., Ricardo, R.: Accurate decision trees for mining high-speed data streams. In: Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM (2003)

    Google Scholar 

  8. Pfahringer, B., Holmes, G., Kirkby, R.: New options for Hoeffding trees. In: Proc. of the 20th Australian Joint Conference on Advances in Artificial Intelligence, Gold Coast, Australia, pp. 90–99 (2007)

    Google Scholar 

  9. Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proc. of the 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, pp. 573–577 (2005)

    Google Scholar 

  10. Chen, L., Yang, Z., Xue, L.: OcVFDT: one-class very fast decision tree for one-class classification of data streams. In: Proc. of the Third International Workshop on Knowledge Discovery from Sensor Data, pp. 79–86. ACM (2009)

    Google Scholar 

  11. Sattar, H., Ying, Y.: Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Min. Knowl. Discov., 1384–5810 19(1), 95–131 (2009)

    Article  MathSciNet  Google Scholar 

  12. Bradford, J., Kunz, C., Kohavi, R., Brunk, C., Brodley, C.: Pruning Decision Trees with Misclassification Costs. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 131–136. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  13. Oza, N., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)

    Google Scholar 

  14. Kirkby, R.: Improving Hoeffding Trees. PhD thesis, University of Waikato, New Zealand (2008)

    Google Scholar 

  15. Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sums of observations. Annals of Mathematical Statistics 23, 493–507 (1952)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, H., Fong, S. (2012). Incrementally Optimized Decision Tree for Mining Imperfect Data Streams. In: Benlamri, R. (eds) Networked Digital Technologies. NDT 2012. Communications in Computer and Information Science, vol 293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30507-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30507-8_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30506-1

  • Online ISBN: 978-3-642-30507-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics