Skip to main content

Random Ensemble Decision Trees for Learning Concept-Drifting Data Streams

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6634))

Included in the following conference series:

Abstract

Few online classification algorithms based on traditional inductive ensembling focus on handling concept drifting data streams while performing well on noisy data. Motivated by this, an incremental algorithm based on random Ensemble Decision Trees for Concept-drifting data streams (EDTC) is proposed in this paper. Three variants of random feature selection are developed to implement split-tests. To better track concept drifts in data streams with noisy data, an improved two-threshold-based drifting detection mechanism is introduced. Extensive studies demonstrate that our algorithm performs very well compared to several known online algorithms based on single models and ensemble models. A conclusion is hence drawn that multiple solutions are provided for learning from concept drifting data streams with noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdulsalam, H., Skillicorn, D.B., Martin, P.: Streaming Random Forests. In: DEAS 2007, pp. 225–232 (2007)

    Google Scholar 

  2. Abdulsalam, H., Skillicorn, D.B., Martin, P.: Classifying Evolving Data Streams Using Dynamic Streaming Random Forests. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 643–651. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Albert, B., Holmes, G., Pfahringer, B., Kirkby, R., Gavald, R.: New Ensemble Methods For Evolving Data Streams. In: KDD 2009, pp. 139–148 (2009)

    Google Scholar 

  4. Baena-García, M., Campo-Ávila, J.D., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early Drift Detection Method. In: ECML PKDD Workshop 2006, pp. 77–86 (2006)

    Google Scholar 

  5. Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  Google Scholar 

  6. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  7. Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)

    Article  Google Scholar 

  8. Fan, W.: On the Optimality of Probability Estimation by Random Decision Trees. In: AAAI 2004, pp. 336–341 (2004)

    Google Scholar 

  9. Fan, W.: StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams. In: VLDB 2004, pp. 1257–1260 (2004)

    Google Scholar 

  10. Fan, W., Wang, H.X., Yu, P.S., Ma, S.: Is Random Model Better? On Its Accuracy and Efficiency. In: ICDM 2003, pp. 51–58 (2003)

    Google Scholar 

  11. Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in Evaluation of Stream Learning Algorithms. In: KDD 2009, pp. 329–338 (2009)

    Google Scholar 

  13. Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variabless. Journal of the American Statistical Association 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  14. Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis (2007), http://sourceforge.net/projects/moa-datastream

  15. Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

  16. Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: KDD 2001, pp. 97–106 (2001)

    Google Scholar 

  17. KDDCUP99 data set (1999), http://kdd.ics.uci.edu/databases/kddcup99

  18. Li, P.P., Hu, X.G., Wu, X.D.: Mining Concept-drifting Data Streams with Multiple Semi-random Decision Trees. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 733–740. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. Li, P., Liang, Q., Wu, X., Hu, X.: Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 376–388. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Quinlan, R.J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  21. Schlimmer Jr., J.C., Granger, R.H.: Incremental Learning from Noisy Data. Machine Learning 1(3), 317–354 (1986)

    Google Scholar 

  22. Scholz, M., Klinkenberg, R.: Boosting Classifiers for Drifting Concepts. Intelligent Data Analysis (IDA) 11(1), 3–28 (2007)

    Google Scholar 

  23. Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: VLDB 1996, pp. 544–555 (1996)

    Google Scholar 

  24. Street, W.N., Kim, Y.S.: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification. In: KDD 2001, pp. 377–382 (2001)

    Google Scholar 

  25. Wang, H.X., Fan, W., Yu, P.S., Han, J.W.: Mining Concept-drifting Data Streams Using Ensemble Classifiers. In: KDD 2003, pp. 226–235 (2003)

    Google Scholar 

  26. Yahoo! Shopping Web Services, http://developer.yahoo.com/everything.html

  27. Yang, Y., Wu, X., Zhu, X.: Combining Proactive and Reactive Predictions for Data Streams. In: KDD 2005, pp. 710–715 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, P., Wu, X., Liang, Q., Hu, X., Zhang, Y. (2011). Random Ensemble Decision Trees for Learning Concept-Drifting Data Streams. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20841-6_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20840-9

  • Online ISBN: 978-3-642-20841-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics