Skip to main content

Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Abstract

Classification with concept-drifting data streams has found wide applications. However, many classification algorithms on streaming data have been designed for fixed features of concept drift and cannot deal with the noise impact on concept drift detection. An incremental algorithm with Multiple Semi- Random decision Trees (MSRT) for concept-drifting data streams is presented in this paper, which takes two sliding windows for training and testing, uses the inequality of Hoeffding Bounds to determine the thresholds for distinguishing the true drift from noise, and chooses the classification function to estimate the error rate for periodic concept-drift detection. Our extensive empirical study shows that MSRT has an improved performance in time, accuracy and robustness in comparison with CVFDT, a state-of-the-art decision-tree algorithm for classifying concept-drifting data streams.

This research is supported by the National Natural Science Foundation of China (No. 60573174).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101 (1996)

    Google Scholar 

  2. Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: ACM KDD Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, pp. 97–106 (2001)

    Google Scholar 

  3. Fan, W.: StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams. In: 30th VLDB Conference, Toronto, Canada (2004)

    Google Scholar 

  4. Zhang, Y., Jin, X.: An automatic construction and organization strategy for ensemble learning on data streams. In: ACM SIGMOD Record, pp. 28–33 (2006)

    Google Scholar 

  5. Hu, X., Li, P., Wu, X., Wu, G.: A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams. Journal of Computer Science and Technology 22, 711–724 (2007)

    Article  Google Scholar 

  6. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)

    Article  MATH  MathSciNet  Google Scholar 

  7. The UCI KDD Archive, http://kdd.ics.uci.edu//databases/kddcup99/kddcup99.html

  8. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  9. Yang, Y., Zhu, X., Wu, X.: Combining Proactive and Reactive Predictions for Data Streams. In: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, pp. 710–715 (2005)

    Google Scholar 

  10. Stanley, K.O.: Learning concept drift with a committee of decision trees. Technical Report AI-03-302, Department of Computer Sciences, University of Texas at Austin (2003)

    Google Scholar 

  11. Salganicoff, M.: Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artificial Intelligence 11, 133–155 (1997)

    Article  Google Scholar 

  12. Kolrer, J.Z., Marcus, A.: Dynamic Weighted majority: A new Ensemble Method for Tracking Concept Drift. In: 3rd International IEEE Conference on Data Mining, Melbourne, Florida, USA, pp. 123–130 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, P., Hu, X., Wu, X. (2008). Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_78

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_78

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics