Abstract
Classification with concept-drifting data streams has found wide applications. However, many classification algorithms on streaming data have been designed for fixed features of concept drift and cannot deal with the noise impact on concept drift detection. An incremental algorithm with Multiple Semi- Random decision Trees (MSRT) for concept-drifting data streams is presented in this paper, which takes two sliding windows for training and testing, uses the inequality of Hoeffding Bounds to determine the thresholds for distinguishing the true drift from noise, and chooses the classification function to estimate the error rate for periodic concept-drift detection. Our extensive empirical study shows that MSRT has an improved performance in time, accuracy and robustness in comparison with CVFDT, a state-of-the-art decision-tree algorithm for classifying concept-drifting data streams.
This research is supported by the National Natural Science Foundation of China (No. 60573174).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101 (1996)
Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: ACM KDD Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, pp. 97–106 (2001)
Fan, W.: StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams. In: 30th VLDB Conference, Toronto, Canada (2004)
Zhang, Y., Jin, X.: An automatic construction and organization strategy for ensemble learning on data streams. In: ACM SIGMOD Record, pp. 28–33 (2006)
Hu, X., Li, P., Wu, X., Wu, G.: A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams. Journal of Computer Science and Technology 22, 711–724 (2007)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)
The UCI KDD Archive, http://kdd.ics.uci.edu//databases/kddcup99/kddcup99.html
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Yang, Y., Zhu, X., Wu, X.: Combining Proactive and Reactive Predictions for Data Streams. In: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, pp. 710–715 (2005)
Stanley, K.O.: Learning concept drift with a committee of decision trees. Technical Report AI-03-302, Department of Computer Sciences, University of Texas at Austin (2003)
Salganicoff, M.: Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artificial Intelligence 11, 133–155 (1997)
Kolrer, J.Z., Marcus, A.: Dynamic Weighted majority: A new Ensemble Method for Tracking Concept Drift. In: 3rd International IEEE Conference on Data Mining, Melbourne, Florida, USA, pp. 123–130 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, P., Hu, X., Wu, X. (2008). Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_78
Download citation
DOI: https://doi.org/10.1007/978-3-540-88192-6_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)