Abstract
We consider the problem of data-stream classification, introducing a stream-classification algorithm, Dynamic Streaming Random Forests, that is able to handle evolving data streams using an entropy-based drift-detection technique. The algorithm automatically adjusts its parameters based on the data seen so far. Experimental results show that the algorithm handles multi-class problems for which the underlying class boundaries drift, without losing accuracy.
The work reported in this paper has been supported by Kuwait University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data mining, pp. 97–106 (2001)
Chu, F., Wang, Y., Zaniolo, C.: An adaptive learning approach for noisy data streams. In: Proceedings of the IEEE International Conference on Data Mining, pp. 351–354 (2004)
Fan, W.: A systematic data selection to mine concept-drifting data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 128–137 (2004)
Zhu, X., Wu, X., Yang, Y.: Dynamic classifier selection for effective mining from noisy data streams. In: Proceedings of the IEEE International Conference on Data Mining, pp. 305–312 (2004)
Abdulsalam, H., Skillicorn, D.B., Martin, P.: Streaming random forests. In: Proceeings of the International Database Engineering and Applications Symposium, pp. 225–232 (2007)
Breiman, L.: Random forests. Technical Report (1999), www.stat.berkeley.edu
Vorburger, P., Bernstein, A.: Entropy-based concept shift detection. In: Proceedings of the International Conference on Data Mining, pp. 1113–1118 (2006)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of American Statistical Association 58(1), 13–30 (1963)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1), 3–55 (2001)
Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An information-theoretic approach to detecting changes in multi-dimensional data streams. Technical Report (2005)
Melli, G.: Scds-a synthetic classification data set generator. Simon Fraser University, School of Computer Science (1997)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining, pp. 523–528 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abdulsalam, H., Skillicorn, D.B., Martin, P. (2008). Classifying Evolving Data Streams Using Dynamic Streaming Random Forests. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2008. Lecture Notes in Computer Science, vol 5181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85654-2_54
Download citation
DOI: https://doi.org/10.1007/978-3-540-85654-2_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85653-5
Online ISBN: 978-3-540-85654-2
eBook Packages: Computer ScienceComputer Science (R0)