Abstract
Correct recognition of the possible changes in data streams, called concept drifts plays a crucial role in constructing the appropriate model learning strategy. This paper focuses on the unsupervised learning model for non-stationary data streams, where two significant modifications of the ClustTree algorithm are presented. They allow the clustering model to be adapted to the changes caused by a concept drift. An experimental study conducted on a set of benchmark data streams proves the usefulness of the proposed solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Source code of the proposed algorithms can be found at https://github.com/jagub2/mgr/tree/master/MyClusTree/src/moa.
- 3.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92. VLDB Endowment (2003)
Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J Mach. Learn. Res. 11, 1601–1604 (2010)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)
Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)
Gama, J., Gaber, M.: Learning from Data Streams: Processing Techniques Insensor Networks. Springer, Heidelberg (2007). https://doi.org/10.1007/3-540-73679-4
Gama, J., Rodrigues, P.P.: Stream-based electricity load forecast. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 446–453. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74976-9_45
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The ClusTree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)
Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, Heidelberg (2005). https://doi.org/10.1007/978-1-84628-293-5
Ren, J., Ma, R.: Density-based data streams clustering over sliding windows. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 5, pp. 248–252, August 2009
Sun, J., Sow, D., Hu, J., Ebadollahi, S.: A system for mining temporal physiological data streams for advanced prognostic decision support. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 1061–1066, Washington, DC, USA. IEEE Computer Society (2010)
Acknowledgments
This work was supported by Statutory Fund of the Department of Systems and—Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zgraja, J., Woźniak, M. (2018). Drifted Data Stream Clustering Based on ClusTree Algorithm. In: de Cos Juez, F., et al. Hybrid Artificial Intelligent Systems. HAIS 2018. Lecture Notes in Computer Science(), vol 10870. Springer, Cham. https://doi.org/10.1007/978-3-319-92639-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-92639-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92638-4
Online ISBN: 978-3-319-92639-1
eBook Packages: Computer ScienceComputer Science (R0)