Drifted Data Stream Clustering Based on ClusTree Algorithm

Zgraja, Jakub; Woźniak, Michał

doi:10.1007/978-3-319-92639-1_28

Jakub Zgraja²⁰ &
Michał Woźniak²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10870))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

2490 Accesses
1 Citations

Abstract

Correct recognition of the possible changes in data streams, called concept drifts plays a crucial role in constructing the appropriate model learning strategy. This paper focuses on the unsupervised learning model for non-stationary data streams, where two significant modifications of the ClustTree algorithm are presented. They allow the clustering model to be adapted to the changes caused by a concept drift. An experimental study conducted on a set of benchmark data streams proves the usefulness of the proposed solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://aws.amazon.com/.
2.
Source code of the proposed algorithms can be found at https://github.com/jagub2/mgr/tree/master/MyClusTree/src/moa.
3.
https://github.com/jagub2/mgr/tree/master/plots.

References

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92. VLDB Endowment (2003)
Chapter Google Scholar
Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)
Google Scholar
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)
Article Google Scholar
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J Mach. Learn. Res. 11, 1601–1604 (2010)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)
Google Scholar
Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)
Book Google Scholar
Gama, J., Gaber, M.: Learning from Data Streams: Processing Techniques Insensor Networks. Springer, Heidelberg (2007). https://doi.org/10.1007/3-540-73679-4
Book MATH Google Scholar
Gama, J., Rodrigues, P.P.: Stream-based electricity load forecast. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 446–453. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74976-9_45
Chapter Google Scholar
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The ClusTree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)
Article Google Scholar
Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, Heidelberg (2005). https://doi.org/10.1007/978-1-84628-293-5
Book MATH Google Scholar
Ren, J., Ma, R.: Density-based data streams clustering over sliding windows. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 5, pp. 248–252, August 2009
Google Scholar
Sun, J., Sow, D., Hu, J., Ebadollahi, S.: A system for mining temporal physiological data streams for advanced prognostic decision support. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 1061–1066, Washington, DC, USA. IEEE Computer Society (2010)
Google Scholar

Download references

Acknowledgments

This work was supported by Statutory Fund of the Department of Systems and—Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.

Author information

Authors and Affiliations

Faculty of Electronics, Department of Systems and Computer Networks, Wroclaw University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Jakub Zgraja & Michał Woźniak

Authors

Jakub Zgraja
View author publications
You can also search for this author in PubMed Google Scholar
Michał Woźniak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Zgraja .

Editor information

Editors and Affiliations

Department of Mine Operating and Prospection, University of Oviedo, Oviedo, Spain
Francisco Javier de Cos Juez
Department of Computer Science, University of Oviedo, Oviedo, Spain
José Ramón Villar
Department of Computer Science, University of Oviedo, Oviedo, Spain
Enrique A. de la Cal
Department of Civil Engineering, University of Burgos, Burgos, Spain
Álvaro Herrero
University of A Coruña, A Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
José António Sáez
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zgraja, J., Woźniak, M. (2018). Drifted Data Stream Clustering Based on ClusTree Algorithm. In: de Cos Juez, F., et al. Hybrid Artificial Intelligent Systems. HAIS 2018. Lecture Notes in Computer Science(), vol 10870. Springer, Cham. https://doi.org/10.1007/978-3-319-92639-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-92639-1_28
Published: 08 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92638-4
Online ISBN: 978-3-319-92639-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics