Skip to main content

Drifted Data Stream Clustering Based on ClusTree Algorithm

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10870))

Included in the following conference series:

Abstract

Correct recognition of the possible changes in data streams, called concept drifts plays a crucial role in constructing the appropriate model learning strategy. This paper focuses on the unsupervised learning model for non-stationary data streams, where two significant modifications of the ClustTree algorithm are presented. They allow the clustering model to be adapted to the changes caused by a concept drift. An experimental study conducted on a set of benchmark data streams proves the usefulness of the proposed solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aws.amazon.com/.

  2. 2.

    Source code of the proposed algorithms can be found at https://github.com/jagub2/mgr/tree/master/MyClusTree/src/moa.

  3. 3.

    https://github.com/jagub2/mgr/tree/master/plots.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92. VLDB Endowment (2003)

    Chapter  Google Scholar 

  2. Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  3. Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)

    Article  Google Scholar 

  4. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  5. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)

    Google Scholar 

  7. Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  8. Gama, J., Gaber, M.: Learning from Data Streams: Processing Techniques Insensor Networks. Springer, Heidelberg (2007). https://doi.org/10.1007/3-540-73679-4

    Book  MATH  Google Scholar 

  9. Gama, J., Rodrigues, P.P.: Stream-based electricity load forecast. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 446–453. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74976-9_45

    Chapter  Google Scholar 

  10. Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The ClusTree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)

    Article  Google Scholar 

  11. Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, Heidelberg (2005). https://doi.org/10.1007/978-1-84628-293-5

    Book  MATH  Google Scholar 

  12. Ren, J., Ma, R.: Density-based data streams clustering over sliding windows. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 5, pp. 248–252, August 2009

    Google Scholar 

  13. Sun, J., Sow, D., Hu, J., Ebadollahi, S.: A system for mining temporal physiological data streams for advanced prognostic decision support. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 1061–1066, Washington, DC, USA. IEEE Computer Society (2010)

    Google Scholar 

Download references

Acknowledgments

This work was supported by Statutory Fund of the Department of Systems and—Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Zgraja .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zgraja, J., Woźniak, M. (2018). Drifted Data Stream Clustering Based on ClusTree Algorithm. In: de Cos Juez, F., et al. Hybrid Artificial Intelligent Systems. HAIS 2018. Lecture Notes in Computer Science(), vol 10870. Springer, Cham. https://doi.org/10.1007/978-3-319-92639-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92639-1_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92638-4

  • Online ISBN: 978-3-319-92639-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics