Skip to main content

Advertisement

Log in

CETra: online cluster tracking for clustering of streaming data sources

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Data stream clustering tasks may be applied to cluster streaming data objects (clustering by examples) or to cluster streaming data sources based on their temporal behavior (clustering by variables). We focus on the latter problem and propose CETra (Cluster evolution tracker)—the first online cluster tracking technique designed to provide information regarding cluster evolution in a streaming scenario of clustering by variables with efficient processing suitable for real-time problems. CETra can trace different intra and inter-cluster changes by considering not only statistics of interest but also the clusters’ membership, thus allowing a deeper understanding of the clustering results. Experimental evaluation using synthetic datasets and real data from meteorological sensors shows that CETra can track abrupt and gradual cluster transitions, while the competing method misses most of the gradual changes. Furthermore, CETra performs efficiently in a clustering environment for multiple streaming data sources, twice as fast as the related method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availibility statement

Datasets and implementation are provided in a GitHUB repository that is informed within the manuscript.

Code Availability

All used datasets and CETra implementations are available in the following public repository https://github.com/afonsoMatheus/CETra.

Notes

  1. Available at: https://www.cnpaf.embrapa.br/infoclima/.

References

  1. Atif M, Shafiq M, Leisch F (2023) Applications of monitoring and tracing the evolution of clustering solutions in dynamic datasets. J Appl Stat 50(4):1017–1035. https://doi.org/10.1080/02664763.2021.2008882

    Article  MathSciNet  MATH  Google Scholar 

  2. Bahri M, Bifet A, Gama J et al (2021) Data stream analysis: foundations, major tasks and tools. Wiley Interdiscip Rev: Data Min Knowl Discov 11(3):e1405. https://doi.org/10.1002/widm.1405

    Article  MATH  Google Scholar 

  3. Bifet A, Gavalda R, Holmes G, et al (2023) Machine learning for data streams: with practical examples in MOA. MIT press

  4. Bones CC, Romani LAS, de Sousa EPM (2016) Clustering multivariate data streams by correlating attributes using fractal dimension. J Inf Data Manag 7(3):249–264

    Google Scholar 

  5. Chaovalit P (2009) Clustering transient data streams by example and by variable. PhD thesis, University of Maryland

  6. Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Boca Raton, Florida, USA

    Book  MATH  Google Scholar 

  7. Hawwash B, Nasraoui O (2012) Stream-dashboard: a framework for mining, tracking and validating clusters in a data stream. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. ACM, Beijing, China, pp 109–117, https://doi.org/10.1007/s10462-020-09874-x

  8. Namitha K, Saju NK, Kumar SG (2018) Tracking cluster transitions using summaries. In: 2018 International Conference on Data Science and Engineering (ICDSE). IEEE, Cochim, Índia, pp 1–5, https://doi.org/10.1109/ICDSE.2018.8527817

  9. Ntoutsi E, Spiliopoulou M, Theodoridis Y (2012) Fingerprint: summarizing cluster evolution in dynamic environments. Int J Data Warehouse Min 8(3):27–44. https://doi.org/10.4018/jdwm.2012070102

    Article  MATH  Google Scholar 

  10. Ntoutsi I, Spiliopoulou M, Theodoridis Y (2009) Tracing cluster transitions for different cluster types. Control Cybern 38(1):239–259

    MATH  Google Scholar 

  11. Oliveira MB, Gama J (2010) MEC - monitoring clusters’ transitions. In: Proceedings of the Fifth Starting Artificial Intelligence Researchers’ Symposium, vol 222. IOS Press, Lisbon, Portugal, pp 212–224, https://doi.org/10.3233/978-1-60750-676-8-212

  12. Pereira G, Mendes JM (2016) Monitoring clusters in the telecom industry. In: New Advances in Information Systems and Technologies, Advances in Intelligent Systems and Computing, vol 445. Springer, Germany, p 631–640, https://doi.org/10.1007/978-3-319-31307-8_65

  13. Silva JA, Faria ER, Barros RC et al (2013) Data stream clustering: a survey. Assoc Comput Mach Comput Surv 46(1):13. https://doi.org/10.1145/2522968.2522981

    Article  MATH  Google Scholar 

  14. Spiliopoulou M, Ntoutsi I, Theodoridis Y, et al (2006) Monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining International Conference on Knowledge Discovery and Data Mining. ACM, Philadelphia, Pennsylvania, USA, pp 706–711, https://doi.org/10.1145/1150402.1150491

  15. Spiliopoulou M, Ntoutsi E, Theodoridis Y, et al (2013) Monic and followups on modeling and monitoring cluster transitions. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Springer, Heidelberg, Germany, pp 622–626, https://doi.org/10.1007/978-3-642-40994-3_41

  16. Widiputra H, Pears R, Kasabov N (2011) Multiple time-series prediction through multiple time-series relationships profiling and clustered recurring trends. In: Pacific-asia conference on knowledge discovery and data mining. Springer, Shenzhen, China, pp 161–172, https://doi.org/10.1007/978-3-642-20847-8_14

  17. Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54(2):1201–1236. https://doi.org/10.1007/s10462-020-09874-x

    Article  MATH  Google Scholar 

Download references

Funding

This research was supported by CAPES (Brazilian Coordination for Improvement of Higher Level Personnel) and CNPq (Brazilian National Council for Supporting Research).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afonso Matheus Sousa Lima.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sousa Lima, A.M., de Sousa, E.P.M. CETra: online cluster tracking for clustering of streaming data sources. Knowl Inf Syst 67, 1455–1479 (2025). https://doi.org/10.1007/s10115-024-02267-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-024-02267-4

Keywords