Abstract
Data stream clustering tasks may be applied to cluster streaming data objects (clustering by examples) or to cluster streaming data sources based on their temporal behavior (clustering by variables). We focus on the latter problem and propose CETra (Cluster evolution tracker)—the first online cluster tracking technique designed to provide information regarding cluster evolution in a streaming scenario of clustering by variables with efficient processing suitable for real-time problems. CETra can trace different intra and inter-cluster changes by considering not only statistics of interest but also the clusters’ membership, thus allowing a deeper understanding of the clustering results. Experimental evaluation using synthetic datasets and real data from meteorological sensors shows that CETra can track abrupt and gradual cluster transitions, while the competing method misses most of the gradual changes. Furthermore, CETra performs efficiently in a clustering environment for multiple streaming data sources, twice as fast as the related method.
















Similar content being viewed by others
Data availibility statement
Datasets and implementation are provided in a GitHUB repository that is informed within the manuscript.
Code Availability
All used datasets and CETra implementations are available in the following public repository https://github.com/afonsoMatheus/CETra.
Notes
Available at: https://www.cnpaf.embrapa.br/infoclima/.
References
Atif M, Shafiq M, Leisch F (2023) Applications of monitoring and tracing the evolution of clustering solutions in dynamic datasets. J Appl Stat 50(4):1017–1035. https://doi.org/10.1080/02664763.2021.2008882
Bahri M, Bifet A, Gama J et al (2021) Data stream analysis: foundations, major tasks and tools. Wiley Interdiscip Rev: Data Min Knowl Discov 11(3):e1405. https://doi.org/10.1002/widm.1405
Bifet A, Gavalda R, Holmes G, et al (2023) Machine learning for data streams: with practical examples in MOA. MIT press
Bones CC, Romani LAS, de Sousa EPM (2016) Clustering multivariate data streams by correlating attributes using fractal dimension. J Inf Data Manag 7(3):249–264
Chaovalit P (2009) Clustering transient data streams by example and by variable. PhD thesis, University of Maryland
Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Boca Raton, Florida, USA
Hawwash B, Nasraoui O (2012) Stream-dashboard: a framework for mining, tracking and validating clusters in a data stream. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. ACM, Beijing, China, pp 109–117, https://doi.org/10.1007/s10462-020-09874-x
Namitha K, Saju NK, Kumar SG (2018) Tracking cluster transitions using summaries. In: 2018 International Conference on Data Science and Engineering (ICDSE). IEEE, Cochim, Índia, pp 1–5, https://doi.org/10.1109/ICDSE.2018.8527817
Ntoutsi E, Spiliopoulou M, Theodoridis Y (2012) Fingerprint: summarizing cluster evolution in dynamic environments. Int J Data Warehouse Min 8(3):27–44. https://doi.org/10.4018/jdwm.2012070102
Ntoutsi I, Spiliopoulou M, Theodoridis Y (2009) Tracing cluster transitions for different cluster types. Control Cybern 38(1):239–259
Oliveira MB, Gama J (2010) MEC - monitoring clusters’ transitions. In: Proceedings of the Fifth Starting Artificial Intelligence Researchers’ Symposium, vol 222. IOS Press, Lisbon, Portugal, pp 212–224, https://doi.org/10.3233/978-1-60750-676-8-212
Pereira G, Mendes JM (2016) Monitoring clusters in the telecom industry. In: New Advances in Information Systems and Technologies, Advances in Intelligent Systems and Computing, vol 445. Springer, Germany, p 631–640, https://doi.org/10.1007/978-3-319-31307-8_65
Silva JA, Faria ER, Barros RC et al (2013) Data stream clustering: a survey. Assoc Comput Mach Comput Surv 46(1):13. https://doi.org/10.1145/2522968.2522981
Spiliopoulou M, Ntoutsi I, Theodoridis Y, et al (2006) Monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining International Conference on Knowledge Discovery and Data Mining. ACM, Philadelphia, Pennsylvania, USA, pp 706–711, https://doi.org/10.1145/1150402.1150491
Spiliopoulou M, Ntoutsi E, Theodoridis Y, et al (2013) Monic and followups on modeling and monitoring cluster transitions. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Springer, Heidelberg, Germany, pp 622–626, https://doi.org/10.1007/978-3-642-40994-3_41
Widiputra H, Pears R, Kasabov N (2011) Multiple time-series prediction through multiple time-series relationships profiling and clustered recurring trends. In: Pacific-asia conference on knowledge discovery and data mining. Springer, Shenzhen, China, pp 161–172, https://doi.org/10.1007/978-3-642-20847-8_14
Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54(2):1201–1236. https://doi.org/10.1007/s10462-020-09874-x
Funding
This research was supported by CAPES (Brazilian Coordination for Improvement of Higher Level Personnel) and CNPq (Brazilian National Council for Supporting Research).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sousa Lima, A.M., de Sousa, E.P.M. CETra: online cluster tracking for clustering of streaming data sources. Knowl Inf Syst 67, 1455–1479 (2025). https://doi.org/10.1007/s10115-024-02267-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-024-02267-4