Abstract
Subspace clustering discovers clusters embedded in multiple, overlapping subspaces of high dimensional data. It has been successfully applied in many domains. Data streams are ordered and potentially infinite sequences of data points created by a typically non-stationary data generating process. Clustering this type of data requires some restrictions in time and memory. In this paper, we propose the S2G-Stream algorithm based on growing neural gas and soft subspace clustering. We introduce two types of entropy weighting for both features and blocks, and also two weighting models (local and global). Experiments on public datasets demonstrated the ability of S2G-Stream to detect relevant features and blocks and to provide the best partitioning of the data.
Similar content being viewed by others
Notes
Percentage of time with abnormal short-term variability.
Mean value of short-term variability.
Percentage of time with abnormal long-term variability.
Mean value of long-term variability.
References
Deng Z, Choi K-S, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106
Friedman JH, Meulman JJ (2004) Clustering objects on subsets of attributes (with discussion). J R Stat Soc Ser B (Stat Methodol) 66:815–849
Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGMoD record, vol 28. ACM, pp 61–72
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM Sigkdd Explor Newslett 6:90–105
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications, vol 27. ACM, New York
Modha DS, Spangler WS (2003) Feature weighting in k-means clustering. Mach Learn 52:217–237
Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27:657–668
Keller A, Klawonn F (2000) Fuzzy clustering with weighting of data variables. Int J Uncertain Fuzziness Knowl Based Syst 8:735–746
Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognit 37:943–952
Gançarski P, Blansche A, Wania A (2008) Comparison between two coevolutionary feature weighting algorithms in clustering. Pattern Recognit 41:983–994
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2:17
Nagesh H, Goil S, Choudhary A (2001) Adaptive grids for clustering massive data sets. In: Proceedings of the 2001 SIAM international conference on data mining. SIAM, pp 1–17
Woo K-G, Lee J-H, Kim M-H, Lee Y-J (2004) Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Inf Softw Technol 46:255–271
Jing L, Ng MK, Xu J, Huang JZ (2005) Subspace clustering of text documents with feature weighting k-means algorithm. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 802–812
Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 987–998
Ren J, Ma R (2009) Density-based data streams clustering over sliding windows. In: 2009 Sixth international conference on fuzzy systems and knowledge discovery, FSKD’09, vol 5. IEEE, pp 248–252
Shukla M, Kosta Y, Jayswal M (2017) A modified approach of optics algorithm for data streams. Eng Technol Appl Sci Res 7:1478–1481
Amini A, Wah TY (2012) Dengris-stream: a density-grid based clustering algorithm for evolving data streams over sliding window. In: Proc. international conference on data mining and computer engineering, pp 206–210
Udommanetanakit K, Rakthanmanon T, Waiyamai K (2007) E-stream: evolution-based technique for stream clustering. In: International conference on advanced data mining and applications. Springer, pp 605–615
Meesuksabai W, Kangkachit T, Waiyamai K (2011) Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty. In: International conference on advanced data mining and applications. Springer, pp 27–40
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases—volume 29. VLDB Endowment, pp 81–92
Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15:181–214
Lu Y, Sun Y, Xu G, Liu G (2005) A grid-based clustering algorithm for high-dimensional data streams. In: Li X, Wang S, Dong ZY (eds) Advanced data mining and applications. Springer, Berlin, pp 824–831
Wang C-D, Lai J-H, Huang D, Zheng W-S (2013) Svstream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25:1410–1424
Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43
Ghesmoune M, Lebbah M, Azzag H (2015) Micro-batching growing neural gas for clustering data streams using spark streaming. Procedia Comput Sci 53:158–166
Yang Y, Liu Z, Zhang J-P, Yang J (2015) Dynamic density-based clustering algorithm over uncertain data streams. In: 2012 9th International conference on fuzzy systems and knowledge discovery (FSKD). IEEE, pp 2664–2670
Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41:127–152
Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Discov 1:141–182
Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20:615–627
Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on very large data bases—volume 30. VLDB Endowment, pp 852–863
Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithm (JEA) 17:2–4
Prudent Y, Ennaji A (2005) An incremental growing neural gas learns topologies. In: Proceedings. 2005 IEEE international joint conference on neural networks, vol 2. IEEE, pp 1211–1216
Hamza H, Belaïd Y, Belaïd A, Chaudhuri BB (2008) Incremental classification of invoice documents. In: 2008 19th international conference on pattern recognition. IEEE, pp 1–4
Bouguelia M-R, Belaïd Y, Belaïd A (2013) An adaptive incremental clustering method based on the growing neural gas algorithm. In: 2nd International conference on pattern recognition applications and methods—ICPRAM, pp 42–49
Ouattara M, Keita NN, Badran F, Mandin C (2013) Soft subpace clustering pour données multiblocs basée sur les cartes topologiques auto-organisées som: 2s-som. In: SFDS 2013
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45:434–446
Attaoui MO, Lebbah M, Keskes N, Azzag H, Ghesmoune M (2019) Soft subspace topological clustering over evolving data stream. In: International workshop on self-organizing maps. Springer, pp 225–230
Attaoui MO, Lebbah M, Keskes N, Azzag H, Ghesmoune M (2019) Soft subspace growing neural gas for data stream clustering. In: International conference on artificial neural networks. Springer, pp 569–580
Ghesmoune M, Lebbah M, Azzag H (2016) A new growing neural gas for clustering data streams. Neural Netw 78:36–50
Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time** Work supported in part by US NSF grants IIS-9988345 and N2010: 0115586. In: VLDB’02: proceedings of the 28th international conference on very large databases. Elsevier, pp 358–369
Frank A, Asuncion A (2010) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine, p 213. http://archive.ics.uci.edu/ml
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Acknowledgements
The works of Mr. Ghesmoune Mohamed inspired this paper. We address him with a special thanks for his availability and the time he gave to this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing for financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Attaoui, M.O., Azzag, H., Lebbah, M. et al. Subspace data stream clustering with global and local weighting models. Neural Comput & Applic 33, 3691–3712 (2021). https://doi.org/10.1007/s00521-020-05184-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05184-z