Skip to main content
Log in

Subspace data stream clustering with global and local weighting models

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Subspace clustering discovers clusters embedded in multiple, overlapping subspaces of high dimensional data. It has been successfully applied in many domains. Data streams are ordered and potentially infinite sequences of data points created by a typically non-stationary data generating process. Clustering this type of data requires some restrictions in time and memory. In this paper, we propose the S2G-Stream algorithm based on growing neural gas and soft subspace clustering. We introduce two types of entropy weighting for both features and blocks, and also two weighting models (local and global). Experiments on public datasets demonstrated the ability of S2G-Stream to detect relevant features and blocks and to provide the best partitioning of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. https://github.com/Clustering4Ever/Clustering4Ever.

  2. http://impca.curtin.edu.au/local/software/synthetic-data-sets.tar.bz2.

  3. https://github.com/mhahsler/streamMOA.

  4. https://github.com/georgekatona/Clique.

  5. https://github.com/OguzhanOktay-Buyuk/PROCLUS-Python3.

  6. https://github.com/Yanis2016/Weighted-K-Means-clustering.

  7. Percentage of time with abnormal short-term variability.

  8. Mean value of short-term variability.

  9. Percentage of time with abnormal long-term variability.

  10. Mean value of long-term variability.

  11. https://github.com/Clustering4Ever/Clustering4Ever.

  12. https://github.com/Spark-clustering-notebook/Clustering4Ever-Notebooks/tree/master/SparkNotebooks/0.9.4.

References

  1. Deng Z, Choi K-S, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106

    Article  MathSciNet  Google Scholar 

  2. Friedman JH, Meulman JJ (2004) Clustering objects on subsets of attributes (with discussion). J R Stat Soc Ser B (Stat Methodol) 66:815–849

    Article  MathSciNet  Google Scholar 

  3. Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGMoD record, vol 28. ACM, pp 61–72

  4. Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM Sigkdd Explor Newslett 6:90–105

    Article  Google Scholar 

  5. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications, vol 27. ACM, New York

    Google Scholar 

  6. Modha DS, Spangler WS (2003) Feature weighting in k-means clustering. Mach Learn 52:217–237

    Article  Google Scholar 

  7. Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27:657–668

    Article  Google Scholar 

  8. Keller A, Klawonn F (2000) Fuzzy clustering with weighting of data variables. Int J Uncertain Fuzziness Knowl Based Syst 8:735–746

    Article  Google Scholar 

  9. Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognit 37:943–952

    Article  Google Scholar 

  10. Gançarski P, Blansche A, Wania A (2008) Comparison between two coevolutionary feature weighting algorithms in clustering. Pattern Recognit 41:983–994

    Article  Google Scholar 

  11. Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data (TKDD) 2:17

    Google Scholar 

  12. Nagesh H, Goil S, Choudhary A (2001) Adaptive grids for clustering massive data sets. In: Proceedings of the 2001 SIAM international conference on data mining. SIAM, pp 1–17

  13. Woo K-G, Lee J-H, Kim M-H, Lee Y-J (2004) Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Inf Softw Technol 46:255–271

    Article  Google Scholar 

  14. Jing L, Ng MK, Xu J, Huang JZ (2005) Subspace clustering of text documents with feature weighting k-means algorithm. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 802–812

  15. Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 987–998

  16. Ren J, Ma R (2009) Density-based data streams clustering over sliding windows. In: 2009 Sixth international conference on fuzzy systems and knowledge discovery, FSKD’09, vol 5. IEEE, pp 248–252

  17. Shukla M, Kosta Y, Jayswal M (2017) A modified approach of optics algorithm for data streams. Eng Technol Appl Sci Res 7:1478–1481

    Article  Google Scholar 

  18. Amini A, Wah TY (2012) Dengris-stream: a density-grid based clustering algorithm for evolving data streams over sliding window. In: Proc. international conference on data mining and computer engineering, pp 206–210

  19. Udommanetanakit K, Rakthanmanon T, Waiyamai K (2007) E-stream: evolution-based technique for stream clustering. In: International conference on advanced data mining and applications. Springer, pp 605–615

  20. Meesuksabai W, Kangkachit T, Waiyamai K (2011) Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty. In: International conference on advanced data mining and applications. Springer, pp 27–40

  21. Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases—volume 29. VLDB Endowment, pp 81–92

  22. Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15:181–214

    Article  Google Scholar 

  23. Lu Y, Sun Y, Xu G, Liu G (2005) A grid-based clustering algorithm for high-dimensional data streams. In: Li X, Wang S, Dong ZY (eds) Advanced data mining and applications. Springer, Berlin, pp 824–831

    Chapter  Google Scholar 

  24. Wang C-D, Lai J-H, Huang D, Zheng W-S (2013) Svstream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25:1410–1424

    Article  Google Scholar 

  25. Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43

    Article  Google Scholar 

  26. Ghesmoune M, Lebbah M, Azzag H (2015) Micro-batching growing neural gas for clustering data streams using spark streaming. Procedia Comput Sci 53:158–166

    Article  Google Scholar 

  27. Yang Y, Liu Z, Zhang J-P, Yang J (2015) Dynamic density-based clustering algorithm over uncertain data streams. In: 2012 9th International conference on fuzzy systems and knowledge discovery (FSKD). IEEE, pp 2664–2670

  28. Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41:127–152

    Article  Google Scholar 

  29. Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Discov 1:141–182

    Article  Google Scholar 

  30. Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20:615–627

    Article  Google Scholar 

  31. Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on very large data bases—volume 30. VLDB Endowment, pp 852–863

  32. Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithm (JEA) 17:2–4

    MathSciNet  MATH  Google Scholar 

  33. Prudent Y, Ennaji A (2005) An incremental growing neural gas learns topologies. In: Proceedings. 2005 IEEE international joint conference on neural networks, vol 2. IEEE, pp 1211–1216

  34. Hamza H, Belaïd Y, Belaïd A, Chaudhuri BB (2008) Incremental classification of invoice documents. In: 2008 19th international conference on pattern recognition. IEEE, pp 1–4

  35. Bouguelia M-R, Belaïd Y, Belaïd A (2013) An adaptive incremental clustering method based on the growing neural gas algorithm. In: 2nd International conference on pattern recognition applications and methods—ICPRAM, pp 42–49

  36. Ouattara M, Keita NN, Badran F, Mandin C (2013) Soft subpace clustering pour données multiblocs basée sur les cartes topologiques auto-organisées som: 2s-som. In: SFDS 2013

  37. Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45:434–446

    Article  Google Scholar 

  38. Attaoui MO, Lebbah M, Keskes N, Azzag H, Ghesmoune M (2019) Soft subspace topological clustering over evolving data stream. In: International workshop on self-organizing maps. Springer, pp 225–230

  39. Attaoui MO, Lebbah M, Keskes N, Azzag H, Ghesmoune M (2019) Soft subspace growing neural gas for data stream clustering. In: International conference on artificial neural networks. Springer, pp 569–580

  40. Ghesmoune M, Lebbah M, Azzag H (2016) A new growing neural gas for clustering data streams. Neural Netw 78:36–50

    Article  Google Scholar 

  41. Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time** Work supported in part by US NSF grants IIS-9988345 and N2010: 0115586. In: VLDB’02: proceedings of the 28th international conference on very large databases. Elsevier, pp 358–369

  42. Frank A, Asuncion A (2010) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine, p 213. http://archive.ics.uci.edu/ml

  43. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  44. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  Google Scholar 

Download references

Acknowledgements

The works of Mr. Ghesmoune Mohamed inspired this paper. We address him with a special thanks for his availability and the time he gave to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Oualid Attaoui.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing for financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Attaoui, M.O., Azzag, H., Lebbah, M. et al. Subspace data stream clustering with global and local weighting models. Neural Comput & Applic 33, 3691–3712 (2021). https://doi.org/10.1007/s00521-020-05184-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05184-z

Keywords

Navigation