Abstract
Data stream clustering faces major challenges such as lack of memory and time. Therefore, traditional clustering methods are not suitable for this kind of data. On the other hand, most data stream clustering methods do not consider the problems of uncertainty and ambiguity in the data. So, in this case, where an object is close to a set of clusters, this object cannot be correctly and simply categorized. The aim of this study is to provide a new method for clustering data stream, called clustering data stream using belief function, with regard to the problem of uncertain and ambiguous data. In the proposed method, the belief function theory is used to cluster objects into single clusters or a set of clusters and determines the structure of data. In addition, using window, weighted centers, and the fading function overcomes the restrictions of data stream. The results of the experiments have been compared with state-of-the-art methods, which show the superiority of the proposed method in terms of purity, error rate, and ambiguity rate measures.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Fixed granularity grid.
The MATLAB implementation of ECM can be found in https://www.hds.utc.fr/~tdenoeux/dokuwiki/_media/en/software/ecm.zip.
References
Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) StreamKM ++: a clustering algorithm for data streams. J Exp Algorithm (JEA) 17:2–4
Aggarwal C (2013) A survey of stream clustering algorithms. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Chapman and Hall/CRC, Boca Raton, pp 229–256
Aggarwal C, Yu P (2008) A framework for clustering uncertain data streams. In: IEEE international conference on data engineering, pp 150–159
Aggarwal C, Han J, Wang J, Yu P, Watson T (2003) A framework for clustering evolving data streams. In: Proceedings of VLDB 2003, pp 81–92
Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of VLDB, pp 852–863
Ahmad S, Lavin A, Purdy S, Agha Z (2017) Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262:134–147
Ahmouda A, Hochmair HH, Cvetojevic S (2018) Analyzing the effect of earthquakes on OpenStreetMap contribution patterns and tweeting activities. Geospat Inf Sci 21(3):195–212
Amini A, Saboohi H, Wah T, Herawan T (2014) A fast density-based clustering algorithm for real-time internet of things stream. Sci World J. https://doi.org/10.1155/2014/926020
Amini A, Saboohi H, Herawan T, Wah T (2016) MuDi-Stream: a multi density clustering algorithm for evolving data stream. Netw Comput Appl 59:370–385
Antoine V, Quost B, Masson MH, Denoeux T (2014) CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18:1321–1335
Bahri M, Elouedi Z (2017) Clustering data stream under a belief function framework. In: IEEE/ACS 13th international conference of computer systems and applications (AICCSA), pp 1–8
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer, New York. https://doi.org/10.1007/978-1-4757-0450-1
Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41:127–152
Calderwood S, McAreavey K, Liu W, Hong J (2017) Context-dependent combination of sensor information in Dempster–Shafer theory for BDI. Knowl Inf Syst 51:259–285
Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining. https://doi.org/10.1137/1.9781611972764.29
Chakeri A, Nekooimehr I, Hall LO (2013) Dempster–Shafer theory of evidence in Single Pass Fuzzy C Means. In: 2013 IEEE international conference on fuzzy systems, Hyderabad, pp 1–5
Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings KDD’07 proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142
Croisard N, Vasile M, Kemble S, Radice G (2010) Preliminary space mission design under uncertainty. Acta Astronaut 66:654–664
da Silva A, Chiky R, Hébrail G (2012) A clustering approach for sampling data streams in sensor networks. Knowl Inf Syst 32:1–23
Ding S, Zhang J, Jia H, Qian J (2016) An adaptive density data stream clustering algorithm. Cognit Comput 8:30–38
Dua D, Taniskidou E (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml. Accessed 5 Feb 2018
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Ghesmoune M, Lebbah M, Azzag H (2016) State-of-the-art on clustering data streams. Big Data Anal. https://doi.org/10.1186/s41044-016-0011-3
Ghosh S, Mitra S (2013) Clustering large data with uncertainty. Appl Soft Comput 13:1639–1645
Hamidzadeh J, Ghomanjani MH (2018) An unequal cluster-radius approach based on node density in clustering for wireless sensor networks. Wireless Pers Commun 101:1619–1637
Hamidzadeh J, Namaei N (2019) Belief-based chaotic algorithm for support vector data description. Soft Comput 23:4289–4314
Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2015) IRAHC: instance reduction algorithm using hyper rectangle clustering. Pattern Recogn 48:1878–1889
Hamidzadeh J, Zabihimayvan M, Sadeghi R (2018) Detection of Web site visitors based on fuzzy rough sets. Soft Comput 22(7):2175–2188
Helton JC (2011) Quantification of margins and uncertainties: conceptual and computational basis. Reliab Eng Syst Saf 96:976–1013
Hofmeyr DP, Pavlidis NG, Eckley IA (2016) Divisive clustering of high dimensional data streams. Stat Comput 26:1101–1120
Jin C, Yu JX, Zhou A, Cao F (2014) Efficient clustering of uncertain data streams. Knowl Inf Syst 40:509–539
Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43
Kranen P, Assent I, Baldauf C, Seidl T (2011) The ClusTree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst 29(2):249–272
Li Y, Chen J, Feng L (2013) Dealing with uncertainty: a survey of theories and practices. IEEE Trans Knowl Data Eng 25(11):2463–2482
Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95
Masson M, Denœux T (2008) ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recogn 41:1384–1397
Meesuksabai W, Kangkachit T, Waiyamai K (2011) HUE-stream: evolution-based clustering technique for heterogeneous data streams with uncertainty. In: Tang J, King I, Chen L, Wang J (eds) Advanced data mining and applications. ADMA 2011. Lecture notes in computer science. Springer, Berlin, pp 27–40
Mousavi M, Abu Bakar A, Vakilian M (2015) Data stream clustering algorithms: a review. Int J Adv Soft Comput Appl 7:1–15
Nguyen HL, Woon YK, Ng WK (2014) A survey on data stream clustering and classification. Knowl Inf Syst 45:535–569
Patra BK, Nandi S (2015) Effective data summarization for hierarchical clustering. Knowl Inf Syst 42:1–20
Pereira C, Mello R (2015) PTS: projected topological stream clustering algorithm. Neurocomputing 180:16–26
Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er M, Ding W, Lin C (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Serir L, Ramasso E, Zerhouni N (2012) Evidential evolving Gustafson–Kessel algorithm for online data streams partitioning using belief function theory. Int J Approx Reason 53:747–768
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
Shang G, Zhu J, Gao T, Zheng X, Zhang J (2018) Using multi-source remote sensing data to classify larch plantations in Northeast China and support the development of multi-purpose silviculture. J For Res 29(4):889–904
Sheskin D (2011) Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC, Boca Raton
Silva J, Hruschka E, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238
Smets P (2000) Data fusion in the transferable belief model. In: Proceedings of the third international conference on information fusion, pp 21–33
Yang Y, Liu Z, Xing Z (2015) A review of uncertain data stream clustering algorithms. In: Eighth international conference on internet computing for science and engineering (ICICSE), Harbin, pp 111–116
Yin C, Xia L, Zhang S, Sun R, Wang J (2018) Improved clustering algorithm based on high-speed network data stream. Soft Comput 22:4185–4195
Yin C, Zhang S, Yin Z, Wang J (2019) Anomaly detection model based on data stream clustering. Cluster Comput 22:1729–1738
Yu X, Xu X, Lin L (2015) A data stream subspace clustering algorithm. In: Wang H et al (eds) Intelligent computation in big data era. ICYCSEE 2015. Communications in computer and information science. Springer, Berlin, pp 334–343
Zabihi M, Vafaei Jahan M, Hamidzadeh J (2014) A density based clustering approach for web robot detection. In: Proceedings of the 4th international conference on computer and knowledge engineering. https://doi.org/10.1109/ICCKE.2014.6993362
Zaman K, Rangavajhala S, McDonald MP, Mahadevan S (2011) A probabilistic approach for representation of interval uncertainty. Reliab Eng Syst Saf 96:117–130
Zhang B, Qin S, Wang W, Wang D, Xue L (2016) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Sig Process 126:111–116
Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15:181–214
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
By taking the derivative of Eq. (20) with respect to \( \lambda_{i} \), we have:
By adjusting and taking the derivative of Eq. (20) with respect to \( m_{ij} \), \( m_{il} \), and \( m_{i\emptyset } \), we have:
And by using these equations in Eq. (31), we have:
Then, the equations of masses are obtained using Eq. (35) in Eqs. (32), (33), and (34).
Appendix 2
Taking the derivative of the objective function with respect to v, we have:
The partial derivatives of the distances with respect to v are given by
Using Eqs. (37) and (38) in Eq. (36), we have:
To simplify the calculations and obtain the centers, the linear equation (Eq. 23) is presented.
Rights and permissions
About this article
Cite this article
Hamidzadeh, J., Ghadamyari, R. Clustering data stream with uncertainty using belief function theory and fading function. Soft Comput 24, 8955–8974 (2020). https://doi.org/10.1007/s00500-019-04422-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04422-4