Skip to main content
Log in

Clustering data stream with uncertainty using belief function theory and fading function

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Data stream clustering faces major challenges such as lack of memory and time. Therefore, traditional clustering methods are not suitable for this kind of data. On the other hand, most data stream clustering methods do not consider the problems of uncertainty and ambiguity in the data. So, in this case, where an object is close to a set of clusters, this object cannot be correctly and simply categorized. The aim of this study is to provide a new method for clustering data stream, called clustering data stream using belief function, with regard to the problem of uncertain and ambiguous data. In the proposed method, the belief function theory is used to cluster objects into single clusters or a set of clusters and determines the structure of data. In addition, using window, weighted centers, and the fading function overcomes the restrictions of data stream. The results of the experiments have been compared with state-of-the-art methods, which show the superiority of the proposed method in terms of purity, error rate, and ambiguity rate measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Fixed granularity grid.

  2. The MATLAB implementation of ECM can be found in https://www.hds.utc.fr/~tdenoeux/dokuwiki/_media/en/software/ecm.zip.

References

  • Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) StreamKM ++: a clustering algorithm for data streams. J Exp Algorithm (JEA) 17:2–4

    MathSciNet  MATH  Google Scholar 

  • Aggarwal C (2013) A survey of stream clustering algorithms. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Chapman and Hall/CRC, Boca Raton, pp 229–256

    Google Scholar 

  • Aggarwal C, Yu P (2008) A framework for clustering uncertain data streams. In: IEEE international conference on data engineering, pp 150–159

  • Aggarwal C, Han J, Wang J, Yu P, Watson T (2003) A framework for clustering evolving data streams. In: Proceedings of VLDB 2003, pp 81–92

  • Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of VLDB, pp 852–863

  • Ahmad S, Lavin A, Purdy S, Agha Z (2017) Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262:134–147

    Google Scholar 

  • Ahmouda A, Hochmair HH, Cvetojevic S (2018) Analyzing the effect of earthquakes on OpenStreetMap contribution patterns and tweeting activities. Geospat Inf Sci 21(3):195–212

    Google Scholar 

  • Amini A, Saboohi H, Wah T, Herawan T (2014) A fast density-based clustering algorithm for real-time internet of things stream. Sci World J. https://doi.org/10.1155/2014/926020

    Article  Google Scholar 

  • Amini A, Saboohi H, Herawan T, Wah T (2016) MuDi-Stream: a multi density clustering algorithm for evolving data stream. Netw Comput Appl 59:370–385

    Google Scholar 

  • Antoine V, Quost B, Masson MH, Denoeux T (2014) CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18:1321–1335

    Google Scholar 

  • Bahri M, Elouedi Z (2017) Clustering data stream under a belief function framework. In: IEEE/ACS 13th international conference of computer systems and applications (AICCSA), pp 1–8

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer, New York. https://doi.org/10.1007/978-1-4757-0450-1

    Book  MATH  Google Scholar 

  • Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41:127–152

    Google Scholar 

  • Calderwood S, McAreavey K, Liu W, Hong J (2017) Context-dependent combination of sensor information in Dempster–Shafer theory for BDI. Knowl Inf Syst 51:259–285

    Google Scholar 

  • Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining. https://doi.org/10.1137/1.9781611972764.29

  • Chakeri A, Nekooimehr I, Hall LO (2013) Dempster–Shafer theory of evidence in Single Pass Fuzzy C Means. In: 2013 IEEE international conference on fuzzy systems, Hyderabad, pp 1–5

  • Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings KDD’07 proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142

  • Croisard N, Vasile M, Kemble S, Radice G (2010) Preliminary space mission design under uncertainty. Acta Astronaut 66:654–664

    Google Scholar 

  • da Silva A, Chiky R, Hébrail G (2012) A clustering approach for sampling data streams in sensor networks. Knowl Inf Syst 32:1–23

    Google Scholar 

  • Ding S, Zhang J, Jia H, Qian J (2016) An adaptive density data stream clustering algorithm. Cognit Comput 8:30–38

    Google Scholar 

  • Dua D, Taniskidou E (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml. Accessed 5 Feb 2018

  • Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976

    MathSciNet  MATH  Google Scholar 

  • Ghesmoune M, Lebbah M, Azzag H (2016) State-of-the-art on clustering data streams. Big Data Anal. https://doi.org/10.1186/s41044-016-0011-3

    Article  Google Scholar 

  • Ghosh S, Mitra S (2013) Clustering large data with uncertainty. Appl Soft Comput 13:1639–1645

    Google Scholar 

  • Hamidzadeh J, Ghomanjani MH (2018) An unequal cluster-radius approach based on node density in clustering for wireless sensor networks. Wireless Pers Commun 101:1619–1637

    Google Scholar 

  • Hamidzadeh J, Namaei N (2019) Belief-based chaotic algorithm for support vector data description. Soft Comput 23:4289–4314

    MATH  Google Scholar 

  • Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2015) IRAHC: instance reduction algorithm using hyper rectangle clustering. Pattern Recogn 48:1878–1889

    MATH  Google Scholar 

  • Hamidzadeh J, Zabihimayvan M, Sadeghi R (2018) Detection of Web site visitors based on fuzzy rough sets. Soft Comput 22(7):2175–2188

    Google Scholar 

  • Helton JC (2011) Quantification of margins and uncertainties: conceptual and computational basis. Reliab Eng Syst Saf 96:976–1013

    Google Scholar 

  • Hofmeyr DP, Pavlidis NG, Eckley IA (2016) Divisive clustering of high dimensional data streams. Stat Comput 26:1101–1120

    MathSciNet  MATH  Google Scholar 

  • Jin C, Yu JX, Zhou A, Cao F (2014) Efficient clustering of uncertain data streams. Knowl Inf Syst 40:509–539

    Google Scholar 

  • Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43

    Google Scholar 

  • Kranen P, Assent I, Baldauf C, Seidl T (2011) The ClusTree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst 29(2):249–272

    Google Scholar 

  • Li Y, Chen J, Feng L (2013) Dealing with uncertainty: a survey of theories and practices. IEEE Trans Knowl Data Eng 25(11):2463–2482

    Google Scholar 

  • Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95

    Google Scholar 

  • Masson M, Denœux T (2008) ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recogn 41:1384–1397

    MATH  Google Scholar 

  • Meesuksabai W, Kangkachit T, Waiyamai K (2011) HUE-stream: evolution-based clustering technique for heterogeneous data streams with uncertainty. In: Tang J, King I, Chen L, Wang J (eds) Advanced data mining and applications. ADMA 2011. Lecture notes in computer science. Springer, Berlin, pp 27–40

    Google Scholar 

  • Mousavi M, Abu Bakar A, Vakilian M (2015) Data stream clustering algorithms: a review. Int J Adv Soft Comput Appl 7:1–15

    Google Scholar 

  • Nguyen HL, Woon YK, Ng WK (2014) A survey on data stream clustering and classification. Knowl Inf Syst 45:535–569

    Google Scholar 

  • Patra BK, Nandi S (2015) Effective data summarization for hierarchical clustering. Knowl Inf Syst 42:1–20

    Google Scholar 

  • Pereira C, Mello R (2015) PTS: projected topological stream clustering algorithm. Neurocomputing 180:16–26

    Google Scholar 

  • Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57

    Google Scholar 

  • Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er M, Ding W, Lin C (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Google Scholar 

  • Serir L, Ramasso E, Zerhouni N (2012) Evidential evolving Gustafson–Kessel algorithm for online data streams partitioning using belief function theory. Int J Approx Reason 53:747–768

    MathSciNet  Google Scholar 

  • Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Shang G, Zhu J, Gao T, Zheng X, Zhang J (2018) Using multi-source remote sensing data to classify larch plantations in Northeast China and support the development of multi-purpose silviculture. J For Res 29(4):889–904

    Google Scholar 

  • Sheskin D (2011) Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Silva J, Hruschka E, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238

    Google Scholar 

  • Smets P (2000) Data fusion in the transferable belief model. In: Proceedings of the third international conference on information fusion, pp 21–33

  • Yang Y, Liu Z, Xing Z (2015) A review of uncertain data stream clustering algorithms. In: Eighth international conference on internet computing for science and engineering (ICICSE), Harbin, pp 111–116

  • Yin C, Xia L, Zhang S, Sun R, Wang J (2018) Improved clustering algorithm based on high-speed network data stream. Soft Comput 22:4185–4195

    Google Scholar 

  • Yin C, Zhang S, Yin Z, Wang J (2019) Anomaly detection model based on data stream clustering. Cluster Comput 22:1729–1738

    Google Scholar 

  • Yu X, Xu X, Lin L (2015) A data stream subspace clustering algorithm. In: Wang H et al (eds) Intelligent computation in big data era. ICYCSEE 2015. Communications in computer and information science. Springer, Berlin, pp 334–343

    Google Scholar 

  • Zabihi M, Vafaei Jahan M, Hamidzadeh J (2014) A density based clustering approach for web robot detection. In: Proceedings of the 4th international conference on computer and knowledge engineering. https://doi.org/10.1109/ICCKE.2014.6993362

  • Zaman K, Rangavajhala S, McDonald MP, Mahadevan S (2011) A probabilistic approach for representation of interval uncertainty. Reliab Eng Syst Saf 96:117–130

    Google Scholar 

  • Zhang B, Qin S, Wang W, Wang D, Xue L (2016) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Sig Process 126:111–116

    Google Scholar 

  • Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15:181–214

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javad Hamidzadeh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

By taking the derivative of Eq. (20) with respect to \( \lambda_{i} \), we have:

$$ \frac{{\partial {\mathcal{L}}}}{{\partial \lambda_{i} }} = \mathop \sum \limits_{{j\left| { c_{j} } \right\rangle 1}} m_{ij} + \mathop \sum \limits_{{l| c_{l} = 1}} m_{il} + m_{i\emptyset } - 1 = 0 $$
(31)

By adjusting and taking the derivative of Eq. (20) with respect to \( m_{ij} \), \( m_{il} \), and \( m_{i\emptyset } \), we have:

$$ m_{ij} = \left( {\frac{{\lambda_{i} }}{{(\beta ){\text{ow}}_{i} 2^{{ - \lambda \Delta t_{i} }} }}} \right)^{{\frac{1}{\beta - 1}}} \cdot \left( {\frac{1}{{\frac{{\mathop \sum \nolimits_{{\omega_{k} \in A_{j} }} d_{ik}^{2} + d_{ij}^{2} }}{{c_{j} + \gamma }}}}} \right)^{{\frac{1}{\beta - 1}}} $$
(32)
$$ m_{il} = \left( {\frac{{\lambda_{i} }}{{(\beta ){\text{ow}}_{i} 2^{{ - \lambda \Delta t_{i} }} }}} \right)^{{\frac{1}{\beta - 1}}} \cdot \left( {\frac{1}{{d_{il}^{2} }}} \right)^{{\frac{1}{\beta - 1}}} $$
(33)
$$ m_{i\emptyset } = \left( {\frac{{\lambda_{i} }}{{(\beta ){\text{ow}}_{i} 2^{{ - \lambda \Delta t_{i} }} }}} \right)^{{\frac{1}{\beta - 1}}} \cdot \left( {\frac{1}{{\delta^{2} }}} \right)^{{\frac{1}{\beta - 1}}} $$
(34)

And by using these equations in Eq. (31), we have:

$$ \begin{aligned} & \left( {\frac{{\lambda_{i} }}{{(\beta ){\text{ow}}_{i} 2^{{ - \lambda \Delta t_{i} }} }}} \right)^{{\frac{1}{\beta - 1}}} \\ & \quad = \left( {\mathop \sum \limits_{{j\left| { c_{j} } \right\rangle 1}} \frac{1}{{ \left( {\frac{{\mathop \sum \nolimits_{{\omega_{k} \in A_{j} }} d_{ik}^{2} + d_{ij}^{2} }}{{c_{j} + \gamma }}} \right)^{{\frac{1}{\beta - 1}}} }} + \mathop \sum \limits_{{l| c_{l} = 1}} \frac{1}{{ d_{il}^{{\frac{2}{\beta - 1}}} }} + \mathop \sum \limits_{{j| c_{j} = 0}} \frac{1}{{ \delta^{{\frac{2}{\beta - 1}}} }} } \right)^{ - 1} \\ \end{aligned} $$
(35)

Then, the equations of masses are obtained using Eq. (35) in Eqs. (32), (33), and (34).

Appendix 2

Taking the derivative of the objective function with respect to v, we have:

$$ \frac{{\partial J_{\text{DSCBF}} }}{{\partial v_{l} }} = \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{{j\left| { c_{j} } \right\rangle 1}} 2^{{ - \lambda \Delta t_{i} }} {\text{ow}}_{i} m_{ij}^{\beta } \frac{{\frac{{\partial d_{il}^{2} }}{{\partial v_{l} }} + \frac{{\partial d_{ij}^{2} }}{{\partial v_{l} }}}}{{c_{j} + \gamma }} + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{{l| c_{l} = 1}} 2^{{ - \lambda \Delta t_{i} }} {\text{ow}}_{i} m_{il}^{\beta } \frac{{\partial d_{il}^{2} }}{{\partial v_{l} }} $$
(36)

The partial derivatives of the distances with respect to v are given by

$$ \frac{{\partial d_{il}^{2} }}{{\partial v_{l} }} = 2\left( {v_{l} - x_{i} } \right)\quad c_{l} = 1 $$
(37)
$$ \begin{aligned} \frac{{\partial d_{ij}^{2} }}{{\partial v_{l} }} & = \frac{{2\left( {v_{l} - x_{i} } \right) + \frac{2}{{c_{j} }}\left( {\frac{{\mathop \sum \nolimits_{{ \omega_{g} \in A_{j} }} v_{g} }}{{c_{j} }} - x_{i} } \right)}}{{c_{j} + \gamma }} \\ & \quad \omega_{l} \in A_{j} \quad c_{j} > 1 \\ \end{aligned} $$
(38)

Using Eqs. (37) and (38) in Eq. (36), we have:

$$ \begin{aligned} & \left( {\mathop \sum \limits_{i = 1}^{n} 2^{{ - \lambda \Delta t_{i} }} {\text{ow}}_{i} m_{il}^{\beta } + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{{\omega_{l} \in A_{j} }} 2^{{ - \lambda \Delta t_{i} }} {\text{ow}}_{i} m_{ij}^{\beta } \frac{{1 + \frac{1}{{c_{j} }}}}{{c_{j} + \gamma }}} \right)x_{i} \\ & \quad = \mathop \sum \limits_{i = 1}^{n} 2^{{ - \lambda \Delta t_{i} }} {\text{ow}}_{i} m_{il}^{\beta } v_{l} + \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{{\omega_{l} \in A_{j} }} 2^{{ - \lambda \Delta t_{i} }} {\text{ow}}_{i} m_{ij}^{\beta } \frac{{\left( {\frac{{v_{l} + \mathop \sum \nolimits_{{ \omega_{g} \in A_{j} }} v_{g} }}{{c_{j}^{2} }}} \right)}}{{c_{j} + \gamma }} \\ \end{aligned} $$
(39)

To simplify the calculations and obtain the centers, the linear equation (Eq. 23) is presented.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamidzadeh, J., Ghadamyari, R. Clustering data stream with uncertainty using belief function theory and fading function. Soft Comput 24, 8955–8974 (2020). https://doi.org/10.1007/s00500-019-04422-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04422-4

Keywords