Skip to main content
Log in

Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We have entered the era of networked communications where concepts such as big data and social networks are emerging. The explosion and profusion of available data in a broad range of application domains cause data streams to become an inevitable part of the most real-world applications. In the classification of data streams, there are four major challenges: infinite length, concept drift, recurring and evolving concepts. This paper proposes a novel method to address the mentioned challenges with a focus on the last one. Unlike the existing methods for detection of evolving concepts, we cast joint classification and detection of evolving concepts into optimizing an objective function by extending a fuzzy agglomerative clustering method. Moreover, rather than keeping instances or hyper-sphere summaries of previously seen classes, we just maintain boundaries in the kernel space and generate instances of each class on demand. This approach enhances the accuracy and reduces the memory usage of the proposed method. We empirically evaluated and showed the effectiveness of the proposed approach on several synthetic and real datasets. Experimental results on synthetic and real datasets show the superiority of the proposed method over the related state-of-the-art methods in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Hosseini MJ, Gholipour A, Beigy H (2016) An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowl Inf Syst 46(3):567–597

    Article  Google Scholar 

  2. Dehghan M, Beigy H, ZareMoodi P (2016) A novel concept drift detection method in data streams using ensemble classifiers. Intell Data Anal 20(6):1329–1350

    Article  Google Scholar 

  3. Hosseini MJ, Gholipour A, Beigy H (2016) An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowl Inf Syst 46(3):567–597

    Article  Google Scholar 

  4. Faria ER, Gonçalves IJCR, de Carvalho ACPLF, Gama J (2016) Novelty detection in data streams. Artif Intell Rev 45(2):235–269. https://doi.org/10.1007/s10462-015-9444-8

    Article  Google Scholar 

  5. Abdallah ZS, Gaber MM, Srinivasan B, Krishnaswamy S (2016) Anynovel: detection of novel concepts in evolving data streams. Evol Syst 7(2):73–93

    Article  Google Scholar 

  6. de Faria ER, Goncalves IR, Gama J, de Leon Ferreira ACP et al (2015) Evaluation of multiclass novelty detection algorithms for data streams. IEEE Trans Knowl Data Eng 27(11):2961–2973

    Article  Google Scholar 

  7. Faria ER, Ponce De Leon Ferreira Carvalho AC, Gama J (2016) MINAS: multiclass learning algorithm for novelty detection in data streams. Data Min Knowl Discov 30(3):640–680. https://doi.org/10.1007/s10618-015-0433-y

    Article  MathSciNet  Google Scholar 

  8. Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249

    Article  Google Scholar 

  9. ZareMoodi P, Beigy H, Siahroudi SK (2015) Novel class detection in data streams using local patterns and neighborhood graph. Neurocomputing 158:234–245

    Article  Google Scholar 

  10. Masud MM, Gao J, Khan L, Han J, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  11. Al-Khateeb T, Masud MM, Khan L, Aggarwal C, Han J, Thuraisingham B (2012) Stream classification with recurring and novel class detection using class-based ensemble. In: Proceedings of the IEEE 12th international conference on data mining (ICDM). IEEE, pp 31–40

  12. ZareMoodi P, Siahroudi SK, Beigy H (2016) A support vector based approach for classification beyond the learned label space in data streams. In: Proceedings of the 31st annual ACM symposium on applied computing. ACM, pp 910–915

  13. Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han J, Srivastava A, Oza NC (2013) Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans Knowl Data Eng 25(7):1484–1497

    Article  Google Scholar 

  14. Farid DM, Rahman CM (2012) Novel class detection in concept-drifting data stream mining employing decision tree. In: Proceedings of the 7th international conference on electrical and computer engineering (ICECE). IEEE, pp 630–633

  15. Faria ER, Gama J, Carvalho AC (2013) Novelty detection algorithm for data streams multi-class problems. In: Proceedings of the 28th annual ACM symposium on applied computing. ACM, pp 795–800

  16. Spinosa EJ, de Leon F de Carvalho AP, Gama J (2007) Olindda: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2007 ACM symposium on applied computing. ACM, New York, NY, USA, pp 448–452. https://doi.org/10.1145/1244002.1244107

  17. Mu X, Ting KM, Zhou Z (2016) Classification under streaming emerging new classes: a solution using completely random trees. CoRR arXiv:1605.09131

  18. Haque A, Khan L, Baron M (2015) Semi supervised adaptive framework for classifying evolving data stream. In: PAKDD (2). Volume 9078 of lecture notes in computer science. Springer, pp 383–394

  19. Haque A, Khan L, Baron M (2016) SAND: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, pp 1652–1658

  20. Bouguelia M, Belaïd Y, Belaïd A (2014) Efficient active novel class detection for data stream classification. In: ICPR. IEEE Computer Society, pp 2826–2831

  21. Bouguelia M, Belaïd Y, Belaïd A (2013) A stream-based semi-supervised active learning approach for document classification. In: 12th International conference on document analysis and recognition, ICDAR 2013, Washington, DC, USA, August 25–28, 2013, pp 611–615

  22. Siahroudi SK, Moodi PZ, Beigy H (2018) Detection of evolving concepts in non-stationary data streams: a multiple kernel learning approach. Exp Syst Appl 91:187–197

    Article  Google Scholar 

  23. Rigollet P (2007) Generalization error bounds in semi-supervised classification under the cluster assumption. J Mach Learn Res 8:1369–1392

    MathSciNet  MATH  Google Scholar 

  24. Camci F, Chinnam RB (2008) General support vector representation machine for one-class classification of non-stationary classes. Pattern Recognit 41(10):3021–3034

    Article  MATH  Google Scholar 

  25. Krawczyk B, Woźniak M (2013) Incremental learning and forgetting in one-class classifiers for data streams. In: Proceedings of the 8th international conference on computer recognition systems. Springer, pp 319–328

  26. Li MJ, Ng MK, Cheung Y, Huang JZ (2008) Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE Trans Knowl Data Eng 20(11):1519–1534

    Article  Google Scholar 

  27. Sun H, Wang S, Jiang Q (2004) FCM-based model selection algorithms for determining the number of clusters. Pattern Recognit 37(10):2027–2037

    Article  MATH  Google Scholar 

  28. Tax DM, Duin RP (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173

    MATH  Google Scholar 

  29. Ullman NR (1978) Elementary statistics: an applied approach. Wiley, New York

    Google Scholar 

  30. Mika S, Schölkopf B, Smola AJ, Müller KR, Scholz M, Rätsch G (1998) Kernel PCA and de-noising in feature spaces. In: NIPS, vol 4, p 7

  31. Schölkopf B, Mika S, Burges CJ, Knirsch P, Müller KR, Rätsch G, Smola AJ (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10(5):1000–1017

    Article  Google Scholar 

  32. Dua D, Efi KT (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments which improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Beigy.

Appendix

Appendix

In this appendix, we give the proof of Theorem 1. The goal of the optimization procedure is to simultaneously find fuzzy memberships U and cluster centers Z such that the objective function given in Eq. (11) is minimized. Onclad adopts an alternating optimization approach to minimize \(J_{\textsc {Onclad}}\). Minimizing \(J_{\textsc {Onclad}}\) with the constraints is a kind of constrained nonlinear optimization problem. We use Lagrange multipliers, and then the following Lagrange function is obtained.

$$\begin{aligned} J_{\textsc {Onclad}}(U,Z)= & {} \sum \limits _{j=1}^{K}\sum \limits _{i=1}^{N}u_{ij}d_{ij} + \gamma \sum \limits _{j=1}^{K}\sum \limits _{i=1}^{N}u_{ij}\log u_{ij}\nonumber \\&+\, \alpha \sum \limits _{m|P_m \in C^i}\sum \limits _{\begin{array}{c} n| P_n \in C^i \\ m\ne n \end{array}}\sum \limits _{k=1}^{K}\sum \limits _{\begin{array}{c} l=1 \\ l\ne k \end{array}}^{K}u_{mk}u_{nl}\nonumber \\&+\, \beta \sum \limits _{m|P_m \in C^i}\sum \limits _{\begin{array}{c} n| P_n \in C^j \\ i\ne j \end{array}}\sum \limits _{k=1}^{K}u_{mk}u_{nk}\nonumber \\&+\, \sum \limits _{i=1}^{n}{\lambda }_i \left( \sum \limits _{j=1}^{K}u_{ij}-1\right) \end{aligned}$$
(11)

This objective function is minimized using alternating optimization approach. First, we fix fuzzy memberships U and minimize the objective function with respect to Z and then we fix cluster centers Z and minimize the objective function with respect to U. Optimizing the cluster centers \(Z \equiv [z_{jl}]_{K \times m}\) and the fuzzy memberships \(U \equiv [u_{ij}]_{K \times m}\) obtained by the following lemmas.

Lemma 1

Given the fuzzy memberships U are fixed, the optimal values for the cluster centers \(Z \equiv [z_{jl}]_{K \times m}\) are obtained using following equation:

$$\begin{aligned} z_{jl} = \dfrac{\sum \nolimits _{i=1}^{N} u_{ij} x_{il}}{\sum \nolimits _{i=1}^{N} u_{ij}}. \end{aligned}$$
(12)

Proof

By taking derivative of Eq. (11) with respect to each cluster center and setting it to zero, we obtain:

$$\begin{aligned} \dfrac{\partial J(U,\mathbf Z )}{\partial z_{jl}} = \sum \limits _{i=1}^{N}2u_{ij}(z_{jl}-x_{il})=0 \end{aligned}$$
(13)

Thus, the solution for \(z_{jl}\) equals to

$$\begin{aligned} z_{jl} = \dfrac{\sum \nolimits _{i=1}^{N} u_{ij} x_{il}}{\sum \nolimits _{i=1}^{N} u_{ij}}, \end{aligned}$$
(14)

which completes the proof of lemma. \(\square \)

Lemma 2

Given the cluster centers Z are fixed, the optimal value of fuzzy memberships are equal to:

$$\begin{aligned} u_{ij}= \dfrac{\exp \left( \dfrac{-d_{ij}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{ij}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{ij}}{\gamma }\right) }{\sum \nolimits _{l=1}^{K} \exp \left( \dfrac{-d_{il}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{il}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{il}}{\gamma }\right) } \end{aligned}$$
(15)

Proof

Taking derivative of Eq. (11) with respect to each fuzzy membership and setting it to zero, we obtain:

$$\begin{aligned} \dfrac{\partial J(\mathbf U ,Z)}{\partial u_{ij}}= & {} d_{ij} + \gamma (1 + \log u_{ij}) + \alpha \left( \underbrace{\sum \limits _{\begin{array}{c} n|P_i \in C^m, P_n \in C^l \\ m\ne l \end{array}} \sum \limits _{\begin{array}{c} k=1 \\ k\ne j \end{array}}^{K} u_{nk}}_{A_{ij}}\right) \nonumber \\&+\, \beta \left( \underbrace{\sum \limits _{n|P_i,P_n \in C^m}u_{nj}}_{B_{ij}}\right) + \lambda _i=0 \end{aligned}$$
(16)

Solving the above equation for \(u_{ij}\), we obtain:

$$\begin{aligned} u_{ij}= \exp (-1)\exp \left( \dfrac{-d_{ij}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{ij}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{ij}}{\gamma }\right) \exp \left( \dfrac{-\lambda _i}{\gamma }\right) \end{aligned}$$
(17)

Because of the constraint \(\sum \nolimits _{j=1}^{K} u_{ij}=1\), the Lagrange multipliers are equal to

$$\begin{aligned} \sum \limits _{j=1}^{K} u_{ij}= & {} \sum \limits _{j=1}^{K} \exp (-1)\exp \left( \dfrac{-d_{ij}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{ij}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{ij}}{\gamma }\right) \exp \left( \dfrac{-\lambda _i}{\gamma }\right) \nonumber \\= & {} \exp (-1)\exp \left( \dfrac{-\lambda _i}{\gamma }\right) \sum \limits _{j=1}^{K} \exp \left( \dfrac{-d_{ij}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{ij}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{ij}}{\gamma }\right) = 1\nonumber \\ \end{aligned}$$
(18)

By some algebraic simplification, we obtain:

$$\begin{aligned} \exp \left( \dfrac{-\lambda _i}{\gamma }\right) = \dfrac{1}{\exp (-1) \sum \nolimits _{j=1}^{K} \exp \left( \dfrac{-d_{ij}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{ij}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{ij}}{\gamma }\right) } \end{aligned}$$
(19)

By substituting Eq. (19) in Eq. (17), we obtain the closed form solution for the optimal memberships as

$$\begin{aligned} u_{ij}= \dfrac{\exp \left( \dfrac{-d_{ij}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{ij}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{ij}}{\gamma }\right) }{\sum \nolimits _{l=1}^{K} \exp \left( \dfrac{-d_{il}}{\gamma }\right) \exp \left( \dfrac{-\alpha A_{il}}{\gamma }\right) \exp \left( \dfrac{-\beta B_{il}}{\gamma }\right) }, \end{aligned}$$
(20)

which completes the proof of lemma. \(\square \)

The following lemma shows the convergence of the alternating minimization procedure by updating Z and U using Eqs. (12) and (15), respectively.

Lemma 3

Let J(Z) be \(J_{\textsc {Onclad}}\) where fuzzy memberships are fixed, let J(U) be \(J_{\textsc {Onclad}}\) where cluster centers are fixed and \(\alpha , \beta , \gamma >0\). Z and U are local optimum of \(J_{\textsc {Onclad}}\) if \(z_{ij}\) and \(u_{ij}\) are calculated using Eqs. (12) and (15), respectively.

Proof

The necessity has been proven in Lemmas 1 and 2. In order to prove their sufficiency, the Hessian matrices H(J(Z)) of J(Z) and H(J(U)) of J(U) are obtained as follows.

$$\begin{aligned} h_{fg,il}(J(Z))= & {} \dfrac{\partial }{\partial _{fg}}\left[ \dfrac{\partial J(Z)}{\partial z_{il}}\right] = {\left\{ \begin{array}{ll} \sum \nolimits _{j=1}^{K} 2u_{ij}, &{}\quad \textit{if}\, f=i, g=l\\ 0, &{} \quad \textit{otherwise}\\ \end{array}\right. } \end{aligned}$$
(21)
$$\begin{aligned} h_{fg,ij}(J(U))= & {} \dfrac{\partial }{\partial _{fg}}\left[ \dfrac{\partial J(U)}{\partial u_{ij}}\right] = {\left\{ \begin{array}{ll} \dfrac{\gamma }{u_{ij}}, &{} \quad \textit{if}\, f=i, g=j\\ 0, &{}\quad \textit{otherwise}\\ \end{array}\right. } \end{aligned}$$
(22)

According to these equations, H(J(Z)) and H(J(U)) are diagonal matrices. Also, it is mentioned that \(u_{ij}\in (0,1]\) and \(\gamma >0\). Hence, the Hessian matrices are positive definite, and Eqs. (12) and (15) are the sufficient conditions to minimize J(Z) and J(U), respectively. \(\square \)

Proof of Theorem 1

The necessary conditions for \(J_{\textsc {Onclad}}\) to attain its local minimum were proven in Lemmas 1 and 2. According to Lemma 3, \(J_{\textsc {Onclad}}(U^{(t+1)},Z^{(t+1)}) \le J_{\textsc {Onclad}}(U^{(t)},Z^{(t)})\), and the convergence to the local minima is proved.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

ZareMoodi, P., Kamali Siahroudi, S. & Beigy, H. Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach. Knowl Inf Syst 60, 1329–1352 (2019). https://doi.org/10.1007/s10115-018-1266-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1266-y

Keywords

Navigation