Skip to main content
Log in

Collaborative fuzzy clustering of distributed concept-drifting dynamic data using a gossip-based approach

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Clustering is a useful method of analyzing large data sets, such as distributed data streams, which are increasingly observed in various applications. In this paper, a collaborative gossip-based approach is proposed for deriving a fuzzy clustering model of distributed dynamic data which involve concept drift. The proposed algorithm consists of local and collaborative phases. During the two phases, prototypes of data are constructed which constitute a summarized view of the distributed data. This summarized view enables each node to extract a custom subset of the overall clustering model. Scalability is achieved by using gossip as a robust method of communication, and also prevention of excessive data transfer among nodes. When concept drift is present, the clustering model incrementally evolves and outdated parts of the summarized view are removed. The experimental results, with different scenarios of data distribution, show that the proposed method can detect fuzzy clusters efficiently, and adapt with concept-drifting data, with bounded communication costs compared to other state of the art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/

References

  1. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer, Boston

    Book  Google Scholar 

  2. Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 7:7

    Google Scholar 

  3. Hammouda KM, Kamel MS (2014) Models of distributed data clustering in peer-to-peer environments. Knowl Inf Syst 38(3):303–329

    Article  Google Scholar 

  4. Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) GDCluster: a general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905

    Article  Google Scholar 

  5. Rodrigues PP, Gama J (2014) Distributed clustering of ubiquitous data streams. Wiley Interdiscip Rev Data Min Knowl Discov 4(1):38–54

    Article  Google Scholar 

  6. Jiang Y, Chung FL, Wang S, Deng Z, Wang J, Qian P (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(5):688–701

    Article  Google Scholar 

  7. Lodi S, Moro G, Sartori C (2010) Distributed data clustering in multi-dimensional peer-to-peer networks. In: Proceedings of the twenty-first Australas, pp 171–178

  8. Vendramin L, Naldi MC, Campello RJGB (2015) Fuzzy clustering algorithms and validity indices for distributed data. In: Proceedings of partitional clustering algorithms, pp 147–192

    Google Scholar 

  9. Naldi MC, Campello RJGB (2014) Evolutionary k-means for distributed data sets. Neurocomputing 127:30–42

    Article  Google Scholar 

  10. Zhang Q, Liu J, Wang W (2008) Approximate clustering on distributed data streams. In: Proceedings of IEEE 24th international conference on data engineering, pp 1131–1139

  11. Long B, Yu PS, Zhang Z (2008) A general model for multiple view unsupervised learning. In: Proceedings of 2008 SIAM international conference on data mining, pp 822–833

    Chapter  Google Scholar 

  12. Dhillon IS, Modha DS (2002) A data-clustering algorithm on distributed memory multiprocessors. In: Proceedings of large-scale parallel data mining, pp 245–260

    Chapter  Google Scholar 

  13. Karunaratne P, Karunasekera S, Harwood A (2017) Distributed stream clustering using micro-clusters on Apache Storm. J Parallel Distrib Comput 108:74–84

    Article  Google Scholar 

  14. Datta S, Giannella C, Kargupta H (2009) Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans Knowl Data Eng 21(10):1372–1388

    Article  Google Scholar 

  15. Elgohary A, Ismail MA (2011) Efficient data clustering over peer-to-peer networks. In: Proceedings of the 11th international conference on intelligent systems design and applications, pp 208–212

  16. Di Fatta G, Blasa F, Cafiero S, Fortino G (2011) Epidemic k-means clustering. In: Proceedings of IEEE 11th international conference on data mining workshops, pp 151–158

  17. Fellus J, Picard D, Gosselin PH (2013) Decentralized k-means using randomized gossip protocols for clustering large datasets. In: Proceedings of IEEE 13th international conference on data mining workshops, pp 599–606

  18. Zhou J, Chen CP, Chen L, Li H X (2014) A collaborative fuzzy clustering algorithm in distributed network environments. IEEE Trans Fuzzy Syst 22(7):1443–1456

    Article  Google Scholar 

  19. Mashayekhi H, Habibi J, Voulgaris S, van Steen M (2013) GoSCAN: decentralized scalable data clustering. Computing 95(9):759–784

    Article  MathSciNet  Google Scholar 

  20. Azimi R, Sajedi H (2018) Peer sampling gossip-based distributed clustering algorithm for unstructured P2P networks. Neural Comput Appl 29(3):593–612

    Article  Google Scholar 

  21. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57

    Article  MathSciNet  Google Scholar 

  22. Wan R, Yan X, Su X (2008) A weighted fuzzy clustering algorithm for data stream. In: Proceedings of the ISECS international colloquium on computing, communication, control, and management, vol 1, pp 360–364

  23. Baruah RD, Angelov P (2014) DEC: dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631

    Article  Google Scholar 

  24. Mostafavi S, Amiri A (2012) Extending fuzzy c-means to clustering data streams. In: Proceedings of the 20th Iranian conference on electrical engineering, pp 726–729

  25. Coletta LF, Vendramin L, Hruschka ER, Campello RJ, Pedrycz W (2012) Collaborative fuzzy clustering algorithms: some refinements and design guidelines. IEEE Trans Fuzzy Syst 20(4):444–462

    Article  Google Scholar 

  26. Pedrycz W (2002) Collaborative fuzzy clustering. Pattern Recognit Lett 23(14):1675–1686

    Article  Google Scholar 

  27. Dang TH, Ngo LT, Pedrycz W (2016) Multiple kernel based collaborative fuzzy clustering algorithm. In: Proceedings of the Asian conference on intelligent information and database systems, pp 585–594

  28. Chao G, Sun S, Bi J (2017) A survey on multi-view clustering. arXiv:1712.06246

  29. Visalakshi NK, Thangavel K (2009) Distributed data clustering: a comparative analysis. In: Proceedings of the foundations of computational, intelligence, vol 6, pp 371–397

    Google Scholar 

  30. Rahimi S, Zargham M, Thakre A, Chhillar D (2004) A parallel fuzzy C-mean algorithm for image segmentation. In: Proceedings of IEEE annual meeting of the fuzzy information, vol 1, pp 234–237

  31. Pedrycz W, Rai P (2008) Collaborative clustering with the use of fuzzy C-means and its quantification. Fuzzy Sets Syst 159(18):2399–2427

    Article  MathSciNet  Google Scholar 

  32. Shen Y, Pedrycz W (2017) Collaborative fuzzy clustering algorithm: some refinements. Int J Approx Reason 86:41–61

    Article  MathSciNet  Google Scholar 

  33. Zarinbal M, Zarandi MF, Turksen IB (2015) Relative entropy collaborative fuzzy clustering method. Pattern Recogn 48(4):933–940

    Article  Google Scholar 

  34. Son LH (2015) DPFCM. Expert Syst Appl 42(1):51–66

    Article  Google Scholar 

  35. Mosk-Aoyama D, Shah D (2006) Computing separable functions via gossip. In: Proceedings of the twenty-fifth annual ACM symposium on principles of distributed computing, pp 113–122

  36. Jelasity M, Voulgaris S, Guerraoui R, Kermarrec AM, Van Steen M (2007) Gossip-based peer sampling. ACM Trans Comput Syst 25(4):8

    Article  Google Scholar 

  37. Campello RJ, Hruschka ER (2006) A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst 157(21):2858–2875

    Article  MathSciNet  Google Scholar 

  38. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc, 66

  39. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  40. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 267–273

  41. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoda Mashayekhi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mashayekhi, H. Collaborative fuzzy clustering of distributed concept-drifting dynamic data using a gossip-based approach. Appl Intell 48, 4905–4922 (2018). https://doi.org/10.1007/s10489-018-1260-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1260-9

Keywords

Navigation