Abstract
Clustering is a useful method of analyzing large data sets, such as distributed data streams, which are increasingly observed in various applications. In this paper, a collaborative gossip-based approach is proposed for deriving a fuzzy clustering model of distributed dynamic data which involve concept drift. The proposed algorithm consists of local and collaborative phases. During the two phases, prototypes of data are constructed which constitute a summarized view of the distributed data. This summarized view enables each node to extract a custom subset of the overall clustering model. Scalability is achieved by using gossip as a robust method of communication, and also prevention of excessive data transfer among nodes. When concept drift is present, the clustering model incrementally evolves and outdated parts of the summarized view are removed. The experimental results, with different scenarios of data distribution, show that the proposed method can detect fuzzy clusters efficiently, and adapt with concept-drifting data, with bounded communication costs compared to other state of the art algorithms.
Similar content being viewed by others
References
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer, Boston
Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 7:7
Hammouda KM, Kamel MS (2014) Models of distributed data clustering in peer-to-peer environments. Knowl Inf Syst 38(3):303–329
Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) GDCluster: a general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905
Rodrigues PP, Gama J (2014) Distributed clustering of ubiquitous data streams. Wiley Interdiscip Rev Data Min Knowl Discov 4(1):38–54
Jiang Y, Chung FL, Wang S, Deng Z, Wang J, Qian P (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(5):688–701
Lodi S, Moro G, Sartori C (2010) Distributed data clustering in multi-dimensional peer-to-peer networks. In: Proceedings of the twenty-first Australas, pp 171–178
Vendramin L, Naldi MC, Campello RJGB (2015) Fuzzy clustering algorithms and validity indices for distributed data. In: Proceedings of partitional clustering algorithms, pp 147–192
Naldi MC, Campello RJGB (2014) Evolutionary k-means for distributed data sets. Neurocomputing 127:30–42
Zhang Q, Liu J, Wang W (2008) Approximate clustering on distributed data streams. In: Proceedings of IEEE 24th international conference on data engineering, pp 1131–1139
Long B, Yu PS, Zhang Z (2008) A general model for multiple view unsupervised learning. In: Proceedings of 2008 SIAM international conference on data mining, pp 822–833
Dhillon IS, Modha DS (2002) A data-clustering algorithm on distributed memory multiprocessors. In: Proceedings of large-scale parallel data mining, pp 245–260
Karunaratne P, Karunasekera S, Harwood A (2017) Distributed stream clustering using micro-clusters on Apache Storm. J Parallel Distrib Comput 108:74–84
Datta S, Giannella C, Kargupta H (2009) Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans Knowl Data Eng 21(10):1372–1388
Elgohary A, Ismail MA (2011) Efficient data clustering over peer-to-peer networks. In: Proceedings of the 11th international conference on intelligent systems design and applications, pp 208–212
Di Fatta G, Blasa F, Cafiero S, Fortino G (2011) Epidemic k-means clustering. In: Proceedings of IEEE 11th international conference on data mining workshops, pp 151–158
Fellus J, Picard D, Gosselin PH (2013) Decentralized k-means using randomized gossip protocols for clustering large datasets. In: Proceedings of IEEE 13th international conference on data mining workshops, pp 599–606
Zhou J, Chen CP, Chen L, Li H X (2014) A collaborative fuzzy clustering algorithm in distributed network environments. IEEE Trans Fuzzy Syst 22(7):1443–1456
Mashayekhi H, Habibi J, Voulgaris S, van Steen M (2013) GoSCAN: decentralized scalable data clustering. Computing 95(9):759–784
Azimi R, Sajedi H (2018) Peer sampling gossip-based distributed clustering algorithm for unstructured P2P networks. Neural Comput Appl 29(3):593–612
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57
Wan R, Yan X, Su X (2008) A weighted fuzzy clustering algorithm for data stream. In: Proceedings of the ISECS international colloquium on computing, communication, control, and management, vol 1, pp 360–364
Baruah RD, Angelov P (2014) DEC: dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631
Mostafavi S, Amiri A (2012) Extending fuzzy c-means to clustering data streams. In: Proceedings of the 20th Iranian conference on electrical engineering, pp 726–729
Coletta LF, Vendramin L, Hruschka ER, Campello RJ, Pedrycz W (2012) Collaborative fuzzy clustering algorithms: some refinements and design guidelines. IEEE Trans Fuzzy Syst 20(4):444–462
Pedrycz W (2002) Collaborative fuzzy clustering. Pattern Recognit Lett 23(14):1675–1686
Dang TH, Ngo LT, Pedrycz W (2016) Multiple kernel based collaborative fuzzy clustering algorithm. In: Proceedings of the Asian conference on intelligent information and database systems, pp 585–594
Chao G, Sun S, Bi J (2017) A survey on multi-view clustering. arXiv:1712.06246
Visalakshi NK, Thangavel K (2009) Distributed data clustering: a comparative analysis. In: Proceedings of the foundations of computational, intelligence, vol 6, pp 371–397
Rahimi S, Zargham M, Thakre A, Chhillar D (2004) A parallel fuzzy C-mean algorithm for image segmentation. In: Proceedings of IEEE annual meeting of the fuzzy information, vol 1, pp 234–237
Pedrycz W, Rai P (2008) Collaborative clustering with the use of fuzzy C-means and its quantification. Fuzzy Sets Syst 159(18):2399–2427
Shen Y, Pedrycz W (2017) Collaborative fuzzy clustering algorithm: some refinements. Int J Approx Reason 86:41–61
Zarinbal M, Zarandi MF, Turksen IB (2015) Relative entropy collaborative fuzzy clustering method. Pattern Recogn 48(4):933–940
Son LH (2015) DPFCM. Expert Syst Appl 42(1):51–66
Mosk-Aoyama D, Shah D (2006) Computing separable functions via gossip. In: Proceedings of the twenty-fifth annual ACM symposium on principles of distributed computing, pp 113–122
Jelasity M, Voulgaris S, Guerraoui R, Kermarrec AM, Van Steen M (2007) Gossip-based peer sampling. ACM Trans Comput Syst 25(4):8
Campello RJ, Hruschka ER (2006) A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst 157(21):2858–2875
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc, 66
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 267–273
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mashayekhi, H. Collaborative fuzzy clustering of distributed concept-drifting dynamic data using a gossip-based approach. Appl Intell 48, 4905–4922 (2018). https://doi.org/10.1007/s10489-018-1260-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1260-9