Skip to main content
Log in

Peer sampling gossip-based distributed clustering algorithm for unstructured P2P networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Clustering, as an unsupervised learning method and an important process in data mining, is an aspect of large and distributed data analysis. In many applications, such as peer-to-peer systems, huge volumes of data are distributed between multiple sources. Analysis of these volumes of data and identifying appropriate clusters is challenging due to transmission, processing and storage costs. In this paper, a gossip-based distributed clustering algorithm for P2P networks called Efficient GBDC-P2P is proposed, based on an improved gossip communicative approach by combining the peer sampeling and CYCLON protocol and the idea of partitioning-based data clustering. This algorithm is appropriate for data clustering in unstructured P2P networks, and it is adapted to the dynamic conditions of these networks. In the Efficient GBDC-P2P algorithm, distributed peers perform clustering operation in a distributed way only through local communications with their neighbors. Our approach does not rely on the central server to carry out data clustering task and without the need to synchronize operations. Evaluation results verify the efficiency of our proposed algorithm for data clustering in unstructured P2P networks. Furthermore, comparative analyses with other well-established distributed clustering approaches demonstrate the superior accuracy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Lodi S, Moro G, Sartori C (2010) Distributed data clustering in multi-dimensional peer-to-peer networks. In: ADC, pp 171–178

  2. Mashayekhi H, Habibi J, Voulgaris S, van Steen M (2013) GoSCAN: decentralized scalable data clustering. Computing 95(9):759–784

    Article  MathSciNet  MATH  Google Scholar 

  3. Schollmeier R (2001) A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications. In: Proceedings of the IEEE international conference on peer-to-peer computing, Linköping, Sweden

  4. Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, van Steen M (2015) GDCluster: a general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905

    Article  Google Scholar 

  5. Yang M, Yang Y (2008) An efficient hybrid peer-to-peer system for distributed data sharing. In: Proceedings of the 22nd IEEE international parallel and distributed processing symposium, Miami

  6. Samatova NF, Ostrouchov G, Geist A, Melechko AV (2002) RACHET: an efficient cover-based merging of clustering hierarchies from distributed datasets. Distrib Parallel Databases 11(2):157–180

    MATH  Google Scholar 

  7. Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), pp 211–218

  8. Da Silva J, Giannella C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Eng Appl Artif Intell 18(7):791–807

    Article  Google Scholar 

  9. Hammouda KM, Kamel MS (2009) Hierarchically distributed peer-to-peer document clustering and cluster summarization. IEEE Trans Knowl Data Eng 21(5):681–698

    Article  Google Scholar 

  10. Januzaj E, Kriegel H-P, Pfeifle M (2004) Scalable density-based distributed clustering. In: 8th European conference on principles and practice of knowledge discovery in databases, Springer, Berlin, pp 231–244

  11. Hammouda KM, Kamel MS (2014) Models of distributed data clustering in peer-to-peer environments. Knowl Inf Syst 38(2):303–329

    Article  Google Scholar 

  12. Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: the KDEC scheme. In: Proceedings of the AgentLink, pp 104–122

  13. Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput Spec Issue Distrib Data Min 10(4):18–26

    Article  Google Scholar 

  14. Park B, Kargupta H (2003) Distributed data mining: algorithms, systems, and applications. In: Ye N (ed) The Handbook of Data Mining. Lawrence Erlbaum Associates, pp 341–358

  15. Eisenhardt M, Muller W, Henrich A (2003) Classifying documents by distributed P2P clustering. Proc GI Jahrestag 2(35):286–291

    Google Scholar 

  16. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations.In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, Berkeley, Calif., University of California Press, vol 1, pp 281–297

  17. Datta S, Giannella C, Kargupta H (2006) K-means clustering over a large, dynamic network. In: Proceedings of the SIAM international conference on data mining, pp 153–164

  18. Datta S, Giannella CR, Kargupta H (2009) Approximate distributed K-means clustering over a peer-to-peer network. IEEE Trans Knowl Data Eng 21(10):1372–1388

    Article  Google Scholar 

  19. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of knowledge discovery in database (KDD), pp 226–231

  20. Di Fatta G, Blasa F, Cafiero S, Fortino G (2013) Fault tolerant decentralised K-means clustering for asynchronous large-scale networks. J Parallel Distrib Comput 73(3):317–329

    Article  Google Scholar 

  21. Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids. In: Dodge Y (ed) Statistical data analysis based on the L1-norm and related methods. North-Holland, Amsterdam, pp 405–416

    Google Scholar 

  22. Voulgaris S, Gavidia D, van Steen M (2005) CYCLON: inexpensive membership management for unstructured P2P overlays. J Netw Syst Manag 13:197–217

    Article  Google Scholar 

  23. Azimi R, Sajedi H (2014) Persistent K-means: stable data clustering algorithm based on K-means algorithm. J Comput Robot 7:57–66

    Google Scholar 

  24. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the international conference on research and development in information retrieval, pp 267–273

  25. Ruspini EH (1969) A new approach to clustering. Inf Control 15(1):22–32

    Article  MATH  Google Scholar 

  26. UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html

  27. Pena JM, Lozano JA, Larraiiaga P (1999) An empirical comparison of the initialization methods for the K-means algorithm. Pattern Recognit Lett 20(10):1027–1040

    Article  Google Scholar 

  28. Bradley PS, Fayyad UM (1998) Refining initial points for K-means clustering. In: ICML ‘98 Proceedings of the fifteenth international conference on machine learning, pp 91–99

  29. Barakbah AR, Kiyoki Y (2009) A pillar algorithm for K-means optimization by distance maximization for initial centroid designation. In: The IEEE symposium on computational intelligence and data mining, Nashville

  30. Bhusare BB, Bansode SM (2014) Centroids initialization for K-means clustering using improved pillar algorithm. Int J Adv Res Comput Eng Technol (IJARCET) 3(4):1317–1322

    Google Scholar 

  31. Tzortzis G, Likas A (2014) The MinMax K-means clustering algorithm. Pattern Recognit 47(7):2505–2516

    Article  Google Scholar 

  32. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hedieh Sajedi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azimi, R., Sajedi, H. Peer sampling gossip-based distributed clustering algorithm for unstructured P2P networks. Neural Comput & Applic 29, 593–612 (2018). https://doi.org/10.1007/s00521-017-3119-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3119-0

Keywords

Navigation