ABSTRACT
In the ever-expanding world of IoT, data has not only increased in volume and velocity but has also moved from residing in centralized nodes to distributed nodes across multiple locations. Traditional data clustering technologies, based on centralized operations, cannot be scaled to efficiently manage Big Data, thus creating a need for clustering in distributed environments. To address the problem of modularity, flexibility, and scalability, a dynamic hierarchical two-tier architecture and model for cooperative clustering in distributed super-peer P2P network is presented in this paper. The proposed model is called Distributed Cooperative Clustering in super-peer P2P networks (DCCP2P). It involves a hierarchy of two layers of P2P neighborhoods. In the first layer, peers in each neighborhood are responsible for building local cooperative sub-clusters from the local data. Each node sends a summarized view of local data to its super-peer in a form of sub-cluster's centroids extracted from the local cooperative clustering, minimizing the exchange of information between nodes and their super-peers. In the next layer, sub-clusters are merged at each super-peer and at the root of the hierarchy, where one global clustering can be derived. The distributed cooperative approach finds globally optimized clusters and achieves significant improvement in global clustering solutions without the cost of centralized clustering.
- S. Datta, C. Giannella, and H. Kargupta. "K-means Clustering over Peer-to-Peer Networks". In Workshop on High Performance and Distributed Mining (HPDM05). SIAM International Conference on Data Mining (SDM05), 2005.Google Scholar
- N. Samatova, G. Ostrouchov, A. Geist, and A. Melechko. "RACHET: An efficient cover-based Merging of Clustering Hierarchies from Distributed Datasets". Distributed and Parallel Databases, 11(2), pp: 157--180, 2002. Google ScholarDigital Library
- J. da Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch. "Distributed Data Mining and Agents". Engineering Applications of Artificial Intelligence, 18(7), pp: 791--807, 2005. Google ScholarDigital Library
- M. Klusch, S. Lodi, and G. Moro. "Agent-based Distributed Data Mining: The KDEC scheme". In AgentLink, pp: 104--122, 2003. Google ScholarDigital Library
- E. Januzaj, H. Kriegel, and M. Pfeifle. "DBDC: Density based Distributed Clustering". In EDBT, pp: 88--105, 2004Google ScholarCross Ref
- S. Datta, K. Bhaduri, C. Giannella, R. Wol, and H. Kargupta. "Distributed Data Mining in Peer-to-Peer Networks". IEEE Internet Computing, 10(4), pp: 18--26, 2006. Google ScholarDigital Library
- M. Eisenhardt, W. Muller, and A. Henrich. "Classifying Documents by Distributed P2P Clustering". In Informatik 2003: Innovative Information Technology Uses, pp: 286--291, 2003.Google Scholar
- L. Kaufmann and P. Rousseeuw, Finding groups in data, Wiley, 1990.Google ScholarCross Ref
- V. Lo, D. Zhou, Y. Liu, C. Dickey, and J. Li, "Scalable Super node Selection in Peer-to-Peer Overlay Networks", Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to- Peer Systems, pp: 18--27, 2005. Google ScholarDigital Library
- S. Demirci, A Yardimci and M. Sayit, "A hierarchical P2P clustering framework for video streaming systems". Computer Standards & Interfaces, Vol. 49, 2017Google ScholarCross Ref
- Z. Erkin. "Privacy-preserving distributed clustering". EURASIP Journal on Information Security, 2013.Google ScholarCross Ref
- S. Merugu and J. Ghosh. "Privacy-preserving Distributed Clustering using Generative Models". In IEEE International Conference on Data Mining (ICDM), pp: 211--218, 2003. Google ScholarDigital Library
- D. Boley, M. Gini, R. Gross, S. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. "Partitioning-Based Clustering for Web Document Categorization". Decision Support Systems, Vol.27, pp: 329--341, 1999. Google ScholarDigital Library
- M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J. Olson, J.Marks, and J. Nevins. "Predicting the Clinical Status of Human Breast Cancer by using Gene Expression Profiles" Proc Natl Acad Sci, pp: 11462--11467, 2001.Google Scholar
- B. Larsen, C. Aone, Fast and effective text mining using linear-time document clustering, In proc. 5th ACM SIGKDD Int. conf. on Knowledge Discovery and Data Mining, pp. 16--22, 1999. Google ScholarDigital Library
- P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53--65, 1987 Google ScholarDigital Library
- J. Hartigan and M. Wong. "A k-means Clustering Algorithm", Applied Statistics, Vol. 28, pp: 100--108, 1979.Google ScholarCross Ref
- S. Savaresi and D. Boley. "On the Performance of Bisecting K-means and PDDP". In Proc. of the 1st SIAM Int. Conf. on Data Mining, pp: 1--14, 2001.Google ScholarCross Ref
Index Terms
- Handling Large-Scale Data using Two-Tier Hierarchical Super-Peer P2P Network
Recommendations
A self-similar super-peer overlay construction scheme for super large-scale P2P applications
Unstructured peer-to-peer (P2P) overlay networks with two-layer hierarchy, comprising an upper layer of super-peers and an underlying layer of ordinary peers, are used to improve the performance of large-scale P2P applications like content distribution ...
A Novel Hierarchical Overlay for Structured Peer-to-Peer Network
SCALCOM-EMBEDDEDCOM '09: Proceedings of the 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded ComputingLarge-scale P2P systems typically have hundreds of thousands of peers that involve frequent dynamic activities. Empirical studies have shown that participating nodes in P2P systems are not equivalent. Such heterogeneity has been taken into account in ...
Trustworthiness of Acquaintances in Peer-to-Peer(P2P) Overlay Networks
CISIS '10: Proceedings of the 2010 International Conference on Complex, Intelligent and Software Intensive SystemsSystems using peer-to-peer (P2P) overlay networks are getting a central position in information systems. P2P systems are in nature fully distributed, with no centralized coordinator and each peer is autonomous. Each peer has to obtain information on ...
Comments