skip to main content
10.1145/3175684.3175726acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdiotConference Proceedingsconference-collections
research-article

Handling Large-Scale Data using Two-Tier Hierarchical Super-Peer P2P Network

Published:20 December 2017Publication History

ABSTRACT

In the ever-expanding world of IoT, data has not only increased in volume and velocity but has also moved from residing in centralized nodes to distributed nodes across multiple locations. Traditional data clustering technologies, based on centralized operations, cannot be scaled to efficiently manage Big Data, thus creating a need for clustering in distributed environments. To address the problem of modularity, flexibility, and scalability, a dynamic hierarchical two-tier architecture and model for cooperative clustering in distributed super-peer P2P network is presented in this paper. The proposed model is called Distributed Cooperative Clustering in super-peer P2P networks (DCCP2P). It involves a hierarchy of two layers of P2P neighborhoods. In the first layer, peers in each neighborhood are responsible for building local cooperative sub-clusters from the local data. Each node sends a summarized view of local data to its super-peer in a form of sub-cluster's centroids extracted from the local cooperative clustering, minimizing the exchange of information between nodes and their super-peers. In the next layer, sub-clusters are merged at each super-peer and at the root of the hierarchy, where one global clustering can be derived. The distributed cooperative approach finds globally optimized clusters and achieves significant improvement in global clustering solutions without the cost of centralized clustering.

References

  1. S. Datta, C. Giannella, and H. Kargupta. "K-means Clustering over Peer-to-Peer Networks". In Workshop on High Performance and Distributed Mining (HPDM05). SIAM International Conference on Data Mining (SDM05), 2005.Google ScholarGoogle Scholar
  2. N. Samatova, G. Ostrouchov, A. Geist, and A. Melechko. "RACHET: An efficient cover-based Merging of Clustering Hierarchies from Distributed Datasets". Distributed and Parallel Databases, 11(2), pp: 157--180, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. da Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch. "Distributed Data Mining and Agents". Engineering Applications of Artificial Intelligence, 18(7), pp: 791--807, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Klusch, S. Lodi, and G. Moro. "Agent-based Distributed Data Mining: The KDEC scheme". In AgentLink, pp: 104--122, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Januzaj, H. Kriegel, and M. Pfeifle. "DBDC: Density based Distributed Clustering". In EDBT, pp: 88--105, 2004Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Datta, K. Bhaduri, C. Giannella, R. Wol, and H. Kargupta. "Distributed Data Mining in Peer-to-Peer Networks". IEEE Internet Computing, 10(4), pp: 18--26, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Eisenhardt, W. Muller, and A. Henrich. "Classifying Documents by Distributed P2P Clustering". In Informatik 2003: Innovative Information Technology Uses, pp: 286--291, 2003.Google ScholarGoogle Scholar
  8. L. Kaufmann and P. Rousseeuw, Finding groups in data, Wiley, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  9. V. Lo, D. Zhou, Y. Liu, C. Dickey, and J. Li, "Scalable Super node Selection in Peer-to-Peer Overlay Networks", Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to- Peer Systems, pp: 18--27, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Demirci, A Yardimci and M. Sayit, "A hierarchical P2P clustering framework for video streaming systems". Computer Standards & Interfaces, Vol. 49, 2017Google ScholarGoogle ScholarCross RefCross Ref
  11. Z. Erkin. "Privacy-preserving distributed clustering". EURASIP Journal on Information Security, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Merugu and J. Ghosh. "Privacy-preserving Distributed Clustering using Generative Models". In IEEE International Conference on Data Mining (ICDM), pp: 211--218, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Boley, M. Gini, R. Gross, S. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. "Partitioning-Based Clustering for Web Document Categorization". Decision Support Systems, Vol.27, pp: 329--341, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J. Olson, J.Marks, and J. Nevins. "Predicting the Clinical Status of Human Breast Cancer by using Gene Expression Profiles" Proc Natl Acad Sci, pp: 11462--11467, 2001.Google ScholarGoogle Scholar
  15. B. Larsen, C. Aone, Fast and effective text mining using linear-time document clustering, In proc. 5th ACM SIGKDD Int. conf. on Knowledge Discovery and Data Mining, pp. 16--22, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53--65, 1987 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hartigan and M. Wong. "A k-means Clustering Algorithm", Applied Statistics, Vol. 28, pp: 100--108, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Savaresi and D. Boley. "On the Performance of Bisecting K-means and PDDP". In Proc. of the 1st SIAM Int. Conf. on Data Mining, pp: 1--14, 2001.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Handling Large-Scale Data using Two-Tier Hierarchical Super-Peer P2P Network

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          BDIOT '17: Proceedings of the International Conference on Big Data and Internet of Thing
          December 2017
          251 pages
          ISBN:9781450354301
          DOI:10.1145/3175684

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 December 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate75of136submissions,55%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader