skip to main content
10.1145/1146847.1146886acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinfoscaleConference Proceedingsconference-collections
Article

PENS: an algorithm for density-based clustering in peer-to-peer systems

Published: 30 May 2006 Publication History

Abstract

Huge amounts of data are available in large-scale networks of autonomous data sources dispersed over a wide area. Data mining is an essential technology for obtaining hidden and valuable knowledge from these networked data sources. In this paper, we investigate clustering, one of the most important data mining tasks, in one of such networked computing environments, i.e., peer-to-peer (P2P) systems. The lack of a central control and the sheer large size of P2P systems make the existing clustering techniques not applicable here. We propose a fully distributed clustering algorithm, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly. The main idea of PENS is hierarchical cluster assembly, which enables peers to collaborate in forming a global clustering model without requiring a central control or message flooding. The complexity analysis of the algorithm demonstrates that PENS can discover clusters and noise efficiently in P2P systems.

References

[1]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of SIGMOD, pages 94--105, June 1998.]]
[2]
M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS: Ordering points to identify the clustering structure. In Proceedings of SIGMOD, pages 49--60, June 1999.]]
[3]
S. Bandyopadhyay, C. Gianella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering Distributed Data Streams in Peer-to-Peer Environments. Information Science Journal (In Press), 2005.]]
[4]
Y. Chawathe, S. Ramabhadran, S. Ratnasamy, A. LaMarca, S. Shenker, and J. Hellerstein. A case study in building layered DHT applications. In Proceedings of SIGCOMM, August 2005.]]
[5]
P. Cheeseman and J. Stutz. Bayesian classification (auto-class): Theory and results. In Advances in Knowledge Discovery and Data Mining, pages 153--180. AAAI/MIT Press, 1996.]]
[6]
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Proceedings of Workshop on Large-Scale Parallel KDD Systems (in conjunction with SIGKDD), pages 245--260, August 1999.]]
[7]
R. Duda and P. Hart. Pattern classification and scene analysis. John Wiley & Sons, New York, 1973.]]
[8]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of Knowledge Discovery in Database (KDD), pages 226--231, 1996.]]
[9]
G. Forman and B. Zhang. Distributed data clustering can be efficient and exact. SIGKDD Explorations, 2(2):34--38, 2000.]]
[10]
S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In Proceedings of SIGMOD, pages 73--84, June 1998.]]
[11]
E. Januzaj, H.-P. Kriegel, and M. Pfeifle. DBDC: Density based distributed clustering. In Proceedings of International Conference on Extending Database Technology (EDBT), pages 88--105, March 2004.]]
[12]
E. L. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Proceedings of Workshop on Large-Scale Parallel KDD Systems (in conjunction with SIGKDD), pages 221--244, August 1999.]]
[13]
M. Li, W.-C. Lee, and A. Sivasubramaniam. Semantic small world: An overlay network for peer-to-peer search. In Proceedings of International Conference on Network Protocols (ICNP), pages 228--238, October 2004.]]
[14]
P. Linga, A. Crainiceanu, J. Gehrke, and J. Shanmugasundaram. Guaranteeing correctness and availability in P2P range indices. In Proceedings of SIGMOD, pages 323--334, 2005.]]
[15]
J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297, 1967.]]
[16]
S. Ratnasamy, P. Francis, M. Handley, R. M. Karp, and S. Schenker. A scalable content-addressable network. In Proceedings of ACM SIGCOMM, pages 161--172, August 2001.]]
[17]
N. F. Samatova, G. Ostrouchov, A. Geist, and A. V. Melechko. RACHET: An efficient cover-based merging of clustering hierarchies from distributed datasets. Distributed and Parallel Databases, 11(2):157--180, 2002.]]
[18]
G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of VLDB, pages 428--439, August 1998.]]
[19]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of ACM SIGCOMM, pages 149--160, August 2001.]]
[20]
W. Wang, J. Yang, and R. R. Muntz. STING: A statistical information grid approach to spatial data mining. In Proceedings of VLDB, pages 186--195, August 1997.]]
[21]
X. Xu, J. Jäger, and H.-P. Kriegel. A fast parallel clustering algorithm for large spatial databases. Data Mining and Knowledge Discovery, 3(3):263--290, 1999.]]
[22]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In Proceedings of SIGMOD, pages 103--114, June 1996.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
InfoScale '06: Proceedings of the 1st international conference on Scalable information systems
May 2006
512 pages
ISBN:1595934286
DOI:10.1145/1146847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

InfoScale '06 Paper Acceptance Rate 33 of 91 submissions, 36%;
Overall Acceptance Rate 33 of 91 submissions, 36%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)DDCM: a decentralized density clustering and its results gathering approachNeural Computing and Applications10.1007/s00521-023-08392-535:35(24743-24754)Online publication date: 3-Mar-2023
  • (2014)On the gene group problem2014 IEEE International Symposium on Bioelectronics and Bioinformatics (IEEE ISBB 2014)10.1109/ISBB.2014.6820910(1-4)Online publication date: Apr-2014
  • (2013)GoSCANComputing10.1007/s00607-012-0264-295:9(759-784)Online publication date: 1-Sep-2013
  • (2012)ASCCNExploring Advances in Interdisciplinary Data Mining and Analytics10.4018/978-1-61350-474-1.ch013(219-232)Online publication date: 2012
  • (2012)Design and evaluation of decentralized online clusteringACM Transactions on Autonomous and Adaptive Systems10.1145/2348832.23488377:3(1-31)Online publication date: 1-Oct-2012
  • (2011)Robust clusteringWIREs Data Mining and Knowledge Discovery10.1002/widm.492:1(29-59)Online publication date: 27-Sep-2011
  • (2010)Distributed data clustering in multi-dimensional peer-to-peer networksProceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 10410.5555/1862242.1862264(171-178)Online publication date: 1-Jan-2010
  • (2010)ASCCNInternational Journal of Data Warehousing and Mining10.4018/jdwm.20101001016:4(1-15)Online publication date: 1-Oct-2010
  • (2009)Clustering Data in Peer-to-Peer SystemsEncyclopedia of Data Warehousing and Mining, Second Edition10.4018/978-1-60566-010-3.ch041(251-257)Online publication date: 2009
  • (2009)Preserving locality in MMVE applications based on ant clustering2009 IEEE International Conference on Virtual Environments, Human-Computer Interfaces and Measurements Systems10.1109/VECIMS.2009.5068866(58-62)Online publication date: May-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media