Abstract
Clustering is one of the main data mining techniques used to analyze and group data, but often applications have to deal with a very large amount of spatially distributed data for which most of the clustering algorithms available so far are impractical. In this paper we present P2PRASTER, a distributed algorithm relying on a gossip–based protocol for clustering that exploits the RASTER algorithm and has been designed to handle big data in a decentralized manner. The experiments carried out show that P2PRASTER returns perfect results under both optimal and non-optimal conditions, and also provides excellent scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995). https://doi.org/10.1109/34.400568
Demers, A., et al.: Epidemic algorithms for replicated database maintenance. In: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC 1987, pp. 1–12. ACM, New York (1987). https://doi.org/10.1145/41840.41841
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977). http://www.jstor.org/stable/2984875
Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21(1), 32–40 (1975)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297. University of California Press, Berkeley (1967)
Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42(3), 19:1–19:21 (2017). https://doi.org/10.1145/3068335
Ulm, G., Smith, S., Nilsson, A., Gustavsson, E., Jirstrand, M.: Contraction clustering (RASTER): a very fast big data algorithm for sequential and parallel density-based clustering in linear time, constant memory, and a single pass (2019)
Wang, W., Yang, J., Muntz, R.R.: Sting: A statistical information grid approach to spatial data mining. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 186–195. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996). https://doi.org/10.1145/235968.233324
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mariani, A., Epicoco, I., Cafaro, M., Pulimeno, M. (2023). Grid-Based Contraction Clustering in a Peer-to-Peer Network. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-25891-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)