Abstract
The parallel fuzzy c-means (PFCM) algorithm for clustering large data sets is proposed in this paper. The proposed algorithm is designed to run on parallel computers of the Single Program Multiple Data (SPMD) model type with the Message Passing Interface (MPI). A comparison is made between PFCM and an existing parallel k-means (PKM) algorithm in terms of their parallelisation capability and scalability. In an implementation of PFCM to cluster a large data set from an insurance company, the proposed algorithm is demonstrated to have almost ideal speedups as well as an excellent scaleup with respect to the size of the data sets.
Chapter PDF
Similar content being viewed by others
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys. 31 (1999) 264–323
McQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. Proceedings Fifth Berkeley Symposium on Mathematical Statistics and Probability. (1967) 281–297
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact, Well-separated Clusters. J. Cybernetics. 3 (1973) 32–57
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Zahn, C.T.: Graph-Theoretic Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions on Computing. C-20 (1971) 68–86
Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining Very Large Databases. IEEE Computer. Aug. (1999) 38–45
Judd, D., McKinley, P., Jain, A.: Large-Scale Parallel Data Clustering. Proceedings of the International Conference on Pattern Recognition. (1996) 488–493
Dhillon, I.S., Modha, D.S.: A Data-Clustering Algorithm on Distributed Memory Multiprocessors. In: Zaki, M.J., Ho, C.-T. (eds.): Large-Scale Parallel Data Mining. Lecture Notes in Artificial Intelligence, Vol. 1759. Springer-Verlag, Berlin Heidelberg (2000) 245–260
Stoffel, K., Belkoniene, A.: Parallel k-Means Clustering for Large Data Sets. In: Parallel Processing. Lecture Notes in Computer Science, Vol. 1685. Springer-Verlag, Berlin (1999) 1451–1454
Nagesh, H., Goil, S., Choudhary, A.: A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets. Proceedings International Conference on Parallel Processing. IEEE Computer Society. (2000) 477–484
Ng, M.K., Zhexue, H.: A Parallel k-Prototypes Algorithm for Clustering Large Data Sets in Data Mining. Intelligent Data Engineering and Learning. 3 (1999) 263–290
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, Cambridge, MA (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kwok, T., Smith, K., Lozano, S., Taniar, D. (2002). Parallel Fuzzy c- Means Clustering for Large Data Sets. In: Monien, B., Feldmann, R. (eds) Euro-Par 2002 Parallel Processing. Euro-Par 2002. Lecture Notes in Computer Science, vol 2400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45706-2_48
Download citation
DOI: https://doi.org/10.1007/3-540-45706-2_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44049-9
Online ISBN: 978-3-540-45706-0
eBook Packages: Springer Book Archive