Parallel Fuzzy c- Means Clustering for Large Data Sets

Kwok, Terence; Smith, Kate; Lozano, Sebastian; Taniar, David

doi:10.1007/3-540-45706-2_48

Terence Kwok⁵,
Kate Smith⁵,
Sebastian Lozano⁶ &
…
David Taniar⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2400))

Included in the following conference series:

European Conference on Parallel Processing

1051 Accesses

Abstract

The parallel fuzzy c-means (PFCM) algorithm for clustering large data sets is proposed in this paper. The proposed algorithm is designed to run on parallel computers of the Single Program Multiple Data (SPMD) model type with the Message Passing Interface (MPI). A comparison is made between PFCM and an existing parallel k-means (PKM) algorithm in terms of their parallelisation capability and scalability. In an implementation of PFCM to cluster a large data set from an insurance company, the proposed algorithm is demonstrated to have almost ideal speedups as well as an excellent scaleup with respect to the size of the data sets.

Download to read the full chapter text

Chapter PDF

Fuzzy Clustering Implementations for Big Data in R

MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability

Article 29 April 2015

An Improved K-Means Parallel Algorithm Based on Cloud Computing

References

Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys. 31 (1999) 264–323
Article Google Scholar
McQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. Proceedings Fifth Berkeley Symposium on Mathematical Statistics and Probability. (1967) 281–297
Google Scholar
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact, Well-separated Clusters. J. Cybernetics. 3 (1973) 32–57
Article MATH MathSciNet Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
MATH Google Scholar
Zahn, C.T.: Graph-Theoretic Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions on Computing. C-20 (1971) 68–86
Article Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining Very Large Databases. IEEE Computer. Aug. (1999) 38–45
Google Scholar
Judd, D., McKinley, P., Jain, A.: Large-Scale Parallel Data Clustering. Proceedings of the International Conference on Pattern Recognition. (1996) 488–493
Google Scholar
Dhillon, I.S., Modha, D.S.: A Data-Clustering Algorithm on Distributed Memory Multiprocessors. In: Zaki, M.J., Ho, C.-T. (eds.): Large-Scale Parallel Data Mining. Lecture Notes in Artificial Intelligence, Vol. 1759. Springer-Verlag, Berlin Heidelberg (2000) 245–260
Chapter Google Scholar
Stoffel, K., Belkoniene, A.: Parallel k-Means Clustering for Large Data Sets. In: Parallel Processing. Lecture Notes in Computer Science, Vol. 1685. Springer-Verlag, Berlin (1999) 1451–1454
Google Scholar
Nagesh, H., Goil, S., Choudhary, A.: A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets. Proceedings International Conference on Parallel Processing. IEEE Computer Society. (2000) 477–484
Google Scholar
Ng, M.K., Zhexue, H.: A Parallel k-Prototypes Algorithm for Clustering Large Data Sets in Data Mining. Intelligent Data Engineering and Learning. 3 (1999) 263–290
Google Scholar
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, Cambridge, MA (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Business Systems, Faculty of Information Technology, Monash University, Australia
Terence Kwok, Kate Smith & David Taniar
Escuela Superior de Ingenieros, University of Seville, Spain
Sebastian Lozano

Authors

Terence Kwok
View author publications
You can also search for this author in PubMed Google Scholar
Kate Smith
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Lozano
View author publications
You can also search for this author in PubMed Google Scholar
David Taniar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich 17, Mathematik und Informatik, Universität Paderborn, Fürstenallee 11, 33102, Paderborn
Burkhard Monien & Rainer Feldmann &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kwok, T., Smith, K., Lozano, S., Taniar, D. (2002). Parallel Fuzzy c- Means Clustering for Large Data Sets. In: Monien, B., Feldmann, R. (eds) Euro-Par 2002 Parallel Processing. Euro-Par 2002. Lecture Notes in Computer Science, vol 2400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45706-2_48

Download citation

DOI: https://doi.org/10.1007/3-540-45706-2_48
Published: 20 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44049-9
Online ISBN: 978-3-540-45706-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Parallel Fuzzy c- Means Clustering for Large Data Sets

Abstract

Chapter PDF

Similar content being viewed by others

Fuzzy Clustering Implementations for Big Data in R

MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability

An Improved K-Means Parallel Algorithm Based on Cloud Computing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallel Fuzzy c- Means Clustering for Large Data Sets

Abstract

Chapter PDF

Similar content being viewed by others

Fuzzy Clustering Implementations for Big Data in R

MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability

An Improved K-Means Parallel Algorithm Based on Cloud Computing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation