Abstract
This work presents an implementation of a parallel Fuzzy c-means cluster analysis tool, which implements both aspects of cluster investigation: the calculation of clusters’ centers with the degrees of membership of records to clusters, and the determination of the optimal number of clusters for the data, by using the PBM validity index to evaluate the quality of the partition.
The work’s main contributions are the implementation of the entire cluster’s analysis process, which is a new approach in literature, integrating to clusters calculation the finding of the best natural pattern present in data, and also, the parallel processing implementation of this tool, which enables this approach to be used with vary large volumes of data, a increasing need for data analysis in nowadays industries and business databases, making the cluster analysis a feasible tool to support specialist’s decision in all fields of knowledge.
The results presented in the paper show that this approach is scalable and brings processing time reduction as an benefit that parallel processing can bring to the matter of cluster analysis.
Topics of Interest: Unsupervised Classification, Fuzzy c-Means, Cluster and Grid Computing
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sousa, M.S.R., Mattoso, M., Ebecken, N.F.F.: Mining a large database with a parallel database server. Intelligent Data Analysis 3, 437–451 (1999)
Coppola, M., Vanneschi, M.: High-performance data mining with skeleton-based structured parallel programming. Parallel Computing 28, 783–813 (2002)
Jin, R., Yang, G., Agrawal, G.: Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. IEEE Transaction on Knowledge and Data Engineering 17(1), 71–89 (2005)
Cannataro, M., et al.: Distributed data mining on grids: services, tools, and applications. IEEE Transactions on Systems, Man and Cybernetics, Part B 34(6), 2451–2465 (2004)
Kubota, K., et al.: Parallelization of decision tree algorithm and its performance evaluation. In: Proceedings of the Fourth International Conference on High Performance Computing in the Asia-Pacific Region, vol. 2, pp. 574–579 (2000)
Kim, M.W., Lee, J.G., Min, C.: Efficient fuzzy rule generation based on fuzzy decision tree for data mining. In: Proceedings of the IEEE International Fuzzy Systems Conference, FUZZ-IEEE ’99, pp. 1223–1228. IEEE Computer Society Press, Los Alamitos (1999)
Evsukoff, A., Costa, M.C.A., Ebecken, N.F.F.: Parallel Implementation of Fuzzy Rule Based Classifier. In: Daydé, M., et al. (eds.) VECPAR 2004. LNCS, vol. 3402, pp. 443–452. Springer, Heidelberg (2005)
Phua, P.K.H., Ming, D.: Parallel nonlinear optimization techniques for training neural networks. IEEE Transactions on Neural Networks 14(6), 1460–1468 (2003)
Costa, M.C.A., Ebecken, N.F.F.: A Neural Network Implementation for Data Mining High Performance Computing. In: Proceedings of the V Brazilian Conference on Neural Networks, pp. 139–142 (2001)
Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)
Shen, L., Shen, H., Cheng, L.: New algorithms for effcient mining of association rules. Information Sciences 118, 251–268 (1999)
Boutsinas, B., Gnardellis, T.: On distributing the clustering process. Pattern Recognition Letters 23, 999–1008 (2002)
Rahimi, S., et al.: A parallel Fuzzy C-Mean algorithm for image segmentation. In: Proceedings of the IEEE Annual Meeting of the Fuzzy Information NAFIPS ’04, vol. 1, pp. 234–237. IEEE Computer Society Press, Los Alamitos (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Xie, X.L., Beni, G.A.: Validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 3(8), 841–846 (1991)
Bezdek, J., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Systems Man and Cybernetics B 28, 301–315 (1998)
Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recognition 37, 487–501 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Quinlan, R.: C4.5 – Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Modenesi, M.V., Costa, M.C.A., Evsukoff, A.G., Ebecken, N.F.F. (2007). Parallel Fuzzy c-Means Cluster Analysis. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2006. VECPAR 2006. Lecture Notes in Computer Science, vol 4395. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71351-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-71351-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71350-0
Online ISBN: 978-3-540-71351-7
eBook Packages: Computer ScienceComputer Science (R0)