Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4395))

Abstract

This work presents an implementation of a parallel Fuzzy c-means cluster analysis tool, which implements both aspects of cluster investigation: the calculation of clusters’ centers with the degrees of membership of records to clusters, and the determination of the optimal number of clusters for the data, by using the PBM validity index to evaluate the quality of the partition.

The work’s main contributions are the implementation of the entire cluster’s analysis process, which is a new approach in literature, integrating to clusters calculation the finding of the best natural pattern present in data, and also, the parallel processing implementation of this tool, which enables this approach to be used with vary large volumes of data, a increasing need for data analysis in nowadays industries and business databases, making the cluster analysis a feasible tool to support specialist’s decision in all fields of knowledge.

The results presented in the paper show that this approach is scalable and brings processing time reduction as an benefit that parallel processing can bring to the matter of cluster analysis.

Topics of Interest: Unsupervised Classification, Fuzzy c-Means, Cluster and Grid Computing

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sousa, M.S.R., Mattoso, M., Ebecken, N.F.F.: Mining a large database with a parallel database server. Intelligent Data Analysis 3, 437–451 (1999)

    Article  Google Scholar 

  2. Coppola, M., Vanneschi, M.: High-performance data mining with skeleton-based structured parallel programming. Parallel Computing 28, 783–813 (2002)

    Article  Google Scholar 

  3. Jin, R., Yang, G., Agrawal, G.: Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. IEEE Transaction on Knowledge and Data Engineering 17(1), 71–89 (2005)

    Article  Google Scholar 

  4. Cannataro, M., et al.: Distributed data mining on grids: services, tools, and applications. IEEE Transactions on Systems, Man and Cybernetics, Part B 34(6), 2451–2465 (2004)

    Article  Google Scholar 

  5. Kubota, K., et al.: Parallelization of decision tree algorithm and its performance evaluation. In: Proceedings of the Fourth International Conference on High Performance Computing in the Asia-Pacific Region, vol. 2, pp. 574–579 (2000)

    Google Scholar 

  6. Kim, M.W., Lee, J.G., Min, C.: Efficient fuzzy rule generation based on fuzzy decision tree for data mining. In: Proceedings of the IEEE International Fuzzy Systems Conference, FUZZ-IEEE ’99, pp. 1223–1228. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  7. Evsukoff, A., Costa, M.C.A., Ebecken, N.F.F.: Parallel Implementation of Fuzzy Rule Based Classifier. In: Daydé, M., et al. (eds.) VECPAR 2004. LNCS, vol. 3402, pp. 443–452. Springer, Heidelberg (2005)

    Google Scholar 

  8. Phua, P.K.H., Ming, D.: Parallel nonlinear optimization techniques for training neural networks. IEEE Transactions on Neural Networks 14(6), 1460–1468 (2003)

    Article  Google Scholar 

  9. Costa, M.C.A., Ebecken, N.F.F.: A Neural Network Implementation for Data Mining High Performance Computing. In: Proceedings of the V Brazilian Conference on Neural Networks, pp. 139–142 (2001)

    Google Scholar 

  10. Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)

    Article  Google Scholar 

  11. Shen, L., Shen, H., Cheng, L.: New algorithms for effcient mining of association rules. Information Sciences 118, 251–268 (1999)

    Article  Google Scholar 

  12. Boutsinas, B., Gnardellis, T.: On distributing the clustering process. Pattern Recognition Letters 23, 999–1008 (2002)

    Article  MATH  Google Scholar 

  13. Rahimi, S., et al.: A parallel Fuzzy C-Mean algorithm for image segmentation. In: Proceedings of the IEEE Annual Meeting of the Fuzzy Information NAFIPS ’04, vol. 1, pp. 234–237. IEEE Computer Society Press, Los Alamitos (2004)

    Chapter  Google Scholar 

  14. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  15. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)

    MATH  Google Scholar 

  16. Xie, X.L., Beni, G.A.: Validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 3(8), 841–846 (1991)

    Article  Google Scholar 

  17. Bezdek, J., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Systems Man and Cybernetics B 28, 301–315 (1998)

    Article  Google Scholar 

  18. Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recognition 37, 487–501 (2004)

    Article  MATH  Google Scholar 

  19. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  20. Quinlan, R.: C4.5 – Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michel Daydé José M. L. M. Palma Álvaro L. G. A. Coutinho Esther Pacitti João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Modenesi, M.V., Costa, M.C.A., Evsukoff, A.G., Ebecken, N.F.F. (2007). Parallel Fuzzy c-Means Cluster Analysis. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2006. VECPAR 2006. Lecture Notes in Computer Science, vol 4395. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71351-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71351-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71350-0

  • Online ISBN: 978-3-540-71351-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics