Skip to main content

A Parallel Clustering Algorithm for Categorical Data Set

  • Conference paper
  • 1690 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3070))

Abstract

During modeling protein structure prediction, it is a fundamental operation and often as a preprocess of in specific tasks that a very large categorical data sets are partitioned into disjoint and homogeneous clusters. The classical k-modes algorithm is a partial solution to such problems. This work presents a parallel implementation of the k-modes algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the k-means style algorithm. Tested with the amino acid data sets on a maximum of 8 nodes the algorithm has demonstrated a very good relative speedup and scaleup in the size of the data set.

This work is supported partially by the National Natural Science Foundation of China (NSFC) under grant: 69933030.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)

    Article  Google Scholar 

  2. Shit, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI: The Complete Reference. The MIT Press, Cambridge (1998)

    Google Scholar 

  3. Li, X., Fang, Z.: Parallel clustering algorithms. Parallel Computing 11, 275–290 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  4. Wang, Y.X., Chang, H.Y., Wang, Z.H., Li, X.M.: Input selection and rule generation in Adaptive Neuro-Fuzzy Inference System for protein structure prediction. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 514–521. Springer, Heidelberg (2003)

    Google Scholar 

  5. Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., Bourne, P.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)

    Article  Google Scholar 

  6. Altschul, S.: A protein alignment scoring system sensitive at all evolutionary distances. Journal of Molecular Evolution 36, 290–300 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, YX., Wang, ZH., Li, XM. (2004). A Parallel Clustering Algorithm for Categorical Data Set. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds) Artificial Intelligence and Soft Computing - ICAISC 2004. ICAISC 2004. Lecture Notes in Computer Science(), vol 3070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24844-6_144

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24844-6_144

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22123-4

  • Online ISBN: 978-3-540-24844-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics