Abstract
During modeling protein structure prediction, it is a fundamental operation and often as a preprocess of in specific tasks that a very large categorical data sets are partitioned into disjoint and homogeneous clusters. The classical k-modes algorithm is a partial solution to such problems. This work presents a parallel implementation of the k-modes algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the k-means style algorithm. Tested with the amino acid data sets on a maximum of 8 nodes the algorithm has demonstrated a very good relative speedup and scaleup in the size of the data set.
This work is supported partially by the National Natural Science Foundation of China (NSFC) under grant: 69933030.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)
Shit, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI: The Complete Reference. The MIT Press, Cambridge (1998)
Li, X., Fang, Z.: Parallel clustering algorithms. Parallel Computing 11, 275–290 (1989)
Wang, Y.X., Chang, H.Y., Wang, Z.H., Li, X.M.: Input selection and rule generation in Adaptive Neuro-Fuzzy Inference System for protein structure prediction. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 514–521. Springer, Heidelberg (2003)
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., Bourne, P.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Altschul, S.: A protein alignment scoring system sensitive at all evolutionary distances. Journal of Molecular Evolution 36, 290–300 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, YX., Wang, ZH., Li, XM. (2004). A Parallel Clustering Algorithm for Categorical Data Set. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds) Artificial Intelligence and Soft Computing - ICAISC 2004. ICAISC 2004. Lecture Notes in Computer Science(), vol 3070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24844-6_144
Download citation
DOI: https://doi.org/10.1007/978-3-540-24844-6_144
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22123-4
Online ISBN: 978-3-540-24844-6
eBook Packages: Springer Book Archive