A Parallel Clustering Algorithm for Categorical Data Set

Wang, Yong-Xian; Wang, Zheng-Hua; Li, Xiao-Mei

doi:10.1007/978-3-540-24844-6_144

A Parallel Clustering Algorithm for Categorical Data Set

Yong-Xian Wang²²,
Zheng-Hua Wang²² &
Xiao-Mei Li²³

Conference paper

1690 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3070))

Abstract

During modeling protein structure prediction, it is a fundamental operation and often as a preprocess of in specific tasks that a very large categorical data sets are partitioned into disjoint and homogeneous clusters. The classical k-modes algorithm is a partial solution to such problems. This work presents a parallel implementation of the k-modes algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the k-means style algorithm. Tested with the amino acid data sets on a maximum of 8 nodes the algorithm has demonstrated a very good relative speedup and scaleup in the size of the data set.

This work is supported partially by the National Natural Science Foundation of China (NSFC) under grant: 69933030.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)
Article Google Scholar
Shit, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI: The Complete Reference. The MIT Press, Cambridge (1998)
Google Scholar
Li, X., Fang, Z.: Parallel clustering algorithms. Parallel Computing 11, 275–290 (1989)
Article MATH MathSciNet Google Scholar
Wang, Y.X., Chang, H.Y., Wang, Z.H., Li, X.M.: Input selection and rule generation in Adaptive Neuro-Fuzzy Inference System for protein structure prediction. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 514–521. Springer, Heidelberg (2003)
Google Scholar
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., Bourne, P.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Article Google Scholar
Altschul, S.: A protein alignment scoring system sensitive at all evolutionary distances. Journal of Molecular Evolution 36, 290–300 (1993)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, 410073, Changsha, China
Yong-Xian Wang & Zheng-Hua Wang
College of Command and Technology of Equipment, 101416, Beijing, China
Xiao-Mei Li

Authors

Yong-Xian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Mei Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Academy of Humanities and Economics, Poland
Leszek Rutkowski
German Research Center of Artificial Intelligence (DFKI), Germany
Jörg H. Siekmann
Institute of Automatics, AGH University of Science and Technology, Al. Mickiewicza 30, PL-30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Initiative in Soft Computing (BISC), 94720-1776, Berkeley, CA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, YX., Wang, ZH., Li, XM. (2004). A Parallel Clustering Algorithm for Categorical Data Set. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds) Artificial Intelligence and Soft Computing - ICAISC 2004. ICAISC 2004. Lecture Notes in Computer Science(), vol 3070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24844-6_144

Download citation

DOI: https://doi.org/10.1007/978-3-540-24844-6_144
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22123-4
Online ISBN: 978-3-540-24844-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics