Abstract
Clustering is a fundamental and important technique in image processing, pattern recorgnition, data compression, etc. However, most recent clustering algorithms cannot deal with large, complex databases and do not always achieve high clustering results. This paper proposes a parallel clustering algorithm for categorical and mixed data which can overcome the above problems. Our contributions are: (1) improving the k-sets algorithm [3] to achieve highly accurate clustering results; and (2) applying parallel techniques to the improved approach to achieve a parallel algorithm. Experiments on a CRAY T3E show that the proposed algorithm can achieve higher accuracy than previous attempts and can reduce processing time; thus, it is practical for use with very large and complex databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Huang, Z.: Clustering Large Data Sets with Mixed Numeric and Categorical Values. In: Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34. World Scientific, Singapore (1997)
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Systems 7(4), 446–452 (1999)
Le, S.Q., Ho, T.B.: A k-sets Clustering algorithm for categorical and mixed data. In: Proc. of the 6th SANKEN Int. Symposium, pp. 124–128 (2003)
Kantabutra, S., Couch, A.L.: Parallel K-means Clustering Algorithm on NOWs. NECTEC Technical journal 1(6), 243–248 (2000)
Stoffel, K., Belkoniene, A.: Parallel K-Means Clustering for Large Databases. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, p. 1451. Springer, Heidelberg (1999)
Blake, C.L., Merz, C.J.: UCI Repository of machine Learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Hettich, S., Bay, S.D.: The UCI KDD Achieve (1999), http://kdd.ics.uci.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hai, N.T.M., Susumu, H. (2004). Performances of Parallel Clustering Algorithm for Categorical and Mixed Data. In: Liew, KM., Shen, H., See, S., Cai, W., Fan, P., Horiguchi, S. (eds) Parallel and Distributed Computing: Applications and Technologies. PDCAT 2004. Lecture Notes in Computer Science, vol 3320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30501-9_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-30501-9_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24013-6
Online ISBN: 978-3-540-30501-9
eBook Packages: Computer ScienceComputer Science (R0)