Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method

Li, Yan; Hung, Edward; Chung, Korris; Huang, Joshua

doi:10.1007/978-3-540-89378-3_33

Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method

Yan Li³,
Edward Hung³,
Korris Chung³ &
…
Joshua Huang⁴

Conference paper

1845 Accesses
16 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5360))

Abstract

In this paper, a new classification method (ADCC) for high dimensional data is proposed. In this method, a decision cluster classification model (DCC) consists of a set of disjoint decision clusters, each labeled with a dominant class that determines the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a variable weighting k-means algorithm. Then, the DCC model is selected from the tree. Anderson-Darling test is used to determine the stopping condition of the tree growing. A series of experiments on both synthetic and real data sets have shown that the new classification method (ADCC) performed better in accuracy and scalability than the existing methods of k-NN, decision tree and SVM. It is particularly suitable for large, high dimensional data with many classes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., Zaki, M.: What are the grand challenges for data mining? KDD-2006 panel report. SIGKDD Explorations 8, 70–77 (2006)
Article Google Scholar
Kyriakopoulou, A., Kalamboukis, T.: Text classification using clustering. In: ECML-PKDD Discovery Challenge Workshop Proceedings (2006)
Google Scholar
Zhang, B., Srihari, S.N.: Fast k-nearest neighbor classification using cluster-based trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 525–528 (2004)
Article Google Scholar
Mui, J., Fu, K.: Automated classification of nucleated blood cells using a binary tree classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence 2, 429–443 (1980)
Article Google Scholar
Huang, Z., Ng, M., Lin, T., Cheung, D.: An interactive approach to building classification models by clustering and cluster validation. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000, vol. 1983, pp. 23–28. Springer, Heidelberg (2000)
Chapter Google Scholar
Huang, Z., Lin, T.: A visual method of cluster validation with fastmap. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000, vol. 1805, pp. 153–164. Springer, Heidelberg (2000)
Chapter Google Scholar
Huang, J., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 657–668 (2005)
Article Google Scholar
Anderson, T.W., Darling, D.A.: Asymptotic theory of certain ”goodness-of-fit” criteria based on stochastic processes. The Annals of Mathematical Statistics 23, 193–212 (1952)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China
Yan Li, Edward Hung & Korris Chung
E-Business Technology Institute, The University of Hong Kong, Pokfulam Road, Hong Kong, China
Joshua Huang

Authors

Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Edward Hung
View author publications
You can also search for this author in PubMed Google Scholar
Korris Chung
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wales, School of Computer Science and Engineering,, University of New South, NSW 2052, Sydney, Australia
Wayne Wobcke
School of Mathematics, Statistics and Computer Science, Victoria University of Wellington, P.O. Box 600, 6140, Wellington, New Zealand
Mengjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Hung, E., Chung, K., Huang, J. (2008). Building a Decision Cluster Classification Model for High Dimensional Data by a Variable Weighting k-Means Method. In: Wobcke, W., Zhang, M. (eds) AI 2008: Advances in Artificial Intelligence. AI 2008. Lecture Notes in Computer Science(), vol 5360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89378-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-89378-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89377-6
Online ISBN: 978-3-540-89378-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics