Abstract
ROCK is a robust, categorical attribute oriented clustering algorithm. The main contribution of ROCK is the introduction of a novel concept called links as a measure of similarity between a pair of data points. Compared with traditional distance-based approaches, links capture global information over the whole data set rather than local information between two data points. Despite its success in clustering some categorical databases, there are still some underlying weaknesses. This paper investigates the problems deeply and proposes a novel algorithm QNNS using Qualified Nearest Neighbors Selection model, which improves clustering quality with an appropriate selection of nearest neighbors. We also discuss a cohesion measure to control the clustering process. Our methods reduce the dependence of the clustering quality on the pre-specified parameters and enhance the convenience for end users. Experiment results demonstrate that QNNS outperforms ROCK and VBACC.
This work was sponsored by Natural Science Foundation of China (NSFC) under Grant No. 60373099.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dubes, R.C., Jain, A.K.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103–114 (1996)
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceedings of the 15th international Conference on Data Engineering, Sydney, Australia, pp. 1–11 (1999)
Gupta, G.K., Ghosh, J.: Value Balanced Agglomerative Connectivity Clustering. In: SPIE Conference on Data Mining and Knowledge Discovery III (April 2001)
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. Technical Report #00-034, University of Minnesota, 0-34
Sebastiani, F.: A Tutorial on Automatic Text Categorization. In: Proceedings of ASAI 1999, 1st Argentinean Symposium on Artificial Intelligence, Buenos Aires, AR, pp. 7–35 (1999)
Dutta, M., Mahanta, A.K., Pujari, A.K.: QROCK: A Quick Version of the ROCK Algorithm for Clustering of Categorical Data, www.uohyd.ernet.in/smcis/dcis/./akpcs/qrock.pdf.1-12
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, Y., Zuo, W. (2006). Clustering Categorical Data Using Qualified Nearest Neighbors Selection Model. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_118
Download citation
DOI: https://doi.org/10.1007/11941439_118
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)