Clustering Categorical Data Using Qualified Nearest Neighbors Selection Model

Jin, Yang; Zuo, Wanli

doi:10.1007/11941439_118

Yang Jin²⁰ &
Wanli Zuo²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

3705 Accesses

Abstract

ROCK is a robust, categorical attribute oriented clustering algorithm. The main contribution of ROCK is the introduction of a novel concept called links as a measure of similarity between a pair of data points. Compared with traditional distance-based approaches, links capture global information over the whole data set rather than local information between two data points. Despite its success in clustering some categorical databases, there are still some underlying weaknesses. This paper investigates the problems deeply and proposes a novel algorithm QNNS using Qualified Nearest Neighbors Selection model, which improves clustering quality with an appropriate selection of nearest neighbors. We also discuss a cohesion measure to control the clustering process. Our methods reduce the dependence of the clustering quality on the pre-specified parameters and enhance the convenience for end users. Experiment results demonstrate that QNNS outperforms ROCK and VBACC.

This work was sponsored by Natural Science Foundation of China (NSFC) under Grant No. 60373099.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio

Article 18 November 2019

Fuzzy Shared Nearest Neighbor Clustering

Article 09 October 2019

Mahalanobis Distance Based K-Means Clustering

References

Dubes, R.C., Jain, A.K.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103–114 (1996)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceedings of the 15th international Conference on Data Engineering, Sydney, Australia, pp. 1–11 (1999)
Google Scholar
Gupta, G.K., Ghosh, J.: Value Balanced Agglomerative Connectivity Clustering. In: SPIE Conference on Data Mining and Knowledge Discovery III (April 2001)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. Technical Report #00-034, University of Minnesota, 0-34
Google Scholar
Sebastiani, F.: A Tutorial on Automatic Text Categorization. In: Proceedings of ASAI 1999, 1st Argentinean Symposium on Artificial Intelligence, Buenos Aires, AR, pp. 7–35 (1999)
Google Scholar
Dutta, M., Mahanta, A.K., Pujari, A.K.: QROCK: A Quick Version of the ROCK Algorithm for Clustering of Categorical Data, www.uohyd.ernet.in/smcis/dcis/./akpcs/qrock.pdf.1-12

Download references

Author information

Authors and Affiliations

College of Computer Science & Technology, Jilin University, Changchun, P.R. China
Yang Jin & Wanli Zuo

Authors

Yang Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DisPRR, National ICT Australia Ltd, QLD, Australia
Abdul Sattar
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, Y., Zuo, W. (2006). Clustering Categorical Data Using Qualified Nearest Neighbors Selection Model. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_118

Download citation

DOI: https://doi.org/10.1007/11941439_118
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics