Abstract
This paper presents two novel graph-clustering algorithms, Clustering based on a Near Neighbor Graph (CNNG) and Clustering based on a Grid Cell Graph (CGCG). CNNG algorithm inspired by the idea of near neighbors is an improved graph-clustering method based on Minimum Spanning Tree (MST). In order to analyze massive data sets more efficiently, CGCG algorithm, which is a kind of graph-clustering method based on MST on the level of grid cells, is presented. To clearly describe the two algorithms, we give some important concepts, such as near neighbor point set, near neighbor undirected graph, grid cell, and so on. To effectively implement the two algorithms, we use some efficient partitioning and index methods, such as multidimensional grid partition method, multidimensional index tree, and so on. From simulation experiments of some artificial data sets and seven real data sets, we observe that the time cost of CNNG algorithm can be decreased by using some improving techniques and approximate methods while attaining an acceptable clustering quality, and CGCG algorithm can approximately analyze some dense data sets with linear time cost. Moreover, comparing some classical clustering algorithms, CNNG algorithm can often get better clustering quality or quicker clustering speed.
Similar content being viewed by others
References
Agrawal, R., Gehrke, J., Gunopolos, D., et al. (1998). Automatic subspace clustering of high dimensional data for data mining application. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 94–105).
Anders, K.H. (2003). A hierarchical graph-clustering approach to find groups of objects. In The 5th workshop on progress in automated map generalization (pp. 1–8).
Cormen, T.H., Leiserson, C.E., Rivest, R.L., et al. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.
Costa, A.F.B.F., Pimentel, B.A., de Souza, R.M.C.R. (2013). Clustering interval data through kernel-induced feature space. Journal of Intelligent Information Systems, 40(1), 109–140.
Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial data sets with noise. In The 2th international conference on knowledge discovery and data mining (pp. 226–231). Portland.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml .
Frey, B.J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(16), 972–976.
Gabriel, K., & Sokal, R. (1969). A new statistical approach to geographic variation analysis. Systematic Zoology, 18, 259–278.
Gower, J.C., & Ross, G.J.S. (1969). Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1), 54–64.
Guha, S., Rastogi, R., Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 73–84). Seattle: ACM Press.
Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.
Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264–323.
Jaromczyk, J.W., Godfried, T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517.
Karypis, G., Han, E.H., Kumar, V. (1999). Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8), 68–75.
Lee, D.T. (1980). Two dimensional voronoi diagram in the l p metric. Journal of ACM, 27(4), 604–618.
Li, C.B., Yin, W.M., Li, R.R., et al. (2009). Tutorial to data structures (3rd ed.). Beijing: The Tsinghua University Press.
Schaeffer, S.E. (2007). Graph clustering. Computer Science Review, 1(1), 27–64.
Schölkopf, B., Smola, A., Müller, K.R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.
Tan, P.N., Steinbach, M., Kumar, V. (2005). Introduction to data mining. Addison Wesley.
Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition (3rd ed.). Academic Press.
Toussaint, G. (1980). The relative neighborhood graph of a finite planar set. Pattern Recognition, 12(4), 261–268.
Wang, X.C., Wang, X.L., Wilkes, D.M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945–958.
Wang, W., Yang, J., Muntz, R.R. (1997). STING: a statistical information grid approach to spatial data mining. In Proceedings of the 23rd VLDB conference (pp. 186–195). Athens, Greece.
Yao, A.C. (1975). An O(∣E∣ ·loglog∣V∣) algorithm for finding minimum spanning trees. Information Processing Letters, 4(1), 21–23.
Yao, A.C. (1982). On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(5), 721–736.
Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 68–86.
Zhang, N.X. (2006). Algorithms and data structures: Described in C language (2nd ed.). Beijing: The Higher Education Press.
Zhang, T., Ramakrishnan, R., Linvy, M. (1997). BIRCH: an efficient data clustering method for very large data sets. Data Mining and Knowledge Discovery, 1(2), 141–182.
Zhou, C.M., Miao, D.Q., Wang, R.Z. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752–766.
Acknowledgements
The author thanks the editors, the anonymous reviewers and Dr. Peijie Hang for their useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chen, X. Clustering based on a near neighbor graph and a grid cell graph. J Intell Inf Syst 40, 529–554 (2013). https://doi.org/10.1007/s10844-013-0236-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-013-0236-9