Skip to main content
Log in

Clustering based on a near neighbor graph and a grid cell graph

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper presents two novel graph-clustering algorithms, Clustering based on a Near Neighbor Graph (CNNG) and Clustering based on a Grid Cell Graph (CGCG). CNNG algorithm inspired by the idea of near neighbors is an improved graph-clustering method based on Minimum Spanning Tree (MST). In order to analyze massive data sets more efficiently, CGCG algorithm, which is a kind of graph-clustering method based on MST on the level of grid cells, is presented. To clearly describe the two algorithms, we give some important concepts, such as near neighbor point set, near neighbor undirected graph, grid cell, and so on. To effectively implement the two algorithms, we use some efficient partitioning and index methods, such as multidimensional grid partition method, multidimensional index tree, and so on. From simulation experiments of some artificial data sets and seven real data sets, we observe that the time cost of CNNG algorithm can be decreased by using some improving techniques and approximate methods while attaining an acceptable clustering quality, and CGCG algorithm can approximately analyze some dense data sets with linear time cost. Moreover, comparing some classical clustering algorithms, CNNG algorithm can often get better clustering quality or quicker clustering speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Agrawal, R., Gehrke, J., Gunopolos, D., et al. (1998). Automatic subspace clustering of high dimensional data for data mining application. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 94–105).

  • Anders, K.H. (2003). A hierarchical graph-clustering approach to find groups of objects. In The 5th workshop on progress in automated map generalization (pp. 1–8).

  • Cormen, T.H., Leiserson, C.E., Rivest, R.L., et al. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.

    MATH  Google Scholar 

  • Costa, A.F.B.F., Pimentel, B.A., de Souza, R.M.C.R. (2013). Clustering interval data through kernel-induced feature space. Journal of Intelligent Information Systems, 40(1), 109–140.

    Article  Google Scholar 

  • Ester, M., Kriegel, H.P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial data sets with noise. In The 2th international conference on knowledge discovery and data mining (pp. 226–231). Portland.

  • Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml .

  • Frey, B.J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(16), 972–976.

    Article  MathSciNet  MATH  Google Scholar 

  • Gabriel, K., & Sokal, R. (1969). A new statistical approach to geographic variation analysis. Systematic Zoology, 18, 259–278.

    Article  Google Scholar 

  • Gower, J.C., & Ross, G.J.S. (1969). Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1), 54–64.

    Article  MathSciNet  Google Scholar 

  • Guha, S., Rastogi, R., Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. In Proceeding of the ACM SIGMOD international conference on management of data (pp. 73–84). Seattle: ACM Press.

    Google Scholar 

  • Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.

    Article  Google Scholar 

  • Jain, A.K., Murty, M.N., Flynn, P.J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264–323.

    Article  Google Scholar 

  • Jaromczyk, J.W., Godfried, T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517.

    Article  Google Scholar 

  • Karypis, G., Han, E.H., Kumar, V. (1999). Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8), 68–75.

    Article  Google Scholar 

  • Lee, D.T. (1980). Two dimensional voronoi diagram in the l p metric. Journal of ACM, 27(4), 604–618.

    Article  MATH  Google Scholar 

  • Li, C.B., Yin, W.M., Li, R.R., et al. (2009). Tutorial to data structures (3rd ed.). Beijing: The Tsinghua University Press.

    Google Scholar 

  • Schaeffer, S.E. (2007). Graph clustering. Computer Science Review, 1(1), 27–64.

    Article  MathSciNet  Google Scholar 

  • Schölkopf, B., Smola, A., Müller, K.R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.

    Article  Google Scholar 

  • Tan, P.N., Steinbach, M., Kumar, V. (2005). Introduction to data mining. Addison Wesley.

  • Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition (3rd ed.). Academic Press.

  • Toussaint, G. (1980). The relative neighborhood graph of a finite planar set. Pattern Recognition, 12(4), 261–268.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, X.C., Wang, X.L., Wilkes, D.M. (2009). A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Transactions on Knowledge and Data Engineering, 21(7), 945–958.

    Article  Google Scholar 

  • Wang, W., Yang, J., Muntz, R.R. (1997). STING: a statistical information grid approach to spatial data mining. In Proceedings of the 23rd VLDB conference (pp. 186–195). Athens, Greece.

  • Yao, A.C. (1975). An O(∣E∣ ·loglog∣V∣) algorithm for finding minimum spanning trees. Information Processing Letters, 4(1), 21–23.

    Article  MATH  Google Scholar 

  • Yao, A.C. (1982). On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(5), 721–736.

    Article  MathSciNet  MATH  Google Scholar 

  • Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1), 68–86.

    Google Scholar 

  • Zhang, N.X. (2006). Algorithms and data structures: Described in C language (2nd ed.). Beijing: The Higher Education Press.

    Google Scholar 

  • Zhang, T., Ramakrishnan, R., Linvy, M. (1997). BIRCH: an efficient data clustering method for very large data sets. Data Mining and Knowledge Discovery, 1(2), 141–182.

    Article  Google Scholar 

  • Zhou, C.M., Miao, D.Q., Wang, R.Z. (2010). A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3), 752–766.

    Article  Google Scholar 

Download references

Acknowledgements

The author thanks the editors, the anonymous reviewers and Dr. Peijie Hang for their useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinquan Chen.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOC 35.5 KB)

(DOC 51.5 KB)

(DOC 34.0 KB)

(DOC 741 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X. Clustering based on a near neighbor graph and a grid cell graph. J Intell Inf Syst 40, 529–554 (2013). https://doi.org/10.1007/s10844-013-0236-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0236-9

Keywords

Navigation