ABSTRACT
Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings and many other applications, distance alone is not enough to represent all the attributes of the relation between data points. We need a more powerful model to record more relational information between data objects. This paper adopts a graph model by which a database is regarded as a graph: each vertex of the graph represents a data point, and each edge, weighted or unweighted, is used to record the relation between two data points connected by the edge. Based on the graph model, this paper presents a set of cluster analysis criteria to guide data clustering. The criteria can be used to measure clustering results and help improving the quality of clustering. Further, a customizable algorithm using the criteria is proposed and implemented. This algorithm can produce clusters according to users' specifications. Preliminary experiments show encouraging results.
- S. E. Hambrusch, C-M. Liu, and H-S. Lim, Clustering in Trees: Optimizing Cluster Sizes and Number of Subtrees, Journal of Graph Algorithms and Applications, Vol. 4, No. 4, pp. 1--26 (2000).]]Google ScholarCross Ref
- V. Batagelj, A. Mrvar, and M. Zaversnik, Partitioning Approaches to Clustering in Graphs, Proc. GD' 1999, LNCS, pp. 99--97 (2000).]]Google Scholar
- D. Harel and Y. Koren, A Fast Multi-scale Method for Drawing Large Graphs, Proc. GD'2000, LNCS, pp. 183--196 (2001).]] Google ScholarDigital Library
- A. Quigley and P. Eades, FADE: Graph Drawing, Clustering, and Visual Abstraction, Proc. GD'2000, LNCS, pp. 197--210 (2001).]] Google ScholarDigital Library
- J. May-Six, Vistool: A Tool For Visualizing Graphs, PhD Thesis, The University of Texas at Dallas (2000).]]Google Scholar
- P. K. Agarwal and C. M. Procopiuc, Exact and Approximation Algorithms for Clustering, Proc. 9th ACM-SIAM Symp., Discrete Algorithms (1998).]] Google ScholarDigital Library
- J. May-Six and I. G. Tollis, Effective Graph Visualization Via Node Grouping, Proc. IEEE Symposium on information Visualization 2001, pp. 51--58 (2001).]] Google ScholarDigital Library
- M. Ester, H. P. Kriegel, J. Sander, and X. Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, pp. 226--231 (1996).]]Google Scholar
- M. Ester, H. P. Kriegel, J. Sander, and X. Xu, Clustering for Mining in Large Spatial Databases. KI (Artificial Intelligence), Special Issue on Data Mining, ScienTec Publishing, pp. 18--24 (1998).]]Google Scholar
- R. T. Ng and J. Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proc. 20th Int. Conf. on Very Large Data Bases, Morgan Kaufmann, pp. 144--155 (1994).]] Google ScholarDigital Library
- W. Wang, J. Yang, and R. Muntz, STING: A Statistical Information Grid Approach to Spatial Data Mining, Proc. 23rd Int. Conf. on Very Large Data Bases, Morgan Kaufmann, pp. 186--195 (1997).]] Google ScholarDigital Library
- T. Zhang, R. Ramakrishnan, and M. Linvy, BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proc. ACM SIGMOD Int'l Conf. on Management of Data, ACM Press, pp. 103--114 (1996).]] Google ScholarDigital Library
- M. S. Chen, J. Han and P. S. Yu, Data Mining: An Overview from Database Perspective, IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society Press, Vol. 8, No.6, pp. 866--883 (1996).]] Google ScholarDigital Library
- D. Harel and Y. Koren, Clustering Spatial Data Using Random Walks, Proc. 7th Int'l Conf. Knowledge Discovery and Data Mining (KDD-2001), ACM Press, New York, pp. 281--286 (2001).]] Google ScholarDigital Library
- G. Karypis, E. Han, and V. Kumar, CHAMELEON, A Hierarchical Clustering Algorithm Using Dynamic Modeling, IEEE Computer pp. 68--75, 32 (1999).]] Google ScholarDigital Library
- V. Estivill-Castro and I. Lee, AUTOCLUST: Automatic Clustering via Boundary Extraction for Mining Massive Point-Data Sets, 5th Int'l Conf. on Geocomputation, Geo Computation CD-ROM: GC049, ISBN 0-9533477-2-9 (2000).]]Google Scholar
- I. Jonyer, L. B. Holder and D. J. Cook, Graph-Based Hierarchical Conceptual Clustering, Proc. of the Thirteenth Annual Florida AI Research Symposium (2000).]] Google ScholarDigital Library
- A. K. Jain, M. N. Murty, and P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, Vol. 31, No. 3, pp. 264--323 (1999).]] Google ScholarDigital Library
- W. T. McCormick, P. J. Sweitzer, and T. W. White: Problem decomposition and data reorganization by a clustering technique. Oper. Res., (September-October), pp. 993--1009 (1972).]]Google Scholar
- K. Zhang and N. Gorla, Locality Metrics and Program Physical Structures, Journal of Systems and Software, 54 (2000), pp. 159--166 (2000).]] Google ScholarDigital Library
Recommendations
Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global InformatizationIn this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
A hybrid clustering algorithm
FSKD'09: Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1In view of the fact that DBSCAN clustering algorithm can identify the data with arbitrary shape and one-pass clustering algorithm has the quick and efficient feature, this paper proposes a two-stage hybrid clustering algorithm. DBSCAN is improved to ...
An efficient hybrid clustering algorithm for molecular sequences classification
ACM-SE 44: Proceedings of the 44th annual Southeast regional conferenceThe k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome ...
Comments