Article

A customizable hybrid approach to data clustering

Authors:
Yu Qian

The University of Texas at Dallas, Richardson, TX

The University of Texas at Dallas, Richardson, TX
View Profile

,
Kang Zhang

The University of Texas at Dallas, Richardson, TX

The University of Texas at Dallas, Richardson, TX
View Profile

SAC '03: Proceedings of the 2003 ACM symposium on Applied computingMarch 2003Pages 485–489https://doi.org/10.1145/952532.952628

Published:09 March 2003Publication History

SAC '03: Proceedings of the 2003 ACM symposium on Applied computing

Pages 485–489

ABSTRACT

Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings and many other applications, distance alone is not enough to represent all the attributes of the relation between data points. We need a more powerful model to record more relational information between data objects. This paper adopts a graph model by which a database is regarded as a graph: each vertex of the graph represents a data point, and each edge, weighted or unweighted, is used to record the relation between two data points connected by the edge. Based on the graph model, this paper presents a set of cluster analysis criteria to guide data clustering. The criteria can be used to measure clustering results and help improving the quality of clustering. Further, a customizable algorithm using the criteria is proposed and implemented. This algorithm can produce clusters according to users' specifications. Preliminary experiments show encouraging results.

References

S. E. Hambrusch, C-M. Liu, and H-S. Lim, Clustering in Trees: Optimizing Cluster Sizes and Number of Subtrees, Journal of Graph Algorithms and Applications, Vol. 4, No. 4, pp. 1--26 (2000).]]Google ScholarCross Ref
V. Batagelj, A. Mrvar, and M. Zaversnik, Partitioning Approaches to Clustering in Graphs, Proc. GD' 1999, LNCS, pp. 99--97 (2000).]]Google Scholar
D. Harel and Y. Koren, A Fast Multi-scale Method for Drawing Large Graphs, Proc. GD'2000, LNCS, pp. 183--196 (2001).]] Google ScholarDigital Library
A. Quigley and P. Eades, FADE: Graph Drawing, Clustering, and Visual Abstraction, Proc. GD'2000, LNCS, pp. 197--210 (2001).]] Google ScholarDigital Library
J. May-Six, Vistool: A Tool For Visualizing Graphs, PhD Thesis, The University of Texas at Dallas (2000).]]Google Scholar
P. K. Agarwal and C. M. Procopiuc, Exact and Approximation Algorithms for Clustering, Proc. 9th ACM-SIAM Symp., Discrete Algorithms (1998).]] Google ScholarDigital Library
J. May-Six and I. G. Tollis, Effective Graph Visualization Via Node Grouping, Proc. IEEE Symposium on information Visualization 2001, pp. 51--58 (2001).]] Google ScholarDigital Library
M. Ester, H. P. Kriegel, J. Sander, and X. Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, pp. 226--231 (1996).]]Google Scholar
M. Ester, H. P. Kriegel, J. Sander, and X. Xu, Clustering for Mining in Large Spatial Databases. KI (Artificial Intelligence), Special Issue on Data Mining, ScienTec Publishing, pp. 18--24 (1998).]]Google Scholar
R. T. Ng and J. Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proc. 20th Int. Conf. on Very Large Data Bases, Morgan Kaufmann, pp. 144--155 (1994).]] Google ScholarDigital Library
W. Wang, J. Yang, and R. Muntz, STING: A Statistical Information Grid Approach to Spatial Data Mining, Proc. 23rd Int. Conf. on Very Large Data Bases, Morgan Kaufmann, pp. 186--195 (1997).]] Google ScholarDigital Library
T. Zhang, R. Ramakrishnan, and M. Linvy, BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proc. ACM SIGMOD Int'l Conf. on Management of Data, ACM Press, pp. 103--114 (1996).]] Google ScholarDigital Library
M. S. Chen, J. Han and P. S. Yu, Data Mining: An Overview from Database Perspective, IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society Press, Vol. 8, No.6, pp. 866--883 (1996).]] Google ScholarDigital Library
D. Harel and Y. Koren, Clustering Spatial Data Using Random Walks, Proc. 7th Int'l Conf. Knowledge Discovery and Data Mining (KDD-2001), ACM Press, New York, pp. 281--286 (2001).]] Google ScholarDigital Library
G. Karypis, E. Han, and V. Kumar, CHAMELEON, A Hierarchical Clustering Algorithm Using Dynamic Modeling, IEEE Computer pp. 68--75, 32 (1999).]] Google ScholarDigital Library
V. Estivill-Castro and I. Lee, AUTOCLUST: Automatic Clustering via Boundary Extraction for Mining Massive Point-Data Sets, 5th Int'l Conf. on Geocomputation, Geo Computation CD-ROM: GC049, ISBN 0-9533477-2-9 (2000).]]Google Scholar
I. Jonyer, L. B. Holder and D. J. Cook, Graph-Based Hierarchical Conceptual Clustering, Proc. of the Thirteenth Annual Florida AI Research Symposium (2000).]] Google ScholarDigital Library
A. K. Jain, M. N. Murty, and P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, Vol. 31, No. 3, pp. 264--323 (1999).]] Google ScholarDigital Library
W. T. McCormick, P. J. Sweitzer, and T. W. White: Problem decomposition and data reorganization by a clustering technique. Oper. Res., (September-October), pp. 993--1009 (1972).]]Google Scholar
K. Zhang and N. Gorla, Locality Metrics and Program Physical Structures, Journal of Systems and Software, 54 (2000), pp. 159--166 (2000).]] Google ScholarDigital Library

Recommendations

Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global Informatization

In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Read More
A hybrid clustering algorithm
FSKD'09: Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1

In view of the fact that DBSCAN clustering algorithm can identify the data with arbitrary shape and one-pass clustering algorithm has the quick and efficient feature, this paper proposes a two-stage hybrid clustering algorithm. DBSCAN is improved to ...
Read More
An efficient hybrid clustering algorithm for molecular sequences classification
ACM-SE 44: Proceedings of the 44th annual Southeast regional conference

The k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '03: Proceedings of the 2003 ACM symposium on Applied computing
March 2003
1268 pages
ISBN:1581136242
DOI:10.1145/952532
Conference Chair:
Gary B. Lamont
Air Force Institute of Technology
,
Program Chairs:
Hisham Haddad
Kennesaw State University
,
George A. Papadopoulos
University of Cyprus, Cyprus
,
Publications Chair:
Brajendra Panda
University of Arkansas
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 March 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 355
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A customizable hybrid approach to data clustering

SAC '03: Proceedings of the 2003 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Recommendations

Hybrid Bisect K-Means Clustering Algorithm

A hybrid clustering algorithm

An efficient hybrid clustering algorithm for molecular sequences classification