skip to main content
10.1145/952532.952628acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A customizable hybrid approach to data clustering

Authors Info & Claims
Published:09 March 2003Publication History

ABSTRACT

Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings and many other applications, distance alone is not enough to represent all the attributes of the relation between data points. We need a more powerful model to record more relational information between data objects. This paper adopts a graph model by which a database is regarded as a graph: each vertex of the graph represents a data point, and each edge, weighted or unweighted, is used to record the relation between two data points connected by the edge. Based on the graph model, this paper presents a set of cluster analysis criteria to guide data clustering. The criteria can be used to measure clustering results and help improving the quality of clustering. Further, a customizable algorithm using the criteria is proposed and implemented. This algorithm can produce clusters according to users' specifications. Preliminary experiments show encouraging results.

References

  1. S. E. Hambrusch, C-M. Liu, and H-S. Lim, Clustering in Trees: Optimizing Cluster Sizes and Number of Subtrees, Journal of Graph Algorithms and Applications, Vol. 4, No. 4, pp. 1--26 (2000).]]Google ScholarGoogle ScholarCross RefCross Ref
  2. V. Batagelj, A. Mrvar, and M. Zaversnik, Partitioning Approaches to Clustering in Graphs, Proc. GD' 1999, LNCS, pp. 99--97 (2000).]]Google ScholarGoogle Scholar
  3. D. Harel and Y. Koren, A Fast Multi-scale Method for Drawing Large Graphs, Proc. GD'2000, LNCS, pp. 183--196 (2001).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Quigley and P. Eades, FADE: Graph Drawing, Clustering, and Visual Abstraction, Proc. GD'2000, LNCS, pp. 197--210 (2001).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. May-Six, Vistool: A Tool For Visualizing Graphs, PhD Thesis, The University of Texas at Dallas (2000).]]Google ScholarGoogle Scholar
  6. P. K. Agarwal and C. M. Procopiuc, Exact and Approximation Algorithms for Clustering, Proc. 9th ACM-SIAM Symp., Discrete Algorithms (1998).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. May-Six and I. G. Tollis, Effective Graph Visualization Via Node Grouping, Proc. IEEE Symposium on information Visualization 2001, pp. 51--58 (2001).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Ester, H. P. Kriegel, J. Sander, and X. Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, pp. 226--231 (1996).]]Google ScholarGoogle Scholar
  9. M. Ester, H. P. Kriegel, J. Sander, and X. Xu, Clustering for Mining in Large Spatial Databases. KI (Artificial Intelligence), Special Issue on Data Mining, ScienTec Publishing, pp. 18--24 (1998).]]Google ScholarGoogle Scholar
  10. R. T. Ng and J. Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proc. 20th Int. Conf. on Very Large Data Bases, Morgan Kaufmann, pp. 144--155 (1994).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Wang, J. Yang, and R. Muntz, STING: A Statistical Information Grid Approach to Spatial Data Mining, Proc. 23rd Int. Conf. on Very Large Data Bases, Morgan Kaufmann, pp. 186--195 (1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Zhang, R. Ramakrishnan, and M. Linvy, BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proc. ACM SIGMOD Int'l Conf. on Management of Data, ACM Press, pp. 103--114 (1996).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. S. Chen, J. Han and P. S. Yu, Data Mining: An Overview from Database Perspective, IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society Press, Vol. 8, No.6, pp. 866--883 (1996).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Harel and Y. Koren, Clustering Spatial Data Using Random Walks, Proc. 7th Int'l Conf. Knowledge Discovery and Data Mining (KDD-2001), ACM Press, New York, pp. 281--286 (2001).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Karypis, E. Han, and V. Kumar, CHAMELEON, A Hierarchical Clustering Algorithm Using Dynamic Modeling, IEEE Computer pp. 68--75, 32 (1999).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Estivill-Castro and I. Lee, AUTOCLUST: Automatic Clustering via Boundary Extraction for Mining Massive Point-Data Sets, 5th Int'l Conf. on Geocomputation, Geo Computation CD-ROM: GC049, ISBN 0-9533477-2-9 (2000).]]Google ScholarGoogle Scholar
  17. I. Jonyer, L. B. Holder and D. J. Cook, Graph-Based Hierarchical Conceptual Clustering, Proc. of the Thirteenth Annual Florida AI Research Symposium (2000).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. K. Jain, M. N. Murty, and P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, Vol. 31, No. 3, pp. 264--323 (1999).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. T. McCormick, P. J. Sweitzer, and T. W. White: Problem decomposition and data reorganization by a clustering technique. Oper. Res., (September-October), pp. 993--1009 (1972).]]Google ScholarGoogle Scholar
  20. K. Zhang and N. Gorla, Locality Metrics and Program Physical Structures, Journal of Systems and Software, 54 (2000), pp. 159--166 (2000).]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SAC '03: Proceedings of the 2003 ACM symposium on Applied computing
    March 2003
    1268 pages
    ISBN:1581136242
    DOI:10.1145/952532

    Copyright © 2003 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 March 2003

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate1,650of6,669submissions,25%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader