ABSTRACT
Clustering has been a subject of wide research since it arises in many application domains. One of the clustering process issues is the evaluation of clustering results. Estimation of the obtained cluster structure quality is the main subject of cluster validity. In several years many cluster validity indexes were presented in the research community, but the general approach for clustering evaluation was not developed. In our work we are going to produce some methodology for cluster validity estimation and construct a special framework for its measure, which will combine a couple of current methods in one suitable tool. We suggest that these investigations will help a wide range of analyst in theirs work with clustering.
- Iso standard 9000-2000: Quality management systems: Fundamentals and vocabulary, 2000.Google Scholar
- M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. Intelligent Information Systems Journal, 17:107--145, 2001. Google ScholarDigital Library
- A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988. Google ScholarDigital Library
- M. J. A. Berry and G. Linoff. Data Mining Techniques For Marketing, Sales and Customer Support. John Wiley & Sons, Inc., 1996. Google ScholarDigital Library
- R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179--188, 1936.Google ScholarCross Ref
- E. Sivogolovko. Cluster validity measurement for crisp clustering. Komp'juternye instrumenty v obrazovanii, 4:14--31, 2011. (In Russian).Google Scholar
- S. A. Knight and J. Burn. Developing a framework for assessing information quality on the world wide web. Informing Science, 8:159--172, 2005.Google ScholarCross Ref
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11, 2009. Google ScholarDigital Library
- D. P. Ballou and H. L. Pazer. Modeling completeness versus consistency tradeoffs in information decision contexts. Knowledge and Data Engineering, IEEE Transactions on, 15:240--243, 2003. Google ScholarDigital Library
- T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. Wiley, 2003. Google ScholarDigital Library
- C. Ordonez and J. Garcia-Garcia. Referential integrity quality metrics. Decision Support Systems, 44:495--508, 2008. Google ScholarDigital Library
- R. Blake and P. Mangiameli. The effects and interactions of data quality and problem complexity on classification. ACM Journal of Data and Information Quality, 2(2), 2011. Google ScholarDigital Library
- E. Sivogolovko. Evaluation of impact of data quality on clustering with syntactic cluster validity methods. Technical report, Christian-Albrechts University, August 2011.Google Scholar
- O. I. Lindland, G. Sindre, and A. Solvberg. Understanding quality in conceptual modelling. In IEEE Software, pages 42--49, 1994. Google ScholarDigital Library
- F. Manola and E. Miller, editors. W3C Recommendation, chapter RDF Primer. 2004.Google Scholar
- Validating cluster structures in data mining tasks
Recommendations
On cluster tree for nested and multi-density data clustering
Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures
Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are ...
Exploring Data Sets for Clusters and Validating Single Clusters
Cluster analysis is often used to find clusters and algorithms are designed and tuned to find the "right" clusters. Instead of searching for the "best" clustering algorithm, we argue that a clear concept of what the aim of a cluster analysis is and a ...
Comments