ABSTRACT

Clustering is one of the most useful tasks in data mining process. It allows discovering groups and identifying interesting distributions and patterns in the underlying data. There are two major types of clustering approaches: supervised (Daumé III. & Marcu, 2005) and unsupervised Clustering (Bouveyron 2009, a,b, Erman, et al, 2006, c, Lung Wu, & Shen Yang 2007, d, Ozdemir, R et al, 2007). It is a mechanism where a set of objects usually multidimensional in nature, are classified into groups (classes or clusters) such that members of one group are similar according to a predefined criterion. Clustering methods have been successfully used to segment an image into a number of clusters (segments). However, clusteringbased segmentation techniques have used several control parameters, e.g., the predefined number of clusters to be found or some tunable thresholds. These parameters should be adjusted to obtain the best image segmentation. The choice of values for the various parameters is a nontrivial task.Almost every clustering algorithm depends on the characteristics of the dataset and on the input parameters. Incorrect input parameters may lead to clusters that deviate from those in the dataset. In order to determine the input parameters that lead to clusters that fit best a given dataset, we need reliable guidelines to evaluate the clusters. Quantitative evaluation function (known under the general term of cluster validity indexes) has been used. Many criteria have been developed to determine clusters validity (Rendón, et al. 2011, a,b, Stein et al. 2003, c, Ammor et al. 2007, d, Yu Yen, & Cios, 2008, e, Pakhiraa et al. 2005, f, Xu et al. 2005, j, Wu & Yang. 2005). All of which have a common goal to find the clustering which results in compact

clusters that are well separated. In this work, a new index validity is proposed and compared to DaviesBouldin (Davies & Bouldin, 1979) and Dunn (Dunn, 1974) indexes. Davies-Bouldin and Dunn indexes are based on two accepted concepts: a cluster’s compactness and a cluster’s separation. The number of clusters that minimizes DB is taken as the optimal number of clusters. Dunn index is limited to the interval [0, 1] and should be maximized. These two indexes cannot improve the clustering result. They just evaluate the clustering quality. In addition, they do not take in to consideration the notion of cluster density. This property has been used in the proposed index allowing correction in classification results.After a training step bySOMalgorithm, the first level of the proposed index consists in clustering estimation where the clustering correction will be performed in the second step. This index has been applied to medical images, especially on mammography images. Mammography is one of the most reliable methods for early detection of breast carcinomas. However, it is difficult for radiologists to provide both accurate and uniform evaluation for the enormous number of mammograms generated in widespread screening. There are some limitations of human observers: 10-30%of breast lesions aremissed during routine screening (Chabriais, 2001). The quality of image interpretation depends on the clustering result.