Clustering is a task of forming groups of similar objects based on the predefined proximity (similarity/dissimilarity) measure and grouping criteria. A lot of approaches, for example, agglomerative/divisive hierarchical clustering, k-means and EM algorithms, have been proposed in the literature [1,2] and widely used for exploratory analysis of real-world data. In order to find the best partition of objects that maximizes both inter-cluster homogeneity and between-clusters isolation, clustering methods often employ geometric measures such as the variance of distances. However, it becomes difficult to form appropriate clusters if only a proximity matrix is available as intrinsic information for analysis and the raw attribute values of data are unavailable or inaccessible. This is because the lack of attribute-value information may bring a difficulty in computing the global properties of groups such as centroids. Additionally, the choice of global coherence/isolation measures is limited if the proximity is defined as a subjective or relative measure, because such a measure may not satisfy the triangular inequality for any triplet of objects. Although conventional hierarchical clusterings are known to be able to deal with relative or subjective measures, they involve other problems such as erosion or expansion of data space by intermediate objects between large clusters and the results are dependent on the orders of object handling [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
B. S. Everitt, S. Landau, and M. Leese (2001): Cluster Analysis Fourth Edition. Arnold Publishers.
P. Berkhin (2002): Survey of Clustering Data Mining Techniques. Accrue Software Research Paper. URL: http://www.accrue.com/products/researchpapers.html.
Z. Pawlak (1991): Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht.
J. W. Grzymala-Busse and M. Noordeen (1988): “CRS – A Program for Clustering Based on Rough Set Theory,” Research report, Department of Computer Science, University of Kansas, TR-88-3, 13.
J. Neyman and E. L. Scott (1958): “Statistical Approach to Problems of Cosmology,” Journal of the Royal Statistical Society, Series B20: 1–43.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hirano, S., Tsumoto, S. (2008). Discovery of Clusters from Proximity Data: An Approach Using Iterative Adjustment of Binary Classifications. In: Iwata, S., Ohsawa, Y., Tsumoto, S., Zhong, N., Shi, Y., Magnani, L. (eds) Communications and Discoveries from Multidisciplinary Data. Studies in Computational Intelligence, vol 123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78733-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-78733-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78732-7
Online ISBN: 978-3-540-78733-4
eBook Packages: EngineeringEngineering (R0)