Abstract:
Clustering is an invaluable data analysis tool in a variety of applications. However, existing algorithms often assume that the clusters do not have any structural relati...Show MoreMetadata
Abstract:
Clustering is an invaluable data analysis tool in a variety of applications. However, existing algorithms often assume that the clusters do not have any structural relationship. Hence, they may not work well in situations where such structural relationships are present (e.g., it may be given that the document clusters are residing in a hierarchy). Recently, the development of the kernel-based structured clustering algorithm CLUHSIC [9] tries to alleviate this problem. But since the input kernel matrix is defined purely based on the feature vectors of the input data, it does not take the output clustering structure into account. Consequently, a direct alignment of the input and output kernel matrices may not assure good performance. In this paper, we reduce this mismatch by learning a better input kernel matrix using techniques from semi-supervised kernel learning. We combine manifold information and output structure information with pairwise clustering constraints that are automatically generated during the clustering process. Experiments on a number of data sets show that the proposed method outperforms existing structured clustering algorithms.
Date of Conference: 31 July 2011 - 05 August 2011
Date Added to IEEE Xplore: 03 October 2011
ISBN Information: