Abstract:
As a promising way for heterogeneous data analytics, consensus clustering has attracted increasing attention in recent decades. Among various excellent solutions, the co-...Show MoreMetadata
Abstract:
As a promising way for heterogeneous data analytics, consensus clustering has attracted increasing attention in recent decades. Among various excellent solutions, the co-association matrix based methods form a landmark, which redefines consensus clustering as a graph partition problem. Nevertheless, the relatively high time and space complexities preclude it from wide real-life applications. We, therefore, propose Spectral Ensemble Clustering (SEC) to leverage the advantages of co-association matrix in information integration but run more efficiently. We disclose the theoretical equivalence between SEC and weighted K-means clustering, which dramatically reduces the algorithmic complexity. We also derive the latent consensus function of SEC, which to our best knowledge is the first to bridge co-association matrix based methods to the methods with explicit global objective functions. Further, we prove in theory that SEC holds the robustness, generalizability, and convergence properties. We finally extend SEC to meet the challenge arising from incomplete basic partitions, based on which a row-segmentation scheme for big data clustering is proposed. Experiments on various real-world data sets in both ensemble and multi-view clustering scenarios demonstrate the superiority of SEC to some state-of-the-art methods. In particular, SEC seems to be a promising candidate for big data clustering.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 29, Issue: 5, 01 May 2017)