Abstract:
Most existing large-scale multiview clustering algorithms attempt to capture data distribution in multiple views by selecting view-wise anchor representations beforehand ...Show MoreMetadata
Abstract:
Most existing large-scale multiview clustering algorithms attempt to capture data distribution in multiple views by selecting view-wise anchor representations beforehand with k-means, or by direct matrix factorization on the original observations. Despite impressive performance, few of them have paid attention to the semantic correlations between anchor bases and cluster centroids, or even the underlying relations between clusters and data samples. In view of this, we propose a Concept Factorization based Multiview Clustering for Large-scale Data (CFMC) method with nearly linear complexity. The anchor bases learning, coefficient expression with clear semantic cues and partitioning are integrated together in this unified model. Meanwhile, explicit connections among multiview data, anchor bases and clusters are modeled via coefficient representations with semantic meanings. A four-step alternate minimizing algorithm is designed to handle the optimization problem, which is proved to have linear time complexity w.r.t. the sample size. Extensive experiments conducted on several challenging large-scale datasets confirm the superiority of the method compared with the state-of-the-art methods.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 36, Issue: 11, November 2024)