Processing math: 100%
Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters | IEEE Journals & Magazine | IEEE Xplore

Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters


Abstract:

In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique ha...Show More

Abstract:

In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of k -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 29, Issue: 8, August 2018)
Page(s): 3308 - 3325
Date of Publication: 03 August 2017

ISSN Information:

PubMed ID: 28792907

Funding Agency:


References

References is not available for this document.