ABSTRACT
There exists large number of clustering algorithms either for numeric or for categorical data sets. There are relatively less algorithms for clustering mixed attributes. This paper proposes Mutual Information based Weighted Clustering for Mixed Attributes (MI-WCMA) based on euclidean distance for numeric attributes, distance measure based on similarity for categorical attributes using rough sets and weights for features based on average mutual information. The metrics accuracy, silhouette width and kappa co-efficient are used for evaluation and comparison with existing algorithms.
- R core team, r: A language and environment for statistical computing, r foundation for statistical computing, 2014.Google Scholar
- A. Ahmad and L. Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data and Knowledege Engineering, 63: 503--527, 2007. Google ScholarDigital Library
- C. Bean and C. Kambhampati. Autonomous clustering using rough set theory. International Journal of Automation and Computing, 5(1): 90--102, January 2008.Google ScholarCross Ref
- A. Desai, H. Singh, and V. Pudi. Disc: Data-intensive similarity measure for categorical data. In Advances in Knowledge Discovery and Data Mining, volume 6635, pages 469--481, 2011. Google ScholarDigital Library
- K. Gibert and U. Cortés. Weighting quantitative and qualitative variables in clustering methods. Mathware and Soft Computing, 4: 251--266, 1997.Google Scholar
- Z. He, X. Xu, and S. Deng. Clustering mixed numeric and categorical data: A cluster ensemble approach. CoRR, abs/cs/0509011, 2005.Google Scholar
- Z. Huang. Clustering large datasets with mixed numeric and categorical values. In Proceedings of First Pacific-Asia Conference on Knowledge Discovery and Data mining, World Scientifc, 1997.Google Scholar
- Z. Huang. A fast clustering algorithm to cluster very large categorical data sets in data mining. In In Research Issues on Data Mining and Knowledge Discovery, pages 1--8, 1997.Google Scholar
- C. Li and G. Biswas. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 14(4): 673âĂŞ690, 2002. Google ScholarDigital Library
- P. E. Meyer. infotheo: Information-theoretic measures, 2012.Google Scholar
- Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11: 341--356, 1982.Google ScholarCross Ref
- B. K. Tripathy and A. Ghosh. Ssdr: An algorithm for clustering categorical data using rough set theory. Advances in Applied Science Research, 2(3): 314--326, 2011.Google Scholar
Index Terms
- Mutual information based weighted clustering for mixed attributes
Recommendations
A generalized multi-aspect distance metric for mixed-type data clustering
Highlights- In this study, a new distance definition for clustering of mixed data including nominal, ordinal, and numerical attributes was proposed.
AbstractDistance calculation is straightforward when working with pure categorical or pure numerical data sets. Defining a unified distance to improve the clustering performance for a mixed data set composed of nominal, ordinal, and numerical ...
Mutual information evaluation: A way to predict the performance of feature weighting on clustering
Feature weighting is one of the popular and effective ways to improve clustering quality. How to choose a proper weighting method for a data object is widely recognized as a difficult problem. Among majority of weighting schemes and combination ...
Simplex Based Vector Mapping for Categorical Attributes Clustering
CIIS '18: Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent SystemsWhen clustering unlabeled data, categorical attributes are usually treated differently from numerical attributes because of their unique characteristics, which introduces difficulties in clustering data with both types of attributes. In this paper, we ...
Comments