poster

Mutual information based weighted clustering for mixed attributes

Authors:
Yogalakshmi Jayabal

International Institute of Information Technology, Bangalore, India

International Institute of Information Technology, Bangalore, India
View Profile

,
Chandrashekar Ramanathan

International Institute of Information Technology, Bangalore, India

International Institute of Information Technology, Bangalore, India
View Profile

CODS '15: Proceedings of the 2nd ACM IKDD Conference on Data SciencesMarch 2015Pages 136–137https://doi.org/10.1145/2732587.2732616

Published:18 March 2015Publication History

CODS '15: Proceedings of the 2nd ACM IKDD Conference on Data Sciences

Pages 136–137

ABSTRACT

There exists large number of clustering algorithms either for numeric or for categorical data sets. There are relatively less algorithms for clustering mixed attributes. This paper proposes Mutual Information based Weighted Clustering for Mixed Attributes (MI-WCMA) based on euclidean distance for numeric attributes, distance measure based on similarity for categorical attributes using rough sets and weights for features based on average mutual information. The metrics accuracy, silhouette width and kappa co-efficient are used for evaluation and comparison with existing algorithms.

References

R core team, r: A language and environment for statistical computing, r foundation for statistical computing, 2014.Google Scholar
A. Ahmad and L. Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data and Knowledege Engineering, 63: 503--527, 2007. Google ScholarDigital Library
C. Bean and C. Kambhampati. Autonomous clustering using rough set theory. International Journal of Automation and Computing, 5(1): 90--102, January 2008.Google ScholarCross Ref
A. Desai, H. Singh, and V. Pudi. Disc: Data-intensive similarity measure for categorical data. In Advances in Knowledge Discovery and Data Mining, volume 6635, pages 469--481, 2011. Google ScholarDigital Library
K. Gibert and U. Cortés. Weighting quantitative and qualitative variables in clustering methods. Mathware and Soft Computing, 4: 251--266, 1997.Google Scholar
Z. He, X. Xu, and S. Deng. Clustering mixed numeric and categorical data: A cluster ensemble approach. CoRR, abs/cs/0509011, 2005.Google Scholar
Z. Huang. Clustering large datasets with mixed numeric and categorical values. In Proceedings of First Pacific-Asia Conference on Knowledge Discovery and Data mining, World Scientifc, 1997.Google Scholar
Z. Huang. A fast clustering algorithm to cluster very large categorical data sets in data mining. In In Research Issues on Data Mining and Knowledge Discovery, pages 1--8, 1997.Google Scholar
C. Li and G. Biswas. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 14(4): 673âĂŞ690, 2002. Google ScholarDigital Library
P. E. Meyer. infotheo: Information-theoretic measures, 2012.Google Scholar
Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11: 341--356, 1982.Google ScholarCross Ref
B. K. Tripathy and A. Ghosh. Ssdr: An algorithm for clustering categorical data using rough set theory. Advances in Applied Science Research, 2(3): 314--326, 2011.Google Scholar

Index Terms

Mutual information based weighted clustering for mixed attributes
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

A generalized multi-aspect distance metric for mixed-type data clustering
Highlights
- In this study, a new distance definition for clustering of mixed data including nominal, ordinal, and numerical attributes was proposed.
Abstract
Distance calculation is straightforward when working with pure categorical or pure numerical data sets. Defining a unified distance to improve the clustering performance for a mixed data set composed of nominal, ordinal, and numerical ...
Read More
Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Feature weighting is one of the popular and effective ways to improve clustering quality. How to choose a proper weighting method for a data object is widely recognized as a difficult problem. Among majority of weighting schemes and combination ...
Read More
Simplex Based Vector Mapping for Categorical Attributes Clustering
CIIS '18: Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems

When clustering unlabeled data, categorical attributes are usually treated differently from numerical attributes because of their unique characteristics, which introduces difficulties in clustering data with both types of attributes. In this paper, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS '15: Proceedings of the 2nd ACM IKDD Conference on Data Sciences
March 2015
150 pages
ISBN:9781450334365
DOI:10.1145/2732587
General Chairs:
Manish Gupta
Xerox Research Center, India
,
Y. Narahari
Indian Institute of Science, Bangalore
,
Program Chairs:
V. S. Subrahmanian
University of Maryland, College Park
,
Indrajit Bhattacharya
IBM Research, India
Copyright © 2015 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 March 2015
Check for updates
Author Tags
clustering
feature weighting
mixed attributes
mutual information
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate197of680submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 133
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mutual information based weighted clustering for mixed attributes

CODS '15: Proceedings of the 2nd ACM IKDD Conference on Data Sciences

ABSTRACT

References

Cited By

Index Terms

Recommendations

A generalized multi-aspect distance metric for mixed-type data clustering

Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Simplex Based Vector Mapping for Categorical Attributes Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mutual information based weighted clustering for mixed attributes

CODS '15: Proceedings of the 2nd ACM IKDD Conference on Data Sciences

ABSTRACT

References

Cited By

Index Terms

Recommendations

A generalized multi-aspect distance metric for mixed-type data clustering

Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Simplex Based Vector Mapping for Categorical Attributes Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media