skip to main content
10.1145/2732587.2732616acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
poster

Mutual information based weighted clustering for mixed attributes

Published:18 March 2015Publication History

ABSTRACT

There exists large number of clustering algorithms either for numeric or for categorical data sets. There are relatively less algorithms for clustering mixed attributes. This paper proposes Mutual Information based Weighted Clustering for Mixed Attributes (MI-WCMA) based on euclidean distance for numeric attributes, distance measure based on similarity for categorical attributes using rough sets and weights for features based on average mutual information. The metrics accuracy, silhouette width and kappa co-efficient are used for evaluation and comparison with existing algorithms.

References

  1. R core team, r: A language and environment for statistical computing, r foundation for statistical computing, 2014.Google ScholarGoogle Scholar
  2. A. Ahmad and L. Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data and Knowledege Engineering, 63: 503--527, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bean and C. Kambhampati. Autonomous clustering using rough set theory. International Journal of Automation and Computing, 5(1): 90--102, January 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. Desai, H. Singh, and V. Pudi. Disc: Data-intensive similarity measure for categorical data. In Advances in Knowledge Discovery and Data Mining, volume 6635, pages 469--481, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Gibert and U. Cortés. Weighting quantitative and qualitative variables in clustering methods. Mathware and Soft Computing, 4: 251--266, 1997.Google ScholarGoogle Scholar
  6. Z. He, X. Xu, and S. Deng. Clustering mixed numeric and categorical data: A cluster ensemble approach. CoRR, abs/cs/0509011, 2005.Google ScholarGoogle Scholar
  7. Z. Huang. Clustering large datasets with mixed numeric and categorical values. In Proceedings of First Pacific-Asia Conference on Knowledge Discovery and Data mining, World Scientifc, 1997.Google ScholarGoogle Scholar
  8. Z. Huang. A fast clustering algorithm to cluster very large categorical data sets in data mining. In In Research Issues on Data Mining and Knowledge Discovery, pages 1--8, 1997.Google ScholarGoogle Scholar
  9. C. Li and G. Biswas. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 14(4): 673âĂŞ690, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. E. Meyer. infotheo: Information-theoretic measures, 2012.Google ScholarGoogle Scholar
  11. Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11: 341--356, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  12. B. K. Tripathy and A. Ghosh. Ssdr: An algorithm for clustering categorical data using rough set theory. Advances in Applied Science Research, 2(3): 314--326, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Mutual information based weighted clustering for mixed attributes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        CODS '15: Proceedings of the 2nd ACM IKDD Conference on Data Sciences
        March 2015
        150 pages
        ISBN:9781450334365
        DOI:10.1145/2732587

        Copyright © 2015 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 March 2015

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate197of680submissions,29%
      • Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader