skip to main content
10.1145/1389095.1389363acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Genetic-guided semi-supervised clustering algorithm with instance-level constraints

Published:12 July 2008Publication History

ABSTRACT

Semi-supervised clustering with instance-level constraints is one of the most active research topics in the areas of pattern recognition, machine learning and data mining. Several recent studies have shown that instance-level constraints can significantly increase accuracies of a variety of clustering algorithms. However, instance-level constraints may split the search space of the optimal clustering solution into pieces, thus significantly compound the difficulty of the search task. This paper explores a genetic approach to solve the problem of semi-supervised clustering with instance-level constraints. In particular, a novel semi-supervised clustering algorithm with instance-level constraints, termed as the hybrid genetic-guided semi-supervised clustering algorithm with instance-level constraints (Cop-HGA), is proposed. Cop-HGA uses a hybrid genetic algorithm to perform the search task of a high quality clustering solution that is able to draw a good balance between predefined clustering criterion and available instance-level background knowledge. The effectiveness of Cop-HGA is confirmed by experimental results on several real data sets with artificial instance-level constraints.

References

  1. A.K. Jain, M.N. Murty, and P.J. Flynn. Data clustering: A review. ACM Computing Survey, 13:264--323, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Wagstaff. Intelligent Clustering with Instance-Level Constraints. Department of Computer Science and Engineering, Cornell University, 2002.Google ScholarGoogle Scholar
  3. M. Law. Clustering, Dimensionality Reduction, and Side Information. Department of Computer Science and Engineering, Michigan State University, 2006.Google ScholarGoogle Scholar
  4. K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In International Conference on Machine Learning, pages 577--584, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In International Conference on Machine Learning, pages 1103--1110, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Klein, S. D. Kamvar, and C. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In International Conference on Machine Learning, pages 307--314, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Lu and T.K. Leen. Semi-supervised learning with penalized probabilistic clustering. In Advances in Neural Information Processing Systems, 2005.Google ScholarGoogle Scholar
  8. S. Basu, M. Bilenko, and R.J. Mooney. A probabilistic framework for semi-supervised clustering. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 56--68, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E.P. Xing, A. Y. Ng, M.I. Jordan, and S. Russell. Distance metrix learning with application to clustering with side-information. In Advances in Neural Information Processing Systems, pages 505--512, 2002.Google ScholarGoogle Scholar
  10. S. Basu and I. Davidson. Clustering with constraints: Theory and practice. In ACM KDD2006 Tutorials, 2006.Google ScholarGoogle Scholar
  11. Y. Hong and S. Kwong. Learning assignment order of instances for constrained k-means clustering algorithm. IEEE Transactions on System, Man and Cybernetics, Part B, Under Review. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Hong, S. Kwong, H. Xiong, and Qingsheng Ren. Data clustering using virtual population based incremental learning algorithm with similarity matrix encoding strategy. In GECCO 2008, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Blake and C. Merz. UCI Machine Learning Repository. http://www.ics.uci.edu/mlearn/MLRepository.html, 1998.Google ScholarGoogle Scholar
  14. W.M. Rand. Objective criterion for the evaluation of clustering methods. Journal of Americal Statistical Association, 66:846--850, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  15. K. Krishna and M. Murty. Genetic k-means algorithm. IEEE Transactions on System, Man, and Cybernetics-Part B, 29:433--439, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Genetic-guided semi-supervised clustering algorithm with instance-level constraints

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation
      July 2008
      1814 pages
      ISBN:9781605581309
      DOI:10.1145/1389095
      • Conference Chair:
      • Conor Ryan,
      • Editor:
      • Maarten Keijzer

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader