skip to main content
10.1145/2695664.2695738acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Semi-supervised clustering using multi-assistant-prototypes to represent each cluster

Published:13 April 2015Publication History

ABSTRACT

The incorporation of semi-supervision in the cluster detection process has proved especially useful when one wants to get a high consistency between the data partitioning and the knowledge the user has about the data domain. In recent years, several strategies for semi-supervised clustering have been proposed. The approaches adopted by these strategies aim at guiding the process of cluster detection by using constraints: to interfere with the allocation of elements to the most appropriate cluster at each iteration of the algorithm; or to modify the objective function employed. This paper proposes a novel approach for incorporating semi-supervision in the well-known k-means algorithm. This semi-supervised clustering method employs constraint information in the definition of multiple assistant representatives for the centroids used at each iteration of k-means. A refinement process is designed to reduce the number of assistant representatives considered for each centroid without losing the clustering quality. The experimental results with eight synthetic datasets show the potential of the proposed approach for dealing with complex data structures composed by clusters of different shapes.

References

  1. M. C. N. Barioni, H. L. Razente, A. M. R. Marcelino, A. J. M. Traina, and C. Traina-Jr. Open issues for partitioning clustering methods: an overview. Wiley Interdisc. Rew.: DMKD, 4(3):161--177, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Basu, I. Davidson, and K. Wagstaff. Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC, 1 edition, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Demšar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Dubey, I. Bhattacharya, and S. Godbole. A cluster-level semi-supervision model for interactive clustering. In ECML PKDD, LNCS vol. 6321, pages 409--424. Springer, Barcelona, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. ACM TKDD, 1(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. SIGMOD Rec., 27(2):73--84, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Huang, Y. Cheng, and R. Zhao. A semi-supervised clustering algorithm based on must-link set. In ADMA, pages 492--499, Berlin, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett., 31(8):651--666, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Ley. DBLP Computer Science Bibliography. http://dblp.uni-trier.de/. Accessed: 2014-09-16.Google ScholarGoogle Scholar
  10. E. Y. Liu, Z. Zhang, and W. Wang. Clustering with relative constraints. In KDD, pages 947--955, San Diego, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Liu, X. Jiang, and A. C. Kot. A multi-prototype clustering algorithm. Pattern Recognition, 42(5):689--698, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Pourrajabi, D. Moulavi, R. J. G. B. Campello, A. Zimek, J. Sander, and R. Goebel. Model selection for semi-supervised clustering. In EDBT, pages 331--342, Athens, 2014.Google ScholarGoogle Scholar
  13. W. Rand. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc., 66(336):846--850, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Schmidt, E. M. Brandle, and S. Kramer. Clustering with attribute-level constraints. In ICDM, pages 1206--1211, Vancouver, 2011. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl. Constrained k-means clustering with background knowledge. In ICML, pages 577--584, Williamstown, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. H. Zar. Biostatistical Analysis. Prentice Hall, 5th edition, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Zheng and T. Li. Semi-supervised hierarchical clustering. In ICDM, pages 982--991, Vancouver, 2011. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semi-supervised clustering using multi-assistant-prototypes to represent each cluster

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing
      April 2015
      2418 pages
      ISBN:9781450331968
      DOI:10.1145/2695664

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 April 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SAC '15 Paper Acceptance Rate291of1,211submissions,24%Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader