Skip to main content
Log in

A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

The paper discusses a new approach for incorporating hard constraints into the K-means algorithm for semi-supervised clustering. An analytic modification of the objective function of K-means is proposed that has not been previously considered in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  • Barbier, G., Zafarani, R., Gao, H., Fung, G., Liu, H. (2012). Maximizing benefits from crowdsourced data. Computational and Mathematical Organization Theory, 18, 257–279.

    Article  Google Scholar 

  • Basu, S., Banerjee, A., Mooney, R. (2002). Semi-supervised clustering by seeding. In Proceedings of the 19th international conference on machine learning (pp. 19–26).

  • Basu, S., Banerjee, A., Mooney, R. (2004). Active semi-supervision for pairwise constrained clustering. In Proceedings of the SIAM international conference on data mining.

  • Basu, S., Davidson, I., Wagstaff, K. (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Bilenko, M., & Mooney, J.R. (2003). Adaptive duplicate detection using learnable string similarity measures. In International conference on knowledge discovery and data mining (pp. 39–48).

  • Celeux, G., & Govaert, G. (1993). Comparison of the mixture and the classification maximum likelihood in cluster analysis. Journal of Statistical Computation and Simulation, 47, 127–146.

    Article  Google Scholar 

  • Covões, T.F., Hruschka, E.R., Ghosh, J. (2013). A study of k-means-based algorithms for constrained clustering. Intelligent Data Analysis, 17, 485–505.

    Article  Google Scholar 

  • Davidson, I., & Ravi, S. (2005). Clustering with constraints: feasibility issues and the k-means algorithm. In Proceedings of the 2005 SIAM international conference on data mining (pp. 138–149): SIAM.

  • Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood for incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.

    MATH  Google Scholar 

  • DeSarbo, W.S., & Mahajan, V. (1984). Constrained classification: the use of a priori information in cluster analysis. Psychometrika, 49, 187–215.

    Article  Google Scholar 

  • Dinler, D., & Tural, M.K. (2016). A survey of constrained clustering. In Unsupervised learning algorithms (pp. 207–235): Springer.

  • Fatehi, K., Bozorgi, A., Zahedi, M.S., Asgarian, E. (2014). Improving semi-supervised constrained k-means clustering method using user feedback. Journal of Computing and Security, 1, 273–261.

    Google Scholar 

  • Gu, L., & Lu, X. (2012). Semi-supervised subtractive clustering by seeding. In 2012 9th international conference on fuzzy systems and knowledge discovery (pp. 738–741): IEEE.

  • Hennig, C., Meila, M., Murtagh, F., Rocci, R. (2015). Handbook of cluster analysis. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Liu, H., & Fu, Y. (2015). Clustering with partition level side information. In 2015 IEEE international conference on data mining (pp. 877–882): IEEE.

  • Maitra, R., & Melnykov, V. (2010). Simulating data to study performance of finite mixture modeling and clustering algorithms. Journal of Computational and Graphical Statistics, 19, 354–376.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.

    Book  Google Scholar 

  • Melnykov, V., Chen, W.-C., Maitra, R. (2012). Mixsim: an R package for simulating data to study performance of clustering algorithms. Journal of Statistical Software, 51, 1–25.

    Article  Google Scholar 

  • Melnykov, V., Melnykov, I., Michael, S. (2016). Semi-supervised model-based clustering with positive and negative constraints. Advances in data analysis and classification, 10, 327–349.

    Article  MathSciNet  Google Scholar 

  • Nimmo, D.W.R., Herrmann, S.J., Sublette, J.E., Melnykov, I.V., Helland, L.K., Romine, J.A., Carsella, J.S., Herrmann-Hoesing, L.M., Turner, J.A., Vanden Heuvel, B.D. (2018). Occurrence of Chironomid species (Diptera: Chironomidae) in the high Se-78 concentrations and high pH of Fountain Creek Watershed, Colorado, USA. Western North American Naturalist, 78, 39–64–26.

    Article  Google Scholar 

  • Ruiz, C., Spiliopoulou, M., Menasalvas, E. (2010). Density-based semi-supervised clustering. Data Mining and Knowledge Discovery, 21, 345–370.

    Article  MathSciNet  Google Scholar 

  • Śmieja, M., & Wiercioch, M. (2017). Constrained clustering with a complex cluster structure. Advances in Data Analysis and Classification, 11, 493–518.

    Article  MathSciNet  Google Scholar 

  • Steinley, D., & Brusco, M.J. (2011). Evaluating mixture modeling for clustering: recommendations and cautions. Psychological Methods, 16, 63.

    Article  Google Scholar 

  • Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S. (2001). Constrained K-means clustering with background knowledge. In Proceedings of the eighteenth international conference on machine learning (ICML-2001) (pp. 577–584).

  • Wang, X., Wang, C., Shen, J. (2011). Semi–supervised K-means clustering by optimizing initial cluster centers. In International conference on web information systems and mining (pp. 178–187): Springer.

  • Yu, Z., Luo, P., You, J., Wong, H.-S., Leung, H., Wu, S., Zhang, J., Han, G. (2015). Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Transactions on Knowledge and Data Engineering, 28, 701–714.

    Article  Google Scholar 

  • Zhigang, C., Xuan, L., Fan, Y. (2013). Constrained k-means with external information. In 2013 8th International conference on computer science & education (pp. 490–493): IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Melnykov.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melnykov, I., Melnykov, V. A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints. J Classif 37, 789–809 (2020). https://doi.org/10.1007/s00357-019-09349-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09349-x

Keywords

Navigation