A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints

Melnykov, Igor; Melnykov, Volodymyr

doi:10.1007/s00357-019-09349-x

A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints

Published: 10 January 2020

Volume 37, pages 789–809, (2020)
Cite this article

Journal of Classification Aims and scope Submit manuscript

234 Accesses
2 Citations
Explore all metrics

Abstract

The paper discusses a new approach for incorporating hard constraints into the K-means algorithm for semi-supervised clustering. An analytic modification of the objective function of K-means is proposed that has not been previously considered in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barbier, G., Zafarani, R., Gao, H., Fung, G., Liu, H. (2012). Maximizing benefits from crowdsourced data. Computational and Mathematical Organization Theory, 18, 257–279.
Article Google Scholar
Basu, S., Banerjee, A., Mooney, R. (2002). Semi-supervised clustering by seeding. In Proceedings of the 19th international conference on machine learning (pp. 19–26).
Basu, S., Banerjee, A., Mooney, R. (2004). Active semi-supervision for pairwise constrained clustering. In Proceedings of the SIAM international conference on data mining.
Basu, S., Davidson, I., Wagstaff, K. (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press.
Book Google Scholar
Bilenko, M., & Mooney, J.R. (2003). Adaptive duplicate detection using learnable string similarity measures. In International conference on knowledge discovery and data mining (pp. 39–48).
Celeux, G., & Govaert, G. (1993). Comparison of the mixture and the classification maximum likelihood in cluster analysis. Journal of Statistical Computation and Simulation, 47, 127–146.
Article Google Scholar
Covões, T.F., Hruschka, E.R., Ghosh, J. (2013). A study of k-means-based algorithms for constrained clustering. Intelligent Data Analysis, 17, 485–505.
Article Google Scholar
Davidson, I., & Ravi, S. (2005). Clustering with constraints: feasibility issues and the k-means algorithm. In Proceedings of the 2005 SIAM international conference on data mining (pp. 138–149): SIAM.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood for incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.
MATH Google Scholar
DeSarbo, W.S., & Mahajan, V. (1984). Constrained classification: the use of a priori information in cluster analysis. Psychometrika, 49, 187–215.
Article Google Scholar
Dinler, D., & Tural, M.K. (2016). A survey of constrained clustering. In Unsupervised learning algorithms (pp. 207–235): Springer.
Fatehi, K., Bozorgi, A., Zahedi, M.S., Asgarian, E. (2014). Improving semi-supervised constrained k-means clustering method using user feedback. Journal of Computing and Security, 1, 273–261.
Google Scholar
Gu, L., & Lu, X. (2012). Semi-supervised subtractive clustering by seeding. In 2012 9th international conference on fuzzy systems and knowledge discovery (pp. 738–741): IEEE.
Hennig, C., Meila, M., Murtagh, F., Rocci, R. (2015). Handbook of cluster analysis. Boca Raton: CRC Press.
Book Google Scholar
Liu, H., & Fu, Y. (2015). Clustering with partition level side information. In 2015 IEEE international conference on data mining (pp. 877–882): IEEE.
Maitra, R., & Melnykov, V. (2010). Simulating data to study performance of finite mixture modeling and clustering algorithms. Journal of Computational and Graphical Statistics, 19, 354–376.
Article MathSciNet Google Scholar
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Book Google Scholar
Melnykov, V., Chen, W.-C., Maitra, R. (2012). Mixsim: an R package for simulating data to study performance of clustering algorithms. Journal of Statistical Software, 51, 1–25.
Article Google Scholar
Melnykov, V., Melnykov, I., Michael, S. (2016). Semi-supervised model-based clustering with positive and negative constraints. Advances in data analysis and classification, 10, 327–349.
Article MathSciNet Google Scholar
Nimmo, D.W.R., Herrmann, S.J., Sublette, J.E., Melnykov, I.V., Helland, L.K., Romine, J.A., Carsella, J.S., Herrmann-Hoesing, L.M., Turner, J.A., Vanden Heuvel, B.D. (2018). Occurrence of Chironomid species (Diptera: Chironomidae) in the high Se-78 concentrations and high pH of Fountain Creek Watershed, Colorado, USA. Western North American Naturalist, 78, 39–64–26.
Article Google Scholar
Ruiz, C., Spiliopoulou, M., Menasalvas, E. (2010). Density-based semi-supervised clustering. Data Mining and Knowledge Discovery, 21, 345–370.
Article MathSciNet Google Scholar
Śmieja, M., & Wiercioch, M. (2017). Constrained clustering with a complex cluster structure. Advances in Data Analysis and Classification, 11, 493–518.
Article MathSciNet Google Scholar
Steinley, D., & Brusco, M.J. (2011). Evaluating mixture modeling for clustering: recommendations and cautions. Psychological Methods, 16, 63.
Article Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S. (2001). Constrained K-means clustering with background knowledge. In Proceedings of the eighteenth international conference on machine learning (ICML-2001) (pp. 577–584).
Wang, X., Wang, C., Shen, J. (2011). Semi–supervised K-means clustering by optimizing initial cluster centers. In International conference on web information systems and mining (pp. 178–187): Springer.
Yu, Z., Luo, P., You, J., Wong, H.-S., Leung, H., Wu, S., Zhang, J., Han, G. (2015). Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Transactions on Knowledge and Data Engineering, 28, 701–714.
Article Google Scholar
Zhigang, C., Xuan, L., Fan, Y. (2013). Constrained k-means with external information. In 2013 8th International conference on computer science & education (pp. 490–493): IEEE.

Download references

Author information

Authors and Affiliations

University of Minnesota - Duluth, Duluth, MN, 55812, USA
Igor Melnykov
The University of Alabama, Tuscaloosa, AL, 35487, USA
Volodymyr Melnykov

Authors

Igor Melnykov
View author publications
You can also search for this author in PubMed Google Scholar
Volodymyr Melnykov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Melnykov.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melnykov, I., Melnykov, V. A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints. J Classif 37, 789–809 (2020). https://doi.org/10.1007/s00357-019-09349-x

Download citation

Published: 10 January 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00357-019-09349-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation