Skip to main content
Log in

Attainable accuracy guarantee for the k-medians clustering in [0, 1]

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

We consider the famous k-medians clustering problem in the context of a zero-sum two-player game, which is defined as follows. For given integers \(n>1\) and \(k>1\), strategy sets of the first and second players consist of n-samples drawn from the unit segment [0, 1] and partitions of the index set \(\{1,\ldots , n\}\) into k nonempty subsets (clusters), respectively. As a payoff, we take a loss function of the k-medians clustering evaluated in terms of the sample chosen by the first player and the partition taken by the second one. Actually, the payoff coincides with the sum of distances between points of the sample and the nearest center of a cluster. It is easy to verify that this game has no value. In this paper, for any \(n>1\) and \(k>1\), we show that \(0.5n/(2k-1)\) is an upper bound for the lower value of this game. Furthermore, for any k, we prove attainability of this bound for some \({\bar{n}}={\bar{n}}(k)\) and an arbitrary \(n\ge {\bar{n}}\). As a consequence, we show that any n-sample from [0, 1] can be partitioned into k clusters, such that the value of k-medians clustering criterion does not exceed the bound obtained and this bound is tight for sufficiently large n.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. if k is a part of an instance.

  2. The set of optimal solutions.

References

  1. Abbey, R., Diepenbrock, J., Langville, A.N., Meyer, C.D., Race, S., Zhou, D.: Data clustering via principal direction gap partitioning. CoRR arXiv:1211.4142 (2012)

  2. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)

    Book  Google Scholar 

  3. Ames, B.P.W.: Guaranteed clustering and biclustering via semidefinite programming. Math. Program. 147(1), 429–465 (2014). https://doi.org/10.1007/s10107-013-0729-x

    Article  MathSciNet  MATH  Google Scholar 

  4. Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998). https://doi.org/10.1023/A:1009740529316

    Article  Google Scholar 

  5. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1–11:37 (2011). https://doi.org/10.1145/1970392.1970395

    Article  MathSciNet  MATH  Google Scholar 

  6. Dasgupta, S.: Performance guarantees for hierarchical clustering. In: Kivinen, J., Sloan, R.H. (eds.) Computational Learning Theory, pp. 351–363. Springer, Berlin (2002)

    Chapter  Google Scholar 

  7. de Berg, M., Buchin, K., Jansen, B.M.P., Woeginger, G.: Fine-grained complexity analysis of two classic TSP variants. In: Chatzigiannakis, I., Mitzenmacher, M., Rabani, Y., Sangiorgi, D. (eds.) 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), Leibniz International Proceedings in Informatics (LIPIcs), vol. 55, pp. 5:1–5:14. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2016). https://doi.org/10.4230/LIPIcs.ICALP.2016.5, http://drops.dagstuhl.de/opus/volltexte/2016/6277

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2001)

    MATH  Google Scholar 

  9. Enomoto, H., Oda, Y., Ota, K.: Pyramidal tours with step-backs and the asymmetric traveling salesman problem. Discrete Appl. Math. 87(1–3), 57–65 (1998). https://doi.org/10.1016/S0166-218X(98)00048-1

    Article  MathSciNet  MATH  Google Scholar 

  10. Eremin, I.: Theory of Linear Optimization. Inverse and Ill-Posed Problems, vol. 29. VSP, Utrecht (2002)

    Google Scholar 

  11. Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S.: Fast exact k-means, k-medians and Bregman divergence clustering in 1D. CoRR arXiv:1701.07204 (2017)

  12. Guruswami, V., Indyk, P.: Embeddings and non-approximability of geometric problems. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, pp. 537–538. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. http://dl.acm.org/citation.cfm?id=644108.644198 (2003)

  13. Gutin, G., Punnen, A.P.: The Traveling Salesman Problem and Its Variations. Springer, Boston (2007)

    Book  Google Scholar 

  14. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, STOC ’04, pp. 291–300. ACM, New York, NY, USA (2004). https://doi.org/10.1145/1007352.1007400

  15. Khachay, M., Neznakhina, K.: Generalized Pyramidal Tours for the Generalized Traveling Salesman Problem. LNCS, vol. 10627, pp. 265–277. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71150-8-23

    Book  MATH  Google Scholar 

  16. Khachay, M., Neznakhina, K.: Polynomial Time Solvable Subclass of the Generalized Traveling Salesman Problem on Grid Clusters. LNCS, vol. 10716, pp. 346–355. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_32

    Book  Google Scholar 

  17. Khachay, M., Pankratov, V., Khachay, D.: Attainable best guarantee for the accuracy of k-medians clustering in [0, 1]. In: Optimization and Applications (OPTIMA2017), pp. 322–327. http://ceur-ws.org/Vol-1987/paper-47.pdf (2017)

  18. Klyaus, P.: Generation of testproblems for the traveling salesman problem. Preprint Inst. Mat. Akad. Nauk. BSSR (16) (1976) (in Russian)

  19. Kovaleva, E.V., Mirkin, B.G.: Bisecting k-means and 1D projection divisive clustering: a unified framework and experimental comparison. J. Classif. 32(3), 414–442 (2015). https://doi.org/10.1007/s00357-015-9186-y

    Article  MathSciNet  MATH  Google Scholar 

  20. Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 5:1–5:32 (2010). https://doi.org/10.1145/1667053.1667054

    Article  MathSciNet  MATH  Google Scholar 

  21. Nilsson, M.: Hierarchical clustering using non-greedy principal direction divisive partitioning. Inf. Retr. 5(4), 311–321 (2002). https://doi.org/10.1023/A:1020443310743

    Article  Google Scholar 

  22. Oda, Y., Ota, K.: Algorithmic aspects of pyramidal tours with restricted jump-backs. Interdiscip. Inf. Sci. 7(1), 123–133 (2001). https://doi.org/10.4036/iis.2001.123

    Article  MathSciNet  MATH  Google Scholar 

  23. Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based \(l_1\)-clustering method. Optim. Lett. 7(1), 5–22 (2013). https://doi.org/10.1007/s11590-011-0389-9

    Article  MathSciNet  MATH  Google Scholar 

  24. Schrijver, A.: Theory of Linear and Integer Programming. Wiley, London (1998)

    MATH  Google Scholar 

  25. Tasoulis, S., Tasoulis, D., Plagianakos, V.: Enhancing principal direction divisive clustering. Pattern Recognit. 43(10), 3391–3411 (2010). https://doi.org/10.1016/j.patcog.2010.05.025

    Article  MATH  Google Scholar 

  26. Zeimpekis, D., Gallopoulos, E.: Principal direction divisive partitioning with kernels and k-means steering (2008). https://doi.org/10.1007/978-1-84800-046-9_3

    Chapter  Google Scholar 

Download references

Acknowledgements

This research is supported by RFBR, grants no. 16-07-00266, 16-01-00505, and 17-08-01385.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Khachay.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khachay, M., Khachay, D. Attainable accuracy guarantee for the k-medians clustering in [0, 1]. Optim Lett 13, 1837–1853 (2019). https://doi.org/10.1007/s11590-018-1305-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-018-1305-3

Keywords

Navigation