Abstract
We consider the famous k-medians clustering problem in the context of a zero-sum two-player game, which is defined as follows. For given integers \(n>1\) and \(k>1\), strategy sets of the first and second players consist of n-samples drawn from the unit segment [0, 1] and partitions of the index set \(\{1,\ldots , n\}\) into k nonempty subsets (clusters), respectively. As a payoff, we take a loss function of the k-medians clustering evaluated in terms of the sample chosen by the first player and the partition taken by the second one. Actually, the payoff coincides with the sum of distances between points of the sample and the nearest center of a cluster. It is easy to verify that this game has no value. In this paper, for any \(n>1\) and \(k>1\), we show that \(0.5n/(2k-1)\) is an upper bound for the lower value of this game. Furthermore, for any k, we prove attainability of this bound for some \({\bar{n}}={\bar{n}}(k)\) and an arbitrary \(n\ge {\bar{n}}\). As a consequence, we show that any n-sample from [0, 1] can be partitioned into k clusters, such that the value of k-medians clustering criterion does not exceed the bound obtained and this bound is tight for sufficiently large n.







Similar content being viewed by others
Notes
if k is a part of an instance.
The set of optimal solutions.
References
Abbey, R., Diepenbrock, J., Langville, A.N., Meyer, C.D., Race, S., Zhou, D.: Data clustering via principal direction gap partitioning. CoRR arXiv:1211.4142 (2012)
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)
Ames, B.P.W.: Guaranteed clustering and biclustering via semidefinite programming. Math. Program. 147(1), 429–465 (2014). https://doi.org/10.1007/s10107-013-0729-x
Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998). https://doi.org/10.1023/A:1009740529316
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1–11:37 (2011). https://doi.org/10.1145/1970392.1970395
Dasgupta, S.: Performance guarantees for hierarchical clustering. In: Kivinen, J., Sloan, R.H. (eds.) Computational Learning Theory, pp. 351–363. Springer, Berlin (2002)
de Berg, M., Buchin, K., Jansen, B.M.P., Woeginger, G.: Fine-grained complexity analysis of two classic TSP variants. In: Chatzigiannakis, I., Mitzenmacher, M., Rabani, Y., Sangiorgi, D. (eds.) 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), Leibniz International Proceedings in Informatics (LIPIcs), vol. 55, pp. 5:1–5:14. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2016). https://doi.org/10.4230/LIPIcs.ICALP.2016.5, http://drops.dagstuhl.de/opus/volltexte/2016/6277
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2001)
Enomoto, H., Oda, Y., Ota, K.: Pyramidal tours with step-backs and the asymmetric traveling salesman problem. Discrete Appl. Math. 87(1–3), 57–65 (1998). https://doi.org/10.1016/S0166-218X(98)00048-1
Eremin, I.: Theory of Linear Optimization. Inverse and Ill-Posed Problems, vol. 29. VSP, Utrecht (2002)
Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S.: Fast exact k-means, k-medians and Bregman divergence clustering in 1D. CoRR arXiv:1701.07204 (2017)
Guruswami, V., Indyk, P.: Embeddings and non-approximability of geometric problems. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, pp. 537–538. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. http://dl.acm.org/citation.cfm?id=644108.644198 (2003)
Gutin, G., Punnen, A.P.: The Traveling Salesman Problem and Its Variations. Springer, Boston (2007)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, STOC ’04, pp. 291–300. ACM, New York, NY, USA (2004). https://doi.org/10.1145/1007352.1007400
Khachay, M., Neznakhina, K.: Generalized Pyramidal Tours for the Generalized Traveling Salesman Problem. LNCS, vol. 10627, pp. 265–277. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71150-8-23
Khachay, M., Neznakhina, K.: Polynomial Time Solvable Subclass of the Generalized Traveling Salesman Problem on Grid Clusters. LNCS, vol. 10716, pp. 346–355. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_32
Khachay, M., Pankratov, V., Khachay, D.: Attainable best guarantee for the accuracy of k-medians clustering in [0, 1]. In: Optimization and Applications (OPTIMA2017), pp. 322–327. http://ceur-ws.org/Vol-1987/paper-47.pdf (2017)
Klyaus, P.: Generation of testproblems for the traveling salesman problem. Preprint Inst. Mat. Akad. Nauk. BSSR (16) (1976) (in Russian)
Kovaleva, E.V., Mirkin, B.G.: Bisecting k-means and 1D projection divisive clustering: a unified framework and experimental comparison. J. Classif. 32(3), 414–442 (2015). https://doi.org/10.1007/s00357-015-9186-y
Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 5:1–5:32 (2010). https://doi.org/10.1145/1667053.1667054
Nilsson, M.: Hierarchical clustering using non-greedy principal direction divisive partitioning. Inf. Retr. 5(4), 311–321 (2002). https://doi.org/10.1023/A:1020443310743
Oda, Y., Ota, K.: Algorithmic aspects of pyramidal tours with restricted jump-backs. Interdiscip. Inf. Sci. 7(1), 123–133 (2001). https://doi.org/10.4036/iis.2001.123
Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based \(l_1\)-clustering method. Optim. Lett. 7(1), 5–22 (2013). https://doi.org/10.1007/s11590-011-0389-9
Schrijver, A.: Theory of Linear and Integer Programming. Wiley, London (1998)
Tasoulis, S., Tasoulis, D., Plagianakos, V.: Enhancing principal direction divisive clustering. Pattern Recognit. 43(10), 3391–3411 (2010). https://doi.org/10.1016/j.patcog.2010.05.025
Zeimpekis, D., Gallopoulos, E.: Principal direction divisive partitioning with kernels and k-means steering (2008). https://doi.org/10.1007/978-1-84800-046-9_3
Acknowledgements
This research is supported by RFBR, grants no. 16-07-00266, 16-01-00505, and 17-08-01385.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khachay, M., Khachay, D. Attainable accuracy guarantee for the k-medians clustering in [0, 1]. Optim Lett 13, 1837–1853 (2019). https://doi.org/10.1007/s11590-018-1305-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-018-1305-3