Abstract
The Bregman k-median problem is defined as follows. Given a Bregman divergence D φ and a finite set \(P \subseteq {\mathbb R}^d\) of size n, our goal is to find a set C of size k such that the sum of errors cost(P,C) = ∑ p ∈ P min c ∈ C D φ (p,c) is minimized. The Bregman k-median problem plays an important role in many applications, e.g., information theory, statistics, text classification, and speech processing. We study a generalization of the kmeans++ seeding of Arthur and Vassilvitskii (SODA ’07). We prove for an almost arbitrary Bregman divergence that if the input set consists of k well separated clusters, then with probability \(2^{-{\mathcal O}(k)}\) this seeding step alone finds an \({\mathcal O}(1)\)-approximate solution. Thereby, we generalize an earlier result of Ostrovsky et al. (FOCS ’06) from the case of the Euclidean k-means problem to the Bregman k-median problem. Additionally, this result leads to a constant factor approximation algorithm for the Bregman k-median problem using at most \(2^{{\mathcal O}(k)}n\) arithmetic operations, including evaluations of Bregman divergence D φ .
Keywords
- Mahalanobis Distance
- Dissimilarity Measure
- Input Instance
- Approximation Guarantee
- Constant Factor Approximation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Research supported by Deutsche Forschungsgemeinschaft (DFG), grant BL-314/6-1.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC ’98), pp. 106–113 (1998)
Kolliopoulos, S.G., Rao, S.: A nearly linear-time approximation scheme for the Euclidean κ-median problem. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 378–389. Springer, Heidelberg (1999)
Bădoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02), pp. 250–257. Association for Computing Machinery (2002)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC’04), pp. 291–300. Association for Computing Machinery (2004)
Kumar, A., Sabharwal, Y., Sen, S.: Linear time algorithms for clustering problems in any dimensions. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1374–1385. Springer, Heidelberg (2005)
Chen, K.: On k-median clustering in high dimensions. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’06), pp. 1177–1185. Society for Industrial and Applied Mathematics (2006)
Matoušek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24(1), 61–84 (2000)
Fernandez de la Vega, W., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC’03), pp. 50–58. Association for Computing Machinery (2003)
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’04), pp. 454–462. IEEE Computer Society, Los Alamitos (2004)
Chen, K.: On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing 39(3), 923–947 (2009)
Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd ACM Symposium on Computational Geometry (SCG ’07), pp. 11–18. Association for Computing Machinery (2007)
Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric distance measures. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’08), pp. 799–808. Society for Industrial and Applied Mathematics (2008); Full version to appear in ACM Transactions on Algorithms (special issue on SODA ’08).
Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for Bregman divergences. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’09), pp. 1088–1097. Society for Industrial and Applied Mathematics (2009)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proceedings of the 50th Symposium on Foundations of Computer Science (FOCS ’09). IEEE Computer Society Press, Los Alamitos (2009) (to appear)
Vattani, A.: k-means requires exponetially many iterations even in the plane. In: Proceedings of the 25th Annual Symposium on Computational Geometry (SCG ’09), pp. 324–332. Association for Computing Machinery (2009)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’07), pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of Lloyd-type methods for the k-means problem. In: Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS ’06), pp. 165–176. IEEE Computer Society, Los Alamitos (2006)
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Proceedings of the 12th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX ’09), pp. 15–28. Springer, Heidelberg (2009)
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)
Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as a Bregman predictor. IEEE Transactions on Information Theory 51(7), 2664–2669 (2005)
Manthey, B., Röglin, H.: Worst-case and smoothed analysis of k-means clustering with Bregman divergences. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1024–1033. Springer, Heidelberg (2009)
Nock, R., Luosto, P., Kivinen, J.: Mixed Bregman clustering with approximation guarantees. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 154–169. Springer, Heidelberg (2008)
Sra, S., Jegelka, S., Banerjee, A.: Approximation algorithms for Bregman clustering, co-clustering and tensor clustering. Technical Report MPIK-TR-177, Max Planck Institure for Biological Cybernetics (2008)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Proceedings of the 7th Pacific Symposium on Biocomputing (PSB ’02), pp. 6–17. World Scientific, Singapore (2002)
Bregman, L.M.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics 7, 200–217 (1967)
Mahalanobis, P.C.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, vol. 2(1), pp. 49–55. Indian National Science Academy (1936)
Ackermann, M.R.: Algorithms for the Bregman k-Median Problem. PhD thesis, University of Paderborn, Department of Computer Science (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ackermann, M.R., Blömer, J. (2010). Bregman Clustering for Separable Instances. In: Kaplan, H. (eds) Algorithm Theory - SWAT 2010. SWAT 2010. Lecture Notes in Computer Science, vol 6139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13731-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-13731-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13730-3
Online ISBN: 978-3-642-13731-0
eBook Packages: Computer ScienceComputer Science (R0)