Approximation algorithms for the metric maximum clustering problem with given cluster sizes

https://doi.org/10.1016/S0167-6377(02)00235-3Get rights and content

Abstract

The input to the METRIC MAXIMUM CLUSTERING PROBLEM WITH GIVEN CLUSTER SIZES consists of a complete graph G=(V,E) with edge weights satisfying the triangle inequality, and integers c1,…,cp that sum to |V|. The goal is to find a partition of V into disjoint clusters of sizes c1,…,cp, that maximizes the sum of weights of edges whose two ends belong to the same cluster. We describe approximation algorithms for this problem.

Introduction

In this paper we approximate the METRIC MAXIMUM CLUSTERING PROBLEM WITH GIVEN CLUSTER SIZES. The input for the problem consists of a complete graph G=(E,V), V={1,…,n}, with non-negative edge weights w(i,j), (i,j)∈E, that satisfy the triangle inequality. In the general case of the problem, cluster sizes c1c2⩾⋯⩾cp⩾1 such that c1+⋯+cp=n are given. In the uniform case, c1=c2=⋯=cp. The problem is to partition V into sets of the given sizes, so that the total weight of edges inside the clusters is maximized. See [6] and its references for some applications.

Hassin and Rubinstein [3] gave a approximation algorithm whose error ratio is bounded by 1/22≈0.353 for the general problem. We improve this result for the case in which cluster sizes are large. In particular, when the minimum cluster size increases, the performance guarantee of our algorithm increases asymptotically to 0.375.

Feo and Khellaf [2] treated the uniform case and developed a polynomial algorithm whose error ratio is bounded by c/2(c−1) or (c+1)/2c, where the cluster size is c=n/p, and it is even or odd, respectively. The bound decreases to 1/2 as c approaches ∞. The algorithm's time complexity is dominated by computation of a maximum weight perfect matching. (Without the triangle inequality assumption, the bound is 1/(c−1) or 1/c, respectively, but Feo, Goldschmidt and Khellaf [1] improved the bound to 12 in the cases of c=3 and 4.) We describe an alternative algorithm for the uniform case that achieves the ratio of 1/2 and has a lower O(n2) complexity.

Hassin, Rubinstein and Tamir [4] generalized the algorithm of [2] and obtained a bound of 12 for computing k clusters of size c each (1⩽kn/c) with maximum total weight. Our discussion concerning the uniform case does not apply to this generalization.

For E′⊂E we denote by w(E′) the total weight of edges in E′. For V′⊆V we denote by E(V′) the edge set of the subgraph induced by V′. To simplify the presentation, we denote the weight w(E(V′)) of the edges in the subgraph induced by a vertex set V′ by w(V′). We denote by opt the optimal solution value, and by apx the approximate value returned by a given approximation algorithm. A p-matching is a set of p vertex-disjoint edges in a graph. A p-matching with p=⌊n/2⌋ is called perfect. A greedy p-matching is obtained by sorting the edges in non-increasing order of their weights, and then scanning the list and selecting edges as long as they are vertex-disjoint to the previously selected edges and their number does not exceed p. A greedy perfect matching has p=⌊n/2⌋.

Section snippets

A 38-approximation algorithm

Lemma 1

Let Mg be a greedy k-matching. Let Mbe an arbitrary 2k-matching. Then, for i=1,…,k, the weight of the ith largest edge in Mg is greater than or equal to the weight of the (2i−1)st largest edge in M′.

Proof

Let e1,…,ek be the edges of M′ in non-increasing order of weight. By the greedy construction, every edge of e′∈M′⧹Mg is incident to an edge of eMg with w(e)⩾w(e′). Since every edge of Mg can take the above role at most twice, it follows that for e1,…,e2i−1 we use at least i edges of Mg all of

The uniform case

We now consider the uniform case, that is ci=c for i=1,…,p. Consider the set of partitions of V into clusters of size c each. A random solution is obtained by randomly (uniformly) selecting such a partition. The following theorem states a bound on the expected value of a random solution. When the cluster sizes are not identical, Example 1 in Section 4 shows that the expected weight of a random solution is not a good approximation. Also note that in contrast to a similar bound for the related

Some bad examples

In this section, we propose some natural algorithms and provide for each of them an instance for which the algorithm performs badly.

In Section 3 we have shown that the expected weight of a random solution is at least 12opt when the clusters have a common size. The following example shows that when the cluster sizes are not identical the expected weight of a random solution may be very small relative to opt.

Example 1

Consider an instance with weights w(1,j)=1,j=2,…,n, and w(i,j)=0 otherwise. Let c1=2 and c

References (6)

There are more references available in the full text version of this article.

Cited by (2)

  • Sphere-separable partitions of multi-parameter elements

    2008, Discrete Applied Mathematics
    Citation Excerpt :

    The simplicity offered by the sphere structure may enable the development of useful approximation algorithms, cf. [7].

View full text