Parallel approximation for partial set cover

https://doi.org/10.1016/j.amc.2021.126358Get rights and content

Highlights

  • A parallel local ratio algorithm for the minimum partial set cover problem.

  • A parallel algorithm for the minimum power partial cover problem.

  • Run in logarithmic rounds.

  • Achieve approximation ratios close to that of best known sequential algorithms.

Abstract

In a minimum partial set cover problem (MinPSC), given a ground set E with n elements, a collection S of subsets of E with |S|=m, a cost function c:SR+, and an integer kn, the goal of MinPSC is to find a minimum cost sub-collection of S that covers at least k elements of E. In this paper, we design a parallel algorithm for MinPSC which yields a solution with approximation ratio at most f12ε in O(1εlogmnε) rounds, where f is the maximum number of sets containing a common element, and 0<ε<1/2 is a constant. We also design a parallel algorithm for a special MinPSC problem, the minimum power partial cover problem (MinPPC), which achieves approximation ratio at most (3+2ε)α12ε in O(1εlogmnεlog2m) rounds, where α1 is the attenuation factor of power.

Introduction

In the minimum set cover problem (MinSC), given a set E of n elements, a collection of sets S2E with |S|=m, a cost function c:SR+, the goal of MinSC is to find a minimum cost sub-collection FS such that all elements are covered, i.e, S:SFS=E. MinSC is a classic combinatorial optimization problem which has a lot of applications in the real world, plays an important role in the field of computational complexity theory and approximation algorithms [27].

In real world applications, it is often too costly to satisfy all covering requirements. Such a consideration leads to the minimum partial set cover problem (MinPSC). Besides E,S,c, an integer kn is given, the goal of MinPSC is to find a minimum cost sub-collection of S that covers at least k elements of E. Partial cover attracts a lot of studies in the field of data science and facility location in terms of “outlier” [8], [21].

Note that most existing algorithms for partial cover are centralized and sequential. But in this era of big data, parallel algorithms are more welcoming. Although there are a lot of studies on parallel algorithms for MinSC, parallel studies on its partial version MinPSC are rare. Gandhi designed a parallel algorithm for MinPSC in [9] which achieves approximation ratio f1ε in (1+flog1ε)(1+logn) rounds, where f is the maximum frequency of elements, i.e., the maximum number of sets containing a common element. Note that f could be as large as |S|, so the running time of Gandhi’s algorithm might not be satisfactory. Such an observation motivates us to study NC parallel algorithm for MinPSC with running time independent of f.

Note that although the approximation ratio f is tight under the unique games conjecture [14], it can be large in a general setting. A question is: is it possible to obtain better approximation in some special setting? For example, considering coverage problems with some geometric properties? This question motivates us to study parallel algorithm for the minimum power partial cover problem (MinPPC). Suppose X is a set of points and S is a set of sensors on the plane, each sensor can adjust its power, the covering range of a sensor s with power p(s) is a disk centered at s which has radius r(s) satisfyingp(s)=b·r(s)α,where b,α are constants, and α is called the attenuation factor of power. We use ce(D) and r(D) to denote the center and the radius of disk D, respectively. So, Disk(ce(D),r(D)) represents the disk with center ce(D) and radius r(D). Given an integer k|X|, the MinPPC problem is to determine the power assignment on each sensor such that at least k points are covered and the total power consumption is the minimum. This problem is motivated by the intention to extend the lifetime of wireless sensor networks under limited energy supply [18].

The MinPPC problem can be viewed as a special case of the MinPSC problem. Note that in a MinPPC instance, an optimal power assigment must be such that for each sensor with positive power, there is a point on the boundary of its covering range. For a sensor sS and a point xX, we call disk Disk(s,sx) as a canonical disk, and set its cost to be c(Disk(s,sx))=bsxα, and sx is the Euclidean distance between s and x. Denote the collection of canonical disks as D, and view each disk DD as a subset of X consisting of those points covered by D. Then, a MinPPC instance can be transformed into a MinPSC instance (X,D,c,k). Since each sensor can be assigned only one power, there is an extra constraint for the transformed instance: among those disks corresponding to a same sensor, at most one disk can be chosen. An optimal solution to the transformed MinPSC instance trivially satisfies such an extra constraint: if two disks corresponding to a same sensor coexist in a solution, then removing the one with smaller radius keeps the feasibility and reduces the power. So the MinPPC problem is a special case of the MinPSC problem.

For the MinPSC instance obtained by the above reduction, the maximum frequency f is equal to the number of sensors, which is too large to be a good approximation factor. In [18], a 3α-approximation algorithm was obtained for MinPPC using local ratio technique. This algorithm is centralized. How to parallelize it needs new insight.

Kearns [13] was the first to study the MinPSC problem and presented a greedy algorithm with approximation ratio 2H(n)+3, where H(n) is the nth Harmonic number and lnnH(n)lnn+1. Later, Slavik [26] gave an improved greedy algorithm with approximation ratio H(min{k,Δ}). Bar-Yehuda [1] obtained an f-approximation using local ratio method. Using a primal-dual method, Gandhi et al. [9] also obtained f-approximation. Könemann et al. [16] obtained a (43+ε)f-approximation for a generalized version of MinPSC, in which each element has a profit, and the goal is to select a minimum cost subcollection of sets such that the total profit of covered elements is beyond some threshold. Inamdar and Varadarajan [11] showed that a feasible solution to the standard linear program for set cover can be rounded to a (2β+2)-approximation for MinPSC, where β is the integrality gap for the set cover LP. These are centralized algorithms. To the best of our knowledge, there is only one paper studying parallel algorithm for MinPSC [9], in which Gandhi et al. presented a parallel algorithm with approximation ratio f1ε in (1+flog1ε)(1+logn) rounds.

Compared with rare studies on parallel MinPSC algorithm, there are a lot of works studying parallel algorithms or distributed algorithms for MinSC. Berger et al. [3] provided the first parallel algorithm using a bucketing approach, obtaining a (1+ε)H(n)-approximation in O(log5M) rounds, where M is the sum of sizes of the sets. Rajagopalan and Vazirani [25] improved the number of rounds to O(log3(Mn)) at the cost of a larger approximation ratio 2(1+ε)H(n). Blelloch et al. [7] further improved this result to a (1+ε)H(n)-approximation in O(log2M) rounds. Khuller et al. [15] presented a parallel (f+ε)-approximation algorithm in O(flognlog1/ε) rounds, where the maximum frequency f is assumed to be a constant. Koufogiannakis and Young [17] and Harvey et al. [10] independently designed distributed f-approximation algorithms in polylogarithmic communication rounds. Bar-Yehuda et al. [2] used a distributed local ratio method to design a (2+ε)-approximation algorithm for the minimum-weight vertex cover problem in O(logΔ/εloglogΔ) communication rounds, where Δ is the maximum degree of the graph and 0<ε<1 is a constant.

The MinPSC problem is a special case of the minimum submodular cover problem (MinSMC). Given a monotone nondecreasing submodular function g:2VR+, an integer kg(V), the goal of MinSMC is to find a subset AV with the minimum cost such that g(A)k. MinPSC is a special MinSMC since for any SS, the function g(S)=min{|SSS|,k} is monotone nondecreasing and submodular. For MinSMC, a greedy algorithm [28] can achieve approximation ratio H(γ) with γ=maxvVf(v). Distributed algorithms for the cardinality version of the MinSMC problem emerge recently. Mirzasoleiman et.al [23] proposed a distributed algorithm which yields a solution of size 2α|OPT|+72log(k)|OPT|min(M,α|OPT|) in log(α|OPT|)+36min(M,α|OPT|)log(k)/α+1 communication rounds, where M denotes the number of machines, 0<α1 is a constant, and OPT is an optimal solution. Afterwards, Mirzasoleiman et.al [24] proposed a faster distributed algorithm with size at most ln(k)|OPT|/(1ε) in at most log3/2(n/(M|OPT))log(γ)/ε+log(k) rounds. Note that these algorithms are distributed, and their approximation ratios are measured in terms of lnk. While in this paper, we aim at a parallel algorithm which can be implemented in an NC model, whose approximation ratio is measured in terms of f.

For the minimum power (full) cover problem (MinPC), Biló et al. [6] presented a PTAS. There are a lot of studies on the minimum power multi-cover problem (MinPMC), in which points are required to be covered multiple times, constant approximation ratios were obtained [4], [5]. These are works on the full cover version. For the partial version of the minimum power cover problem MinPPC, studies are rare. Li et al. presented a 3α-approximation algorithm for MinPPC using local ratio technique [18] and primal dual method [19], respectively. Liang et al. [20] studied the minimum power partial multi-cover problem on a line (MinPPMC-Line) and presented a polynomial-time exact algorithm when the maximum covering requirement is a constant. As far as we know, there is no study of parallel algorithm for the minimum power partial cover problem.

In this paper, we first design a parallel algorithm for MinPSC, which achieves approximation ratio at most f12ε in O(1εlogmnε) rounds, where 0<ε<1/2 is a constant.

Compared with the parallel algorithm in [9], which has approximation ratio f1ε, the number of rounds is (1+flog1ε))(1+logn). Since f can be as large as m, the running time of [9] might not be logarithmic in the input size. Our algorithm has the advantage that its running time is logarithm of the input size which is independent of f.

The method used in our algorithm is inspired by the sequential local-ratio algorithm in [1]. To parallelize this sequential algorithm, we decompose the cost function into a series of cost functions depending on a varying parameter α. The key trick is to let α increase as a geometric progression, so that the decrease of cost can be fast, and thus the number of rounds could be controlled within a logarithm. In the sequential local ratio algorithm of [1], a set is chosen into the solution when its cost is decreased to zero. But using the above trick, a cost might not be decreased to zero. Our strategy is to select S into the solution as soon as its cost is less than εc(S). This is where ε comes into the approximation ratio and the running time.

Then we design a parallel algorithm for MinPPC with approximation ratio (3+2ε)α12ε in O(logmnεlog2mε) rounds, where α1 is the attenuation factor of power. Geometric property plays a crucial role in the analysis. This is the first parallel algorithm for MinPPC.

Section snippets

Parallel algorithm for MinPSC

Denote by I=(E,S,c,k) a MinPSC instance. For any sub-collection of sets FS, denote by U(F)=SFS the set of elements covered by F. The algorithm will guess the largest cost of a set in OPT, where OPT is an optimal solution. Suppose S0 is the guessed set. The residual instance with respect to S0 is Ires(S0)=(Eres,Sres,cres,kres), where Eres=ES0, Sres=S{SS:c(S)>c(S0)}, cres(S)=c(S) for SSres and kres=max{0,k|S0|}. The algorithm is executed parallelly on m machines, where m=|S|. Each machine

Parallel algorithm for MinPPC

In this section, we present a parallel algorithm for the minimum power partial cover problem (MinPPC). In [18], Li et al. proposed a 3α-approximation algorithm for MinPPC. They first guess the largest cost in OPT. Then for the residual instance, a feasible solution F is obtained using local ratio method. A maximal independent set is found in F by a specific rule, and expanding disks in this maximal independent set yields a feasible solution. The main purpose of this section is to parallelize

Conclusion

In this paper, we design a parallel algorithm for MinPSC to obtain a solution with approximation ratio at most f12ε in O(logmnεε) rounds, where 0<ε<1/2 is an arbitrarily constant. For the minimum power partial cover problem (MinPPC), we design a parallel algorithm with approximation ratio (3+2ε)α12ε in O(logmnεlog2mε) rounds, where α1 is the attenuation factor of power. How to obtain a parallel algorithm with approximation ratio exactly f might be an interesting topic.

Note that our method

Acknowledgments

This research work is supported in part by NSFC (11901533, U20A2068, 11771013), and ZJNSFC (LD19A010001).

References (28)

  • V. Bilò et al.

    Geometric clustering to minimize the sum of cluster sizes

    ESA

    (2005)
  • G.E. Blelloch et al.

    Linear-work greedy parallel approximate set cover and variants

    SPAA

    (2011)
  • M. Charikar et al.

    Algorithms for facility location with outliers

    SODA

    (2001)
  • N.J.A. Harvey et al.

    Greedy and local ratio algorithms in the mapreduce model

    CoRR

    (2018)
  • Cited by (3)

    View full text