An improved approximation algorithm for the minimum common integer partition problem☆
Introduction
The minimum common integer partition (MCIP) problem was introduced to the computational biology community by Chen et al. [7], formulated from their work on ortholog assignment and DNA fingerprint assembly. Mathematically, a partition of a positive integer x is a multiset of positive integers such that , where each is called a part of the partition of x [2], [3]. For example, is a partition of ; so is . A partition of a multiset X of positive integers is the multiset union of the partition for all x of X, i.e. . For example, as is a partition of and is a partition of , is a partition for .
Given a collection of multisets (), a multiset S is a common integer partition (CIP) for them if S is an integer partition of every multiset . For example, when and and , is a CIP for them since is also a partition for : , , and . The minimum common integer partition (MCIP) problem is defined as to find a CIP for with the minimum cardinality. For example, one can verify that, for the above and , is a minimum cardinality CIP. We use k-MCIP to denote the restricted version of the MCIP problem when the number of input multisets is fixed to be k.
For simplicity, we denote the optimal, i.e. a minimum cardinality, CIP for as OPT, or simply OPT when the input multisets are clear from the context. Analogously, we denote the CIP for produced by an algorithm A as CIP, or simply CIPA; without the algorithm subscript, we use CIP to denote any feasible common integer partition.
We mentioned earlier that the MCIP problem was introduced by Chen et al. [7], formulated out of ortholog assignment and DNA fingerprint assembly. The interested readers may refer to their paper for more detailed descriptions and the mappings between the problems. More recently, another application of the MCIP problem in similarity comparison between two unlabeled pedigrees was presented in [10]. Pedigrees, or commonly known as family trees, record the occurrence and appearance (or phenotypes) of a particular gene or organism and its ancestors from one generation to the next. They are important to geneticists for linkage analysis, as with a valid pedigree the recombination events can be deduced more accurately [8], or disease loci can be mapped consistently [12], [13]. Jiang et al. [10] considered the isomorphism and similarity comparison problems for two-generation pedigrees, and formulated them as the minimum common integer pair partition (MCIPP) problem, which generalizes the MCIP problem. By exploiting certain structural properties of the optimal solutions for the 2-MCIP problem, they were able to show that their MCIPP problem is also fixed-parameter tractable [10].
For an integer , its number of integer partitions increases very rapidly with x. For example, the integer 3 has three partitions, namely {3}, , and ; the integer 4 has five partitions, namely {4}, , , , and ; while the integer 10 has 190,569,292 partitions according to [2].
Given a collection of multisets (), they have a CIP if and only if they have the same summation over their elements. Multisets with this property are called related [6], and we assume throughout the paper that the multisets in any instance of MCIP are related, as the verification takes only linear time.
One can see that the 2-MCIP problem generalizes the well-known subset sum problem [9], based on the following observation: Given a positive integer number x and a multiset of positive integers , there exists a sub-multiset of X summing to x if and only if for the two multisets and , . Thus 2-MCIP is NP-hard [6]. Chen et al. showed that 2-MCIP is APX-hard [6], via a linear reduction (also called an approximation preserving reduction) from the maximum bounded 3-dimensional matching problem [11]. After the preliminary version of this paper, You et al. presented a fixed-parameter tractable (FPT) algorithm for 2-MCIP in [15].
Let denote the total number of integers in the k-MCIP problem. For the positive algorithmic results, Chen et al. presented a linear time 2-approximation algorithm and an -time 5/4-approximation algorithm for 2-MCIP [6], based on a heuristic for the maximum weighted set packing problem [11]. The 5/4-approximation can be taken as a subroutine to design a 0.625k-approximation algorithm for k-MCIP (when k is even; when k is odd, the approximation ratio is ) [14]. Woodruff developed a framework for capturing the frequencies of the integers across the input multisets and presented a randomized -time approximation algorithm for k-MCIP, with a worst-case performance ratio [14]. The basic idea is, when there are not too many distinct integers in the input multisets, most of the low frequency integers will have to be split into at least two parts in any common partition. Inspired by this idea, Zhao et al. [16] formulated the k-MCIP problem into a flow decomposition problem in an acyclic k-layer network with the goal to find a minimum number of directed simple paths from the source to the sink. Since this minimum number can be bounded by the number of arcs in the network according to the well-known flow decomposition theorem [1], Zhao et al. presented a scheme to reduce the number of arcs in the network, resulting in a de-randomized approximation algorithm with a performance ratio , which is the currently best.
In this paper, we present a polynomial-time 6/5-approximation algorithm for 2-MCIP. Subsequently, we obtain a 0.6k-approximation algorithm for k-MCIP when k is even (when k is odd, the approximation ratio is ). It is worth pointing out that the ratio of 0.5625k in [16] is asymptotic, that it holds for only sufficiently large k; while our ratio of 0.6k is absolute, that it holds for all .
The rest of the paper is organized as follows: In the next section, we introduce some known bounds on the cardinality of the optimal CIPs for 2-MCIP first, then present our 6/5-approximation algorithm and its performance analysis, assuming an important inequality stated in Lemma 4. The entire Section 3 is devoted to the proof of Lemma 4, where multiple amortized analyses are employed. We note that while conceptually simple, some of the amortized analyses are technical and involved, with a number of notations set up for token counting purposes. In Section 4, we extend the 6/5-approximation algorithm to a 0.6k-approximation for k-MCIP when (a -approximation when k is odd). We conclude the paper with some future work in Section 5.
Section snippets
A 6/5-approximation algorithm for 2-MCIP
In this section, we deal with the 2-MCIP problem. For ease of presentation, we denote the two multisets of positive integers in an instance as and , and assume without loss of generality that they are related. Recall that, OPT denotes the optimal solution — the minimum cardinality CIP for , and CIP denotes the solution CIP produced by the algorithm A.
Proof of Lemma 4
This section is devoted to the proof of Lemma 4, stating that . By Eq. (2.4), it is sufficient to show that , which is stated as Lemma 10. To this purpose, we consider the bipartite subgraph of the graph H induced by the vertex subsets and P. By associating two tokens for each vertex of and one token for each vertex of , we re-distribute these tokens to the vertices of P through adjacencies by distinguishing five
A 0.6k-approximation algorithm for k-MCIP
Given an instance of the k-MCIP problem , we first divide these k multisets into pairs , , plus the last multiset if k is odd. Next, we run the algorithm Apx65 on each pair to obtain a solution , for , plus if k is odd. We continue this dividing and running Apx65 on if , and repeat until we have only one multiset left, denoted as . Clearly, is a common
Conclusions
We presented an improved -approximation algorithm for the 2-MCIP problem; the previous best approximation algorithm has a performance ratio of and was designed by Chen et al. in 2006 [5], [6]. Subsequently, we obtained an absolute 0.6k-approximation algorithm for k-MCIP when k is even (when k is odd, the approximation ratio is ). It is worth pointing out that the ratio of 0.5625k in [16] is asymptotic, that it holds for only sufficiently large k1
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
We are very grateful to the anonymous reviewers for their many helpful comments and suggestions to improve the presentation.
This research is supported by the NSERC Canada.
References (16)
Maximum bounded 3-dimensional matching is MAX SNP-complete
Inf. Process. Lett.
(1991)- et al.
Fixed-parameter tractability for minimum tree cut/paste distance and minimum common integer partition
Theor. Comput. Sci.
(2020) - et al.
A network flow approach to the minimum common integer partition problem
Theor. Comput. Sci.
(2006) - et al.
Network Flows: Theory, Algorithm, and Applications
(2005) The Theory of Partitions
(1976)- et al.
The Integer Partitions
(2004) A approximation for maximum weight independent set in d-claw free graphs
- et al.
On the minimum common integer partition problem
Cited by (0)
- ☆
An extended abstract appears in ISAAC 2014.