An improved approximation algorithm for the minimum common integer partition problem

doi:10.1016/j.ic.2021.104784

Information and Computation

Volume 281, December 2021, 104784

https://doi.org/10.1016/j.ic.2021.104784 Get rights and content

Abstract

Given a collection of multisets ${X_{1}, X_{2}, \dots, X_{k}}$ ( $k \geq 2$ ) of positive integers, a multiset S is a common integer partition for them if S is an integer partition of every multiset $X_{i}, 1 \leq i \leq k$ . The minimum common integer partition (k-MCIP) problem is defined as to find a CIP for ${X_{1}, X_{2}, \dots, X_{k}}$ with the minimum cardinality. We present a $\frac{6}{5}$ -approximation algorithm for the 2-MCIP problem, improving the previous best algorithm of performance ratio $\frac{5}{4}$ designed by Chen et al. in 2006. We then extend it to obtain an absolute 0.6k-approximation algorithm for k-MCIP when k is even (when k is odd, the approximation ratio is $0.6 k + 0.4$ ).

Introduction

The minimum common integer partition (MCIP) problem was introduced to the computational biology community by Chen et al. [7], formulated from their work on ortholog assignment and DNA fingerprint assembly. Mathematically, a partition of a positive integer x is a multiset $σ (x) = {a_{1}, a_{2}, \dots, a_{t}}$ of positive integers such that $a_{1} + a_{2} + \dots + a_{t} = x$ , where each $a_{i}$ is called a part of the partition of x [2], [3]. For example, ${3, 2, 2, 1}$ is a partition of $x = 8$ ; so is ${6, 1, 1}$ . A partition of a multiset X of positive integers is the multiset union of the partition $σ (x)$ for all x of X, i.e. $σ (X) = ⊎_{x \in X} σ (x)$ . For example, as ${3, 2, 2, 1}$ is a partition of $x_{1} = 8$ and ${3, 2}$ is a partition of $x_{2} = 5$ , ${3, 3, 2, 2, 2, 1}$ is a partition for $X = {8, 5}$ .

Given a collection of multisets ${X_{1}, X_{2}, \dots, X_{k}}$ ( $k \geq 2$ ), a multiset S is a common integer partition (CIP) for them if S is an integer partition of every multiset $X_{i}, 1 \leq i \leq k$ . For example, when $k = 2$ and $X_{1} = {8, 5}$ and $X_{2} = {6, 4, 3}$ , ${3, 3, 2, 2, 2, 1}$ is a CIP for them since ${3, 3, 2, 2, 2, 1}$ is also a partition for $X_{2} = {6, 4, 3}$ : $3 + 3 = 6$ , $2 + 2 = 4$ , and $2 + 1 = 3$ . The minimum common integer partition (MCIP) problem is defined as to find a CIP for ${X_{1}, X_{2}, \dots, X_{k}}$ with the minimum cardinality. For example, one can verify that, for the above $X_{1} = {8, 5}$ and $X_{2} = {6, 4, 3}$ , ${6, 3, 2, 2}$ is a minimum cardinality CIP. We use k-MCIP to denote the restricted version of the MCIP problem when the number of input multisets is fixed to be k.

For simplicity, we denote the optimal, i.e. a minimum cardinality, CIP for ${X_{1}, X_{2}, \dots, X_{k}}$ as OPT $(X_{1}, X_{2}, \dots, X_{k})$ , or simply OPT when the input multisets are clear from the context. Analogously, we denote the CIP for ${X_{1}, X_{2}, \dots, X_{k}}$ produced by an algorithm A as CIP $_{A} (X_{1}, X_{2}, \dots, X_{k})$ , or simply CIP_A; without the algorithm subscript, we use CIP to denote any feasible common integer partition.

We mentioned earlier that the MCIP problem was introduced by Chen et al. [7], formulated out of ortholog assignment and DNA fingerprint assembly. The interested readers may refer to their paper for more detailed descriptions and the mappings between the problems. More recently, another application of the MCIP problem in similarity comparison between two unlabeled pedigrees was presented in [10]. Pedigrees, or commonly known as family trees, record the occurrence and appearance (or phenotypes) of a particular gene or organism and its ancestors from one generation to the next. They are important to geneticists for linkage analysis, as with a valid pedigree the recombination events can be deduced more accurately [8], or disease loci can be mapped consistently [12], [13]. Jiang et al. [10] considered the isomorphism and similarity comparison problems for two-generation pedigrees, and formulated them as the minimum common integer pair partition (MCIPP) problem, which generalizes the MCIP problem. By exploiting certain structural properties of the optimal solutions for the 2-MCIP problem, they were able to show that their MCIPP problem is also fixed-parameter tractable [10].

For an integer $x \in Z^{+}$ , its number of integer partitions increases very rapidly with x. For example, the integer 3 has three partitions, namely {3}, ${2, 1}$ , and ${1, 1, 1}$ ; the integer 4 has five partitions, namely {4}, ${3, 1}$ , ${2, 2}$ , ${2, 1, 1}$ , and ${1, 1, 1, 1}$ ; while the integer 10 has 190,569,292 partitions according to [2].

Given a collection of multisets ${X_{1}, X_{2}, \dots, X_{k}}$ ( $k \geq 2$ ), they have a CIP if and only if they have the same summation over their elements. Multisets with this property are called related [6], and we assume throughout the paper that the multisets in any instance of MCIP are related, as the verification takes only linear time.

One can see that the 2-MCIP problem generalizes the well-known subset sum problem [9], based on the following observation: Given a positive integer number x and a multiset of positive integers $X = {a_{1}, a_{2}, \dots, a_{m}}$ , there exists a sub-multiset of X summing to x if and only if for the two multisets $X = {a_{1}, a_{2}, \dots, a_{m}}$ and $Y = {x, \sum_{i = 1}^{m} a_{i} - x}$ , $| OPT (X, Y) | = m$ . Thus 2-MCIP is NP-hard [6]. Chen et al. showed that 2-MCIP is APX-hard [6], via a linear reduction (also called an approximation preserving reduction) from the maximum bounded 3-dimensional matching problem [11]. After the preliminary version of this paper, You et al. presented a fixed-parameter tractable (FPT) algorithm for 2-MCIP in [15].

Let $M = | X_{1} | + | X_{2} | + \dots + | X_{k} |$ denote the total number of integers in the k-MCIP problem. For the positive algorithmic results, Chen et al. presented a linear time 2-approximation algorithm and an $O (M^{9})$ -time 5/4-approximation algorithm for 2-MCIP [6], based on a heuristic for the maximum weighted set packing problem [11]. The 5/4-approximation can be taken as a subroutine to design a 0.625k-approximation algorithm for k-MCIP (when k is even; when k is odd, the approximation ratio is $0.625 k + 0.375$ ) [14]. Woodruff developed a framework for capturing the frequencies of the integers across the input multisets and presented a randomized $O (M \log k)$ -time approximation algorithm for k-MCIP, with a worst-case performance ratio $0.6139 k (1 + o (1))$ [14]. The basic idea is, when there are not too many distinct integers in the input multisets, most of the low frequency integers will have to be split into at least two parts in any common partition. Inspired by this idea, Zhao et al. [16] formulated the k-MCIP problem into a flow decomposition problem in an acyclic k-layer network with the goal to find a minimum number of directed simple paths from the source to the sink. Since this minimum number can be bounded by the number of arcs in the network according to the well-known flow decomposition theorem [1], Zhao et al. presented a scheme to reduce the number of arcs in the network, resulting in a de-randomized approximation algorithm with a performance ratio $0.5625 k (1 + o (1))$ , which is the currently best.

In this paper, we present a polynomial-time 6/5-approximation algorithm for 2-MCIP. Subsequently, we obtain a 0.6k-approximation algorithm for k-MCIP when k is even (when k is odd, the approximation ratio is $0.6 k + 0.4$ ). It is worth pointing out that the ratio of 0.5625k in [16] is asymptotic, that it holds for only sufficiently large k; while our ratio of 0.6k is absolute, that it holds for all $k \geq 2$ .

The rest of the paper is organized as follows: In the next section, we introduce some known bounds on the cardinality of the optimal CIPs for 2-MCIP first, then present our 6/5-approximation algorithm and its performance analysis, assuming an important inequality stated in Lemma 4. The entire Section 3 is devoted to the proof of Lemma 4, where multiple amortized analyses are employed. We note that while conceptually simple, some of the amortized analyses are technical and involved, with a number of notations set up for token counting purposes. In Section 4, we extend the 6/5-approximation algorithm to a 0.6k-approximation for k-MCIP when $k > 2$ (a $(0.6 k + 0.4)$ -approximation when k is odd). We conclude the paper with some future work in Section 5.

Section snippets

A 6/5-approximation algorithm for 2-MCIP

In this section, we deal with the 2-MCIP problem. For ease of presentation, we denote the two multisets of positive integers in an instance as $X = {x_{1}, x_{2}, \dots, x_{m}}$ and $Y = {y_{1}, y_{2}, \dots, y_{n}}$ , and assume without loss of generality that they are related. Recall that, OPT $(X, Y)$ denotes the optimal solution — the minimum cardinality CIP for ${X, Y}$ , and CIP $_{A} (X, Y)$ denotes the solution CIP produced by the algorithm A.

Proof of Lemma 4

This section is devoted to the proof of Lemma 4, stating that $3 q_{3}^{⁎} + 2 q_{4}^{⁎} + q_{5}^{⁎} \leq 5 (p_{3} + p_{4} + p_{5})$ . By Eq. (2.4), it is sufficient to show that $2 q_{31}^{⁎} + q_{32}^{⁎} + q_{41}^{⁎} \leq 2 p_{3} + p_{4}$ , which is stated as Lemma 10. To this purpose, we consider the bipartite subgraph $H^{'}$ of the graph H induced by the vertex subsets $Q_{31}^{⁎} \cup Q_{32}^{⁎} \cup Q_{41}^{⁎}$ and P. By associating two tokens for each vertex of $Q_{31}^{⁎}$ and one token for each vertex of $Q_{32}^{⁎} \cup Q_{41}^{⁎}$ , we re-distribute these tokens to the vertices of P through adjacencies by distinguishing five

A 0.6k-approximation algorithm for k-MCIP

Given an instance of the k-MCIP problem ${X_{1}, X_{2}, \dots, X_{k}}$ , we first divide these k multisets into $⌊ k / 2 ⌋$ pairs ${X_{2 i - 1}, X_{2 i}}$ , $i = 1, 2, \dots, ⌊ k / 2 ⌋$ , plus the last multiset $X_{k}$ if k is odd. Next, we run the algorithm Apx65 on each pair ${X_{2 i - 1}, X_{2 i}}$ to obtain a solution

, for

i = 1, 2, \dots, ⌊ k / 2 ⌋

, plus

Z_{(k + 1) / 2} = X_{k}

if k is odd. We continue this dividing and running Apx65 on

{Z_{1}, Z_{2}, \dots, Z_{⌊ (k + 1) / 2 ⌋}}

⌊ (k + 1) / 2 ⌋ \geq 2

, and repeat until we have only one multiset left, denoted as

{CIP}_{final}

. Clearly,

{CIP}_{final}

is a common

Conclusions

We presented an improved $\frac{6}{5}$ -approximation algorithm for the 2-MCIP problem; the previous best approximation algorithm has a performance ratio of $\frac{5}{4}$ and was designed by Chen et al. in 2006 [5], [6]. Subsequently, we obtained an absolute 0.6k-approximation algorithm for k-MCIP when k is even (when k is odd, the approximation ratio is $0.6 k + 0.4$ ). It is worth pointing out that the ratio of 0.5625k in [16] is asymptotic, that it holds for only sufficiently large k¹

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

We are very grateful to the anonymous reviewers for their many helpful comments and suggestions to improve the presentation.

This research is supported by the NSERC Canada.

References (16)

V. Kann
Maximum bounded 3-dimensional matching is MAX SNP-complete
Inf. Process. Lett.
(1991)
J. You et al.
Fixed-parameter tractability for minimum tree cut/paste distance and minimum common integer partition
Theor. Comput. Sci.
(2020)
W. Zhao et al.
A network flow approach to the minimum common integer partition problem
Theor. Comput. Sci.
(2006)
R.K. Ahuja et al.
Network Flows: Theory, Algorithm, and Applications
(2005)
G. Andrews
The Theory of Partitions
(1976)
G. Andrews et al.
The Integer Partitions
(2004)
P. Berman
A $d / 2$ approximation for maximum weight independent set in d-claw free graphs
X. Chen et al.
On the minimum common integer partition problem

There are more references available in the full text version of this article.

Cited by (0)

^☆: An extended abstract appears in ISAAC 2014.

View full text

An improved approximation algorithm for the minimum common integer partition problem☆

Abstract

Introduction

Section snippets

A 6/5-approximation algorithm for 2-MCIP

Proof of Lemma 4

A 0.6k-approximation algorithm for k-MCIP

Conclusions

Declaration of Competing Interest

Acknowledgement

Inf. Process. Lett.

Theor. Comput. Sci.

Theor. Comput. Sci.

Network Flows: Theory, Algorithm, and Applications

The Theory of Partitions

The Integer Partitions

A d/2 approximation for maximum weight independent set in d-claw free graphs

On the minimum common integer partition problem

A $d / 2$ approximation for maximum weight independent set in d-claw free graphs