Algorithms for Computing the Family-Free Genomic Similarity Under DCJ

Rubert, Diego P.; Medeiros, Gabriel L.; Hoshino, Edna A.; Braga, Marília D. V.; Stoye, Jens; Martinez, Fábio V.

doi:10.1007/978-3-319-67979-2_5

Diego P. Rubert¹⁵,
Gabriel L. Medeiros¹⁵,
Edna A. Hoshino¹⁵,
Marília D. V. Braga¹⁶,
Jens Stoye¹⁶ &
…
Fábio V. Martinez¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Included in the following conference series:

RECOMB International Workshop on Comparative Genomics

892 Accesses

Abstract

The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. Here we propose an exact ILP algorithm to solve it, we show its APX-hardness, and we present three combinatorial heuristics, with computational experiments comparing their results to the ILP. Experiments on simulated datasets show that the proposed heuristics are very fast and even competitive with respect to the ILP algorithm for some instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)
Article MathSciNet Google Scholar
Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)
Article MathSciNet MATH Google Scholar
Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates. J. Comput. Biol. 14(4), 379–393 (2007)
Article MathSciNet Google Scholar
Ausiello, G., Protasi, M., Marchetti-Spaccamela, A., Gambosi, G., Crescenzi, P., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer (1999)
Google Scholar
Bafna, V., Pevzner, P.: Genome rearrangements and sorting by reversals. In: Proceedings of the FOCS 1993, pp. 148–157 (1993)
Google Scholar
Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). doi:10.1007/11851561_16
Chapter Google Scholar
Berman, P.: A d/2 approximation for maximum weight independent set in d-claw free graphs. In: Halldórsson, M.M. (ed.) SWAT 2000. LNCS, vol. 1851, pp. 214–219. Springer, Heidelberg (2000). doi:10.1007/3-540-44985-X_19
Chapter Google Scholar
Berman, P., Karpinski, M.: On some tighter inapproximability results (extended abstract). In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 200–209. Springer, Heidelberg (1999). doi:10.1007/3-540-48523-6_17
Chapter Google Scholar
Braga, M.D.V., Willing, E., Stoye, J.: Double cut and join with insertions and deletions. J. Comput. Biol. 18(9), 1167–1184 (2011)
Article MathSciNet Google Scholar
Braga, M.D.V., Chauve, C., Dörr, D., Jahn, K., Stoye, J., Thévenin, A., Wittler, R.: The potential of family-free genome comparison. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, vol. 19, pp. 287–307. Springer, London (2013). doi:10.1007/978-1-4471-5298-9_13. Chap. 13
Chapter Google Scholar
Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 207–211. Kluwer Academic Publishers, Dortrecht (2000)
Chapter Google Scholar
Bulteau, L., Jiang, M.: Inapproximability of (1,2)-exemplar distance. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(6), 1384–1390 (2013)
Article Google Scholar
Crescenzi, P.: A short guide to approximation preserving reductions. In: Twelfth Annual IEEE Conference on Proceedings of Computational Complexity, pp. 262–273 (1997). doi:10.1109/CCC.1997.612321
Dalquen, D.A., Anisimova, M., Gonnet, G.H., Dessimoz, C.: ALF - a simulation framework for genome evolution. Mol. Biol. Evol. 29(4), 1115 (2012)
Article Google Scholar
Dörr, D., Thévenin, A., Stoye, J.: Gene family assignment-free comparative genomics. BMC Bioinform. 13(Suppl 19), S3 (2012)
Article Google Scholar
Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of the FOCS 1995, pp. 581–592 (1995). doi:10.1109/SFCS.1995.492588
Håstad, J.: Some optimal inapproximability results. J. ACM (JACM) 48(4), 798–859 (2001)
Article MathSciNet MATH Google Scholar
Hawick, K.A., James, H.A.: Enumerating circuits and loops in graphs with self-arcs and multiple-arcs. Technical report CSTN-013, Massey University (2008)
Google Scholar
Johnson, D.: Finding all the elementary circuits of a directed graph. SIAM J. Comput. 4(1), 77–84 (1975)
Article MathSciNet MATH Google Scholar
Martinez, F.V., Feijão, P., Braga, M.D.V., Stoye, J.: On the family-free DCJ distance and similarity. Algorithms Mol. Biol. 10, 13 (2015)
Article Google Scholar
Raman, V., Ravikumar, B., Rao, S.S.: A simplified NP-complete MAXSAT problem. Inf. Process. Lett. 65(1), 1–6 (1998)
Article MathSciNet MATH Google Scholar
Rubert, D.P., Feijão, P., Braga, M.D.V., Stoye, J., Martinez, F.V.: Approximating the DCJ distance of balanced genomes in linear time. Algorithms Mol. Biol. 12, 3 (2017)
Article Google Scholar
Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992). doi:10.1007/3-540-56024-6_10
Chapter Google Scholar
Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)
Article Google Scholar
Shao, M., Lin, Y.: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform. 13(Suppl 19), S13 (2012)
Article Google Scholar
Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 280–292. Springer, Cham (2014). doi:10.1007/978-3-319-05269-4_22
Chapter Google Scholar
Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchanges. Bioinformatics 21(16), 3340–3346 (2005)
Article Google Scholar

Download references

Acknowledgments

We would like to thank Pedro Feijão and Daniel Doerr for helping us with hints on how to get the simulated data for our experiments.

Author information

Authors and Affiliations

Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS, Brazil
Diego P. Rubert, Gabriel L. Medeiros, Edna A. Hoshino & Fábio V. Martinez
Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
Marília D. V. Braga & Jens Stoye

Authors

Diego P. Rubert
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel L. Medeiros
View author publications
You can also search for this author in PubMed Google Scholar
Edna A. Hoshino
View author publications
You can also search for this author in PubMed Google Scholar
Marília D. V. Braga
View author publications
You can also search for this author in PubMed Google Scholar
Jens Stoye
View author publications
You can also search for this author in PubMed Google Scholar
Fábio V. Martinez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fábio V. Martinez .

Editor information

Editors and Affiliations

University of Campinas, Campinas, São Paulo, Brazil
Joao Meidanis
Rice University, Houston, Texas, USA
Luay Nakhleh

A Proof of APX-hardness and Approximation Ratio Lower Bound

For the APX-hardness proof of problem ffdcj-similarity, we first give some definitions based on [13]. Thereby we restrict ourselves to maximization problems and feasible solutions.

Given an instance x of an optimization problem P and a solution y of x, $val (x,y)$ denotes the value of y, which is a positive integer measure of y. The function $val $, also referred to as objective function, must be computable in polynomial time. The value of an optimal solution (which maximizes the objective function) is defined as $\text {opt}(x)$. Thus, the performance ratio of y with respect to x is defined as:

$$\begin{aligned} R_P(x,y) = \frac{\text {opt}(x)}{val (x,y)}. \end{aligned}$$

(6)

Given two optimization problems P and $P'$, let f be a polynomial-time computable function that maps an instance x of P into an instance f(x) of $P'$, and let g be a polynomial-time computable function that maps a solution y for the instance f(x) of $P'$ into a solution g(x, y) of P. A reduction is a pair (f, g). A reduction from P to $P'$ is frequently denoted by $P \le P'$, and we say that P is reduced to $P'$. A reduction $P \le P'$ preserves membership in a class $\mathcal {C}$ if $P' \in \mathcal {C}$ implies $P \in \mathcal {C}$. An approximation-preserving reduction preserves membership in either APX, PTAS, or both classes. The strict reduction, which is the simplest type of approximation-preserving reduction, preserves membership in both APX and PTAS classes and must satisfy the following condition:

$$\begin{aligned} R_P(x,g(x,y)) \le R_{P'}(f(x),y). \end{aligned}$$

(7)

We consider the following optimization problem, to be used within the proof of Theorem 1 below:

Problem max-2sat3($\phi $): Given a 2-cnf formula (i.e., with at most 2 literals per clause) $\phi = \{C_1, \cdots , C_m\}$ with n variables $X = \{x_1, \cdots , x_n\}$, where each variable appears in at most 3 clauses, find an assignment that satisfies the largest number of clauses.

The formula $\phi $ as defined above is called a 2sat3 formula. max-2sat3 [4, 8] is a special case of max-2sat B (also known as B-occ-max-2sat), where each variable occurs in at most B clauses for some B, which in turn is a restricted version of max-2sat [21].

Theorem 1

ffdcj-similarity is APX-hard and cannot be approximated with approximation ratio better than , unless $P = NP$.

Proof

(Theorem 1, first part). We give a strict reduction (f, g) from max-2sat3 to ffdcj-similarity, showing that

$$\begin{aligned} R_{\textsc {max}\text {-}\textsc {2sat3}}(\phi ,g(f(\phi ),\gamma )) \le R_{\textsc {ffdcj}\text {-}\textsc {similarity}}(f(\phi ),\gamma ), \end{aligned}$$

for any instance $\phi $ of max-2sat3 and solution $\gamma $ of ffdcj-similarity with instance $f(\phi )$. Since variables occurring only once imply their clauses and others to be trivially satisfied, we consider only clauses that are not trivially satisfied in their instance. Similar for clauses containing literals $x_i$ and $\overline{x_i}$, for some variable $x_i$.

(Function f.) We show progressively how to build $G\!S_\sigma (A, B)$ and define genes and their sequences in chromosomes of A and B. For each variable $x_i$ occurring three times, let $Cx_i^1$, $Cx_i^2$ and $Cx_i^3$ be aliases for the clauses where $x_i$ occurs (notice that a clause composed of two literals has two aliases). We define a variable component $\mathcal {C}_i$ adding vertices (genes) $x_i^1$, $x_i^2$ and $x_i^3$ to $\mathcal {A}$, vertices (genes) $Cx_i^1$, $Cx_i^2$ and $Cx_i^3$ to $\mathcal {B}$, and edges $ex_i^j = (Cx_i^j,x_i^j)$ and $e\overline{x_i}^j = (Cx_i^j,x_i^k)$ for $j \in \{1, 2, 3\}$ and $k =(j+1)\bmod {}3+1$. An edge $ex_i^j$ ($e\overline{x_i}^j$) has weight 1 (0) if the literal $x_i$ ($\overline{x_i}$) belongs to the clause $Cx_i^j$. Edges in the variable component $\mathcal {C}_i$ form a cycle of length 6 (Fig. 6). Variable components for variables occurring two times are defined in a similar manner. Genomes are $A=\{(x_i^j)$ for each occurrence j of each variable $x_i \in X \}$ and $B = \{(Cx_i^j) : Cx_i^j$ is an alias to a clause in $\phi $ with only one literal$\} \cup \{(Cx_i^j\;Cx_{i'}^{j'}) : Cx_i^j$ and $Cx_{i'}^{j'}$ are aliases to the same clause in $\phi \}$.

The function f as defined here maps an instance $\phi $ of max-2sat3 (a 2-cnf formula) to an instance $f(\phi )$ of ffdcj-similarity (genomes A and B and $G\!S_\sigma (A, B)$) and is clearly polynomial. Besides, since all chromosomes are circular, the corresponding weighted adjacency graph $AG_{\!\sigma }(A,B)$ (or $AG_{\!\sigma }(A^M,B^M)$ for some matching M) is a collection of cycles only.

Now, notice that any maximal matching in $G\!S_\sigma (A,B)$ covers all genes in both $\mathcal {A}$ and $\mathcal {B}$, inducing in $AG_{\!\sigma }(A,B)$ only cycles of length 2, composed by (genes in) chromosomes $(x_i^j)$ and $(Cx_i^{j'})$, or cycles of length 4, composed by chromosomes $(x_i^j)$, $(x_k^l)$ and $(Cx_i^{j'}\;Cx_k^{l'})$.

Define the normalized weight of cycle C as $\mu (C) = w(C) / |C|$. In this transformation, each cycle C is such that $\mu (C) = 0, 0.5$ or 1. A cycle C such that $\mu (C) > 0$ is a helpful cycle and represents a clause satisfied by one or two literals ($\mu (C) = 0.5$ or $\mu (C) = 1$, respectively). See an example in Fig. 7.

In this scenario, however, a solution of ffdcj-similarity with performance ratio r could lead to a solution of max-2sat3 with ratio 2r, since the total normalized weight for two cycles $C_1$ and $C_2$ with $\mu (C_1) = \mu (C_2) = 0.5$ (two clauses satisfied by one literal each) is the same for one cycle C with $\mu (C) = 1.0$ (one clause satisfied by two literals). Therefore, achieving the desired ratio requires some modifications in f. It is not possible to make these two types of cycles have the same weight, but it suffices to get close enough.

We introduce special genes into the genomes called extenders. For some p even, for each edge $ex_i^j = (Cx_i^j,x_i^j)$ of weight 1 in $G\!S_\sigma (A,B)$ we introduce p extenders $\alpha _1, \cdots , \alpha _{p}$ into A (as a consequence, they are also introduced into $\mathcal {A}$) and p extenders $\alpha _{p+1}, \cdots , \alpha _{2p}$ into B (each $ex_i^j$ of weight 1 has its own set of extenders). Edge $ex_i^j$ is replaced by edges $(Cx_i^j,\alpha _1)$ with weight 1 (which we consider equivalent to $ex_i^j$) and $(\alpha _{p+1},x_i^j)$ with weight 0, and edges $(\alpha _k,\alpha _{p+k})$ with weight 0 are added to $G\!S_\sigma (A,B)$ for each $1 \le k \le p$ (extenders $\alpha _1$ and $\alpha _{p+1}$ are now part of the variable component $\mathcal {C}_i$). Regarding new chromosomes in genomes A and B, A is updated to $A \cup \{(\alpha _1\;{-\alpha _p})\} \cup \{(\alpha _k\;{-\alpha _{k+1}}) : k \in \{2, 4, \cdots , p-2\}\}$ and B to $B \cup \{(\alpha _k\;{-\alpha _{k+1}}) : k \!\in \! \{p+1, p+3, \cdots , 2p-1\}\}$. By this construction, which is still polynomial, the path from $x_i^{jt}$ to $Cx_i^{jt}$ in $AG_{\!\sigma }(A,B)$ is extended from 1 to $1 + p$ edges, from $\{(x_i^{jt},Cx_i^{jt})\}$ to $\{(x_i^{jt},\alpha _p^t), (\alpha _{p+1}^t,\alpha _2^t), (\alpha _3^t,\alpha _{p+2}^t), (\alpha _{p+3}^t,\alpha _4^t), \cdots , (\alpha _1^t,Cx_i^{jt})\}$. The same occurs for the path from $x_i^{jh}$ to $Cx_i^{jh}$ (see Fig. 8). Now, cycles in $AG_{\!\sigma }(A,B)$ induced by edges of weight 0 in $G\!S_\sigma (A,B)$ have normalized weight 0, cycles previously with normalized weight 1 are extended and have normalized weight $\frac{1}{1+p}$, and cycles previously with normalized weight 0.5 are extended and have normalized weight $\frac{1}{2+p}$. Notice that, for a sufficiently large p, $\frac{1}{1+p}$ is quite close to $\frac{1}{2+p}$, hence the problem of finding the maximum similarity in this graph is very similar to finding the maximum number of helpful cycles.

(Function g.) By the structure of variable components in $G\!S_\sigma (A, B)$, and since solutions of ffdcj-similarity are restricted to maximal matchings only, any solution $\gamma $ for $f(\phi )$ is a matching that covers only edges $ex_i^j$ or $e\overline{x_i}^j$ for each variable component $\mathcal {C}_i$. For a $\mathcal {C}_i$, if edges $ex_i^j$ ($e\overline{x_i}^j$) are in the solution then the variable $x_i$ is assigned to true (false), inducing in polynomial time an assignment for each $x_i \in X$ and therefore a solution $g(f(\phi ),\gamma )$ to max-2sat3. A clause is satisfied if vertices (or the only vertex) corresponding to its aliases are in a helpful cycle.

(Approximation Ratio.) Given $f(\phi )$ and a feasible solution $\gamma $ of ffdcj-similarity with the maximum number of helpful cycles, denote by $c'$ the number of helpful cycles in $\gamma $. Notice that $c'$ is also the maximum number of satisfied clauses of max-2sat3, that is, the value of an optimal solution for max-2sat3 for any instance $\phi $, denoted here by $\text {opt}_{\textsc {2sat3}}(\phi )$. Thus, $c' = \text {opt}_{\textsc {2sat3}}(\phi )$.

To achieve the desired ratio we must establish some properties and relations between the parameters of max-2sat3 and ffdcj-similarity and set some parameters to specific values.

Let $n := |A| = |B|$ before extenders are added. We choose for p (the number of extenders added for each edge of weight 1 in $G\!S_\sigma (A,B)$) the value 2n and define $\omega = \frac{1}{2+p} = \frac{1}{2+2n}$ and

$$\begin{aligned} \varepsilon = \frac{1}{1+p} - \frac{1}{2+p} = \frac{1}{4n^2 + 6n + 2}, \end{aligned}$$

which implies that $\omega + \varepsilon = \frac{1}{p+1}$. Thus, it is easy to see that $n\varepsilon < \omega $, i.e.,

$$\begin{aligned} \varepsilon< \frac{\omega }{n} < 1. \end{aligned}$$

(8)

If $\text {opt}_{\textsc {sim}}(f(\phi ))$ denotes the value of an optimal solution for ffdcj-similarity with instance $f(\phi )$ and $c^*$ denotes the number of helpful cycles in an optimal solution of ffdcj-similarity, then we have immediately that

$$\begin{aligned} \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{\omega + \varepsilon } \le c^* \le \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{\omega }. \end{aligned}$$

(9)

Besides that

$$\begin{aligned} 0 \le c^* \le n, \end{aligned}$$

(10)

and

$$\begin{aligned} c^*\omega \le \text {opt}_{\textsc {sim}}(f(\phi )) \le c^*(\omega + \varepsilon ). \end{aligned}$$

(11)

Thus, we have

$$\begin{aligned} c^*(\omega + \varepsilon )&= c^*\omega + c^*\varepsilon \nonumber \\&< c^*\omega + \frac{c^*\omega }{n} \end{aligned}$$

(12)

$$\begin{aligned}&\le c^*\omega + 1 \cdot \omega \end{aligned}$$

(13)

$$\begin{aligned}&= c^*\omega + \omega , \end{aligned}$$

(14)

where (12) comes from (8) and (13) is valid due to (10).

Now, let $c^r$ be the number of helpful cycles given by an approximate solution for the ffdcj-similarity with approximation ratio r. Then,

$$\begin{aligned} R_{\textsc {max-2sat3}}(\phi ,g(f(\phi ),\gamma )) = \frac{\text {opt}_{\textsc {2sat3}}(\phi )}{c^r} = \frac{c'}{c^r} \le r, \end{aligned}$$

where the last inequality is given by Proposition 2 below. This concludes the first part of the proof. $\square $

Proposition 1

Let $c'$ be the number of helpful cycles in a feasible solution of ffdcj-similarity with the greatest number of helpful cycles possible. Let $c^*$ be the number of helpful cycles in an optimal solution of ffdcj-similarity. Then,

$$\begin{aligned} c' = c^*. \end{aligned}$$

Proof

Since $c'$ is the greatest number of helpful cycles possible, it is immediate that $c^* \le c'$.

Let us now show that $c^* \ge c'$. Suppose for a moment that $c^* < c'$. Since $c^*$ and $c'$ are integers, this implies that $c^* + 1 \le c'$, i.e.,

$$\begin{aligned} c^* \le c' - 1. \end{aligned}$$

(15)

Let $\mathcal {C}'$ be the set of cycles with $c'$ cycles, i.e., with the maximum number of helpful cycles possible. Let $\mu (\mathcal {C}') := \sum _{C \in \mathcal {C}'} \mu (C) = \sum _{C \in \mathcal {C}'} w(C)/|C|$. Then

$$\begin{aligned} \mu (\mathcal {C}') \ge c'\omega&= (c'-1)\omega + \omega \nonumber \\&\ge c^*\omega + \omega \end{aligned}$$

(16)

$$\begin{aligned}&> c^*(\omega +\varepsilon ) \end{aligned}$$

(17)

$$\begin{aligned}&\ge \text {opt}_{\textsc {sim}}(f(\phi )), \end{aligned}$$

(18)

where (16) follows from (15), (17) comes from (14), and (18) is valid due to (11). It means that $\mu (\mathcal {C}') > \text {opt}_{\textsc {sim}}(f(\phi ))$, which is a contradiction.

Therefore, $c' = c^*$. $\square $

Proposition 2

Let $c^r$ be the number of helpful cycles given by an approximate solution for ffdcj-similarity with approximation ratio r. Let $c'$ be the same as defined in Proposition 1. Then,

$$\begin{aligned} c^r \ge \frac{c'}{r}. \end{aligned}$$

Proof

Given an instance $f(\phi )$ of ffdcj-similarity, let $\gamma ^r$ be an approximate solution of $f(\phi )$ with performance ratio r, i.e., $val (f(\phi ), \gamma ^r) \ge \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r}$. Let $c^r$ be the number of helpful cycles of $\gamma ^r$. Then

$$\begin{aligned} c^r&\ge \frac{\big (\frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r}\big )}{\omega + \epsilon } \nonumber \\&> \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r(\omega +\omega /n)} \end{aligned}$$

(19)

$$\begin{aligned}&= \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r\omega } \cdot \frac{n}{n+1}\nonumber \\&\ge \frac{c'\omega }{r\omega } \cdot \frac{n}{n+1} \end{aligned}$$

(20)

$$\begin{aligned}&= \frac{c'}{r} \cdot \big (1 - \frac{1}{n+1}\big ) \nonumber \\&= \frac{c'}{r} - \frac{c'}{r(n+1)} \nonumber \\&\ge \frac{c'}{r} - 1 \end{aligned}$$

(21)

where (19) follows from (8), (20) is valid from (11) and Proposition 1. Then, from (21) we know that $c^r > \frac{c'}{r} - 1$ and, since $c^r$ is an integer number, the result follows. $\square $

We now continue with the proof of Theorem 1.

Proof

(Theorem 1, second part). First, notice that if a problem is APX-hard, the existence of a PTAS for it implies P $=$ NP. Since a strict reduction preserves membership in the class PTAS, finding a PTAS for ffdcj-similarity implies a PTAS for every APX-hard problem and P $=$ NP. A PTAS for ffdcj-similarity would also imply an approximation ratio better than , unless P $=$ NP. This follows immediately from the reduction in Theorem 1 with $R_{\textsc {max}\text {-}\textsc {2sat3}} = R_{\textsc {ffdcj}\text {-}\textsc {similarity}}$ and the fact that max-2sat3 is shown in [8] to be NP-hard to approximate within a factor of $2012/2011 - \varepsilon $ for any $\varepsilon > 0$.

However, our result is slightly stronger. Notice particularly that the reduction $\textsc {max}\text {-}\textsc {2sat3} \le \textsc {ffdcj}\text {-}\textsc {similarity}$ from the first part of the proof can be trivially extended to $\textsc {max}\text {-}\textsc {2sat} \le \textsc {ffdcj}\text {-}\textsc {similarity}$ by extending variable components to arbitrary sizes. This increases the lower bound to [17]. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rubert, D.P., Medeiros, G.L., Hoshino, E.A., Braga, M.D.V., Stoye, J., Martinez, F.V. (2017). Algorithms for Computing the Family-Free Genomic Similarity Under DCJ. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-67979-2_5
Published: 15 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Algorithms for Computing the Family-Free Genomic Similarity Under DCJ

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proof of APX-hardness and Approximation Ratio Lower Bound

A Proof of APX-hardness and Approximation Ratio Lower Bound

Theorem 1

Proof

Proposition 1

Proof

Proposition 2

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation