Skip to main content

Algorithms for Computing the Family-Free Genomic Similarity Under DCJ

  • Conference paper
  • First Online:
Comparative Genomics (RECOMB-CG 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Included in the following conference series:

  • 892 Accesses

Abstract

The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. Here we propose an exact ILP algorithm to solve it, we show its APX-hardness, and we present three combinatorial heuristics, with computational experiments comparing their results to the ILP. Experiments on simulated datasets show that the proposed heuristics are very fast and even competitive with respect to the ILP algorithm for some instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)

    Article  MathSciNet  Google Scholar 

  2. Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates. J. Comput. Biol. 14(4), 379–393 (2007)

    Article  MathSciNet  Google Scholar 

  4. Ausiello, G., Protasi, M., Marchetti-Spaccamela, A., Gambosi, G., Crescenzi, P., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer (1999)

    Google Scholar 

  5. Bafna, V., Pevzner, P.: Genome rearrangements and sorting by reversals. In: Proceedings of the FOCS 1993, pp. 148–157 (1993)

    Google Scholar 

  6. Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). doi:10.1007/11851561_16

    Chapter  Google Scholar 

  7. Berman, P.: A d/2 approximation for maximum weight independent set in d-claw free graphs. In: Halldórsson, M.M. (ed.) SWAT 2000. LNCS, vol. 1851, pp. 214–219. Springer, Heidelberg (2000). doi:10.1007/3-540-44985-X_19

    Chapter  Google Scholar 

  8. Berman, P., Karpinski, M.: On some tighter inapproximability results (extended abstract). In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 200–209. Springer, Heidelberg (1999). doi:10.1007/3-540-48523-6_17

    Chapter  Google Scholar 

  9. Braga, M.D.V., Willing, E., Stoye, J.: Double cut and join with insertions and deletions. J. Comput. Biol. 18(9), 1167–1184 (2011)

    Article  MathSciNet  Google Scholar 

  10. Braga, M.D.V., Chauve, C., Dörr, D., Jahn, K., Stoye, J., Thévenin, A., Wittler, R.: The potential of family-free genome comparison. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, vol. 19, pp. 287–307. Springer, London (2013). doi:10.1007/978-1-4471-5298-9_13. Chap. 13

    Chapter  Google Scholar 

  11. Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 207–211. Kluwer Academic Publishers, Dortrecht (2000)

    Chapter  Google Scholar 

  12. Bulteau, L., Jiang, M.: Inapproximability of (1,2)-exemplar distance. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(6), 1384–1390 (2013)

    Article  Google Scholar 

  13. Crescenzi, P.: A short guide to approximation preserving reductions. In: Twelfth Annual IEEE Conference on Proceedings of Computational Complexity, pp. 262–273 (1997). doi:10.1109/CCC.1997.612321

  14. Dalquen, D.A., Anisimova, M., Gonnet, G.H., Dessimoz, C.: ALF - a simulation framework for genome evolution. Mol. Biol. Evol. 29(4), 1115 (2012)

    Article  Google Scholar 

  15. Dörr, D., Thévenin, A., Stoye, J.: Gene family assignment-free comparative genomics. BMC Bioinform. 13(Suppl 19), S3 (2012)

    Article  Google Scholar 

  16. Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of the FOCS 1995, pp. 581–592 (1995). doi:10.1109/SFCS.1995.492588

  17. Håstad, J.: Some optimal inapproximability results. J. ACM (JACM) 48(4), 798–859 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hawick, K.A., James, H.A.: Enumerating circuits and loops in graphs with self-arcs and multiple-arcs. Technical report CSTN-013, Massey University (2008)

    Google Scholar 

  19. Johnson, D.: Finding all the elementary circuits of a directed graph. SIAM J. Comput. 4(1), 77–84 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  20. Martinez, F.V., Feijão, P., Braga, M.D.V., Stoye, J.: On the family-free DCJ distance and similarity. Algorithms Mol. Biol. 10, 13 (2015)

    Article  Google Scholar 

  21. Raman, V., Ravikumar, B., Rao, S.S.: A simplified NP-complete MAXSAT problem. Inf. Process. Lett. 65(1), 1–6 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  22. Rubert, D.P., Feijão, P., Braga, M.D.V., Stoye, J., Martinez, F.V.: Approximating the DCJ distance of balanced genomes in linear time. Algorithms Mol. Biol. 12, 3 (2017)

    Article  Google Scholar 

  23. Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992). doi:10.1007/3-540-56024-6_10

    Chapter  Google Scholar 

  24. Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)

    Article  Google Scholar 

  25. Shao, M., Lin, Y.: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform. 13(Suppl 19), S13 (2012)

    Article  Google Scholar 

  26. Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 280–292. Springer, Cham (2014). doi:10.1007/978-3-319-05269-4_22

    Chapter  Google Scholar 

  27. Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchanges. Bioinformatics 21(16), 3340–3346 (2005)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Pedro Feijão and Daniel Doerr for helping us with hints on how to get the simulated data for our experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fábio V. Martinez .

Editor information

Editors and Affiliations

A Proof of APX-hardness and Approximation Ratio Lower Bound

A Proof of APX-hardness and Approximation Ratio Lower Bound

For the APX-hardness proof of problem ffdcj-similarity, we first give some definitions based on [13]. Thereby we restrict ourselves to maximization problems and feasible solutions.

Given an instance x of an optimization problem P and a solution y of x, \(val (x,y)\) denotes the value of y, which is a positive integer measure of y. The function \(val \), also referred to as objective function, must be computable in polynomial time. The value of an optimal solution (which maximizes the objective function) is defined as \(\text {opt}(x)\). Thus, the performance ratio of y with respect to x is defined as:

$$\begin{aligned} R_P(x,y) = \frac{\text {opt}(x)}{val (x,y)}. \end{aligned}$$
(6)

Given two optimization problems P and \(P'\), let f be a polynomial-time computable function that maps an instance x of P into an instance f(x) of \(P'\), and let g be a polynomial-time computable function that maps a solution y for the instance f(x) of \(P'\) into a solution g(xy) of P. A reduction is a pair (fg). A reduction from P to \(P'\) is frequently denoted by \(P \le P'\), and we say that P is reduced to \(P'\). A reduction \(P \le P'\) preserves membership in a class \(\mathcal {C}\) if \(P' \in \mathcal {C}\) implies \(P \in \mathcal {C}\). An approximation-preserving reduction preserves membership in either APX, PTAS, or both classes. The strict reduction, which is the simplest type of approximation-preserving reduction, preserves membership in both APX and PTAS classes and must satisfy the following condition:

$$\begin{aligned} R_P(x,g(x,y)) \le R_{P'}(f(x),y). \end{aligned}$$
(7)

We consider the following optimization problem, to be used within the proof of Theorem 1 below:

Problem max-2sat3(\(\phi \)): Given a 2-cnf formula (i.e., with at most 2 literals per clause) \(\phi = \{C_1, \cdots , C_m\}\) with n variables \(X = \{x_1, \cdots , x_n\}\), where each variable appears in at most 3 clauses, find an assignment that satisfies the largest number of clauses.

The formula \(\phi \) as defined above is called a 2sat3 formula. max-2sat3 [4, 8] is a special case of max-2sat B (also known as B-occ-max-2sat), where each variable occurs in at most B clauses for some B, which in turn is a restricted version of max-2sat [21].

Theorem 1

ffdcj-similarity is APX-hard and cannot be approximated with approximation ratio better than , unless \(P = NP\).

Proof

(Theorem 1, first part). We give a strict reduction (fg) from max-2sat3 to ffdcj-similarity, showing that

$$\begin{aligned} R_{\textsc {max}\text {-}\textsc {2sat3}}(\phi ,g(f(\phi ),\gamma )) \le R_{\textsc {ffdcj}\text {-}\textsc {similarity}}(f(\phi ),\gamma ), \end{aligned}$$

for any instance \(\phi \) of max-2sat3 and solution \(\gamma \) of ffdcj-similarity with instance \(f(\phi )\). Since variables occurring only once imply their clauses and others to be trivially satisfied, we consider only clauses that are not trivially satisfied in their instance. Similar for clauses containing literals \(x_i\) and \(\overline{x_i}\), for some variable \(x_i\).

(Function f.) We show progressively how to build \(G\!S_\sigma (A, B)\) and define genes and their sequences in chromosomes of A and B. For each variable \(x_i\) occurring three times, let \(Cx_i^1\), \(Cx_i^2\) and \(Cx_i^3\) be aliases for the clauses where \(x_i\) occurs (notice that a clause composed of two literals has two aliases). We define a variable component \(\mathcal {C}_i\) adding vertices (genes) \(x_i^1\), \(x_i^2\) and \(x_i^3\) to \(\mathcal {A}\), vertices (genes) \(Cx_i^1\), \(Cx_i^2\) and \(Cx_i^3\) to \(\mathcal {B}\), and edges \(ex_i^j = (Cx_i^j,x_i^j)\) and \(e\overline{x_i}^j = (Cx_i^j,x_i^k)\) for \(j \in \{1, 2, 3\}\) and \(k =(j+1)\bmod {}3+1\). An edge \(ex_i^j\) (\(e\overline{x_i}^j\)) has weight 1 (0) if the literal \(x_i\) (\(\overline{x_i}\)) belongs to the clause \(Cx_i^j\). Edges in the variable component \(\mathcal {C}_i\) form a cycle of length 6 (Fig. 6). Variable components for variables occurring two times are defined in a similar manner. Genomes are \(A=\{(x_i^j)\) for each occurrence j of each variable \(x_i \in X \}\) and \(B = \{(Cx_i^j) : Cx_i^j\) is an alias to a clause in \(\phi \) with only one literal\(\} \cup \{(Cx_i^j\;Cx_{i'}^{j'}) : Cx_i^j\) and \(Cx_{i'}^{j'}\) are aliases to the same clause in \(\phi \}\).

The function f as defined here maps an instance \(\phi \) of max-2sat3 (a 2-cnf formula) to an instance \(f(\phi )\) of ffdcj-similarity (genomes A and B and \(G\!S_\sigma (A, B)\)) and is clearly polynomial. Besides, since all chromosomes are circular, the corresponding weighted adjacency graph \(AG_{\!\sigma }(A,B)\) (or \(AG_{\!\sigma }(A^M,B^M)\) for some matching M) is a collection of cycles only.

Fig. 6.
figure 6

\(G\!S_\sigma (A,B)\) and \(AG_{\!\sigma }(A,B)\) for genomes \(A=\{(x_1^1),(x_1^2),(x_1^3),(x_2^1),(x_2^2)\}\) and \(B = \{(Cx_1^1\;Cx_2^1),(Cx_1^2),(Cx_1^3\;Cx_2^2)\}\) given by function f (Theorem 1) applied to 2sat(3) clauses \(C_1 = (x_1 \vee x_2)\), \(C_2 = (\overline{x_1})\) and \(C_3 = (\overline{x_1} \vee x_2)\). In \(G\!S_\sigma (A,B)\), solid edges correspond to \(ex_i^j\) and dashed edges correspond to \(e\overline{x_i}^j\). In \(AG_{\!\sigma }(A,B)\), shaded region corresponds to genes of genome B, and solid (dashed) edges correspond to solid (dashed) edges of \(G\!S_\sigma (A,B)\).

Fig. 7.
figure 7

A matching M of \(G\!S_\sigma (A,B)\) and cycles induced by M in \(AG_{\!\sigma }(A^M,B^M)\) for genomes of Fig. 6. This solution of ffdcj-similarity represents clauses \(C_1\) and \(C_3\) of max-2sat3 satisfied.

Fig. 8.
figure 8

Detail of graphs \(G\!S_\sigma (A,B)\) and \(AG_{\!\sigma }(A,B)\) for genomes of Fig. 6 including extenders for edge \((x_1^1,Cx_1^1)\) for \(p = 4\). Shaded regions correspond to genes of genome B. Extending all edges of weight 1 and selecting the matching of Fig. 7, this helpful cycle (only half of it is in this figure) would have normalized weight \(\frac{4}{4(p+1)} = \frac{1}{p+1} = \frac{1}{5} = 0.2\).

Now, notice that any maximal matching in \(G\!S_\sigma (A,B)\) covers all genes in both \(\mathcal {A}\) and \(\mathcal {B}\), inducing in \(AG_{\!\sigma }(A,B)\) only cycles of length 2, composed by (genes in) chromosomes \((x_i^j)\) and \((Cx_i^{j'})\), or cycles of length 4, composed by chromosomes \((x_i^j)\), \((x_k^l)\) and \((Cx_i^{j'}\;Cx_k^{l'})\).

Define the normalized weight of cycle C as \(\mu (C) = w(C) / |C|\). In this transformation, each cycle C is such that \(\mu (C) = 0, 0.5\) or 1. A cycle C such that \(\mu (C) > 0\) is a helpful cycle and represents a clause satisfied by one or two literals (\(\mu (C) = 0.5\) or \(\mu (C) = 1\), respectively). See an example in Fig. 7.

In this scenario, however, a solution of ffdcj-similarity with performance ratio r could lead to a solution of max-2sat3 with ratio 2r, since the total normalized weight for two cycles \(C_1\) and \(C_2\) with \(\mu (C_1) = \mu (C_2) = 0.5\) (two clauses satisfied by one literal each) is the same for one cycle C with \(\mu (C) = 1.0\) (one clause satisfied by two literals). Therefore, achieving the desired ratio requires some modifications in f. It is not possible to make these two types of cycles have the same weight, but it suffices to get close enough.

We introduce special genes into the genomes called extenders. For some p even, for each edge \(ex_i^j = (Cx_i^j,x_i^j)\) of weight 1 in \(G\!S_\sigma (A,B)\) we introduce p extenders \(\alpha _1, \cdots , \alpha _{p}\) into A (as a consequence, they are also introduced into \(\mathcal {A}\)) and p extenders \(\alpha _{p+1}, \cdots , \alpha _{2p}\) into B (each \(ex_i^j\) of weight 1 has its own set of extenders). Edge \(ex_i^j\) is replaced by edges \((Cx_i^j,\alpha _1)\) with weight 1 (which we consider equivalent to \(ex_i^j\)) and \((\alpha _{p+1},x_i^j)\) with weight 0, and edges \((\alpha _k,\alpha _{p+k})\) with weight 0 are added to \(G\!S_\sigma (A,B)\) for each \(1 \le k \le p\) (extenders \(\alpha _1\) and \(\alpha _{p+1}\) are now part of the variable component \(\mathcal {C}_i\)). Regarding new chromosomes in genomes A and B, A is updated to \(A \cup \{(\alpha _1\;{-\alpha _p})\} \cup \{(\alpha _k\;{-\alpha _{k+1}}) : k \in \{2, 4, \cdots , p-2\}\}\) and B to \(B \cup \{(\alpha _k\;{-\alpha _{k+1}}) : k \!\in \! \{p+1, p+3, \cdots , 2p-1\}\}\). By this construction, which is still polynomial, the path from \(x_i^{jt}\) to \(Cx_i^{jt}\) in \(AG_{\!\sigma }(A,B)\) is extended from 1 to \(1 + p\) edges, from \(\{(x_i^{jt},Cx_i^{jt})\}\) to \(\{(x_i^{jt},\alpha _p^t), (\alpha _{p+1}^t,\alpha _2^t), (\alpha _3^t,\alpha _{p+2}^t), (\alpha _{p+3}^t,\alpha _4^t), \cdots , (\alpha _1^t,Cx_i^{jt})\}\). The same occurs for the path from \(x_i^{jh}\) to \(Cx_i^{jh}\) (see Fig. 8). Now, cycles in \(AG_{\!\sigma }(A,B)\) induced by edges of weight 0 in \(G\!S_\sigma (A,B)\) have normalized weight 0, cycles previously with normalized weight 1 are extended and have normalized weight \(\frac{1}{1+p}\), and cycles previously with normalized weight 0.5 are extended and have normalized weight \(\frac{1}{2+p}\). Notice that, for a sufficiently large p, \(\frac{1}{1+p}\) is quite close to \(\frac{1}{2+p}\), hence the problem of finding the maximum similarity in this graph is very similar to finding the maximum number of helpful cycles.

(Function g.) By the structure of variable components in \(G\!S_\sigma (A, B)\), and since solutions of ffdcj-similarity are restricted to maximal matchings only, any solution \(\gamma \) for \(f(\phi )\) is a matching that covers only edges \(ex_i^j\) or \(e\overline{x_i}^j\) for each variable component \(\mathcal {C}_i\). For a \(\mathcal {C}_i\), if edges \(ex_i^j\) (\(e\overline{x_i}^j\)) are in the solution then the variable \(x_i\) is assigned to true (false), inducing in polynomial time an assignment for each \(x_i \in X\) and therefore a solution \(g(f(\phi ),\gamma )\) to max-2sat3. A clause is satisfied if vertices (or the only vertex) corresponding to its aliases are in a helpful cycle.

(Approximation Ratio.) Given \(f(\phi )\) and a feasible solution \(\gamma \) of ffdcj-similarity with the maximum number of helpful cycles, denote by \(c'\) the number of helpful cycles in \(\gamma \). Notice that \(c'\) is also the maximum number of satisfied clauses of max-2sat3, that is, the value of an optimal solution for max-2sat3 for any instance \(\phi \), denoted here by \(\text {opt}_{\textsc {2sat3}}(\phi )\). Thus, \(c' = \text {opt}_{\textsc {2sat3}}(\phi )\).

To achieve the desired ratio we must establish some properties and relations between the parameters of max-2sat3 and ffdcj-similarity and set some parameters to specific values.

Let \(n := |A| = |B|\) before extenders are added. We choose for p (the number of extenders added for each edge of weight 1 in \(G\!S_\sigma (A,B)\)) the value 2n and define \(\omega = \frac{1}{2+p} = \frac{1}{2+2n}\) and

$$\begin{aligned} \varepsilon = \frac{1}{1+p} - \frac{1}{2+p} = \frac{1}{4n^2 + 6n + 2}, \end{aligned}$$

which implies that \(\omega + \varepsilon = \frac{1}{p+1}\). Thus, it is easy to see that \(n\varepsilon < \omega \), i.e.,

$$\begin{aligned} \varepsilon< \frac{\omega }{n} < 1. \end{aligned}$$
(8)

If \(\text {opt}_{\textsc {sim}}(f(\phi ))\) denotes the value of an optimal solution for ffdcj-similarity with instance \(f(\phi )\) and \(c^*\) denotes the number of helpful cycles in an optimal solution of ffdcj-similarity, then we have immediately that

$$\begin{aligned} \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{\omega + \varepsilon } \le c^* \le \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{\omega }. \end{aligned}$$
(9)

Besides that

$$\begin{aligned} 0 \le c^* \le n, \end{aligned}$$
(10)

and

$$\begin{aligned} c^*\omega \le \text {opt}_{\textsc {sim}}(f(\phi )) \le c^*(\omega + \varepsilon ). \end{aligned}$$
(11)

Thus, we have

$$\begin{aligned} c^*(\omega + \varepsilon )&= c^*\omega + c^*\varepsilon \nonumber \\&< c^*\omega + \frac{c^*\omega }{n} \end{aligned}$$
(12)
$$\begin{aligned}&\le c^*\omega + 1 \cdot \omega \end{aligned}$$
(13)
$$\begin{aligned}&= c^*\omega + \omega , \end{aligned}$$
(14)

where (12) comes from (8) and (13) is valid due to (10).

Now, let \(c^r\) be the number of helpful cycles given by an approximate solution for the ffdcj-similarity with approximation ratio r. Then,

$$\begin{aligned} R_{\textsc {max-2sat3}}(\phi ,g(f(\phi ),\gamma )) = \frac{\text {opt}_{\textsc {2sat3}}(\phi )}{c^r} = \frac{c'}{c^r} \le r, \end{aligned}$$

where the last inequality is given by Proposition 2 below. This concludes the first part of the proof.    \(\square \)

Proposition 1

Let \(c'\) be the number of helpful cycles in a feasible solution of ffdcj-similarity with the greatest number of helpful cycles possible. Let \(c^*\) be the number of helpful cycles in an optimal solution of ffdcj-similarity. Then,

$$\begin{aligned} c' = c^*. \end{aligned}$$

Proof

Since \(c'\) is the greatest number of helpful cycles possible, it is immediate that \(c^* \le c'\).

Let us now show that \(c^* \ge c'\). Suppose for a moment that \(c^* < c'\). Since \(c^*\) and \(c'\) are integers, this implies that \(c^* + 1 \le c'\), i.e.,

$$\begin{aligned} c^* \le c' - 1. \end{aligned}$$
(15)

Let \(\mathcal {C}'\) be the set of cycles with \(c'\) cycles, i.e., with the maximum number of helpful cycles possible. Let \(\mu (\mathcal {C}') := \sum _{C \in \mathcal {C}'} \mu (C) = \sum _{C \in \mathcal {C}'} w(C)/|C|\). Then

$$\begin{aligned} \mu (\mathcal {C}') \ge c'\omega&= (c'-1)\omega + \omega \nonumber \\&\ge c^*\omega + \omega \end{aligned}$$
(16)
$$\begin{aligned}&> c^*(\omega +\varepsilon ) \end{aligned}$$
(17)
$$\begin{aligned}&\ge \text {opt}_{\textsc {sim}}(f(\phi )), \end{aligned}$$
(18)

where (16) follows from (15), (17) comes from (14), and (18) is valid due to (11). It means that \(\mu (\mathcal {C}') > \text {opt}_{\textsc {sim}}(f(\phi ))\), which is a contradiction.

Therefore, \(c' = c^*\).    \(\square \)

Proposition 2

Let \(c^r\) be the number of helpful cycles given by an approximate solution for ffdcj-similarity with approximation ratio r. Let \(c'\) be the same as defined in Proposition 1. Then,

$$\begin{aligned} c^r \ge \frac{c'}{r}. \end{aligned}$$

Proof

Given an instance \(f(\phi )\) of ffdcj-similarity, let \(\gamma ^r\) be an approximate solution of \(f(\phi )\) with performance ratio r, i.e., \(val (f(\phi ), \gamma ^r) \ge \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r}\). Let \(c^r\) be the number of helpful cycles of \(\gamma ^r\). Then

$$\begin{aligned} c^r&\ge \frac{\big (\frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r}\big )}{\omega + \epsilon } \nonumber \\&> \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r(\omega +\omega /n)} \end{aligned}$$
(19)
$$\begin{aligned}&= \frac{\text {opt}_{\textsc {sim}}(f(\phi ))}{r\omega } \cdot \frac{n}{n+1}\nonumber \\&\ge \frac{c'\omega }{r\omega } \cdot \frac{n}{n+1} \end{aligned}$$
(20)
$$\begin{aligned}&= \frac{c'}{r} \cdot \big (1 - \frac{1}{n+1}\big ) \nonumber \\&= \frac{c'}{r} - \frac{c'}{r(n+1)} \nonumber \\&\ge \frac{c'}{r} - 1 \end{aligned}$$
(21)

where (19) follows from (8), (20) is valid from (11) and Proposition 1. Then, from (21) we know that \(c^r > \frac{c'}{r} - 1\) and, since \(c^r\) is an integer number, the result follows.    \(\square \)

We now continue with the proof of Theorem 1.

Proof

(Theorem 1, second part). First, notice that if a problem is APX-hard, the existence of a PTAS for it implies P \(=\) NP. Since a strict reduction preserves membership in the class PTAS, finding a PTAS for ffdcj-similarity implies a PTAS for every APX-hard problem and P \(=\) NP. A PTAS for ffdcj-similarity would also imply an approximation ratio better than , unless P \(=\) NP. This follows immediately from the reduction in Theorem 1 with \(R_{\textsc {max}\text {-}\textsc {2sat3}} = R_{\textsc {ffdcj}\text {-}\textsc {similarity}}\) and the fact that max-2sat3 is shown in [8] to be NP-hard to approximate within a factor of \(2012/2011 - \varepsilon \) for any \(\varepsilon > 0\).

However, our result is slightly stronger. Notice particularly that the reduction \(\textsc {max}\text {-}\textsc {2sat3} \le \textsc {ffdcj}\text {-}\textsc {similarity}\) from the first part of the proof can be trivially extended to \(\textsc {max}\text {-}\textsc {2sat} \le \textsc {ffdcj}\text {-}\textsc {similarity}\) by extending variable components to arbitrary sizes. This increases the lower bound to  [17].    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Rubert, D.P., Medeiros, G.L., Hoshino, E.A., Braga, M.D.V., Stoye, J., Martinez, F.V. (2017). Algorithms for Computing the Family-Free Genomic Similarity Under DCJ. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67979-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67978-5

  • Online ISBN: 978-3-319-67979-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics