A General Framework for Genome Rearrangement with Biological Constraints

Simonaitis, Pijus; Chateau, Annie; Swenson, Krister M.

doi:10.1007/978-3-030-00834-5_3

Pijus Simonaitis¹⁵,
Annie Chateau^15,16 &
Krister M. Swenson^15,16

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11183))

Included in the following conference series:

RECOMB International conference on Comparative Genomics

561 Accesses

Abstract

This paper generalizes previous studies on genome rearrangement under biological constraints, using double cut and join (DCJ). We propose a model for weighted DCJ, along with a family of optimization problems called $\varphi $-MCPS (Minimum Cost Parsimonious Scenario), that are based on edge labeled graphs. After embedding known results in our framework, we show how to compute solutions to general instances of $\varphi $-MCPS, given an algorithm to compute $\varphi $-MCPS on a circular genome with exactly one occurrence of each gene. These general instances can have an arbitrary number of circular and linear chromosomes, and arbitrary gene content. The practicality of the framework is displayed by generalizing the results of Bulteau, Fertin, and Tannier on the Sorting by wDCJs and indels in intergenes problem, and by generalizing previous results on the Minimum Local Parsimonious Scenario problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A general framework for genome rearrangement with biological constraints

Article Open access 19 July 2019

Models and Algorithms for Genome Rearrangement with Positional Constraints

Models and algorithms for genome rearrangement with positional constraints

Article Open access 17 May 2016

References

Bafna, V., Pevzner, P.A.: Genome rearrangements and sorting by reversals. SIAM J. Comput. 25(2), 272–289 (1996)
Article MathSciNet Google Scholar
Baudet, C., Dias, U., Dias, Z.: Sorting by weighted inversions considering length and symmetry. BMC Bioinform. 16(19), S3 (2015)
Article Google Scholar
Baudet, C., Lemaitre, C., Dias, Z., Gautier, C., Tannier, E., Sagot, M.-F.: Cassis: detection of genomic rearrangement breakpoints. Bioinformatics 26(15), 1897–1898 (2010)
Article Google Scholar
Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). https://doi.org/10.1007/11851561_16
Chapter Google Scholar
Bhuiyan, H., Chen, J., Khan, M., Marathe, M.: Fast parallel algorithms for edge-switching to achieve a target visit rate in heterogeneous graphs. In: 43rd International Conference on Parallel Processing (ICPP), pp. 60–69. IEEE (2014)
Google Scholar
Bienstock, D., Günlük, O.: A degree sequence problem related to network design. Networks 24(4), 195–205 (1994)
Article MathSciNet Google Scholar
Biller, P., Knibbe, C., Guéguen, L., Tannier, E.: Breaking good: accounting for the diversity of fragile regions for estimating rearrangement distances. Genome Biol. Evol. 8, 1427–39 (2016)
Article Google Scholar
Bitner, J.R.: An asymptotically optimal algorithm for the dutch national flag problem. SIAM J. Comput. 11(2), 243–262 (1982)
Article MathSciNet Google Scholar
Blanchette, M., Kunisawa, T., Sankoff, D.: Parametric genome rearrangement. Gene 172(1), 11–17 (1996)
Article Google Scholar
Braga, M.D.V., Sagot, M.-F., Scornavacca, C., Tannier, E.: The solution space of sorting by reversals. In: Măndoiu, I., Zelikovsky, A. (eds.) ISBRA 2007. LNCS, vol. 4463, pp. 293–304. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72031-7_27
Chapter Google Scholar
Braga, M.D.V., Stoye, J.: The solution space of sorting by DCJ. J. Comput. Biol. 17(9), 1145–1165 (2010)
Article MathSciNet Google Scholar
Bulteau, L., Fertin, G., Tannier, E.: Genome rearrangements with indels in intergenes restrict the scenario space. BMC Bioinform. 17(14), 426 (2016)
Article Google Scholar
Caprara, A.: Sorting by reversals is difficult. In Proceedings of the First Annual International Conference on Computational Molecular Biology, pp. 75–83. ACM (1997)
Google Scholar
Farnoud, F., Milenkovic, O.: Sorting of permutations by cost-constrained transpositions. IEEE Trans. Inf. Theory 58(1), 3–23 (2012)
Article MathSciNet Google Scholar
Fertin, G., Jean, G., Tannier, E.: Algorithms for computing the double cut and join distance on both gene order and intergenic sizes. Algorithms Mol. Biol. 12(1), 16 (2017)
Article Google Scholar
Lieberman-Aiden, E., Van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. science 326(5950), 289–293 (2009)
Article Google Scholar
Nadeau, J.H., Taylor, B.A.: Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. 81(3), 814–818 (1984)
Article Google Scholar
Ohno, S.: Evolution by Gene Duplication. Springer, Heidelberg (1970)
Book Google Scholar
Pulicani, S., Simonaitis, P., Rivals, E., Swenson, K.M.: Rearrangement scenarios guided by chromatin structure. In: Meidanis, J., Nakhleh, L. (eds.) Comparative Genomics. RECOMB-CG 2017. LNCS, vol. 10562, pp. 141–155. Springer, Cham (2017)
Google Scholar
Shao, M., Lin, Y.: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. In: BMC bioinformatics, vol. 13, p. S13. BioMed Central (2012)
Google Scholar
Shao, M., Lin, Y., Moret, B.M.E.: Sorting genomes with rearrangements and segmental duplications through trajectory graphs. In: BMC bioinformatics, vol. 14, p. S9. BioMed Central (2013)
Google Scholar
Simonaitis, P., Swenson, K.M.: Finding local genome rearrangements. Algorithms Mol. Biol. 13(1), 9 (2018)
Article Google Scholar
Swenson, K.M., Simonaitis, P., Blanchette, M.: Models and algorithms for genome rearrangement with positional constraints. Algorithms Mol. Biol. 11(1), 13 (2016)
Article Google Scholar
Veron, A., Lemaitre, C., Gautier, C., Lacroix, V., Sagot, M.-F.: Close 3D proximity of evolutionary breakpoints argues for the notion of spatial synteny. BMC Genomics 12(1), 303 (2011)
Article Google Scholar
Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)
Article Google Scholar

Download references

Acknowledgments

This work is partially supported by the IBC (Institut de Biologie Computationnelle) (ANR-11-BINF-0002), by the Labex NUMEV flagship project GEM, and by the CNRS project Osez l’Interdisciplinarité.

Author information

Authors and Affiliations

LIRMM, CNRS – Université Montpellier, 161 rue Ada, 34392, Montpellier, France
Pijus Simonaitis, Annie Chateau & Krister M. Swenson
Institut de Biologie Computationnelle (IBC), Montpellier, France
Annie Chateau & Krister M. Swenson

Authors

Pijus Simonaitis
View author publications
You can also search for this author in PubMed Google Scholar
Annie Chateau
View author publications
You can also search for this author in PubMed Google Scholar
Krister M. Swenson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krister M. Swenson .

Editor information

Editors and Affiliations

McGill University, Montréal, QC, Canada
Mathieu Blanchette
Université de Sherbrooke, Sherbrooke, QC, Canada
Aïda Ouangraoua

A Proofs

1.1 A.1 Lemma 1

Lemma

The minimum length of a 2-break scenario on a graph G is $d_{2b} (G) = e(G) - c(G)$.

Proof

A 2-break can increase the size of a MAECD by at most 1 and the size of a MAECD of a terminal graph is e(G). This leads to an inequality $d_{2b} (G)\ge e(G)-c(G)$.

In this paragraph the length of a cycle will be its number of black edges. For any cycle c of length $l > 1$ there is a 2-break transforming c into a union of length 1 and length $l-1$ cycles. This way we obtain a scenario of length $l-1$ for c, and can transform every cycle of a MAECD of G independently, obtaining a 2-break scenario of length $e(G)-c(G)$. Thus, $d_{2b} (G)\le e(G)-c(G)$. $\square $

1.2 A.2 Lemma 2

Lemma

The minimum length of a DCJ scenario transforming genome A into B is equal to $d_{2b} (G(A,B)) = e(G(A,B)) - c(G(A,B))$.

Proof

G(A, B) is constructed in such a way that for every DCJ $A\rightarrow A'$ the transformation $G(A,B)\rightarrow G(A',B)$ is a 2-break. Notably, a DCJ $\{a,b\}\rightarrow \{a\},\{b\}$ results in a transformation $\{a,b\},\{\circ ,\circ \}\rightarrow \{a,\circ \},\{b,\circ \}$, as the construction of a breakpoint graph guarantees that there are enough black loops $\{\circ ,\circ \}$ to realize such a 2-break. For any 2-break $G(A,B)\rightarrow G'$ with $G'\ne G(A,B)$ there exists a DCJ $A\rightarrow A'$ such that $G(A',B)=G'$. Since G(B, B) is terminal, it follows that the minimum length of a scenario transforming A into B is $d_{2b} (G(A,B))$ and we conclude using Lemma 1. $\square $

1.3 A.3 Theorem 1

Theorem

If $\mathcal {D} (G,\rho )$ has k connected components then $\rho $ can be partitioned into k subscenarios $\rho ^{i}$ and G can be partitioned into k edge-disjoint Eulerian subgraphs $H^{i}$ in such a way that $\rho ^{i}$ is a scenario for $H^{i}$ for every $i\in \{1,\ldots ,k\}$. If $\rho $ is parsimonious, then $k=c(G)$ and $C(\rho ) = \{H^{1}, \ldots , H^{k}\}$ is a MAECD of G.

Proof

Take a connected component C of $\mathcal {D} (G,\rho )$. It has an equal number of vertices of indegree 0 and vertices of outdegree 0. Its edges incident to the vertices of indegree 0 are labeled with the black edges of G and its edges incident to the vertices of outdegree 0 are labeled with the gray edges of G. Together these labels define a subgraph H of G that we will prove to be Eulerian.

Define $C_{l}$ to be a subgraph of $\mathcal {D} (G,\rho _{l})$ consisting of its connected components containing the vertices of indegree 0 of C. This way $C_{m}=C$. Define $H_{l}$ to be a subgraph of $G_{l}$ containing the gray edges of H and the black edges of $G_{l}$ labeling the edges of $C_{l}$ incident to the vertices of outdegree 0. This way $H_{0}=H$ and $H_{m}$ is a terminal graph.

We prove that H is Eulerian by induction. $H_{m}$ is Eulerian as it is terminal. Suppose that $H_{l}$ is Eulerian. By construction the two edges of $G_{l}$ replaced by the l-th 2-break of $\rho $ either both belong to $H_{l-1}$ or both are outside of $H_{l-1}$. In the first case, $H_{l}$ is obtained from $H_{l-1}$ via a 2-break and as $H_{l}$ is Eulerian this means that $H_{l-1}$ is also Eulerian. In the second case, $H_{l}=H_{l-1}$, thus the latter stays Eulerian. Thus $H=H_{0}$ is Eulerian and we obtain a subsequence of $\rho $ that is a scenario for H.

$\mathcal {D} (G,\rho _0)$ has e(G) connected components. The l-th 2-break of $\rho $ merges two vertices of $\mathcal {D} (G,\rho _{l-1})$, thus reduces the number of the connected components by at most 1. This means that the number k of the connected components of $\mathcal {D} (G,\rho )$ is greater or equal to $e(G)-m$.

If $\rho $ is parsimonious, then its length m is $e(G) - c(G)$ using Lemma 1. This means that $k\ge c(G)$ and G can be partitioned into k edge-disjoint Eulerian subgraphs. Due to the maximality of c(G), we have that $k=c(G)$ and all of the obtained edge-disjoint Eulerian subgraphs of G are simple cycles. $\square $

1.4 A.4 Lemma 3

Lemma

${\textsc {MCPS}} _{\varphi }(S,\lambda )=\text {min}\{{\textsc {MCPS}} _{\hat{\varphi }}(\hat{S},\hat{\lambda })|~(\hat{S},\hat{\lambda })\in ~S_{1}\}$

Proof

For a labeled graph $(H,\mu )$ on vertices $\hat{V}$ we denote $r(H,\mu )$ as the labeled graph obtained from $(H,\mu )$ by merging the two vertices that were split in S. For $(\hat{S},\hat{\lambda })\in ~S_{1}$ we have $r(\hat{S},\hat{\lambda })=(S,\lambda )$ by construction. An operation in $\hat{\mathcal {O}}$ transforms $(\hat{S},\hat{\lambda })$ into such $(\hat{S}',\hat{\lambda }')$ that there exists unique operation in $\mathcal {O} $ of the same cost transforming $(S,\lambda )$ into $r(\hat{S}',\hat{\lambda }')$. This leads to an observation that for an $\hat{\mathcal {O}}$-scenario for $(\hat{S},\hat{\lambda })$ there exists an $\mathcal {O} $-scenario of the same cost and the same 2-break-length for $(S,\lambda )$.

On the other hand, for an operation in $\mathcal {O} $ transforming $(S,\lambda )$ into $(S',\lambda ')$ there exists an operation in $\hat{\mathcal {O}}$ of the same cost transforming every $(\hat{S},\hat{\lambda })\in S_{1}$ into $(\hat{S}',\hat{\lambda }')$ such that $r(\hat{S}',\hat{\lambda }'_{S})=(S',\lambda ')$. This leads to an observation that an $\mathcal {O} $-scenario for $(S,\lambda )$ provides us with a sequence $\hat{\rho }_{\hat{\mathcal {O}}}$ of $\hat{\mathcal {O}}$ operations of the same cost and 2-break-length transforming every $(\hat{S},\hat{\lambda })\in S_{1}$ into such $(\overline{\hat{S}},\overline{\hat{\lambda }})$ for which $r(\overline{\hat{S}},\overline{\hat{\lambda }})$ is a terminal graph with equal multi-sets of labeled gray and black edges. As the later graph is obtained by merging two vertices of degree one of the former, we know that its structure is as well fairly simple. We can check all the possible cases by hand and show that there is $(\hat{S},\hat{\lambda })\in S_{1}$ such that $(\overline{\hat{S}},\overline{\hat{\lambda }})$ is itself a terminal graph with equal multi-sets of labeled gray and black edges.

If $S_{1}$ is of size 1, then there is a single choice for $(\overline{\hat{S}},\overline{\hat{\lambda }})$ such that $r(\overline{\hat{S}},\overline{\hat{\lambda }})$ is a terminal graph with equal multi-sets of labeled gray and black edges (see the right upper corner of Fig. 4). If $S_{1}$ is of size 2, then there are more cases, but they are all easy to check and one of them is given in the right bottom corner of Fig. 4. $\square $

1.5 A.5 Lemma 4

Lemma

For a function f and an O(f(r)) time algorithm for $\varphi $-MCPS on a labeled circle on r vertices, there exists an ${O(p^2f(P)+p^3+f(n))}$ time algorithm for $\varphi $-MCPS on a labeled breakpoint graph. If $f(r)=O(r^t)$ for some constant $t\ge 1$, then $\varphi $-MCPS on a labeled breakpoint graph can be solved in $O(pP^t+p^3+n^t)$ time.

Proof

The $p^2$ edges of a bipartite graph H can be weighted in $O(p^2f(P))$ time due to Theorem 3 and the fact that the simple cycles of G(A, B) have at most 1 vertex of degree 2. A minimum weight maximum matching of H can be found in $O(p^3)$ time using the Hungarian algorithm. Finally, ${\textsc {MCPS}} _{\varphi }$ for the labeled circles in G(A, B) can be computed in O(f(n)) time. Combining these results we obtain an $O(p^{2}f(P)+p^3+f(n))$ time algorithm for computing ${\textsc {MCPS}} _{\varphi }(G(A,B),\lambda )$.

Now suppose that $f(r)=O(r^t)$ for some constant $t\ge 1$. Let $a_{1},\ldots ,a_{p}$ and $b_{1},\ldots ,b_{p}$ denote the number of edges in AA and BB paths with $\sum _{i=0}^{p}a_{i}=P_{A}$, $\sum _{j=0}^{p}b_{j}=P_{B}$ and $P=P_{A}+P_{B}$.

${\textsc {MCPS}} _{\varphi }$ for a union of an AA path and a BB path having a and b edges respectively can be obtained by computing ${\textsc {MCPS}} _{\varphi }$ for at most two circles on $a+b$ vertices due to Theorem 3. This can be done in less than $c(a+b)^{t}$ steps for some constant c using the $O(r^t)$ time algorithm for computing ${\textsc {MCPS}} _{\varphi }$ for a circle. ${\textsc {MCPS}} _{\varphi }$ for every pair of AA and BB paths of $G(A,B)'$ can be computed in a number of steps bounded by:

$$\begin{aligned}&\sum _{i=0}^{p}\sum _{j=0}^{p}c(a_{i}+b_{j})^{t}=c\sum _{i=0}^{p}\sum _{j=0}^{p}\sum _{l=0}^{t}\left( {\begin{array}{c}t\\ l\end{array}}\right) a_{i}^{l}b_{j}^{t-l}=c\sum _{l=0}^{t}\left( {\begin{array}{c}t\\ l\end{array}}\right) \sum _{i=. For0}^{p}\sum _{j=0}^{p}a_{i}^{l}b_{j}^{t-l}\\&=c\sum _{j=0}^{p}\sum _{i=0}^{p}b_{j}^{t}+c\sum _{i=0}^{p}\sum _{j=0}^{p}a_{i}^{t}+c\sum _{l=1}^{t-1}\left( {\begin{array}{c}t\\ l\end{array}}\right) \sum _{i=0}^{p}a_{i}^{l}\sum _{j=0}^{p}b_{j}^{t-l}\\&=cp\sum _{j=0}^{p}b_{j}^{t}+cp\sum _{i=0}^{p}a_{i}^{t}+c\sum _{l=1}^{t-1}\left( {\begin{array}{c}t\\ l\end{array}}\right) \sum _{i=0}^{p}a_{i}^{l}\sum _{j=0}^{p}b_{j}^{t-l}\\&\le cp(\sum _{j=0}^{p}b_{j})^{t}+cp(\sum _{i=0}^{p}a_{i})^{t}+c\sum _{l=1}^{t-1}\left( {\begin{array}{c}t\\ l\end{array}}\right) (\sum _{i=0}^{p}a_{i})^{l}(\sum _{j=0}^{p}b_{j})^{t-l}\\&\le c(pP_{B}^{t}+p P_{A}^{t})+p c\sum _{l=1}^{t-1}\left( {\begin{array}{c}t\\ l\end{array}}\right) P_{B}^{t-l}P_{A}^{l}=cp(P_{B}+P_{A})^t=cpP^t \end{aligned}$$

Thus, the weighting of H can be performed in $O(pP^{t})$ time. This provides us with an $O(pP^t+p^3+n^t)$ time algorithm for computing ${\textsc {MCPS}} _{\varphi }(G(A,B),\lambda )$. $\square $

1.6 A.6 Theorem 5

Theorem

For a constant $t\ge 2$ and an $O(r^t)$ time $\alpha $-approximation algorithm for $\varphi $-MCPS on a labeled circle on r vertices, there exists an $O(n^{t+1})$ time $\alpha $-approximation algorithm for $\varphi $-MCPS on a labeled breakpoint graph.

Proof

In Theorem 3, ${\textsc {MCPS}} _{\varphi }$ on a simple cycle is expressed as the minimum of the ${\textsc {MCPS}} _{\varphi }$ for a set of corresponding circles. In Theorem 2, ${\textsc {MCPS}} _{\varphi }$ on a graph is expressed as the minimum of the sums of the ${\textsc {MCPS}} _{\varphi }$ for the simple cycles. We prove an auxiliary lemma establishing the following:

1.
An $\alpha $-approximation for ${\textsc {MCPS}} _{\varphi }$ on a simple cycle can be obtained by taking the minimum of the $\alpha $-approximations for the corresponding circles.
2.
An $\alpha $-approximation for ${\textsc {MCPS}} _{\varphi }$ on a graph can be obtained by taking the minimum of the sums of the $\alpha $-approximations for ${\textsc {MCPS}} _{\varphi }$ on the simple cycles.

Lemma

Take $k\in \mathbb {N}$ and two sets of positive numbers $\{q_{1}^{*},\ldots , q_{k}^{*}\}$ and $\{q_{1},\ldots , q_{k}\}$ with $q_{i}\le \alpha q_{i}^{*}$ for every i. The following inequalities hold:

1.
$min\{q_{i}|i\in \{1,\ldots ,k\}\}\le \alpha \text {min}\{q_{i}^{*}|i\in \{1,\ldots ,k\}\}$
2.
$\sum _{i=0}^{k} q_{i}\le \alpha \sum _{i=0}^{k} q_{i}^{*}$

Proof

Take u and v such that $q_{u}^{*}=min\{q_{i}^{*}|i\in \{1,\ldots ,k\}\}$ and $q_{v}=min\{q_{i}|i\in \{1,\ldots ,k\}\}$. By construction $q_{v}\le q_{u}\le \alpha q_{u}^{*}$ which proves the first inequality. For the second inequality it suffice to observe that $\sum _{i=0}^{k} q_{i}\le \sum _{i=0}^{k} \alpha q_{i}^{*}=\alpha \sum _{i=0}^{k} q_{i}^{*}$ $\square $

A simple cycle of a breakpoint graph has at most one vertex of degree 2. This means that it has at most two corresponding circles (see Theorem 6). Taking the minimum of the $\alpha $-approximations for ${\textsc {MCPS}} _{\varphi }$ on these circles provides us with an $\alpha $-approximation for the simple cycle due to Theorem 6 and the first part of the lemma above. This way we obtain an $\alpha $-approximation algorithm for $\varphi $-MCPS on a simple cycle of a breakpoint graph that runs in $O(r^t)$ time where r is the number of the vertices in the simple cycle.

We can reuse the structure of a bipartite graph H presented in Sect. 7 with the weights of the edges now being the $\alpha $-approximations for the ${\textsc {MCPS}} _{\varphi }$ on the corresponding simple cycles. Following the same reasoning as in Sect. 7, we know that the minimum cost maximum matching of H leads to a MAECD of a breakpoint graph minimizing the sum of the $\alpha $-approximations for the ${\textsc {MCPS}} _{\varphi }$ on its simple cycles. Combining Theorem 2, both parts of the lemma above, and the proof of Lemma 4, we obtain an $O(n^{t+1})$ time $\alpha $-approximation algorithm for $\varphi $-MCPS on a breakpoint graph. $\square $

1.7 A.7 Lemma 5

Lemma

If $\rho _{\mathcal {O}}$ is a minimum 2-break-length $\mathcal {O} $-scenario for a labeled circle $(O,\lambda )$, then $\mathcal {T} (\rho _{\mathcal {O}})$ is a planar tree on $(O,\lambda )$. In addition to that, for a planar tree $\mathcal {T} $ on $(O,\lambda )$ there exists an $\mathcal {O} $-scenario $\rho _{\mathcal {O}}$ such that $\mathcal {T} (\rho _{\mathcal {O}})=\mathcal {T} $.

Proof

We prove the first statement by induction. It is trivially true if O has 2 vertices. We suppose it to be true for all the circles having less than 2l vertices and prove it for a circle having 2l vertices. Fix a minimum 2-break-length scenario $\rho _{\mathcal {O}}$. Its length is $l-1$ due to Lemma 1. The first labeled 2-break of $\rho _{\mathcal {O}}$ transforms $(O,\lambda )$ into two vertex disjoint labeled circles $(O_{1},\lambda _1)$ and $(O_{2},\lambda _2)$ both having less vertices than O. The rest of the scenario $\rho _{\mathcal {O}}$ can be partitioned into $\rho _{\mathcal {O}}^{1}$ acting on the edges of $O_{1}$ and $\rho _{\mathcal {O}}^{2}$ acting on the edges of $O_{2}$. As $\rho _{\mathcal {O}}$ is a minimum 2-break-length scenario, $\rho _{\mathcal {O}}^{1}$ and $\rho _{\mathcal {O}}^{2}$ must also be minimum 2-break-length scenarios. By the inductive hypothesis, $\mathcal {T} (\rho _{\mathcal {O}}^{1})$ and $\mathcal {T} (\rho _{\mathcal {O}}^{2})$ are planar trees on $(O_{1},\lambda _{1})$ and $(O_{2},\lambda _{2})$ respectively. $\mathcal {T} (\rho _{\mathcal {O}})$ can be easily obtained from $\mathcal {T} (\rho _{\mathcal {O}}^{1})$ and $\mathcal {T} (\rho _{\mathcal {O}}^{2})$ by taking the union of their edges and adding an edge corresponding to the first 2-break of $\rho _{\mathcal {O}}$. This way we obtain a planar tree $\mathcal {T} (\rho _{\mathcal {O}})$ on $(O,\lambda )$ proving the first statement of the lemma.

Now define the distance of an edge $\{x,y\}$ in $\mathcal {T} $ as the minimum number of vertices between x and y in the fixed circular embedding of $\mathcal {T} $. For example, in the rightmost tree on the top of Fig. 5 the distance of the edge $\{w,z\}$ is one, because t is in between w and z, while the distance of the edge $\{x,y\}$ is 0. An edge is said to be short if its distance is 0. We prove an auxiliary lemma.

Lemma

A planar tree $\mathcal {T} $ on $(O,\lambda )$ has a short edge incident to a leaf.

Proof

Choose a leaf x in $\mathcal {T} $ incident to an edge of the minimum distance d. If $d\ne 0$, then in between the leaf and the vertex that it is adjacent to, there are d other vertices. Since $\mathcal {T} $ is planar on $(O,\lambda )$, it is easy to see that there is at least one other leaf among these d vertices, which contradicts the minimality of x. $\square $

Now take a short edge $\{x,y\}$ incident to a leaf x in $\mathcal {T} $. Take the black edges $\{a,b\}$ and $\{c,d\}$ in $(O,\lambda )$ labeled with x and y respectively and separated by a gray edge $\{b,c\}$. Perform a labeled 2-break $(\{b,a\},x),(\{c,d\},y)\rightarrow (\{b,c\},x),(\{a,d\},y)$. This 2-break results in two labeled circles. One of them is a terminal graph having two edges $\{b,c\}$ with the black one labeled with x. Remove the edge $\{x,y\}$ from $\mathcal {T} $. This way we have reduced the size of the problem. The number of the vertices in the circle was reduced by two and the number of the edges in the tree was reduced by 1. We iterate this procedure to construct a required scenario. See the bottom part of Fig. 5 for an example. $\square $

1.8 A.8 Lemma 6

Lemma

A minimum cost planar tree on a circle can be found in $O(r^4)$ time, where r is the number of vertices of a tree.

Proof

Farnoud and Milenkovic pose the problem of sorting permutations by cost-constrained mathematical transpositions (a sorting scenario is called a decomposition) [14]. They define a cost function on the set of transpositions and treat the problem, called MIN-COST-MLD, of finding a minimum cost decomposition among the minimum length transposition decompositions of a permutation. They reduce this problem to finding a minimum cost planar tree on a circle, and propose the following $O(r^4)$ time dynamic programming algorithm for a tree having r vertices.

Enumerate the vertices 1 to r while respecting their order on the circle. Define cost(i, j) as the minimum cost of a planar tree on the vertices $\{i,\ldots , j\}$ for $1\le i<j\le r$ and set $cost(i,i)=0$ for $1\le i\le r$.

Take a planar tree $\mathcal {T} $ on the vertices $\{1,\ldots , r\}$. If $deg(1)=1$ and 1 is on the edge $\{1,q\}$, then the cost of $\mathcal {T} $ is equal to $\varPhi (1,q)$ plus the costs of the subgraphs of $\mathcal {T} $ induced by the vertices $\{2,\ldots ,q\}$ and $\{q+1,\ldots , r\}$. If $deg(1)>1$, then take $q=\text {max}(\{u|\{1,u\}\text { belongs to }\mathcal {T} \})$ and $s=\text {max}(\{u|$ there is a path in $\mathcal {T} $ joining 1 and u but not visiting $q\})$. The cost of $\mathcal {T} $ is equal to $\varPhi (1,q)$ plus the costs of the subgraphs of $\mathcal {T} $ induced by the vertices $\{1,\ldots ,s\}$, $\{s+1,\ldots , q\}$ and $\{q,\ldots ,r\}$. This observation provides us with the following equality:

$$\begin{aligned} cost(i,j)=\text {max}(cost(i,s)+cost(s+1,q)+cost(q,j)+\varPhi (i,q)|\text { }i\le s<q\le j) \end{aligned}$$

for $1\le i<j\le r$, that leads to an $O(r^4)$ time dynamic programming algorithm for finding cost(1, r). $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Simonaitis, P., Chateau, A., Swenson, K.M. (2018). A General Framework for Genome Rearrangement with Biological Constraints. In: Blanchette, M., Ouangraoua, A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science(), vol 11183. Springer, Cham. https://doi.org/10.1007/978-3-030-00834-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-00834-5_3
Published: 08 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00833-8
Online ISBN: 978-3-030-00834-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A General Framework for Genome Rearrangement with Biological Constraints

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A general framework for genome rearrangement with biological constraints

Models and Algorithms for Genome Rearrangement with Positional Constraints

Models and algorithms for genome rearrangement with positional constraints

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proofs

A Proofs

1.1 A.1 Lemma 1

Lemma

Proof

1.2 A.2 Lemma 2

Lemma

Proof

1.3 A.3 Theorem 1

Theorem

Proof

1.4 A.4 Lemma 3

Lemma

Proof

1.5 A.5 Lemma 4

Lemma

Proof

1.6 A.6 Theorem 5

Theorem

Proof

Lemma

Proof

1.7 A.7 Lemma 5

Lemma

Proof

Lemma

Proof

1.8 A.8 Lemma 6

Lemma

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation