Abstract
The breakpoint distance employed in comparative genomics is not a geodesic distance, which makes it difficult to study genomes (i.e. permutations) that are intermediate between two given genomes G and \(G'\). An intermediate genome, also called a geodesic point, is a genome whose sum of breakpoint distances to G and \(G'\) is equal to the breakpoint distance of G and \(G'\). To construct an intermediate genome M, it is necessary to find sets of gene adjacencies I and J selected from G and \(G'\) whose union forms M. This means that the set of adjacencies of M is \(I\cup J\). Any given set of adjacencies I selected from G may put some constraints on some adjacencies of \(G'\) so that they cannot be used in J to construct M or if they can, they must be used in specific ways. For instance, a gene adjacency of \(G'\) whose gene extremities are used in the “middle” of segments of I cannot be used to construct M. Based on these constraints, we classify the set of all adjacencies of \(G'\) with respect to I into four distinct groups. For two unichromosomal random genomes of the same gene-content, namely \(\xi _1\) and \(\xi _2\), as the number of genes tends to infinity, we study the limiting behaviour of the frequencies of adjacencies of each type in \(\xi _2\) with respect to a random or deterministic set of adjacencies selected from \(\xi _1\). We use the limiting results to provide necessary conditions for the size and the shape of the set of adjacencies selected from the first genome for the purpose of constructing an intermediate genome between \(\xi _1\) and \(\xi _2\). These results can help to shed light on how to construct “accessible breakpoint medians” far from the input genomes (corners).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Billingsley, P.: Probability and Measure, 3r edn. John Wiley & Sons, New York (1995)
Haghighi, M., Sankoff, D.: Medians seek the corners, and other conjectures. BMC Bioinform. 13(19), S5 (2012)
Jamshidpey, A.: Population dynamics in random environment, random walks on symmetric group, and phylogeny reconstruction. Ph.D. thesis, Université d’Ottawa/University of Ottawa (2016)
Jamshidpey, A., Jamshidpey, A., Sankoff, D.: Sets of medians in the non-geodesic pseudometric space of unsigned genomes with breakpoints. BMC Genomics 15(6), S3 (2014)
Kallenberg, O.: Foundations of Modern Probability. Springer, Cham (2006)
Larlee, C.A., Zheng, C., Sankoff, D.: Near-medians that avoid the corners; a combinatorial probability approach. BMC Genomics 15(6), S1 (2014)
Sankoff, D., Blanchette, M.: The median problem for breakpoints in comparative genomics. In: Jiang, T., Lee, D.T. (eds.) COCOON 1997. LNCS, vol. 1276, pp. 251–263. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0045092
Sankoff, D., Blanchette, M.: Multiple genome rearrangement and breakpoint phylogeny. J. Comput. Biol. 5(3), 555–570 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A: Proofs
Appendix A: Proofs
Here we provide the proofs of Lemma 1, Lemma 2, Theorem 1, Theorem 2, Theorem 3 and Lemma 3.
Proof of Lemma 1. Consider a segment set \(I=\{s_1,...,s_k\}\), with k non-empty segments and m adjacencies that is contained in \(x\in S_n\). Then \(|\Vert \overline{I}_{x}\Vert -k|\le 1\), and therefore we represent the segments of \(\overline{I}_{x}\) by \(s'_1,...,s'_{k+1}\), where \(s'_j\) is non-empty for \(2\le j\le k\), and \(s'_1\) and \(s'_{k+1}\) may be empty. Note that \(\sum _{i=1}^{k} |s_i|=m\) and \(\sum _{j=1}^{k+1} |s'_j|=n-1-m\) with \(|s_i|\ge 1\) for \(1\le i\le k\) and \(|s'_j|\ge 1\) for \(2\le j\le k\). Hence, the number of solutions for these two equations is equal to:
In other words, that is the number of ways we can choose k segments with m adjacencies of x. \(\square \)
Proof of Lemma 2. As the segment set I has m adjacencies and k segments, each permutation containing I has \(n-m-k\) external points with respect to I. Therefore, noting that segments have two directions, we have \(2^{k}(k+(n-m-k))!\) permutations containing I. \(\square \)
Proof of Theorem 1. As \(\alpha _{m,k}\) is independent of \(\mathcal {I}_{m,k}\) and \(\mathcal {L}(\mathcal {I}_{m,k})=\mathcal {L}(\mathcal {I}_m\mid \Vert \mathcal {I}_m\Vert =k)\), we have
So for the first part, we only need to compute \(\mathbbm {E}[\alpha _m\mid \Vert \mathcal {I}_m\Vert =k]\). To this end, note that there are \(n-m-k\) external points (gens), \(m-k\) internal points, and 2k end points in any segment set with m adjacencies and k segments. Sampling a random adjacency from \(\mathcal {I}_m\), conditional on \(\Vert \mathcal {I}_m\Vert =k\), the chance to have a 2-free-end, 1-free-end, trivial segment adjacency, is respectively
while the chance to have a 0-free-end adjacency is given by
Now, for \(i=1,...,n-1\), let \(\hat{\alpha }_{m,i}\) be a random variable such that \(\hat{\alpha }_{m,i}=1\) if the \(i^{th}\) adjacency of \(\xi \), i.e. \(\{\xi _i,\xi _{i+1}\}\), is 2-free-end w.r.t. \(\mathcal {I}_m\) and \(\hat{\alpha }_{m,i}=0\) otherwise. Then, for every \(i=1,...,n-1\), we have
implying that \(\mathbbm {E}[\alpha _m \mid \Vert \mathcal {I}_m\Vert =k]\) is equal to
The other conditional expected values in the statement of the theorem are computed similarly. For the second part of the theorem, averaging over the possible number of segment sets, we have
Since \(\Vert \mathcal {I}_m\Vert \sim H(n-1,m-n,m)\), its moments are given in (1). Therefore, after some simplification, we obtain
Similarly,
and
\(\square \)
Proof of Theorem 2. There are two options for choosing two adjacencies of \(\xi \). They are either consecutive, \(\{\xi _i,\xi _{i+1}\},\{\xi _{i+1},\xi _{i+2}\}\), or nonconsecutive, \(\{\xi _i,\xi _{i+1}\}\), \(\{\xi _{j},\xi _{j+1}\}\) for \(i+1< j\). If we select two consecutive adjacencies of \(\xi \) at random, the chances that both are 2-free end, both are 1-free end, and both are trivial segment adjacencies are respectively given by
while the chance that both are 0-free end is
Similarly, if we pick two nonconsecutive adjacencies of \(\xi \) at random, the chances that both are 2-free end, both are 1-free end, and both are trivial segment adjacencies are respectively given by
and finally the chance that both are 0-free end is readily obtained
Now, for the first part of the theorem, as before we only need to compute the left of
For \(i=1,\dots ,n-1\), recall the definition of \(\hat{\alpha }_{m,i}\) from the proof of Theorem 1, and let \(\hat{\alpha }_{m,k,i}\) be random variable such that \(\hat{\alpha }_{m,k,i}=1\) if the \(i^{th}\) adjacency of \(\xi \), i.e. \(\{\xi _i,\xi _{i+1}\}\), is 2-free-end w.r.t. \(\mathcal {I}_{m,k}\) and \(\hat{\alpha }_{m,k,i}=0\) otherwise. Then, for every \(i=1,...,n-1\)
Note that
and
Hence,
Exactly the same calculations give \(Var(\alpha (\xi ,I))\). Similarly we can compute \(Var(\beta _{m,k})=Var(\beta (\xi ,I))\), \(Var(\gamma _{m,k})=Var(\gamma (\xi ,I))\) and \(~Var(\delta _{m,k})=~ \) \(Var(\delta (\xi ,I))\). Now to compute \(Var(\alpha _m)\), write \(\mathbbm {E}[\alpha _m^2]\) as
Letting \(A_{m,k}=A_{m,k}^{(n)}:=\{\Vert \mathcal {I}_m^{(n)}\Vert =k\}\), note that
and
Therefore from (1)
In the same way, we can show that
and finally,
\(\square \)
Proof of Theorem 3. First observe that, by Theorem 1, as \(n\rightarrow \infty \),
Also, following Theorem 2, the variances of all these sequences converge to 0. Hence, the convergence in \(L^2\) and in probability holds. \(\square \)
Proof of Lemma 3. Suppose \(\{a,b\} \in F(x,I)\setminus \mathcal {A}_\pi \). As \(a,b\in Ext(I)\) and therefore the neighbours of a in \(\pi \) should be from set \(\mathcal {N}_x(a)\setminus \{b\}\) and the neighbours of b in \(\pi \) should be from set \(\mathcal {N}_x(b)\setminus \{a\}\), we have \(|\mathcal {N}_\pi (a)|,|\mathcal {N}_\pi (b)| \le 1\). But \(|\mathcal {N}_\pi (a)|\) and \(|\mathcal {N}_\pi (b)|\) cannot be 0, since in that case a or b cannot be connected to the rest of the numbers to construct \(\pi \), and therefore \(|\mathcal {N}_\pi (a)|=|\mathcal {N}_\pi (b)|=1\) which means that a and b are extremities of permutation \(\pi \), i.e. \(\{\pi _1,\pi _n\}=\{a,b\}\). In other words, there may exist at most one adjacency \(\{a,b\}\in F(x,I)\setminus \mathcal {A}_\pi \). This proves part (a). For part (b), suppose \(\pi '\in \overline{[id,x]}\) and there exists adjacency \(\{a,b\}\) such that \(\{a,b\}\in F(x,I)\setminus \mathcal A_{\pi '}\). As we showed above \(\{\pi '_1,\pi '_n\}=\{a,b\}\). Also, as a and b are connected in \(\pi '\) through a segment of \(\pi '\) containing at least one segment of I and this means that there exists at least one \({<}1,2{>}\)-adjacency (\({<}1,2{>}\)-segment) in the segment of \(\pi '\) connecting a to b, namely e, and hence e is not in F(x, I). Therefore, we can construct a new permutation \(\pi \) by cutting e in \(\pi '\) and joining a to b. This proves part (b). \(\square \)
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
da Silva, P.H., Jamshidpey, A., Sankoff, D. (2024). Sampling Gene Adjacencies and Geodesic Points of Random Genomes. In: Scornavacca, C., Hernández-Rosales, M. (eds) Comparative Genomics. RECOMB-CG 2024. Lecture Notes in Computer Science(), vol 14616. Springer, Cham. https://doi.org/10.1007/978-3-031-58072-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-58072-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58071-0
Online ISBN: 978-3-031-58072-7
eBook Packages: Computer ScienceComputer Science (R0)