Abstract
In genome sequencing there is a trend not to complete the sequence of the whole genomes. Motivated by this Muñoz et al. recently studied the (one-sided) problem of filling an incomplete multichromosomal genome (or scaffold) H with respect to a complete target genome C such that the resulting genomic (or double-cut-and-join, DCJ for short) distance between H′ and C is minimized, where H′ is the corresponding filled scaffold. Jiang et al. recently extended this result to both the breakpoint distance and the DCJ distance and to the (two-sided) case when even C has some missing genes, and solved all these problems in polynomial time. However, when H and C contain duplicated genes, the corresponding breakpoint distance problem becomes NP-complete and there has been no efficient approximation or FPT algorithms for it. In this paper, we mainly consider the one-sided problem of filling scaffolds with gene repetitions so as to maximize the number of adjacencies between the two resulting sequences; namely, given an incomplete genome I and a complete genome G, both with gene repetitions, fill in the missing genes to obtain I′ such that the number of adjacencies between I′ and G is maximized. We prove that this problem is also NP-complete and present an efficient 1.33-approximation for the problem. The hardness result also holds for the two-sided problem for which a trivial factor-2 approximation exists. We also present FPT algorithms for some special cases of this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blin, G., Fertin, G., Sikora, F., Vialette, S.: The exemplar breakpoint distance for non-trivial genomes cannot be approximated. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 357–368. Springer, Heidelberg (2009)
Chain, P., Grafham, D., Fulton, R., Fitzgerald, M., Hostetler, J., Muzny, D., Ali, J., et al.: Genome project standards in a new era of sequencing. Science 326, 236–237 (2009)
Chen, Z., Fu, B., Fowler, R., Zhu, B.: On the inapproximability of the exemplar conserved interval distance problem of genomes. J. Combinatorial Optimization 15(2), 201–221 (2008)
Chen, Z., Fu, B., Xu, J., Yang, B., Zhao, Z., Zhu, B.: Non-breaking similarity of genomes with gene repetitions. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 119–130. Springer, Heidelberg (2007)
Chen, Z., Fu, B., Zhu, B.: The approximability of the exemplar breakpoint distance problem. In: Cheng, S.-W., Poon, C.K. (eds.) AAIM 2006. LNCS, vol. 4041, pp. 291–302. Springer, Heidelberg (2006)
Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Computing the assignment of orthologous genes via genome rearrangement. In: Proc. of the 3rd Asia-Pacific Bioinformatics Conf. (APBC 2005), pp. 363–378 (2005)
Chrobak, M., Kolman, P., Sgall, J.: The greedy algorithm for the minimum common string partition problem. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 84–95. Springer, Heidelberg (2004)
Damaschke, P.: Minimum Common String Partition Parameterized. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 87–98. Springer, Heidelberg (2008)
Downey, R., Fellows, M.: Parameterized Complexity. Springer, Heidelberg (1999)
Flum, J., Grohe, M.: Parameterized Complexity Theory. Springer, Heidelberg (2006)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York (1979)
Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: Hardness and approximations. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 484–495. Springer, Heidelberg (2004)
Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint distance. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 83–92. Springer, Heidelberg (2010)
Jiang, H., Zhu, B., Zhu, D., Zhu, H.: Minimum common string partition revisited. In: Lee, D.-T., Chen, D.Z., Ying, S. (eds.) FAW 2010. LNCS, vol. 6213, pp. 45–52. Springer, Heidelberg (2010)
Jiang, M.: The zero exemplar distance problem. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS(LNBI), vol. 6398, pp. 74–82. Springer, Heidelberg (2010)
Kaplan, H., Shafrir, N.: The greedy algorithm for edit distance with moves. Inf. Process. Lett. 97(1), 23–27 (2006)
Muñoz, A., Zheng, C., Zhu, Q., Albert, V., Rounsley, S., Sankoff, D.: Scaffold filling, contig fusion and gene order comparison. BMC Bioinformatics 11, 304 (2010)
Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 16(11), 909–917 (1999)
Tesler, G.: Efficient algorithms for multichromosomal genome rearrangements. J. Computer and System Sciences 65, 587–609 (2002)
Watterson, G., Ewens, W., Hall, T., Morgan, A.: The chromosome inversion problem. J. Theoretical Biology 99, 1–7 (1982)
Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21, 3340–3346 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, H., Zhong, F., Zhu, B. (2011). Filling Scaffolds with Gene Repetitions: Maximizing the Number of Adjacencies. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-21458-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21457-8
Online ISBN: 978-3-642-21458-5
eBook Packages: Computer ScienceComputer Science (R0)