Skip to main content

Filling Scaffolds with Gene Repetitions: Maximizing the Number of Adjacencies

  • Conference paper
Combinatorial Pattern Matching (CPM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6661))

Included in the following conference series:

Abstract

In genome sequencing there is a trend not to complete the sequence of the whole genomes. Motivated by this Muñoz et al. recently studied the (one-sided) problem of filling an incomplete multichromosomal genome (or scaffold) H with respect to a complete target genome C such that the resulting genomic (or double-cut-and-join, DCJ for short) distance between H′ and C is minimized, where H′ is the corresponding filled scaffold. Jiang et al. recently extended this result to both the breakpoint distance and the DCJ distance and to the (two-sided) case when even C has some missing genes, and solved all these problems in polynomial time. However, when H and C contain duplicated genes, the corresponding breakpoint distance problem becomes NP-complete and there has been no efficient approximation or FPT algorithms for it. In this paper, we mainly consider the one-sided problem of filling scaffolds with gene repetitions so as to maximize the number of adjacencies between the two resulting sequences; namely, given an incomplete genome I and a complete genome G, both with gene repetitions, fill in the missing genes to obtain I′ such that the number of adjacencies between I′ and G is maximized. We prove that this problem is also NP-complete and present an efficient 1.33-approximation for the problem. The hardness result also holds for the two-sided problem for which a trivial factor-2 approximation exists. We also present FPT algorithms for some special cases of this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blin, G., Fertin, G., Sikora, F., Vialette, S.: The exemplar breakpoint distance for non-trivial genomes cannot be approximated. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 357–368. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Chain, P., Grafham, D., Fulton, R., Fitzgerald, M., Hostetler, J., Muzny, D., Ali, J., et al.: Genome project standards in a new era of sequencing. Science 326, 236–237 (2009)

    Article  Google Scholar 

  3. Chen, Z., Fu, B., Fowler, R., Zhu, B.: On the inapproximability of the exemplar conserved interval distance problem of genomes. J. Combinatorial Optimization 15(2), 201–221 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chen, Z., Fu, B., Xu, J., Yang, B., Zhao, Z., Zhu, B.: Non-breaking similarity of genomes with gene repetitions. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 119–130. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Chen, Z., Fu, B., Zhu, B.: The approximability of the exemplar breakpoint distance problem. In: Cheng, S.-W., Poon, C.K. (eds.) AAIM 2006. LNCS, vol. 4041, pp. 291–302. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Computing the assignment of orthologous genes via genome rearrangement. In: Proc. of the 3rd Asia-Pacific Bioinformatics Conf. (APBC 2005), pp. 363–378 (2005)

    Google Scholar 

  7. Chrobak, M., Kolman, P., Sgall, J.: The greedy algorithm for the minimum common string partition problem. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 84–95. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Damaschke, P.: Minimum Common String Partition Parameterized. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 87–98. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Downey, R., Fellows, M.: Parameterized Complexity. Springer, Heidelberg (1999)

    Book  MATH  Google Scholar 

  10. Flum, J., Grohe, M.: Parameterized Complexity Theory. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  11. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York (1979)

    MATH  Google Scholar 

  12. Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: Hardness and approximations. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 484–495. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint distance. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 83–92. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Jiang, H., Zhu, B., Zhu, D., Zhu, H.: Minimum common string partition revisited. In: Lee, D.-T., Chen, D.Z., Ying, S. (eds.) FAW 2010. LNCS, vol. 6213, pp. 45–52. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  15. Jiang, M.: The zero exemplar distance problem. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS(LNBI), vol. 6398, pp. 74–82. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Kaplan, H., Shafrir, N.: The greedy algorithm for edit distance with moves. Inf. Process. Lett. 97(1), 23–27 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Muñoz, A., Zheng, C., Zhu, Q., Albert, V., Rounsley, S., Sankoff, D.: Scaffold filling, contig fusion and gene order comparison. BMC Bioinformatics 11, 304 (2010)

    Article  Google Scholar 

  18. Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 16(11), 909–917 (1999)

    Article  Google Scholar 

  19. Tesler, G.: Efficient algorithms for multichromosomal genome rearrangements. J. Computer and System Sciences 65, 587–609 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  20. Watterson, G., Ewens, W., Hall, T., Morgan, A.: The chromosome inversion problem. J. Theoretical Biology 99, 1–7 (1982)

    Article  Google Scholar 

  21. Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21, 3340–3346 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, H., Zhong, F., Zhu, B. (2011). Filling Scaffolds with Gene Repetitions: Maximizing the Number of Adjacencies. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21458-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21457-8

  • Online ISBN: 978-3-642-21458-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics