Skip to main content

Constraint Database Solutions to the Genome Map Assembly Problem

  • Conference paper
Constraint Databases (CDB 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3074))

Included in the following conference series:

  • 213 Accesses

Abstract

Long DNA sequences have to be cut using restriction enzymes into small fragments whose lengths and/or nucleotide sequences can be analyzed by currently available technology. Cutting multiplecopies of the same long DNA sequence using different restriction enzymes yields many fragments with overlaps that allow the fragments to be assembled into the order as they appear on the original DNA sequence. This basic idea allows several NP-complete abstractions of the genome map assembly problem. However, it is not obvious which variation is computationally the best in practice. By extensive computer experiments, we show that in the average case the running time of a constraint automata solution of the big-bag matching abstraction increases linearly, while the running time of a greedy search solution of the shortest common superstring in an overlap multigraph abstraction increases exponentially with the size of real genome input data. Hence the first abstraction is much more efficient computationally for very large real genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Berg, J.M., Tymoczko, J.L., Stryer, L.: Biochemistry, 5th edn. W.H. Freeman, New York (2002)

    Google Scholar 

  2. Byon, J., Revesz, P.: DISCO: A Constraint Database with Sets. In: Kuper, G.M., Wallace, M. (eds.) CONTESSA-WS 1995 and CDB 1995. LNCS, vol. 1034, pp. 68–83. Springer, Heidelberg (1995)

    Google Scholar 

  3. Derrida, B., Fink, T.M.A.: Sequence determination from overlapping fragments: A simple model of whole-genome shotgun sequencing. Physical Review Letters 88(6), 68106 (2003)

    Article  Google Scholar 

  4. DeWeerdt, S.E.: What’s a Genome? The Center for Advanced Genomics (2003)

    Google Scholar 

  5. Entrez Database, http://www.ncbi.nlm.nih.gov/entrez/

  6. Gillett, W., Hanks, L., Wong, G.K.-S., Yu, J., Lim, R., Olsen, M.V.: Assembly of high-resolution restriction maps based on multiple complete digests of a redundant set of overlapping clones. Genomics 33, 389–408 (1996)

    Article  Google Scholar 

  7. Green, E.D., Green, P.: Sequence-tagged site (STS) content mapping of human chromosomes: Theoretical considerations and early experiences. PCR Methods and Applications 1, 77–90 (1991)

    Google Scholar 

  8. Harley, E., Bonner, A.J.: A flexible approach to genome map assembly. In: Proc. International Symposium on Intelligent Systems for Molecular Biology, pp. 161–169. AAAI Press, Menlo Park (1994)

    Google Scholar 

  9. Harley, E., Bonner, A.J., Goodman, N.: Good maps are straight. In: Proc. 4th International Conference on Intelligent Systems for Molecular Biology, pp. 88-97 (1994)

    Google Scholar 

  10. Hoaglin, D.C., Mosteller, F., Tukey, J.W.: Understanding Robust and Exploratory Data Analysis. John Wiley, New York (1983)

    MATH  Google Scholar 

  11. Kahveci, T., Singh, A.K.: Genome on demand: Interactive substring searching. In: Proceedings of the Computational Systems Bioinformatics, IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  12. Kahveci, T., Singh, A.K.: An interactive search technique for string databases. Technical Report 10, UCSB (2003)

    Google Scholar 

  13. Kanellakis, P., Kuper, G., Revesz, P.: Constraint query languages. Journal of Computer and System Sciences 51, 26–52 (1995)

    Article  MathSciNet  Google Scholar 

  14. Karp, R.M.: Mapping the genome: Some combinatorial problems arising in molecular biology. In: Proc. 25th ACM Symposium on Theory of Computing, pp. 278–285. ACM Press, New York (1993)

    Google Scholar 

  15. Kuper, G., Libkin, L., Paredaens, J.: Constraint Databases. Springer, Heidelberg (2000)

    MATH  Google Scholar 

  16. Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2, 231–239 (1988)

    Article  Google Scholar 

  17. Mouse Genome Resources, http://www.ncbi.nlm.nih.gov/genome/guide/mouse/

  18. Olson, M.V., Dutchik, J.E., Graham, M.Y., Brodeur, G.M., Helms, C., Frank, M., MacCollin, M., Scheinman, R., Frank, T.: Random-clone strategy for genomic restriction mapping in yeast. Genomics 83, 7826–7830 (1986)

    Google Scholar 

  19. Pop, M., Salzberg, S.L., Shumway, M.: Genome Sequence Assembly: Algorithms and Issues. Computer, 47–54 (July 2002)

    Google Scholar 

  20. Revesz, P.: Bioinformatics. In: Introduction to Constraint Databases, pp. 351–360. Springer, New York (2002)

    Google Scholar 

  21. Revesz, P.: Introduction to Constraint Databases. Springer, New York (2002)

    MATH  Google Scholar 

  22. Revesz, P.: Refining restriction enzyme genome maps. Constraints 2(3-4), 361–375 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  23. Revesz, P.: The dominating cycle problem in 2-connected graphs and the matching problem for bag of bags are NP-complete. In: Proc. International Conference on Paul Erdos and His Mathematics, pp. 221–225 (1999)

    Google Scholar 

  24. Roberts, R.J.: REBASE: The Restriction Enzyme Database. New England Biolabs (2003), http://rebase.neb.com/rebase/rebase.html

  25. Setubal, J., Meidanis, J.: Fragment Assembly of DNA. In: Introduction to Computational Molecular Biology, pp. 118–124. PWS Publishing, Boston (1997)

    Google Scholar 

  26. Tsur, S., Olken, F., Naor, D.: Deductive databases for genome mapping. In: Proc. NACLP Workshop on Deductive Databases (1993)

    Google Scholar 

  27. Veeramachaneni, V., Berman, P., Miller, W.: Aligning Two Fragmented Sequences. Discrete Applied Mathematics 127(1), 119–143 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  28. Venter, J.C., Adams, M.D., Sutton, G.G., Kerlavage, A.R., Smith, H.O., Hunkapiller, M.: Shotgun sequencing of the human genome. Science 280, 1540–1542 (1998)

    Article  Google Scholar 

  29. Voet, D., Voet, J.: Biochemistry, 3rd edn., vol. 1. John Wiley, New York (2003)

    Google Scholar 

  30. Weber, J.L., Myers, E.W.: Human whole-genome shotgun sequencing. Genome Research 7(5), 401–409 (1997)

    Google Scholar 

  31. Wong, G.K.-S., Yu, J., Thayer, E.C., Olson, M.V.: Multiple-complete-digest restriction fragment mapping: Generating sequence-ready maps for large-scale DNA sequencing. Proc. National Academy of Sciences, USA 94(10), 5225–5230 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ramanathan, V., Revesz, P. (2004). Constraint Database Solutions to the Genome Map Assembly Problem. In: Kuijpers, B., Revesz, P. (eds) Constraint Databases. CDB 2004. Lecture Notes in Computer Science, vol 3074. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25954-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25954-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22126-5

  • Online ISBN: 978-3-540-25954-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics