Abstract
Long DNA sequences have to be cut using restriction enzymes into small fragments whose lengths and/or nucleotide sequences can be analyzed by currently available technology. Cutting multiplecopies of the same long DNA sequence using different restriction enzymes yields many fragments with overlaps that allow the fragments to be assembled into the order as they appear on the original DNA sequence. This basic idea allows several NP-complete abstractions of the genome map assembly problem. However, it is not obvious which variation is computationally the best in practice. By extensive computer experiments, we show that in the average case the running time of a constraint automata solution of the big-bag matching abstraction increases linearly, while the running time of a greedy search solution of the shortest common superstring in an overlap multigraph abstraction increases exponentially with the size of real genome input data. Hence the first abstraction is much more efficient computationally for very large real genomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berg, J.M., Tymoczko, J.L., Stryer, L.: Biochemistry, 5th edn. W.H. Freeman, New York (2002)
Byon, J., Revesz, P.: DISCO: A Constraint Database with Sets. In: Kuper, G.M., Wallace, M. (eds.) CONTESSA-WS 1995 and CDB 1995. LNCS, vol. 1034, pp. 68–83. Springer, Heidelberg (1995)
Derrida, B., Fink, T.M.A.: Sequence determination from overlapping fragments: A simple model of whole-genome shotgun sequencing. Physical Review Letters 88(6), 68106 (2003)
DeWeerdt, S.E.: What’s a Genome? The Center for Advanced Genomics (2003)
Entrez Database, http://www.ncbi.nlm.nih.gov/entrez/
Gillett, W., Hanks, L., Wong, G.K.-S., Yu, J., Lim, R., Olsen, M.V.: Assembly of high-resolution restriction maps based on multiple complete digests of a redundant set of overlapping clones. Genomics 33, 389–408 (1996)
Green, E.D., Green, P.: Sequence-tagged site (STS) content mapping of human chromosomes: Theoretical considerations and early experiences. PCR Methods and Applications 1, 77–90 (1991)
Harley, E., Bonner, A.J.: A flexible approach to genome map assembly. In: Proc. International Symposium on Intelligent Systems for Molecular Biology, pp. 161–169. AAAI Press, Menlo Park (1994)
Harley, E., Bonner, A.J., Goodman, N.: Good maps are straight. In: Proc. 4th International Conference on Intelligent Systems for Molecular Biology, pp. 88-97 (1994)
Hoaglin, D.C., Mosteller, F., Tukey, J.W.: Understanding Robust and Exploratory Data Analysis. John Wiley, New York (1983)
Kahveci, T., Singh, A.K.: Genome on demand: Interactive substring searching. In: Proceedings of the Computational Systems Bioinformatics, IEEE Computer Society, Los Alamitos (2003)
Kahveci, T., Singh, A.K.: An interactive search technique for string databases. Technical Report 10, UCSB (2003)
Kanellakis, P., Kuper, G., Revesz, P.: Constraint query languages. Journal of Computer and System Sciences 51, 26–52 (1995)
Karp, R.M.: Mapping the genome: Some combinatorial problems arising in molecular biology. In: Proc. 25th ACM Symposium on Theory of Computing, pp. 278–285. ACM Press, New York (1993)
Kuper, G., Libkin, L., Paredaens, J.: Constraint Databases. Springer, Heidelberg (2000)
Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2, 231–239 (1988)
Mouse Genome Resources, http://www.ncbi.nlm.nih.gov/genome/guide/mouse/
Olson, M.V., Dutchik, J.E., Graham, M.Y., Brodeur, G.M., Helms, C., Frank, M., MacCollin, M., Scheinman, R., Frank, T.: Random-clone strategy for genomic restriction mapping in yeast. Genomics 83, 7826–7830 (1986)
Pop, M., Salzberg, S.L., Shumway, M.: Genome Sequence Assembly: Algorithms and Issues. Computer, 47–54 (July 2002)
Revesz, P.: Bioinformatics. In: Introduction to Constraint Databases, pp. 351–360. Springer, New York (2002)
Revesz, P.: Introduction to Constraint Databases. Springer, New York (2002)
Revesz, P.: Refining restriction enzyme genome maps. Constraints 2(3-4), 361–375 (1997)
Revesz, P.: The dominating cycle problem in 2-connected graphs and the matching problem for bag of bags are NP-complete. In: Proc. International Conference on Paul Erdos and His Mathematics, pp. 221–225 (1999)
Roberts, R.J.: REBASE: The Restriction Enzyme Database. New England Biolabs (2003), http://rebase.neb.com/rebase/rebase.html
Setubal, J., Meidanis, J.: Fragment Assembly of DNA. In: Introduction to Computational Molecular Biology, pp. 118–124. PWS Publishing, Boston (1997)
Tsur, S., Olken, F., Naor, D.: Deductive databases for genome mapping. In: Proc. NACLP Workshop on Deductive Databases (1993)
Veeramachaneni, V., Berman, P., Miller, W.: Aligning Two Fragmented Sequences. Discrete Applied Mathematics 127(1), 119–143 (2003)
Venter, J.C., Adams, M.D., Sutton, G.G., Kerlavage, A.R., Smith, H.O., Hunkapiller, M.: Shotgun sequencing of the human genome. Science 280, 1540–1542 (1998)
Voet, D., Voet, J.: Biochemistry, 3rd edn., vol. 1. John Wiley, New York (2003)
Weber, J.L., Myers, E.W.: Human whole-genome shotgun sequencing. Genome Research 7(5), 401–409 (1997)
Wong, G.K.-S., Yu, J., Thayer, E.C., Olson, M.V.: Multiple-complete-digest restriction fragment mapping: Generating sequence-ready maps for large-scale DNA sequencing. Proc. National Academy of Sciences, USA 94(10), 5225–5230 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ramanathan, V., Revesz, P. (2004). Constraint Database Solutions to the Genome Map Assembly Problem. In: Kuijpers, B., Revesz, P. (eds) Constraint Databases. CDB 2004. Lecture Notes in Computer Science, vol 3074. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25954-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-25954-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22126-5
Online ISBN: 978-3-540-25954-1
eBook Packages: Springer Book Archive