Abstract
The codeword design problem is an important problem in DNA computing and its applications. Several theoretical analyses as well as practical solutions for short oligonucleotides (up to 20-mers) have been generated recently. These solutions have, in turn, suggested new applications to DNA-based indexing and natural language processing, in addition to the obvious applications to the problems of reliability and scalability that generated them. Here we continue the exploration of this type of DNA-based indexing for biological applications and show that DNA noncrosshybridizing (nxh) sets can be successfully applied to infer ab initio phylogenetic trees by providing a way to measure distances among different genomes indexed by sets of short oligonucleotides selected so as to minimize crosshybridization. These phylogenies are solidly established and well accepted in biology. The new technique is much more effective in terms of signal-to-noise ratio, cost and time than current methods. Second, it is demonstrated that DNA indexing does provide novel and principled insights into the phylogenesis of organisms hitherto inaccessible by current methods, such as a prediction of the origin of the Salmonella plasmid 50 as being acquired horizontally, likely from some bacteria somewhat related to Yesinia. Finally, DNA indexing can be scaled up to newly available universal DNA chips readily available both in vitro and in silico. In particular, we show how a recently obtained such set of nxh 16-mers can be used as a universal coordinate system in DNA spaces to characterize very large groups (families, genera, and even phylla) of organisms on a uniform biomarker reference system, a veritable and comprehensive “Atlas of Life”, as it is or as it could be on earth.
Similar content being viewed by others
References
Adleman L (1994) Molecular computation of solutions of combinatorial problems. Science 266:1021–1024
Bi H, Chen J, Deaton R, Garzon M, Rubin H, Wood DH (2003) A PCR protocol for in vitro selection of non-crosshybridizing oligonucleotides. J Nat Comput 2(3):417–426
Blain D, Garzon MH, Shin SY, Zhang BT, Kashiwamura S, Yamamoto M, Kameda A, Ohuchi A (2004) Development, evaluation and benchmarking of simulation software for biomolecule-based computing. J Nat Comput 3(4):427–442
Bobba KC, Neel AJ, Phan V, Garzon MH (2006) “Reasoning” and “Talking” DNA: can DNA understand English? In: Mao C, Yokomori S (eds) 12th International Conference on DNA Computing DNA12, Lecture notes in computer science 4287. Springer-Verlag, pp 337–349
Brown JR, Volker C (2004) Phylogeny of gamma-proteobacteria: resolution of one branch of the universal tree? Bioassay 26:463–468
Chen J, Deaton R, Garzon M, Wood DH, Bi H, Carpenter D, Wang YZ (2006) Characterization of non-crosshybridizing DNA oligonucleotides manufactured in vitro. J Nat Comput 5(2):165–181
DasGupta KM, Konwar II, Shvartsman AA (2005) Highly scalable algorithms for robust string barcoding. Int J Bioinform Res Appl 1:2
Deaton J, Chen J, Garzon M, Wood DH (2006) Test tube selection of large independent sets of DNA oligonucleotides R. World Scientific Publishing, Singapore pp 152–166 (Volume dedicated to Ned Seeman on occasion of his 60th birthday)
Garzon MH, Yan H (eds) (2008) DNA computing 13. In: Proceedings of 13th International Meeting. Lecture notes in computer science, vol 4848. Springer-Verlag, Heidelberg
Garzon MH, Blain D, Neel AJ (2004a) Virtual test tubes for biomolecular computing. J Nat Comput 3(4):461–477
Garzon M, Bobba KV, Hyde B (2004b) Digital information encoding on DNA. Lecture notes in computer science, vol 2950. Springer, Heidelberg, pp 152–166
Garzon MH, Bobba K, Phan V, Kontham R (2005) Sensitivity and capacity of microarray encodings. In: Carbone A, Pierce NA (eds) 11th International Conference on DNA Computing DNA 11. Lecture notes in computer science, vol 3892. Springer-Verlag, Heidelberg, pp 81–95
Garzon MH, Phan V, Roy S, Neel AJ (2006) In search of optimal codes for DNA-computing. In: Mao C, Yokomori S (eds) 12th International Conference on DNA Computing DNA12. Lecture notes in computer science, vol 4287. Springer-Verlag, Heidelberg, pp 143–156
Garzon MH, Phan V, Neel A (2009) Optimal codes for computing and self-assembly. Int J Nanotechnol Mol Comput 1:1–17
Hennig W (1950) Grundzüge einer Theorie der Phylogenetischen Systematik English revision, Phylogenetic Systematics. (trans: Davis D, Zangerl R). University of Illinois Press, Urbana 1966 (reprinted 1979)
Henz SR, Huson DH, Auch AF, Nieselt-Struwe K, Schuster SC (2005) Whole-genome prokaryotic phylogeny. Bioinformatics 21(10):2329–2335
Liu TT, Lee REB, Barker KS, Lee RE, Wei L, Homayouni R, Rogers PD (2005) Genome-wide expression profiling of the response to azole, polyene, echinocandin, and pyrimidine antifungal agents in Candida albicans. Antimicrob Agents Chemother 49(6):2226–2236
Margulis L (1993) Symbiosis in cell evolution, 2nd edn. Freeman, New York
Neel A, Garzon M (2006) Semantic retrieval in DNA-based memories with Gibbs energy models. Biotechnol Prog 22(1):86–90
Neel AJ, Garzon MH (2008) DNA-based memories: a survey. Stud Comput Intell 113:259–275
Ochman H, Elwyn S et al (1999) Calibrating bacterial evolution. Proc Natl Acad Sci USA 96(22):12638–12643
Paulsson J, Chattoraj DK (2006) Origin inactivation in bacterial DNA replication control. Mol Microbiol 61(1):9–15
Qiu Q, Mukre P, Bishop M, Bruns D, Wu Q (2008) Hardware accelerator for thermodynamic constrained DNA code generation. In: Garzon MH, Yan H (eds). Lecture notes in computer science, vol 4848. Springer, Heidelberg, pp 201–210
Reif JH, LaBean TM, Pirrung M, Rana VS, Guo B, Kingsfor C, Wickman GS (2001) Experimental construction of very-large scale databases with associative search capability. In: Proceedings of the 7th international workshop on DNA-based computers. Lecture notes in computer science, vol 2340. Springer-Verlag, Heidelberg, pp 231–247
Seeman N (2003) DNA in a material world. Nature 421:427–431
Stekel D (2003) Microarray bioinformatics. Cambridge University Press, Cambridge
Tulpan D, Andronescu M, Chang SB, Shortreed MR, Condon A, Hoos HH, Smith LM (2005) Thermodynamically based DNA strand design. Nucleic Acids Res 33(15):4951–4964
Volff JN, Altenbuchner J (2000) A new beginning with new ends: linearisation of circular chromosomes during bacterial evolution. FEMS Microbiol Lett 186(2):143–150
Watkins NE, SantaLucia J Jr (2005) Nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes. Nucleic Acids Res 33(19):6258–6267
Winfree E, Liu F, Wenzler LA, Seeman NC (1998) Design and self-assembly of two-dimensional DNA crystals. Nature 394:539–544
Woese C, Fox G (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 74:5088–5090
Wong TY, Fernandes S, Sankhon N, Leong PP, Kuo J, Liu JK (2008) Role of premature stop codons in bacterial evolution. J Bacteriol 190(20):6718–6725
Zhou F, Olman V, Xu Y (2008) Barcodes for genomes and applications. Bioinformatics 9:546
Acknowledgments
Many thanks to Abishek Logishetty and Jason Knisley for their help in producing the visualization of the signatures and trees above.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Garzon, M.H., Wong, TY. DNA chips for species identification and biological phylogenies. Nat Comput 10, 375–389 (2011). https://doi.org/10.1007/s11047-010-9232-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-010-9232-y