Abstract
We explore the problem of designing oligonucleotides that help locate organisms along a known phylogenetic tree. We develop a suffix-tree based algorithm to find such short sequences efficiently. Our algorithm requires O(Nm) time and O(N) space in the worst case where m is the number of the genomes classified by the phylogeny and N is their total length. We implemented our algorithm and used it to find these discriminating sequences in both small and large phylogenies. We believe our algorithm will have wide applications including: high-throughput classification and identification, oligo array design optimally differentiating genes in gene families, and markers for closely related strains and populations. It will also have scientific significance as a new way to assess the confidence in a given classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Velculescu, V., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene expression. Science 270, 484–487 (1995)
Adams, M., Kelley, J., Gocayne, J., Dubnick, M., Polymeropoulos, M., Xiao, H., Merril, C.R., et al.: Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991)
Olson, M., Hood, L., Cantor, C., Botstein, D.: A common language for physical mapping of the human genome. Science 245, 1434–1435 (1989)
Hebert, P., Cywinska, A., Ball, S., de Waard, J.: Biological identifications through DNA barcodes. In: Proc. of the Royal Society of London, vol. 270, pp. 313–321 (2003)
Onodera, K., Melcher, U.: Viroligo: a database of virus-specific oligonucleotides. Nucl. Acids. Res. 30, 203–204 (2002)
Ashelford, K.E., Weightman, A.J., Fry, J.C.: Primrose: a computer program for generating and estimating the phylogenetic range of 16S rRNA oligonucleotide probes and primers in conjunction with the rdp-ii database. Nucl. Acids. Res. 30, 3481–3489 (2002)
Amann, R., Ludwig, W.: Ribosomal rna-targeted nucleic acid probes for studies in microbial ecology. FEMS Microbiology Reviews 24, 555–565 (2000)
Matveeva, O.V., Shabalina, S.A., Nemtsov, V.A., Tsodikov, A.D., Gesteland, R.F., Atkins, J.F.: hermodynamic calculations and statistical correlations for oligoprobes design. Nucl. Acids. Res. 31, 4211–4217 (2003)
Kaderali, L., Schliep, A.: Selecting signature oligonucleotides to identify organisms using DNA arrays. Bioinformatics 18, 1340–1349 (2002)
Frieze, A.M., Halldorsson, B.V.: Optimal sequencing by hybridization in rounds. Journal of Computational Biology 9, 355–369 (2002)
Mitsuhashi, M., Cooper, A., Ogura, M., Shinagawa, T., Yano, K., Hosokawa, T.: Oligonucleotide probe design - a new approach. Nature 367, 759–761 (1994)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Thomas, J., et al.: Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793 (2003)
Maidak, B.L., Cole, J.R., Lilburn, T.G., Parker, Charles T., J., Sax man, P.R., Farris, R.J., Garrity, G.M., Olsen, G.J., Schmidt, T.M., Tie dje, J.M.: The rdp-ii (ribosomal database project). Nucl. Acids. Res. 29, 173–174 (2001)
Weiner, P.: Linear pattern matching algorithms. In: Proc. of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the ACM (JACM) 23, 262–272 (1976)
Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)
Hui, L.: Color set size problem with applications to string matching. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 227–240. Springer, Heidelberg (1992)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM Journal of Computing 13, 338–355 (1984)
Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplificationsand parallelization. SIAM Journal of Computing 17, 1253–1262 (1988)
Knudsen, S.: A Biologist’s Guide to Analysis of DNA Microarray Data. Wiley Pub, Chichester (2002)
Baldi, P., Hatfield, G.W.: DNA Microarrays and Gene Expression. Cambridge University Press, Cambridge (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Angelov, S., Harb, B., Kannan, S., Khanna, S., Kim, J., Wang, LS. (2004). Genome Identification and Classification by Short Oligo Arrays. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-30219-3_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive