Abstract
We present algorithms that reduce the time and space needed to solve problems of finding all motifs common to a set of sequences. In particular, we give algorithms that (1) require time and space linear in the size of the input, (2) succinctly encode the output so that the time and space requirements depend on the number of motifs, not directly on motif length, and (3) efficiently parallelize the enumeration.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aho, A.: Algorithms for finding patterns in strings. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. A, pp. 257–300. MIT Press/Elsevier (1990)
Blanchette, M., Schwikowski, B., Tompa, M.: An exact algorithm to identify motifs in orthologous sequences from multiple species. In: Proceedings of the Annual International Conference on Computational Molecular Biology, pp. 37–45. ACM Press, New York (2000)
Cesati, M., Di Ianni, M.: Parameterized parallel complexity. Technical Report 4(6), Electronic Colloquium on Computational Complexity (ECCC) (1997)
Cook, S.A.: A taxonomy of problems with fast parallel algorithms. Information and Control 64(1-3), 2–21 (1985)
Downey, R., Fellows, M.: Parameterized Complexity. Monographs in Computer Science. Springer, New York (1999)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
van Helden, J., Andre, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281, 827–842 (1998)
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the Association for Computing Machiner 23(2), 262–272 (1976)
Sagot, M.-F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1390, pp. 374–390. Springer, Heidelberg (1998)
Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: Proceedings of the Annual International Symposium on Intelligent Systems for Molecular Biology, 2000, pp. 344–344. AAAI Press (2000)
Martin Tompa. An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. In: Proceedings of the Annual International Symposium on Intelligent Systems for Molecular Biology, 1999. AAAI Press, pages 262–271.
Valiant, L.G.: A Bridging Model for Parallel Computation. Communications of the Association for Computing Machinery 33(8), 103–111 (1990)
Vanet, A., Marsan, L., Labigne, A., Sagot, M.-F.: Inferring regulatory elements from a whole genome. An application to the analysis of the genome of Helicobacter pylori σ 80 family of promoter signals. Journal of Molecular Biology 297, 335–353 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Evans, P.A., Smith, A.D. (2003). Toward Optimal Motif Enumeration. In: Dehne, F., Sack, JR., Smid, M. (eds) Algorithms and Data Structures. WADS 2003. Lecture Notes in Computer Science, vol 2748. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45078-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-45078-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40545-0
Online ISBN: 978-3-540-45078-8
eBook Packages: Springer Book Archive