Fast Structured Motif Search in DNA Sequences

Halachev, Mihail; Shiri, Nematollaah

doi:10.1007/978-3-540-70600-7_5

Mihail Halachev¹ &
Nematollaah Shiri¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

International Conference on Bioinformatics Research and Development

732 Accesses
1 Citations

Abstract

We study the problem of structured motif search in DNA sequences. This is a fundamental task in bioinformatics which contributes to better understanding of genome characteristics and properties. We propose an efficient algorithm for Exact Match, Overlapping Structured motif search (EMOS), which uses a suffix tree index we proposed earlier and runs on a typical desktop computer. We have conducted numerous experiments to evaluate EMOS and compared its performance with the best known solution, SMOTIF1 [1]. While in some cases the search time of EMOS is comparable to SMOTIF1, it is on average 5 to 6 times faster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhang, Y., Zaki, M.J.: SMOTIF: efficient structured pattern and profile motif search. Algorithms for Molecular Biology, 1–22 (November 2006)
Google Scholar
McCarthy, E., McDonald, J.: LTR_STRUC: A Novel Search and Identification Program for LTR Retrotransposons. Bioinformatics 19(3), 362–367 (2003)
Article Google Scholar
Feschotte, C., Jiang, N., Wessler, S.: Plant transposable elements: where genetics meets genomics. Nature Review Genetics 3(5), 329–341 (2002)
Article Google Scholar
Jurka, J., Kapitonov, V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J.: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110(1-4), 462–467 (2005)
Article Google Scholar
Policriti, A., Vitacolonna, N., Morgante, M., Zuccolo, A.: Structured Motif Search. In: Int’l Conf. on Research in Computational Molecular Biology, pp. 133–139 (2004)
Google Scholar
Mehldau, G., Myers, G.: A system for Pattern Matching Applications on Biosequences. Computer Applications in the Biosciences 9(3), 299–314 (1993)
Google Scholar
Myers, E.: Approximate Matching of Network Expressions with Spacers. J. Comput. Biol. 3(1), 33–51 (1996)
Article Google Scholar
Navarro, G., Raffinot, M.: Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Application to protein Searching. J. Comput. Biol. 10(6), 903–923 (2003)
Article Google Scholar
Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42(1/2), 1–31 (2001)
Article Google Scholar
Zaki, M.J.: Sequence Mining in Categorical Domains: Incorporating Constraints. In: ACM Int’l Conf on Information and Knowledge Management, pp. 422–429 (2000)
Google Scholar
Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
MATH Google Scholar
Halachev, M., Shiri, N., Thamildurai, A.: Efficient and scalable indexing techniques for biological sequence data. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 464–479. Springer, Heidelberg (2007)
Chapter Google Scholar
FASST web-interface, http://sepehr.cs.concordia.ca/
Giegerich, R., Kurtz, S., Stoye, J.: Efficient implementation of lazy suffix trees. Software – Practice and Experience 33(11), 1035–1049 (2003)
Article Google Scholar
Tian, Y., Tata, S., Hankins, R.A., Patel, J.: Practical methods for constructing suffix trees. VLDB Journal 14(3), 281–299 (2005)
Article Google Scholar
Human Genome Data, ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes
SMOTIF1 source code, http://www.cs.rpi.edu/~zaki/software/sMotif/

Download references

Author information

Authors and Affiliations

Dept. of Computer Science & Software Engineering, Concordia University, Montreal, Quebec, Canada
Mihail Halachev & Nematollaah Shiri

Authors

Mihail Halachev
View author publications
You can also search for this author in PubMed Google Scholar
Nematollaah Shiri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halachev, M., Shiri, N. (2008). Fast Structured Motif Search in DNA Sequences. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-70600-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics