Skip to main content

Fast Structured Motif Search in DNA Sequences

  • Conference paper
Bioinformatics Research and Development (BIRD 2008)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

Abstract

We study the problem of structured motif search in DNA sequences. This is a fundamental task in bioinformatics which contributes to better understanding of genome characteristics and properties. We propose an efficient algorithm for Exact Match, Overlapping Structured motif search (EMOS), which uses a suffix tree index we proposed earlier and runs on a typical desktop computer. We have conducted numerous experiments to evaluate EMOS and compared its performance with the best known solution, SMOTIF1 [1]. While in some cases the search time of EMOS is comparable to SMOTIF1, it is on average 5 to 6 times faster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, Y., Zaki, M.J.: SMOTIF: efficient structured pattern and profile motif search. Algorithms for Molecular Biology, 1–22 (November 2006)

    Google Scholar 

  2. McCarthy, E., McDonald, J.: LTR_STRUC: A Novel Search and Identification Program for LTR Retrotransposons. Bioinformatics 19(3), 362–367 (2003)

    Article  Google Scholar 

  3. Feschotte, C., Jiang, N., Wessler, S.: Plant transposable elements: where genetics meets genomics. Nature Review Genetics 3(5), 329–341 (2002)

    Article  Google Scholar 

  4. Jurka, J., Kapitonov, V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J.: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110(1-4), 462–467 (2005)

    Article  Google Scholar 

  5. Policriti, A., Vitacolonna, N., Morgante, M., Zuccolo, A.: Structured Motif Search. In: Int’l Conf. on Research in Computational Molecular Biology, pp. 133–139 (2004)

    Google Scholar 

  6. Mehldau, G., Myers, G.: A system for Pattern Matching Applications on Biosequences. Computer Applications in the Biosciences 9(3), 299–314 (1993)

    Google Scholar 

  7. Myers, E.: Approximate Matching of Network Expressions with Spacers. J. Comput. Biol. 3(1), 33–51 (1996)

    Article  Google Scholar 

  8. Navarro, G., Raffinot, M.: Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Application to protein Searching. J. Comput. Biol. 10(6), 903–923 (2003)

    Article  Google Scholar 

  9. Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42(1/2), 1–31 (2001)

    Article  Google Scholar 

  10. Zaki, M.J.: Sequence Mining in Categorical Domains: Incorporating Constraints. In: ACM Int’l Conf on Information and Knowledge Management, pp. 422–429 (2000)

    Google Scholar 

  11. Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  12. Halachev, M., Shiri, N., Thamildurai, A.: Efficient and scalable indexing techniques for biological sequence data. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 464–479. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. FASST web-interface, http://sepehr.cs.concordia.ca/

  14. Giegerich, R., Kurtz, S., Stoye, J.: Efficient implementation of lazy suffix trees. Software – Practice and Experience 33(11), 1035–1049 (2003)

    Article  Google Scholar 

  15. Tian, Y., Tata, S., Hankins, R.A., Patel, J.: Practical methods for constructing suffix trees. VLDB Journal 14(3), 281–299 (2005)

    Article  Google Scholar 

  16. Human Genome Data, ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes

  17. SMOTIF1 source code, http://www.cs.rpi.edu/~zaki/software/sMotif/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Halachev, M., Shiri, N. (2008). Fast Structured Motif Search in DNA Sequences. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70600-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70598-7

  • Online ISBN: 978-3-540-70600-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics