Skip to main content

Part of the book series: Advances in Soft Computing ((AINSC,volume 49))

Summary

We present a new, efficient and scalable tool, named BIORED, for pattern discovery in proteomic and genomic sequences. It uses a genetic algorithm to find interesting patterns in the form of regular expressions, and a new efficient pattern matching procedure to count pattern occurrences. We studied the performance, scalability and usefulness of BIORED using several databases of biosequences. The results show that BIORED was successful in finding previously known patterns, thus an excellent indicator for its potential. BIORED is available for download under the GNU Public License at http://www.dcc.fc.up.pt/biored/ . An online demo is available at the same address.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs (Third, Revised and Extended edn.). Springer, New York (1999)

    Google Scholar 

  2. van Helden, J., del Olmo, M., Perez-Ortin, J.: Statistical analisys of yeast genome downstream sequences reveals putative polyadenylation signals. Nucleic Acids Research 28(4), 1000–1010 (2000)

    Article  Google Scholar 

  3. Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. Proceedings of the National Academy of Sciences of the United States of America 95(6), 2738–2743 (2000)

    Google Scholar 

  4. Sinha, S., Tompa, M.: An exact method for finding shor motifs in sequences, with application to the ribosome binding site problem. In: Proceedings of the 7th International Conference on ISMB, pp. 262–271 (1999)

    Google Scholar 

  5. Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn. John Wiley & Sons, Chichester (1968)

    MATH  Google Scholar 

  6. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: STOC 2002: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 592–601. ACM Press, New York (2002)

    Chapter  Google Scholar 

  7. Navarro, G.: Pattern matching. Journal of Applied Statistics 31(8), 925–949 (2004); Special issue on Pattern Discovery

    Article  MATH  MathSciNet  Google Scholar 

  8. Pereira, P., Fonseca, N.A., Silva, F.: A high performance distributed tool for mining patterns in biological sequences. Technical Report DCC-2006-08, DCC-FC & LIACC, Universidade do Porto (2006)

    Google Scholar 

  9. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, (2005) ISBN 3-900051-07-0

    Google Scholar 

  10. Hubbard, T., Andrews, D., Caccamo, M., et al.: Ensembl 2005. Nucleic Acids Research 33(1) (January 2005)

    Google Scholar 

  11. Rotwein, P., Yokoyama, S., Didier, D.K., Chirgwin, J.M.: Genetic analysis of the hypervariable region flanking the human insulin gene. The American Journal of Human Genetics (1986)

    Google Scholar 

  12. Lew, A., Rutter, W.J., Kennedy, G.C.: Unusual dna structure of the diabetes susceptibility locus iddm2 and its effect on transcription by the insulin promoter factor pur-1/maz. Proceedings of the National Academy of Sciences of the United States of America 97(23), 12508–12512 (2000)

    Article  Google Scholar 

  13. Costas, J., Vieira, C.P., Casares, F., Vieira, J.: Genomic characterization of a repetitive motif strongly associated with developmental genes in drosophila. BMC Genomics (2003)

    Google Scholar 

  14. Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: The teiresias algorithm. Bioinformatics 14(1), 55–67 (1998)

    Article  Google Scholar 

  15. Jonassen, I., Collins, J.F., Higgins, D.: Finding flexible patterns in unaligned protein sequences. Protein Science 4(8), 1587–1595 (1995)

    Article  Google Scholar 

  16. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M., Sigrist, C.J.A.: The prosite database. Nucleic Acids Res., 34 (2006)

    Google Scholar 

  17. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on ISMB, pp. 28–36. AAAI Press, Menlo Park (1994)

    Google Scholar 

  18. Robin, S., Schbath, S., Vandewalle, V.: Statistical tests to compare motif count exceptionalities. BMC Bioinformatics 8(84) (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Juan M. Corchado Juan F. De Paz Miguel P. Rocha Florentino Fernández Riverola

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pereira, P., Silva, F., Fonseca, N.A. (2009). BIORED - A Genetic Algorithm for Pattern Detection in Biosequences. In: Corchado, J.M., De Paz, J.F., Rocha, M.P., Fernández Riverola, F. (eds) 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). Advances in Soft Computing, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85861-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85861-4_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85860-7

  • Online ISBN: 978-3-540-85861-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics