Skip to main content

Protein Motif Discovery with Linear Genetic Programming

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3683))

Abstract

There have been published some studies of genetic programming as a way to discover motifs in proteins and other biological data. These studies have been small, and often used domain knowledge to improve search. In this paper we present a genetic programming algorithm, that does not use domain knowledge, with results on 44 different protein families. We demonstrate that our list-based representation, given a fixed amount of processing resources, is able to discover meaningful motifs with good classification performance. Sometimes comparable to or even surpassing that of motifs found in a database of manually created motifs. We also investigate introduction of gaps in our algorithm, and it seems that this give a small increase in classification accuracy and recall, but with reduced precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Halaas, A., Svingen, B., Nedland, M., Saetrom, P., Snove Jr., O., Birkeland, O.R.: A recursive MISD architecture for pattern matching. IEEE Transactions on Very Large Scale Integraion (VLSI) Systems 12(7), 727–734 (2004)

    Article  Google Scholar 

  2. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann, San Francisco (1998)

    MATH  Google Scholar 

  3. Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5(2), 277–304 (1998)

    Article  Google Scholar 

  4. Heddad, A., Brameier, M., MacCallum, R.M.: Evolving regular expression-based sequence classifiers for protein nuclear localisation. In: Raidl, G.R., Cagnoni, S., Branke, J., Corne, D.W., Drechsler, R., Jin, Y., Johnson, C.G., Machado, P., Marchiori, E., Rothlauf, F., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2004. LNCS, vol. 3005, pp. 31–40. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Hu, Y.-J.: Biopattern discovery by genetic programming. In: Koza, J.R., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA, July 22-25, pp. 152–157. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  6. Hulo, N., Sigrist, C.J.A., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A.: Recent improvements to the PROSITE database. Nucl. Acids Res. 32(90001), D134–137 (2004)

    Google Scholar 

  7. Koza, J.R., Andre, D.: Automatic discovery using genetic programming of an unknown-sized detector of protein motifs containing repeatedly-used subexpressions. In: Rosca, J.P. (ed.) Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, Tahoe City, California, USA, July 9, pp. 89–97 (1995)

    Google Scholar 

  8. Ross, B.J.: The evaluation of a stochastic regular motif language for protein sequences. In: Spector, L., Goodman, E.D., Wu, A., Langdon, W.B., Voigt, H.-M., Gen, M., Sen, S., Dorigo, M., Pezeshk, S., Garzon, M.H., Burke, E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), San Francisco, California, USA, July 7-11, pp. 120–128. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  9. Ross, B.J.: The evolution of stochastic regular motifs for protein sequences. New Generation Computing 20(2), 187–213 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Seehuus, R. (2005). Protein Motif Discovery with Linear Genetic Programming. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553939_109

Download citation

  • DOI: https://doi.org/10.1007/11553939_109

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28896-1

  • Online ISBN: 978-3-540-31990-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics