GP challenge: evolving energy function for protein structure prediction

Widera, Paweł; Garibaldi, Jonathan M.; Krasnogor, Natalio

doi:10.1007/s10710-009-9087-0

GP challenge: evolving energy function for protein structure prediction

Original Paper
Published: 23 September 2009

Volume 11, pages 61–88, (2010)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Paweł Widera¹,
Jonathan M. Garibaldi¹ &
Natalio Krasnogor¹

403 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

One of the key elements in protein structure prediction is the ability to distinguish between good and bad candidate structures. This distinction is made by estimation of the structure energy. The energy function used in the best state-of-the-art automatic predictors competing in the most recent CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiment is defined as a weighted sum of a set of energy terms designed by experts. We hypothesised that combining these terms more freely will improve the prediction quality. To test this hypothesis, we designed a genetic programming algorithm to evolve the protein energy function. We compared the predictive power of the best evolved function and a linear combination of energy terms featuring weights optimised by the Nelder–Mead algorithm. The GP based optimisation outperformed the optimised linear function. We have made the data used in our experiments publicly available in order to encourage others to further investigate this challenging problem by using GP and other methods, and to attempt to improve on the results presented here.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on genetic algorithm: past, present, and future

Article 31 October 2020

Evolutionary algorithms and their applications to engineering problems

Article Open access 16 March 2020

Genetic algorithms: theory, genetic operators, solutions, and applications

Article 03 February 2023

Notes

Protein domain is an independent part of a protein chain that folds into distinct structural region. Its average size is around 100 amino acids in length [45].
From the original set of 56 protein we have excluded 1ogwA (it contains LEF—a non-standard amino acid) and 1cy5A (by omission).

References

C. Anfinsen, Principles that govern the folding of protein chains. Science 181(4096), 223–230 (1973). doi:10.1126/science.181.4096.223
Article Google Scholar
J. Bacardit, M. Stout, N. Krasnogor, J. Hirst, J. Blazewicz, Coordination number prediction using learning classifier systems: performance and interpretability. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO ’06). (ACM Press, 2006), pp. 247–254. doi:10.1145/1143997.1144041
D. Barthel, J.D. Hirst, J. Blazewicz, N. Krasnogor, ProCKSI: a decision support system for protein (structure) comparison, knowledge, similarity and information. BMC Bioinform. 8(1), 416 (2007). doi:10.1186/1471-2105-8-416
Article Google Scholar
J.N.D. Battey, J. Kopp, L. Bordoli, R.J. Read, N.D. Clarke, T. Schwede, Automated server predictions in CASP7. Proteins Struct. Funct. Bioinform. 69(S8), 68–82 (2007). doi:10.1002/prot.21761
Article Google Scholar
H.M. Berman, The protein data bank: a historical perspective. Acta Crystallographica Sect. A 64(1), 88–95 (2008). doi:10.1107/S0108767307035623
Google Scholar
P.E. Bourne, Structural bioinformatics, chap. CASP and CAFASP experiments and their findings (Wiley-Liss, New York, 2003), pp. 499–505. doi:10.1002/0471721204.ch24
Google Scholar
E. Burke, S. Gustafson, G. Kendall, Diversity in genetic programming: an analysis of measures and correlation with fitness. IEEE Trans. Evol. Comput. 8(1), 47–62 (2004). doi:10.1109/TEVC.2003.819263
Article Google Scholar
E. Burke, S. Gustafson, G. Kendall, N. Krasnogor, Advanced population diversity measures in genetic programming. In 7th International Conference Parallel Problem Solving from Nature, Springer Lecture Notes in Computer Science, vol. 2439, ed. by H.G.B.J.L.F.V.H.P.S.J.J. Merelo Guervós, P. Adamidis (PPSN, Springer Berlin/Heidelberg, Granada, Spain, 2002), pp. 341–350. doi:10.1007/3-540-45712-7_33
H. Chen, H.X. Zhou, Prediction of solvent accessibility and sites of deleterious mutations from protein sequence. Nucleic Acids Res. 33(10), 3193–3199 (2005). doi:10.1093/nar/gki633
Article MathSciNet Google Scholar
D. Chivian, CASP7 server ranking for FM category (GDT MM) (2006). http://robetta.bakerlab.org/CASP7_eval/CASP7.FR_A-NF.Best-GDT_MM.html
E.A. Coutsias, C. Seok, K.A. Dill, Using quaternions to calculate RMSD. J. Comput. Chem. 25(15), 1849–1857 (2004). doi:10.1002/jcc.20110
Article Google Scholar
S. Cristobal, A. Zemla, D. Fischer, L. Rychlewski, A. Elofsson, A study of quality measures for protein threading models. BMC Bioinform. 2(1), 5 (2001). doi:10.1186/1471-2105-2-5. http://www.biomedcentral.com/1471-2105/2/5
V. Cutello, G. Narzisi, G. Nicosia, A multi-objective evolutionary approach to the protein structure prediction problem. J. R. Soc. Interface 3(6), 139–151 (2006). doi:10.1098/rsif.2005.0083. Applies MOO for CHARMM27 energy (computed with TINKER)
Google Scholar
R. Das, B. Qian, S. Raman, R. Vernon, J. Thompson, P. Bradley, S. Khare, M.D. Tyka, D. Bhat, D. Chivian, D.E. Kim, W.H. Sheffler, L. Malmström, A.M. Wollacott, C. Wang, I. Andre, D. Baker, Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins Struct. Funct. Bioinform. 69(S8), 118–128 (2007). doi:10.1002/prot.21636
Article Google Scholar
R.O. Day, G.B. Lamont, R. Pachter, Protein structure prediction by applying an evolutionary algorithm. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing (IEEE Computer Society, 2003), p. 155.1. doi:10.1109/IPDPS.2003.1213291
K.A. Dill, Dominant forces in protein folding. Biochemistry 29(31), 7133–7155 (1990). doi:10.1021/bi00483a001
Article Google Scholar
D.P. Djurdjevic, M.J. Biggs, Ab initio protein fold prediction using evolutionary algorithms: influence of design and control parameters on performance. J. Comput. Chem. 27(11), 1177–1195 (2006). doi:10.1002/jcc.20440
Article Google Scholar
C. Dwork, R. Kumar, M. Naor, D. Sivakumar, Rank aggregation methods for the Web. In Proceedings of the 10th international conference on World Wide Web (ACM, Hong Kong, 2001), pp. 613–622. doi:10.1145/371920.372165
C. Gagné, M. Parizeau, Genericity in evolutionary computation software tools: principles and case-study. Int. J. Artif. Intell. Tools 15(2), 173–194 (2006). doi:10.1142/S021821300600262X
Article Google Scholar
D.E. Goldberg, K. Deb, A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms, ed. by G.J.E. Rawlins (Morgan Kaufmann, San Francisco, CA, 1990), pp. 69–93
Google Scholar
E. Jones, T. Oliphant, P. Peterson, et al., SciPy: open source scientific tools for Python (2001–). http://www.scipy.org/
W. Kabsch, A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Sect. A 34(5), 827–828 (1978). doi:10.1107/S0567739478001680
Article Google Scholar
W.R. Knight, A computer method for calculating Kendall’s tau with ungrouped data. J. Am. Stat. Assoc. 61(314), 436–439 (1966)
Article MATH Google Scholar
A. Kolinski, Protein modeling and structure prediction with a reduced representation. Acta Biochimica Polonica 51(2), 349–371 (2004). http://www.actabp.pl/html/2_2004/349.html
A. Kolinski, J. Skolnick, Assembly of protein structure from sparse experimental data: an efficient Monte Carlo model. Proteins Struct Funct Genet 32(4), 475–494 (1998). doi:10.1002/(SICI)1097-0134(19980901)32:4<475::AID-PROT6>3.0.CO;2-F
Article Google Scholar
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection and Genetics (MIT Press, Cambridge, 1992)
MATH Google Scholar
J.R. Koza, Scalable learning in genetic programming using automatic function definition. In Advances in Genetic Programming, Chap. 5, ed. by K.E.J. Kinnear (MIT Press, Cambridge, 1994), pp. 99–117
Google Scholar
N. Krasnogor, B. Blackburnem, J. Hirst, E. Burke, Multimeme algorithms for protein structure prediction. In Parallel Problem Solving from Nature—PPSN VII, Springer Lecture Notes in Computer Science, vol. 2439, ed. by J.J. Merelo, P. Adamidis, H.G. Beyer (Springer, Berlin, 2002), pp. 769–778. doi:10.1007/3-540-45712-7_74
Chapter Google Scholar
N. Krasnogor, W. Hart, J. Smith, D. Pelta, Protein structure prediction with evolutionary algorithms. In International Genetic and Evolutionary Computation Conference (GECCO99), ed. by Banzhaf, Daida, Eiben, Garzon, Honovar, Jakiela, Smith (Morgan Kaufmann, San Francisco, CA, 1999), pp. 1569–1601
V.I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Dokl. 10(8), 707–710 (1966)
MathSciNet Google Scholar
A. Liwo, S. Oldziej, C. Czaplewski, U. Kozlowska, H. Scheraga, Parametrization of backbone-electrostatic and multibody contributions to the UNRES force field for protein-structure prediction from ab initio energy surfaces of model systems. J. Phys. Chem. B 108(27), 9421–9438 (2004). doi:10.1021/jp030844f
Article Google Scholar
S. Luke, L. Panait, A survey and comparison of tree generation algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), ed. by L. Spector, E.D. Goodman, A. Wu, W.B. Langdon, H.M. Voigt, M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M.H. Garzon, E. Burke (Morgan Kaufman, San Francisco, CA, 2001), pp. 81–88. http://en.scientificcommons.org/453130
J.A. MacKerell, Empirical force fields for biological macromolecules: overview and issues. J. Comput. Chem. 25(13), 1584–1604 (2004). doi:10.1002/jcc.20082
Article Google Scholar
K.I.M. McKinnon, Convergence of the Nelder–Mead simplex method to a nonstationary point. SIAM J. Optim. 9, 148–158 (1999)
Article MathSciNet Google Scholar
J. Nelder, R. Mead, A simplex method for function minimization. Comput. J. 7, 308–313 (1964)
Google Scholar
V.S. Pande, I. Baker, J. Chapman, S.P. Elmer, S. Khaliq, S.M. Larson, Y.M. Rhee, M.R. Shirts, C.D. Snow, E.J. Sorin, B. Zagrovic, Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing. Biopolymers 68(1), 91–109 (2003). doi:10.1002/bip.10219
Article Google Scholar
C.A. Rohl, C.E.M. Strauss, K.M.S. Misura, D. Baker, Protein structure prediction using rosetta. In Numerical Computer Methods, Part D, Methods in Enzymology, vol. 383, ed. by L. Brand, M.L. Johnson (Academic Press, New York, 2004), pp. 66–93. doi:10.1016/S0076-6879(04)83004-0
Chapter Google Scholar
R. Santana, P. Larranaga, J. Lozano, Protein folding in simplified models with estimation of distribution algorithms. IEEE Trans. Evol. Comput. 12(4), 418–438 (2008). doi:10.1109/TEVC.2007.906095
Article Google Scholar
K.T. Simons, I. Ruczinski, C. Kooperberg, B.A. Fox, C. Bystroff, D. Baker, Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins. Proteins Struct Funct Genet 34(1), 82–95 (1999). doi:10.1002/(SICI)1097-0134(19990101)34:1<82::AID-PROT7>3.0.CO;2-A
Article Google Scholar
M. Stout, J. Bacardit, J. Hirst, R. Smith, N. Krasnogor, Prediction of topological contacts in proteins using learning classifier systems. Soft Comput. Fusion Found. Methodol. Appl. 13(3), 245–258 (2009). doi:10.1007/s00500-008-0318-8
Google Scholar
M. Stout, J. Bacardit, J.D. Hirst, N. Krasnogor, Prediction of recursive convex hull class assignments for protein residues. Bioinformatics 24(7), 916–923 (2008). doi:10.1093/bioinformatics/btn050
Article Google Scholar
G. Syswerda, A study of reproduction in generational and steady state genetic algorithms. In Foundations of Genetic Algorithms, ed. by G.J.E. Rawlins (Morgan Kaufmann, San Francisco, CA, 1990), pp. 94–101
Google Scholar
R. Unger, Applications of Evolutionary Computation in Chemistry, Structure & Bonding, vol. 110, chap. The Genetic Algorithm Approach to Protein Structure Prediction (Springer, Berlin, 2004), pp. 2697–2699. doi:10.1007/b13936
Google Scholar
S. Wallin, J. Farwer, U. Bastolla, Testing similarity measures with continuous and discrete protein models. Proteins Struct. Funct. Genet. 50(1), 144–157 (2003). doi:10.1002/prot.10271
Article Google Scholar
S.J. Wheelan, A. Marchler-Bauer, S.H. Bryant, Domain size distributions can predict domain boundaries. Bioinformatics 16(7), 613–618 (2000). doi:10.1093/bioinformatics/16.7.613
Article Google Scholar
S. Wu, J. Skolnick, Y. Zhang, Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 5(1), 17 (2007). doi:10.1186/1741-7007-5-17
Article Google Scholar
A. Zemla, LGA: a method for finding 3D similarities in protein structures. Nucl. Acids Res. 31(13), 3370–3374 (2003). doi:10.1093/nar/gkg571
Article Google Scholar
Y. Zhang, CASP7 server ranking for FM category (TM-Score) (2006). http://zhang.bioinformatics.ku.edu/casp7/24.html
Y. Zhang, I.A. Hubner, A.K. Arakaki, E. Shakhnovich, J. Skolnick, On the origin and highly likely completeness of single-domain protein structures. PNAS 103(8), 2605–2610 (2006). doi:10.1073/pnas.0509379103
Article Google Scholar
Y. Zhang, D. Kihara, J. Skolnick, Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding. Proteins Struct. Funct. Genet. 48(2), 192–201 (2002). doi:10.1002/prot.10141
Article Google Scholar
Y. Zhang, A. Kolinski, J. Skolnick, TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys. J. 85(2), 1145–1164 (2003). http://www.biophysj.org/cgi/content/full/85/2/1145
Google Scholar
Y. Zhang, J. Skolnick, Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. Biophys. J. 87(4), 2647–2655 (2004). doi:10.1529/biophysj.104.045385
Article Google Scholar

Download references

Acknowledgments

We would like to thank Yang Zhang for making the decoys data available online and for explaining the details of I-TASSER energy terms implementation. This research was supported by the Marie Curie Action MEST-CT-2004-7597 under the Sixth Framework Programme of the European Community and by the UK Engineering and Physical Sciences Research Council under grant GR/T07534/01.

Author information

Authors and Affiliations

School of Computer Science, University of Nottingham, Nottingham, NG8 1BB, UK
Paweł Widera, Jonathan M. Garibaldi & Natalio Krasnogor

Authors

Paweł Widera
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan M. Garibaldi
View author publications
You can also search for this author in PubMed Google Scholar
Natalio Krasnogor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalio Krasnogor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Widera, P., Garibaldi, J.M. & Krasnogor, N. GP challenge: evolving energy function for protein structure prediction. Genet Program Evolvable Mach 11, 61–88 (2010). https://doi.org/10.1007/s10710-009-9087-0

Download citation

Received: 05 September 2008
Revised: 11 May 2009
Published: 23 September 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10710-009-9087-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GP challenge: evolving energy function for protein structure prediction

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

Evolutionary algorithms and their applications to engineering problems

Genetic algorithms: theory, genetic operators, solutions, and applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GP challenge: evolving energy function for protein structure prediction

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

Evolutionary algorithms and their applications to engineering problems

Genetic algorithms: theory, genetic operators, solutions, and applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation