Skip to main content

Advertisement

Log in

MOIRAE: A computational strategy to extract and represent structural information from experimental protein templates

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The prediction and analysis of the three- dimensional (3D) structure of proteins is a key research problem in Structural Bioinformatics. The 1990’s Genome Projects resulted in a large increase in the number of available protein sequences. However, the number of identified 3D protein structures have not followed the same growth trend. Currently, the number of available protein sequences greatly exceeds the number of known 3D structures. Many computational methodologies, systems and algorithms have been proposed to address the protein structure prediction problem. However, the problem still remains challenging because of the complexity and high dimensionality of a protein conformational search space. The most significant progress in the last Critical Assessment of protein Structure Prediction was achieved by methods that use database information. Nevertheless, a major challenge remains in the development of better strategies for template identification and representation. This article describes a computational strategy to acquire and represent structural information of experimentally determined 3D protein structures. A clustering strategy was combined with artificial neural networks in order to extract structural information from experimental protein structure templates. In the proposed strategy, the main efforts focus on the acquisition of useful and accurate structural information from 3D protein templates stored in the Protein Data Bank (PDB). The proposed method was tested in twenty protein sequences whose sizes vary from 14 to 70 amino acid residues. Our results show that the proposed method is a good way to extract and represent valuable information obtained from the PDB and also significantly reduce the 3D protein conformational search space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://www.rcsb.org.

  2. predictioncenter.org.

  3. http://www.pdb.org.

  4. Hubbard, S.J. and Thornton, J.M. ’NACCESS’, computer program, 1993, Department of Biochemistry and Molecular Biology, University College London.

  5. Bloomsbury Center for Bioinformatics. http://www.bioinf.org.uk/software/swreg.html.

  6. Protein templates: in this work we define a structural template as a subsequence of amino acid residues found in experimentally determined 3D protein structures. We look for all proteins in the PDB that have a subsequence of amino acid residue identical to a target subsequence or fragment of amino acid residues.

  7. Template fragments are short sub-sequences of amino acid residues of proteins with known 3D structure.

  8. BLAST blast.ncbi.nlm.nih.gov.

  9. http://www.pymol.org.

  10. SCOP is a classification of protein structural domains based on similarities of their structures. http://scop.mrc-lmb.cam.ac.uk/scop.

  11. 1AB1 presents 46 amino acid residues. However, the fragmentation scheme adopted in our method implies that the first and the last two amino acid residues are lost. For these amino acid residues the torsion angles are fixed on 180.0\(^\circ \).

  12. http://www.bioinf.org.uk/software/profit.

  13. DOE Genomic Science. http://genomics.energy.gov.

References

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  Google Scholar 

  • Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(96):223–230

    Article  Google Scholar 

  • Anfinsen CB, Haber E, Sela M, White FH Jr (1961) The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc Natl Acad Sci USA 47:1309–1314

    Article  Google Scholar 

  • Bajorath J, Stenkamp R, Aruffo A (1994) Knowledge-based model building of proteins: concepts and examples. Protein Sci 2(11):1797–1810

    Google Scholar 

  • Banner DW, Kokkinidis M, Tsernoglou D (1987) Structure of the ColE1 rop protein at 1.7 A resolution. J Mol Biol 196:657–675

    Article  Google Scholar 

  • Ben-David M, Noivirt-Brik O, Prilusky J, Sussman JL, Levy Y (2009) Assessments of CASP8 structure predictions for template free targets. Proteins Struct Funct Bioinf 77(9):50–65

    Google Scholar 

  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bath TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242

    Article  Google Scholar 

  • Blanc E, Fremont V, Sizun P, Meunier S, Van Rietschoten J, Thevand A, Bernassau JM, Darbon H (1996) Solution structure of P01, a natural scorpion peptide structurally analogous to scorpion toxins specific for apamin-sensitive potassium channel. Proteins 24:359–369

    Article  Google Scholar 

  • Blundell TL, Sibanda BL, Sternberg MJE, Thornton JM (1987) Knowledge-based prediction of protein structures and the design of novel molecules. Nature 326:347–352

    Article  Google Scholar 

  • Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016):164–170

    Article  Google Scholar 

  • Bryant SH, Altschul S (1995) Statistics of sequence-structure threading. Curr Opin Struct Biol 5(2):236–244

    Article  Google Scholar 

  • Bryson AE, Ho Y-C (1969) Applied optimal control: optimization, estimation, and control, 1st edn. Taylor and Francis, Levittown

  • Cai Z, Xu C, Xu Y, Lu W, Chi CW, Shi Y, Wu J (2004) Solution structure of BmBKTx1, a new BKCa1 channel blocker from the Chinese scorpion Buthus martensi Karsch. Biochemistry 43:3764–3771

    Article  Google Scholar 

  • Chagot B, Pimentel C, Dai L, Pil J, Tytgat J, Nakajima T, Corzo G, Darbon H, Ferrat G (2005) An unusual fold for potassium channel blockers: NMR structure of three toxins from the scorpion opisthacanthus madagascariensis. Biochem J 388:263–271

    Article  Google Scholar 

  • Clarke ND, Kissinger CR, Desjarlais J, Gilliland GL, Pabo CO (1994) Structural studies of the engrailed homeodomain. Protein Sci 3:1779–1787

    Article  Google Scholar 

  • Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A (2009) Evaluation of template-based models in CASP8 with standard measures. Proteins: Struct Funct Bioinf 77(9):18–28

    Google Scholar 

  • Creighton TE (1990) Protein folding. Biochem J 270:1–16

    Google Scholar 

  • Crescenzi P, Goldman D, Papadimitriou CH, Piccolboni A, Yannakakis M (1998) On the complexity of protein folding. J Comput Biol 5(3):423–466

    Article  Google Scholar 

  • Dauplais M, Lecoq A, Song J, Cotton J, Jamin N, Gilquin B, Roumestand C, Vita C, de Medeiros CL, Rowan EG, Harvey AL, Menez A (1997) On the convergent evolution of animal toxins. Conservation of a diad of functional residues in potassium channel-blocking toxins with unrelated structures. J Biol Chem 272:4302–4309

    Article  Google Scholar 

  • Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. Atlas Protein Seq Struct 5(3):345–352

    Google Scholar 

  • Donaldson LW, Wojtyra U, Houry WA (2003) Solution structure of the dimeric zinc binding domain of the chaperone ClpX. J Biol Chem 278:48991–48996

    Article  Google Scholar 

  • Dorn M, Breda A, Norberto de Souza O (2008) A hybrid method for the protein structure prediction problem. Lect Notes Bioinf 5167:47– 56

    Google Scholar 

  • Dorn M, Buriol LS, Lamb LC (2011) A hybrid genetic algorithm for the 3-D protein structure prediction problem using a path-relinking strategy. In: IEEE congress on evolutionary computation (CEC), pp 2709–2716

  • Floudas CA, Fung HK, McAllister SR, Moennigmann M, Rajgaria R (2006) Advances in protein structure prediction and de novo protein design: a review. Chem Eng Sci 61(3):966–988

    Article  Google Scholar 

  • Fraenkel AS (1993) Complexity of protein folding. Bull Math Biol 55(6):1199–1210

    Article  MATH  Google Scholar 

  • Glykos NM, Cesareni G, Kokkinidis M (1999) Protein plasticity to the extreme: changing the topology of a 4-alpha-helical bundle with a single amino acid substitution. Struct Fold Des 7:597–603

    Article  Google Scholar 

  • Gouda H, Torigoe H, Saito A, Sato M, Arata Y, Shimada I (1992) Three-dimensional solution structure of the B domain of staphylococcal protein A: comparisons of the solution and crystal structures. Biochemistry 31:9665–9672

    Article  Google Scholar 

  • Hart W, Istrail S (1997) Robust proofs of NP-hardness for protein folding: general lattices and energy potentials. J Comput Biol 4(1): 1–22

    Google Scholar 

  • Henikoff S, Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17(1):49–61

    Article  Google Scholar 

  • Hill CP, Yee J, Selsted ME, Eisenberg D (1991) Crystal structure of defensin HNP-3, an amphiphilic dimer: mechanisms of membrane permeabilization. Science 251:1481–1485

    Article  Google Scholar 

  • Hovmoller TZ, Ohlson T (2002) Conformation of amino acids in protein. Acta Crystallogr 58(5):768–776

    Google Scholar 

  • Hutchinson EG, Thornton JM (1996) Promotif: a program to identify and analyze structural motifs in proteins. Protein Sci 5(2):212–220

    Article  Google Scholar 

  • Jauch R, Yeo HC, Kolatkar PR, Clarke ND (2007) Assessment of CASP7 structure predictions for template free targets. Proteins: Struct Funct Bioinf 69(8):57–67

    Google Scholar 

  • Ji H, Shu W, Burling FT, Jiang S, Lu M (1999) Inhibition of human immunodeficiency virus type 1 infectivity by the gp41 core: role of a conserved hydrophobic cavity in membrane fusion. J Virol 73:8578–8586

    Google Scholar 

  • Jones DT, Taylor WR, Thornton JM (1992) A new approach to protein fold recognition. Nature 358(6381):86–89

    Article  Google Scholar 

  • Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637

    Article  Google Scholar 

  • Kabsch W, Sander C (1984) On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA 81(10):1075–1078

    Article  Google Scholar 

  • Kolinski A (2004) Protein modeling and structure prediction with a reduced representation. Acta Biochim Pol 51:349–371

    Google Scholar 

  • Koonin EV, Galperin MY (2002) Computational approaches in comparative genomics, 1st edn. Kluwer, Norwell

  • Koop S, Bordoli L, Battey JN, Kiefer F, Schwede T (2007) Assessment of CASP7 predictions for template-based modleing targets. Proteins: Struct Funct Bioinf 69(8):38–56

    Google Scholar 

  • Lander ES, Waterman MS (1999) The secrets of life: a mathematician’s introduction to molecular biology. National Academy Press, Washington, DC

  • Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) Procheck: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26(2):283–291

    Article  Google Scholar 

  • Laskowski RA, Rullmann JA, MacArthur MW, Kaptein R, Thornton JM (1996) AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR 8:477–486

    Google Scholar 

  • Lehninger AL, Nelson DL, Cox MM (2005) Princ Biochem, 4th edn. W.H. Freeman, New York

    Google Scholar 

  • Lesk AM (2002) Introduction to bioinformatics, 1st edn. Oxford University Press Inc., New York

  • Lesk AM (2010) Introduction to protein science, 2nd edn. Oxford University Press, New York

  • Levinthal C (1968) Are there pathways for protein folding? J Chim Phys Phys-Chim Biol 65(1):44–45

    Google Scholar 

  • Lewis PN, Momany FA, Scheraga HA (1973) Chain reversals in proteins. Biochim Biophys Act 303(2):211–229

    Google Scholar 

  • Liljas A, Liljas L, Pskur J, Lindblom G, Nissen P, Kjeldgaard M (2011) Textbook of structural biology, 1st edn. World Scientific Printers, Singapore

  • Liu J, Lynch PA, Chien CY, Montelione GT, Krug RM, Berman HM (1997) Crystal structure of the unique RNA-binding domain of the influenza virus NS1 protein. Nat Struct Biol 4:896–899

    Article  Google Scholar 

  • Liu J, Zheng Q, Deng Y, Cheng CS, Kallenbach NR, Lu M (2006) A seven-helix coiled coil. Proc Natl Acad Sci USA 103(42):15457–15462

    Article  Google Scholar 

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MATH  MathSciNet  Google Scholar 

  • Martí-Renom MA, Stuart A, Fiser A, Sanchez A, Mello F, Sali A (2000) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29(16):291–325

    Article  Google Scholar 

  • McLachlan AD (1992) Rapid comparison of protein structures. Acta Crystallogr A38:871–873

    Google Scholar 

  • Milner-White EJ, Ross BM, Ismail R, Belhadj-Mostefa K, Poet R (1988) One type of gamma-turn, rather than the other gives rise to chain-reversal in proteins. J Mol Biol 204(3):777–782

    Article  Google Scholar 

  • Mitra S, Acharya T (2005) Data mining: pratical machine learning tools and techniques, 2nd edn. Elsevier, San Francisco

    Google Scholar 

  • Morris AL, MacArthur MW, Hutchinson EG, Thornton JM (1992) Stereochemical quality of protein structure coordinates. Proteins: Struct Funct Bioinf 12:345–364

    Google Scholar 

  • Nagadoi A, Nakazawa K, Uda H, Okuno K, Maekawa T, Ishii S, Nishimura Y (1999) Solution structure of the transactivation domain of ATF-2 comprising a zinc finger-like subdomain and a flexible subdomain. J Mol Biol 287:593–607

    Google Scholar 

  • Némethy G, Printz MP (1972) The \(\gamma \)-turn, a possible folded conformation of the polypeptide chain. Comparison with the \(\beta \)-turn. Macromolecules 5(6):755

    Article  Google Scholar 

  • Neumaier A (1997) Molecular modeling of proteins and mathematical prediction of protein structure. SIAM Rev 39:407–460

    Article  MATH  MathSciNet  Google Scholar 

  • Ngo JT, Marks J, Karplus M (1997) The protein folding problem and tertiary structure prediction. In: Merz K Jr, Grand SL (eds) Computational complexity, protein structure prediction and the Levinthal Paradox, pp 435–508. Birkhauser, Boston

  • Osguthorpe DJ (2000) Ab initio protein folding. Curr Opin Struct Biol 10(2):146–152

    Article  Google Scholar 

  • Pastor MT, Lopez de la Paz M, Lacroix E, Serrano L, Perez-Paya E (2002) Combinatorial approaches: a new tool to search for highly structured beta-hairpin peptides. Proc Natl Acad Sci USA 99:614–619

    Article  Google Scholar 

  • Pauling L, Corey RB, Branson HR (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 37(4):205–211

    Article  Google Scholar 

  • Pedersen JT, Moult J (1997) Protein folding simulations with genetic algorithms and a detailed molecular description. J Mol Biol 269(2):240–259

    Article  Google Scholar 

  • Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) Ucsf chimera: a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612

    Article  Google Scholar 

  • Richardson JS (1981) The anatomy and taxonomy of protein structure. Biopolymers 34:167–339

    Google Scholar 

  • Rohl CA, Strauss CE, Misura KMS, Baker D (2004) Protein structure prediction using Rosetta. Methods Enzymol 383(2):66–93

    Article  Google Scholar 

  • Rose GD, Gierasch LM, Smith JA (1985) Turns in peptides and proteins. Adv Protein Chem 37:1–109

    Article  Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  • Sánchez R, Sali A (1997) Advances in comparative protein-structure modeling. Curr Opin Struct Biol 7(2):206–214

    Article  Google Scholar 

  • Sarisky CA, Mayo SL (2001) The beta-beta-alpha fold: explorations in sequence space. J Mol Biol 307:1411–1418

    Article  Google Scholar 

  • Schwartz R (2008) Biological Modeling and Simulation: a survey of pratical models, algorithms, and numerical methods, 1st edn. MIT Press, London

    Google Scholar 

  • Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated anneling and Bayesian score functions. J Mol Biol 268(1):209–225

    Article  Google Scholar 

  • Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960

    Article  Google Scholar 

  • Srinivasan R, Rose GD (1995) LINUS: a hierarchic procedure to predict the fold of a protein. Proteins 22(2):81–99

    Article  Google Scholar 

  • Tramontano A (2006) Protein structure prediction: concepts and applications, 1st edn. Wiley, Weinheim

    Google Scholar 

  • Tudor JE, Pallaghy PK, Pennington MW, Norton RS (1996) Solution structure of ShK toxin, a novel potassium channel inhibitor from a sea anemone. Nat Struct Biol 3:317–320

    Article  Google Scholar 

  • Tuffery P, Etchebest C, Hazout S, Lavery R (1991) A new approach to the rapid determination of protein side chain conformations. J Biomol Struct Dyn 8(6):1267–1289

    Article  Google Scholar 

  • Tugarinov V, Zvi A, Levy R, Anglister J (1999) A cis proline turn linking two beta-hairpin strands in the solution structure of an antibody-bound HIV-1IIIB V3 peptide. Nat Struct Biol 6:331–335

    Google Scholar 

  • Venkatachalam CM (1968) Stereochemical criteria for polypeptides and proteins. V. Conformation of a system of three linked peptide units. Biopolymers 6(10):1425–1436

    Google Scholar 

  • Withers-Ward ES, Mueller TD, Chen IS, Feigon J (2000) Biochemical and structural analysis of the interaction between the UBA(2) domain of the DNA repair protein HHR23A and HIV-1 Vpr. Biochemistry 39:14103–14112

    Google Scholar 

  • Xu D, Zhang J, Roy A, Zhang A (2011) Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based strcuture refinement. Proteins: Struct Funct Bioinf 79(10):147–160

    Google Scholar 

  • Yamano A, Heo NH, Teeter MM (1997) Crystal structure of Ser-22/ile-25 form crambin confirms solvent, side chain substate correlations. J Biol Chem 272:9597–9600

    Article  Google Scholar 

  • Zerella R, Chen PY, Evans PA, Raine A, Williams DH (2000) Structural characterization of a mutant peptide derived from ubiquitin: implications for protein folding. Protein Sci 9:2142–2150

    Article  Google Scholar 

  • Zhang Y (2008B) Progress and challenges in protein structure prediction. Curr Opin Struct Biol 18:342–348

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank MCT/CNPq, CAPES and FAPERGS (Brazil) for financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Márcio Dorn.

Additional information

Communicated by V. Piuri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dorn, M., Buriol, L.S. & Lamb, L.C. MOIRAE: A computational strategy to extract and represent structural information from experimental protein templates. Soft Comput 18, 773–795 (2014). https://doi.org/10.1007/s00500-013-1087-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-013-1087-6

Keywords

Navigation