A method of predicting the secondary protein structure based on dictionaries

Irena Roterman-Konieczna; Piotr Fabian; Katarzyna Stąpor

doi:10.1515/bams-2015-0019

Published by De Gruyter August 15, 2015

A method of predicting the secondary protein structure based on dictionaries

Irena Roterman-Konieczna , Piotr Fabian and Katarzyna Stąpor

From the journal Bio-Algorithms and Med-Systems

https://doi.org/10.1515/bams-2015-0019

Showing a limited preview of this publication:

Abstract

The shape of a protein chain may be analyzed at different levels of details. The ultimate shape description contains three-dimensional coordinates of all atoms in the chain. In many cases, a description of the local shape, namely secondary structure, is enough to determine some properties of proteins. Although obtaining the full three-dimensional (3D) information also defines the secondary structure, the problem of finding this precise 3D shape (tertiary structure) given only the amino acid sequence is very complex. However, the secondary structure may be found even without having the full 3D information. Many methods have been developed for this purpose. Most of them are based on similarities of the analyzed protein chain to other proteins that are already analyzed and have a known secondary structure. The presented paper proposes a method based on dictionaries of known structures for predicting the secondary structure from either the primary structure or the so-called structural code. Accuracies of up to 79% have been achieved.

Keywords: protein secondary structure prediction; statistical dictionary; structural code

Corresponding author: Piotr Fabian, Institute of Computer Science, Silesian Technical University, Akademicka 16, Gliwice, Poland, E-mail: Piotr.Fabian@polsl.pl

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

References

1. Tramontano A. Protein structure prediction: concepts and applications. Weinheim: Wiley-VCH, 2006.Search in Google Scholar

2. Kabsch W, Sander C. A database of secondary structure assignments (and much more) for all protein entries in the Protein Data Bank (PDB), 2012. Available at: http://swift.cmbi.ru.nl/gv/dssp/. Accessed on October, 2012.Search in Google Scholar

3. Chou PY, Fasman GD. Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry 1974;13:211–22.10.1021/bi00699a001Search in Google Scholar

4. Garnier J, Osguthorpe DJ, Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 1978;120:97–120.10.1016/0022-2836(78)90297-8Search in Google Scholar

5. Lim V. Algorithms for prediction of α-helical and β-structural regions in globular proteins. J Mol Biol 1974;88:873–94.10.1016/0022-2836(74)90405-7Search in Google Scholar

6. Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 1993;232:584–99.10.1006/jmbi.1993.1413Search in Google Scholar PubMed

7. Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999;292:195–202.10.1006/jmbi.1999.3091Search in Google Scholar PubMed

8. Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–402.10.1093/nar/25.17.3389Search in Google Scholar PubMed PubMed Central

9. Przybylski D, Rost B. Alignments grow, secondary structure prediction improves. Proteins 2002;46:197–205.10.1002/prot.10029Search in Google Scholar PubMed

10. Pollastri G, McLysaght A. Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 2005;21:1719–20.10.1093/bioinformatics/bti203Search in Google Scholar PubMed

11. Ward JJ, McGuffin LJ, Buxton BF, Jones DT. Secondary structure prediction with support vector machines. Bioinformatics 2003;19:1650–5.10.1093/bioinformatics/btg223Search in Google Scholar PubMed

12. Nguyen MN, Rajapakse JC. Two-stage multi-class support vector machines to protein secondary structure prediction. Pac Symp Biocomput 2005:346–57.Search in Google Scholar

13. Stąpor K. Metody klasyfikacji obiektów w wizji komputerowej. Wydawnictwo Naukowe PWN, 2011.Search in Google Scholar

14. Błażewicz J, Łukasiak P, Wilk S. New machine learning methods for prediction of protein secondary structures. Control Cybernet 2007;36:183–201.Search in Google Scholar

15. Rost B. Rising accuracy of protein secondary structure prediction. New York: Dekker, 2003:207–49.10.1201/9780203911327.ch8Search in Google Scholar

16. Zemla A, Venclovas C, Fidelis K, Rost B. A modified definition of SOV, a segment-based measure for protein secondary structure prediction assessment. Proteins 1999;34:220–3.10.1002/(SICI)1097-0134(19990201)34:2<220::AID-PROT7>3.0.CO;2-KSearch in Google Scholar

17. Yang W, Wang K-Q, Zuo W-M. Protein secondary structure prediction based on statistical dictionaries. In: 3rd International conference on bioinformatics and biomedical engineering, 2009:1–4.10.1109/ICBBE.2009.5163256Search in Google Scholar

18. Lin H, Sung T, Ho S, Hsu W. Improving protein secondary structure prediction based on short subsequences with local structure similarity. BMC Genomics 2010;11:S4.10.1186/1471-2164-11-S4-S4Search in Google Scholar

19. Rost B, Sander C, Schneider R. Redefining the goals of protein secondary structure prediction. J Mol Biol 1994;235:13–26.10.1016/S0022-2836(05)80007-5Search in Google Scholar

20. Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983;22:2577–637.10.1002/bip.360221211Search in Google Scholar

21. Joosten RP, te Beek TA, Krieger E, Hekkelman ML, Hooft RW, Schneider R, et al. A series of PDB related databases for everyday needs. Nucleic Acids Res 2011;39:D411–9.10.1093/nar/gkq1105Search in Google Scholar

22. Brylinski M, Konieczny L, Roterman I. SPI – structure predictability index for protein sequences. In Silico Biol 2005;5: 227–37.Search in Google Scholar

23. Brylinski M, Konieczny L, Czerwonko P, Jurkowski W, Roterman I. Early-stage folding in proteins (in silico) sequence-to-structure relation. J Biomed Biotechnol 2005;2005: 65–80.10.1155/JBB.2005.65Search in Google Scholar

24. Kalinowska B, Fabian P, Stąpor K, Roterman I. Statistical dictionaries for hypothetical in silico model of the early-stage intermediate in protein folding. J Comput-Aided Mol Des 2015;29:609–18.10.1007/s10822-015-9839-2Search in Google Scholar

Received: 2015-6-14

Accepted: 2015-7-22

Published Online: 2015-8-15

Published in Print: 2015-9-1

A method of predicting the secondary protein structure based on dictionaries

Abstract

References

Journal and Issue

Articles in the same Issue