Abstract
The prediction of secondary structure is an important topic in the field of bioinformatics, even if the methods have matured, and development of the algorithms is a far less active area than a decade ago. Accurate prediction is very useful to biologists in its own right, but it is worth pointing out that it is also an essential component of tertiary structure prediction, which in contrast is far from solved and continues to be a highly active area of research. In addition, sequence comparison methods have more recently incorporated local structure tracks. The extra information utilized by the new methods has led to considerable improvements in fold recognition and alignment accuracy. In this paper, a novel method for protein secondary structure prediction is presented. Using evolutionary information contained in amino acid’s physicochemical properties, position-specific scoring matrix generated by PSI-BLAST and HMMER3 profiles as input to hybrid back propagation system, secondary structure can be predicted at significantly increased accuracy. Based on knowledge discovery theory based on inner cognitive mechanism (KDTICM) theory, we have constructed a compound pyramid model approach, which is composed of four layers of the intelligent interface and integrated in several ways, such as hybrid back propagation method (HBP), modified knowledge discovery in databases (KDD*), hybrid SVM method (HSVM) and so on. Experiments on three standard datasets (RS126, CB513 and CASP8) show that CPM is capable of producing the higher Q 3 and SOV scores than that achieved by existing widely used schemes such as PSIPRED, PHD, Predator, as well as previously developed prediction methods. On the RS126 and CB513 datasets, it achieves a Q 3 and SOV99 score are considerably higher than the best reported scores, respectively. It is also tested on target proteins of critical assessment of protein structure prediction experiment (CASP8) and achieves better results than the traditional methods, including the popular PSIPRED method over overall prediction accuracy. Available: http://www.kdd.ustb.edu.cn/protein_Web.
Similar content being viewed by others
References
Ankur Bansal TC, Zhong S (2010) Privacy preserving back-propagation neural network learning over arbitrarily partitioned data. Neural Comput & Appl 20(1):143–150
APSSP2 (2009) http://www.imtech.res.in/raghava/apssp2
Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput & Appl 19(8):1165–1195
Baldi P, Brunak S, Frasconi P, Pollastri G, Soda G (1999) Bidirectional dynamics for protein secondary structure prediction. In: Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI99), vol 1828, Springer, Berlin, pp 80–104
Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G (1999) Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15(11):937–946
Barton GJ (1990) Protein multiple sequence alignment and flexible pattern matching. Methods Enzymol 183:403–428
Ben Gal I, Shani A, Gohr A, Grau J et al (2005) Identification of transcription factor binding sites with variable-order bayesian networks. Bioinformatics 21(11):2657–2666
Bhairpred (2009) http://www.imtech.res.in/raghava/bhairpred
Crooks GE, Brenner SE (2004) Protein secondary structure: entropy, correlations and prediction. Bioinformatics 20:1603–1611
Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34(4):508–519
Cuff JA, Barton GJ (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40:502–511
Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ (1998) Jpred: a consensus secondary structure prediction server. Bioinformatics 14(10):892–893
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
F Vivarelli PF, Casadio R (1997) The prediction of protein secondary structure with a cascade correlation learning architecture of neural networks. Neural Comput & Appl 6(2):57–62
Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins 23(4):566–579
Frishman D, Argos P (1997) 75% accuracy in protein secondary structure prediction. Proteins 327:329–335
Geoffrey JB, Michael JES (1987) A strategy for the rapid multiple alignment of protein sequences: confidence levels from tertiary structure comparisons. J Mol Biol 198:327–337
Gouchol P, Jin CH, Ryu KH (2008) Correlation of amino acid physicochemical properties with protein secondary structure conformation. In: proceedings of the 2008 international conference on biomedical engineering and informatics, vol 01, IEEE Computer Society, Washington, pp 117–121
Howlett R, Lovrek I, Jain L, Lim CP, Gabrys B (2010) Advances in design and application of neural networks. Neural Comput & Appl 19(2):167–168
Hua SJ, Sun ZR (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17:721–728
Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637
Karplus K, Karplus R, Draper J et al (2003) Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 53(6):491–496
Kim H, Park H (2003) Protein secondary structure prediction based on an improved support vector machines approach. Protein Eng 16(8):553–560
King RD, Sternberg MJ (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 5(11):2298–2310
Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Combining the gor v algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154–166
Negi S, Braun W (2007) Statistical analysis of physical-chemical properties and prediction of protein-protein interfaces. J Mol Model 13(11):1157–1167
Ouali M, King R (2000) Cascaded multiple classiers for secondary structure prediction. Protein Sci 9:1162–1176
Pedro J, Garca-Laencina JLSG, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput & Appl 19(2):263–282
Richards FM, Kundrot CE (1988) Identification of structural motifs from protein coordinate data: Secondary structure and first-level supersecondary structure. Proteins-struct Funct Bioinform 3(2):71–84
Robles V, Larranaga P, Pena J et al (2004) Bayesian network multi-classifiers for protein secondary structure prediction. Art Intell Med 31(2):117–136
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599
Salamov AA, Solovyev VV (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol 247(1):11–15
Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) Gor v server for protein secondary structure prediction. Bioinformatics 21(11):2787–2788
Sui H, Qu W, Yan B, Wang L (2011) Improved protein secondary structure prediction using a intelligent hsvm method with a new encoding scheme. IJACT : Int J Adv Comput Technol 3(3):239–250
Sujun H, Zhirong S (2001) A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 308(2):397–407
Yang BR (2004) Knowledge discovery based on theory of inner cognition mechanism and application. Beijing Electronic Industry Press, Beijing
Yang BR, Hou W, Zhou Z (2009) Kaapro: An approach of protein secondary structure prediction based on kdd* in the compound pyramid prediction model. Expert Syst Appl 36(5):9000–9006
Yang BR, Sun HH, Xiong FL (2002) Ming quantitative association rules with standard sql queries and it’s evaluation. J Comput Res Dev 39(3):307–312
Zemla A, Venclovas C, Fidelis K, Rost B (1999) A modified definition of sov, a segment-based measure for protein secondary structure prediction assessment. Proteins 34(2):220–223
Zhai Y, Yang B, Qu W, Sui H (2011) Study on source of classification in imbalanced datasets based on new ensemble classifier. J Syst Eng Electron 33(1):196–201
Zhou Z, Yang BR, Hou W (2009) An improved cba prediction algorithm in compound pyramid model. In: Proceedings of the 21st annual international conference on chinese control and decision conference, IEEE Press, Piscataway, pp 5212–5216
Acknowledgments
We are grateful for the support of the National Nature Science Foundation (60675030, 60875029), the Beijing Key Discipline Development Program (XK100080537) and the Beijing Key Discipline Development Program-Computer Architecture.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qu, W., Yang, B., Jiang, W. et al. HYBP_PSSP: a hybrid back propagation method for predicting protein secondary structure. Neural Comput & Applic 21, 337–349 (2012). https://doi.org/10.1007/s00521-011-0739-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-011-0739-7