Abstract
The beta sheet, as one of the three common second form of regular secondary structure in proteins plays an important role in protein function. The best strands in a beta sheet can be classified into the outer or inner strands. Considering the protein primary sequences have determinant information to arrange the strands in the beta sheet topology, we introduce an approach by using the random forest algorithm to predict outer or inner arrangement of a beta strand. We use nine features to describe a strand based on the hydrophobicity, the hydrophilicity, the side-chain mass and other properties of the beta strands. The random forest classifiers reach the best prediction accuracy 89.45% with 10-fold cross-validation among five machine learning methods. This result demonstrates that there are significant differences between the outer beta strands and the inner ones in beta sheets. The finding in this study can be used to arrange beta strands in a beta sheet without any prior structure information. It can also help better understanding the mechanisms of protein beta sheet formation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 308(2), 397–407 (2001)
Cheng, J.L., Baldi, P.: Three-Stage Prediction of Protein Beta-Sheets by Neural Networks, Alignments and Graph Algorithms. Bioinformatics 21(suppl.1), I75–I84 (2005)
Chen, C., et al.: Prediction of Protein Secondary Structure Content by Using the Concept of Chou’s Pseudo Amino Acid Composition and Support Vector Machine. Protein Pept. Lett. 16(1), 27–31 (2009)
Kuhlman, B., et al.: Design of a Novel Globular Protein Fold with Atomic-Level Accuracy. Science 302(5649), 1364–1368 (2003)
Zhang, C., Kim, S.H.: The Anatomy of Protein Beta-Sheet Topology. J. Mol. Biol. 299(4), 1075–1089 (2000)
Balbach, J.J., et al.: Supramolecular Structure in Full-Length Alzheimer’s Beta-Amyloid Fibrils: Evidence for a Parallel Beta-Sheet Organization from Solid-State Nuclear Magnetic Resonance. Biophysical Journal 83(2), 1205–1216 (2002)
Wathen, B., Jia, Z.C.: Protein Beta-Sheet Nucleation is Driven by Local Modular Formation. Journal of Biological Chemistry 285(24), 18376–18384 (2010)
Piana, S., et al.: Computational Design and Experimental Testing of the Fastest-Folding Beta-Sheet Protein. J. Mol. Biol. 405(1), 43–48 (2011)
Zhang, L., et al.: Studies on the Rules of Beta-Strand Alignment in a Protein Beta-Sheet Structure. Journal of Theoretical Biology 285(1), 69–76 (2011)
Goh, B.C., et al.: The Mechanism of Antiparallel Beta-Sheet Formation Based on Conditioned Self-Avoiding Walk. Eur. Phys. J. E Soft. Matter. 35(4), 9704 (2012)
Zhang, G.Z., Huang, D.S., Quan, Z.H.: Combining a Binary Input Encoding Scheme with RBFNN for Globulin Protein Inter-Residue Contact Map Prediction. Pattern Recognition Letters 26(10), 1543–1553 (2005)
Cheng, J.L., Baldi, P.: Improved Residue Contact Prediction Using Support Vector Machines and A Large Feature Set. BMC Bioinformatics 8, 113–121 (2007)
Steward, R.E., Thornton, J.M.: Prediction of Strand Pairing in Antiparallel and Parallel Beta-Sheets Using Information Theory. Proteins-Structure Function and Bioinformatics 48(2), 178–191 (2002)
Zhang, N., et al.: The Interstrand Amino Acid Pairs Play a Significant Role in Determining The Parallel or Antiparallel Orientation of Beta-Strands. Biochemical and Biophysical Research Communications 386(3), 537–543 (2009)
Zhang, N., et al.: Prediction of the Parallel/Antiparallel Orientation of Beta-Strands Using Amino Acid Pairing Preferences and Support Vector Machines. Journal of Theoretical Biology 263(3), 360–368 (2010)
Lifson, S., Sander, C.: Specific Recognition in the Tertiary Structure of Beta-Sheets of Proteins. Journal of Molecular Biology 139(4), 627–639 (1980)
Hubbard, T.J.: Use of Beta-Strand Interaction Pseudo-Potentials in Protein Structure Prediction and Modelling. In: Proceedings of The Biotechnology Computing Track, Protein Structure Prediction Minitrack of The 27th HICSS. IEEE Computer Society Press (1994)
Wouters, M.A., Curmi, P.M.: An Analysis of Side Chain Interactions and Pair Correlations Within Antiparallel Beta-Sheets: The Differences Between Backbone Hydrogen-Bonded and Non-Hydrogen-Bonded Residue Pairs. Proteins-Structure Function and Bioinformatics 22(2), 119–131 (1995)
Fooks, H.M., et al.: Amino Acid Pairing Preferences in Parallel Beta-Sheets in Proteins. Journal of Molecular Biology 356(1), 32–44 (2006)
Kato, Y., Akutsu, T., Seki, H.: Dynamic Programming Algorithms and Grammatical Modeling for Protein Beta-Sheet Prediction. Journal of Computational Biology 16(7), 945–957 (2009)
Aydin, Z., Altunbasak, Y., Erdogan, H.: Bayesian Models and Algorithms for Protein Beta-Sheet Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 395–409 (2011)
Wang, G.L., Dunbrack, R.L.: PISCES: A Protein Sequence Culling Server. Bioinformatics 19(12), 1589–1591 (2003)
Wang, G.L., Dunbrack, R.L.: PISCES: Recent Improvements to A PDB Sequence Culling Server. Nucleic Acids Research 33, W94–W98 (2005)
Chou, K.C., Shen, H.B.: Recent Progress in Protein Subcellular Location Prediction. Analytical Biochemistry 370(1), 1–16 (2007)
Chou, K.C.: Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. Journal of Theoretical Biology 273(1), 236–247 (2011)
Zhang, N., et al.: SHEETSPAIR: A Database of Amino Acid Pairs in Protein Sheet Structures. Data Science Journal 6, S589–S595 (2007)
Linding, R., et al.: Protein Disorder Prediction: Implications for Structural Proteomics. Structure 11(11), 1453–1459 (2003)
Ferron, F., et al.: A Practical Overview of Protein Disorder Prediction Methods. Proteins-Structure Function and Bioinformatics 65(1), 1–14 (2006)
Parisien, M., Major, F.: Ranking The Factors That Contribute to Protein B-Sheet Folding. Proteins: Structure, Function, and Bioinformatics 68(4), 824–829 (2007)
Wang, L.H., et al.: Predicting Protein Secondary Structure by a Support Vector Machine Based on a New Coding Scheme. Genome Inform. 15(2), 181–190 (2004)
French, S., Robson, B.: What Is a Conservative Substitution? J. Mol. Evol. 19, 171–175 (1983)
Tanford, C.: Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins. Journal of The American Chemical Society 84(22), 4240–4247 (1962)
Eisenberg, D., Wilcox, W., Mclachlan, A.D.: Hydrophobicity and Amphiphilicity in Protein Structure. J. Cell Biochem. 31(1), 11–17 (1986)
Chou, K.C.: Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition. Proteins-Structure Function and Bioinformatics 43(3), 246–255 (2001)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Hua, J.P., et al.: Optimal Number of Features as a Function of Sample Size for Various Classification Rules. Bioinformatics 21(8), 1509–1515 (2005)
Qi, Y., Klein-Seetharaman, J., Bar-Joseph, Z.: Random Forest Similarity for Protein-Protein Interaction Prediction From Multiple Sources. In: Pac. Symp. Biocomput., pp. 531–542 (2005)
Diaz-Uriarte, R., Alvarez De Andres, S.: Gene Selection and Classification of Microarray Data Using Random Forest. Bmc Bioinformatics 7, 3 (2006)
Jain, P., Hirst, J.D.: Automatic Structure Classification of Small Proteins Using Random Forest. Bmc Bioinformatics 11, 364 (2010)
Jia, S.C., Hu, X.Z.: Using Random Forest Algorithm to Predict Beta-Hairpin Motifs. Protein and Peptide Letters (2011)
Kandaswamy, K.K., et al.: AFP-Pred: a Random Forest Approach for Predicting Antifreeze Proteins From Sequence-Derived Properties. Journal of Theoretical Biology 270(1), 56–62 (2011)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)
Gao, S., et al.: Prediction of Function Changes Associated with Single-Point Protein Mutations Using Support Vector Machines (Svms). Human Mutation. 30(8), 1161–1166 (2009)
Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines (2001)
Kolinski, A., et al.: Generalized Comparative Modeling (GENECOMP): A Combination of Sequence Comparison, Threading, and Lattice Modeling for Protein Structure Prediction and Refinement. Proteins-Structure Function and Genetics 44(2), 133–149 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tang, L., Zhao, Z., Zhang, L., Zhang, T., Gao, S. (2014). Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based on the Random Forest Algorithm. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-09330-7_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09329-1
Online ISBN: 978-3-319-09330-7
eBook Packages: Computer ScienceComputer Science (R0)