Skip to main content

Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based on the Random Forest Algorithm

  • Conference paper
Book cover Intelligent Computing in Bioinformatics (ICIC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8590))

Included in the following conference series:

Abstract

The beta sheet, as one of the three common second form of regular secondary structure in proteins plays an important role in protein function. The best strands in a beta sheet can be classified into the outer or inner strands. Considering the protein primary sequences have determinant information to arrange the strands in the beta sheet topology, we introduce an approach by using the random forest algorithm to predict outer or inner arrangement of a beta strand. We use nine features to describe a strand based on the hydrophobicity, the hydrophilicity, the side-chain mass and other properties of the beta strands. The random forest classifiers reach the best prediction accuracy 89.45% with 10-fold cross-validation among five machine learning methods. This result demonstrates that there are significant differences between the outer beta strands and the inner ones in beta sheets. The finding in this study can be used to arrange beta strands in a beta sheet without any prior structure information. It can also help better understanding the mechanisms of protein beta sheet formation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 308(2), 397–407 (2001)

    Article  Google Scholar 

  2. Cheng, J.L., Baldi, P.: Three-Stage Prediction of Protein Beta-Sheets by Neural Networks, Alignments and Graph Algorithms. Bioinformatics 21(suppl.1), I75–I84 (2005)

    Google Scholar 

  3. Chen, C., et al.: Prediction of Protein Secondary Structure Content by Using the Concept of Chou’s Pseudo Amino Acid Composition and Support Vector Machine. Protein Pept. Lett. 16(1), 27–31 (2009)

    Article  Google Scholar 

  4. Kuhlman, B., et al.: Design of a Novel Globular Protein Fold with Atomic-Level Accuracy. Science 302(5649), 1364–1368 (2003)

    Article  Google Scholar 

  5. Zhang, C., Kim, S.H.: The Anatomy of Protein Beta-Sheet Topology. J. Mol. Biol. 299(4), 1075–1089 (2000)

    Article  MathSciNet  Google Scholar 

  6. Balbach, J.J., et al.: Supramolecular Structure in Full-Length Alzheimer’s Beta-Amyloid Fibrils: Evidence for a Parallel Beta-Sheet Organization from Solid-State Nuclear Magnetic Resonance. Biophysical Journal 83(2), 1205–1216 (2002)

    Article  Google Scholar 

  7. Wathen, B., Jia, Z.C.: Protein Beta-Sheet Nucleation is Driven by Local Modular Formation. Journal of Biological Chemistry 285(24), 18376–18384 (2010)

    Article  Google Scholar 

  8. Piana, S., et al.: Computational Design and Experimental Testing of the Fastest-Folding Beta-Sheet Protein. J. Mol. Biol. 405(1), 43–48 (2011)

    Article  Google Scholar 

  9. Zhang, L., et al.: Studies on the Rules of Beta-Strand Alignment in a Protein Beta-Sheet Structure. Journal of Theoretical Biology 285(1), 69–76 (2011)

    Article  MathSciNet  Google Scholar 

  10. Goh, B.C., et al.: The Mechanism of Antiparallel Beta-Sheet Formation Based on Conditioned Self-Avoiding Walk. Eur. Phys. J. E Soft. Matter. 35(4), 9704 (2012)

    Article  Google Scholar 

  11. Zhang, G.Z., Huang, D.S., Quan, Z.H.: Combining a Binary Input Encoding Scheme with RBFNN for Globulin Protein Inter-Residue Contact Map Prediction. Pattern Recognition Letters 26(10), 1543–1553 (2005)

    Article  Google Scholar 

  12. Cheng, J.L., Baldi, P.: Improved Residue Contact Prediction Using Support Vector Machines and A Large Feature Set. BMC Bioinformatics 8, 113–121 (2007)

    Article  Google Scholar 

  13. Steward, R.E., Thornton, J.M.: Prediction of Strand Pairing in Antiparallel and Parallel Beta-Sheets Using Information Theory. Proteins-Structure Function and Bioinformatics 48(2), 178–191 (2002)

    Article  Google Scholar 

  14. Zhang, N., et al.: The Interstrand Amino Acid Pairs Play a Significant Role in Determining The Parallel or Antiparallel Orientation of Beta-Strands. Biochemical and Biophysical Research Communications 386(3), 537–543 (2009)

    Article  Google Scholar 

  15. Zhang, N., et al.: Prediction of the Parallel/Antiparallel Orientation of Beta-Strands Using Amino Acid Pairing Preferences and Support Vector Machines. Journal of Theoretical Biology 263(3), 360–368 (2010)

    Article  Google Scholar 

  16. Lifson, S., Sander, C.: Specific Recognition in the Tertiary Structure of Beta-Sheets of Proteins. Journal of Molecular Biology 139(4), 627–639 (1980)

    Article  Google Scholar 

  17. Hubbard, T.J.: Use of Beta-Strand Interaction Pseudo-Potentials in Protein Structure Prediction and Modelling. In: Proceedings of The Biotechnology Computing Track, Protein Structure Prediction Minitrack of The 27th HICSS. IEEE Computer Society Press (1994)

    Google Scholar 

  18. Wouters, M.A., Curmi, P.M.: An Analysis of Side Chain Interactions and Pair Correlations Within Antiparallel Beta-Sheets: The Differences Between Backbone Hydrogen-Bonded and Non-Hydrogen-Bonded Residue Pairs. Proteins-Structure Function and Bioinformatics 22(2), 119–131 (1995)

    Article  Google Scholar 

  19. Fooks, H.M., et al.: Amino Acid Pairing Preferences in Parallel Beta-Sheets in Proteins. Journal of Molecular Biology 356(1), 32–44 (2006)

    Article  Google Scholar 

  20. Kato, Y., Akutsu, T., Seki, H.: Dynamic Programming Algorithms and Grammatical Modeling for Protein Beta-Sheet Prediction. Journal of Computational Biology 16(7), 945–957 (2009)

    Article  MathSciNet  Google Scholar 

  21. Aydin, Z., Altunbasak, Y., Erdogan, H.: Bayesian Models and Algorithms for Protein Beta-Sheet Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 395–409 (2011)

    Article  Google Scholar 

  22. Wang, G.L., Dunbrack, R.L.: PISCES: A Protein Sequence Culling Server. Bioinformatics 19(12), 1589–1591 (2003)

    Article  Google Scholar 

  23. Wang, G.L., Dunbrack, R.L.: PISCES: Recent Improvements to A PDB Sequence Culling Server. Nucleic Acids Research 33, W94–W98 (2005)

    Google Scholar 

  24. Chou, K.C., Shen, H.B.: Recent Progress in Protein Subcellular Location Prediction. Analytical Biochemistry 370(1), 1–16 (2007)

    Article  Google Scholar 

  25. Chou, K.C.: Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. Journal of Theoretical Biology 273(1), 236–247 (2011)

    Article  MathSciNet  Google Scholar 

  26. Zhang, N., et al.: SHEETSPAIR: A Database of Amino Acid Pairs in Protein Sheet Structures. Data Science Journal 6, S589–S595 (2007)

    Google Scholar 

  27. Linding, R., et al.: Protein Disorder Prediction: Implications for Structural Proteomics. Structure 11(11), 1453–1459 (2003)

    Article  Google Scholar 

  28. Ferron, F., et al.: A Practical Overview of Protein Disorder Prediction Methods. Proteins-Structure Function and Bioinformatics 65(1), 1–14 (2006)

    Article  Google Scholar 

  29. Parisien, M., Major, F.: Ranking The Factors That Contribute to Protein B-Sheet Folding. Proteins: Structure, Function, and Bioinformatics 68(4), 824–829 (2007)

    Article  Google Scholar 

  30. Wang, L.H., et al.: Predicting Protein Secondary Structure by a Support Vector Machine Based on a New Coding Scheme. Genome Inform. 15(2), 181–190 (2004)

    Google Scholar 

  31. French, S., Robson, B.: What Is a Conservative Substitution? J. Mol. Evol. 19, 171–175 (1983)

    Article  Google Scholar 

  32. Tanford, C.: Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins. Journal of The American Chemical Society 84(22), 4240–4247 (1962)

    Article  Google Scholar 

  33. Eisenberg, D., Wilcox, W., Mclachlan, A.D.: Hydrophobicity and Amphiphilicity in Protein Structure. J. Cell Biochem. 31(1), 11–17 (1986)

    Article  Google Scholar 

  34. Chou, K.C.: Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition. Proteins-Structure Function and Bioinformatics 43(3), 246–255 (2001)

    Article  Google Scholar 

  35. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  36. Hua, J.P., et al.: Optimal Number of Features as a Function of Sample Size for Various Classification Rules. Bioinformatics 21(8), 1509–1515 (2005)

    Article  Google Scholar 

  37. Qi, Y., Klein-Seetharaman, J., Bar-Joseph, Z.: Random Forest Similarity for Protein-Protein Interaction Prediction From Multiple Sources. In: Pac. Symp. Biocomput., pp. 531–542 (2005)

    Google Scholar 

  38. Diaz-Uriarte, R., Alvarez De Andres, S.: Gene Selection and Classification of Microarray Data Using Random Forest. Bmc Bioinformatics 7, 3 (2006)

    Article  Google Scholar 

  39. Jain, P., Hirst, J.D.: Automatic Structure Classification of Small Proteins Using Random Forest. Bmc Bioinformatics 11, 364 (2010)

    Article  Google Scholar 

  40. Jia, S.C., Hu, X.Z.: Using Random Forest Algorithm to Predict Beta-Hairpin Motifs. Protein and Peptide Letters (2011)

    Google Scholar 

  41. Kandaswamy, K.K., et al.: AFP-Pred: a Random Forest Approach for Predicting Antifreeze Proteins From Sequence-Derived Properties. Journal of Theoretical Biology 270(1), 56–62 (2011)

    Article  MathSciNet  Google Scholar 

  42. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)

    Google Scholar 

  43. Gao, S., et al.: Prediction of Function Changes Associated with Single-Point Protein Mutations Using Support Vector Machines (Svms). Human Mutation. 30(8), 1161–1166 (2009)

    Article  Google Scholar 

  44. Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines (2001)

    Google Scholar 

  45. Kolinski, A., et al.: Generalized Comparative Modeling (GENECOMP): A Combination of Sequence Comparison, Threading, and Lattice Modeling for Protein Structure Prediction and Refinement. Proteins-Structure Function and Genetics 44(2), 133–149 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tang, L., Zhao, Z., Zhang, L., Zhang, T., Gao, S. (2014). Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based on the Random Forest Algorithm. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09330-7_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09329-1

  • Online ISBN: 978-3-319-09330-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics