Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based on the Random Forest Algorithm

Tang, Li; Zhao, Zheng; Zhang, Lei; Zhang, Tao; Gao, Shan

doi:10.1007/978-3-319-09330-7_1

Li Tang^21,22,
Zheng Zhao²¹,
Lei Zhang²³,
Tao Zhang²³ &
…
Shan Gao²³

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8590))

Included in the following conference series:

International Conference on Intelligent Computing

3410 Accesses
1 Citations

Abstract

The beta sheet, as one of the three common second form of regular secondary structure in proteins plays an important role in protein function. The best strands in a beta sheet can be classified into the outer or inner strands. Considering the protein primary sequences have determinant information to arrange the strands in the beta sheet topology, we introduce an approach by using the random forest algorithm to predict outer or inner arrangement of a beta strand. We use nine features to describe a strand based on the hydrophobicity, the hydrophilicity, the side-chain mass and other properties of the beta strands. The random forest classifiers reach the best prediction accuracy 89.45% with 10-fold cross-validation among five machine learning methods. This result demonstrates that there are significant differences between the outer beta strands and the inner ones in beta sheets. The finding in this study can be used to arrange beta strands in a beta sheet without any prior structure information. It can also help better understanding the mechanisms of protein beta sheet formation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 308(2), 397–407 (2001)
Article Google Scholar
Cheng, J.L., Baldi, P.: Three-Stage Prediction of Protein Beta-Sheets by Neural Networks, Alignments and Graph Algorithms. Bioinformatics 21(suppl.1), I75–I84 (2005)
Google Scholar
Chen, C., et al.: Prediction of Protein Secondary Structure Content by Using the Concept of Chou’s Pseudo Amino Acid Composition and Support Vector Machine. Protein Pept. Lett. 16(1), 27–31 (2009)
Article Google Scholar
Kuhlman, B., et al.: Design of a Novel Globular Protein Fold with Atomic-Level Accuracy. Science 302(5649), 1364–1368 (2003)
Article Google Scholar
Zhang, C., Kim, S.H.: The Anatomy of Protein Beta-Sheet Topology. J. Mol. Biol. 299(4), 1075–1089 (2000)
Article MathSciNet Google Scholar
Balbach, J.J., et al.: Supramolecular Structure in Full-Length Alzheimer’s Beta-Amyloid Fibrils: Evidence for a Parallel Beta-Sheet Organization from Solid-State Nuclear Magnetic Resonance. Biophysical Journal 83(2), 1205–1216 (2002)
Article Google Scholar
Wathen, B., Jia, Z.C.: Protein Beta-Sheet Nucleation is Driven by Local Modular Formation. Journal of Biological Chemistry 285(24), 18376–18384 (2010)
Article Google Scholar
Piana, S., et al.: Computational Design and Experimental Testing of the Fastest-Folding Beta-Sheet Protein. J. Mol. Biol. 405(1), 43–48 (2011)
Article Google Scholar
Zhang, L., et al.: Studies on the Rules of Beta-Strand Alignment in a Protein Beta-Sheet Structure. Journal of Theoretical Biology 285(1), 69–76 (2011)
Article MathSciNet Google Scholar
Goh, B.C., et al.: The Mechanism of Antiparallel Beta-Sheet Formation Based on Conditioned Self-Avoiding Walk. Eur. Phys. J. E Soft. Matter. 35(4), 9704 (2012)
Article Google Scholar
Zhang, G.Z., Huang, D.S., Quan, Z.H.: Combining a Binary Input Encoding Scheme with RBFNN for Globulin Protein Inter-Residue Contact Map Prediction. Pattern Recognition Letters 26(10), 1543–1553 (2005)
Article Google Scholar
Cheng, J.L., Baldi, P.: Improved Residue Contact Prediction Using Support Vector Machines and A Large Feature Set. BMC Bioinformatics 8, 113–121 (2007)
Article Google Scholar
Steward, R.E., Thornton, J.M.: Prediction of Strand Pairing in Antiparallel and Parallel Beta-Sheets Using Information Theory. Proteins-Structure Function and Bioinformatics 48(2), 178–191 (2002)
Article Google Scholar
Zhang, N., et al.: The Interstrand Amino Acid Pairs Play a Significant Role in Determining The Parallel or Antiparallel Orientation of Beta-Strands. Biochemical and Biophysical Research Communications 386(3), 537–543 (2009)
Article Google Scholar
Zhang, N., et al.: Prediction of the Parallel/Antiparallel Orientation of Beta-Strands Using Amino Acid Pairing Preferences and Support Vector Machines. Journal of Theoretical Biology 263(3), 360–368 (2010)
Article Google Scholar
Lifson, S., Sander, C.: Specific Recognition in the Tertiary Structure of Beta-Sheets of Proteins. Journal of Molecular Biology 139(4), 627–639 (1980)
Article Google Scholar
Hubbard, T.J.: Use of Beta-Strand Interaction Pseudo-Potentials in Protein Structure Prediction and Modelling. In: Proceedings of The Biotechnology Computing Track, Protein Structure Prediction Minitrack of The 27th HICSS. IEEE Computer Society Press (1994)
Google Scholar
Wouters, M.A., Curmi, P.M.: An Analysis of Side Chain Interactions and Pair Correlations Within Antiparallel Beta-Sheets: The Differences Between Backbone Hydrogen-Bonded and Non-Hydrogen-Bonded Residue Pairs. Proteins-Structure Function and Bioinformatics 22(2), 119–131 (1995)
Article Google Scholar
Fooks, H.M., et al.: Amino Acid Pairing Preferences in Parallel Beta-Sheets in Proteins. Journal of Molecular Biology 356(1), 32–44 (2006)
Article Google Scholar
Kato, Y., Akutsu, T., Seki, H.: Dynamic Programming Algorithms and Grammatical Modeling for Protein Beta-Sheet Prediction. Journal of Computational Biology 16(7), 945–957 (2009)
Article MathSciNet Google Scholar
Aydin, Z., Altunbasak, Y., Erdogan, H.: Bayesian Models and Algorithms for Protein Beta-Sheet Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 395–409 (2011)
Article Google Scholar
Wang, G.L., Dunbrack, R.L.: PISCES: A Protein Sequence Culling Server. Bioinformatics 19(12), 1589–1591 (2003)
Article Google Scholar
Wang, G.L., Dunbrack, R.L.: PISCES: Recent Improvements to A PDB Sequence Culling Server. Nucleic Acids Research 33, W94–W98 (2005)
Google Scholar
Chou, K.C., Shen, H.B.: Recent Progress in Protein Subcellular Location Prediction. Analytical Biochemistry 370(1), 1–16 (2007)
Article Google Scholar
Chou, K.C.: Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. Journal of Theoretical Biology 273(1), 236–247 (2011)
Article MathSciNet Google Scholar
Zhang, N., et al.: SHEETSPAIR: A Database of Amino Acid Pairs in Protein Sheet Structures. Data Science Journal 6, S589–S595 (2007)
Google Scholar
Linding, R., et al.: Protein Disorder Prediction: Implications for Structural Proteomics. Structure 11(11), 1453–1459 (2003)
Article Google Scholar
Ferron, F., et al.: A Practical Overview of Protein Disorder Prediction Methods. Proteins-Structure Function and Bioinformatics 65(1), 1–14 (2006)
Article Google Scholar
Parisien, M., Major, F.: Ranking The Factors That Contribute to Protein B-Sheet Folding. Proteins: Structure, Function, and Bioinformatics 68(4), 824–829 (2007)
Article Google Scholar
Wang, L.H., et al.: Predicting Protein Secondary Structure by a Support Vector Machine Based on a New Coding Scheme. Genome Inform. 15(2), 181–190 (2004)
Google Scholar
French, S., Robson, B.: What Is a Conservative Substitution? J. Mol. Evol. 19, 171–175 (1983)
Article Google Scholar
Tanford, C.: Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins. Journal of The American Chemical Society 84(22), 4240–4247 (1962)
Article Google Scholar
Eisenberg, D., Wilcox, W., Mclachlan, A.D.: Hydrophobicity and Amphiphilicity in Protein Structure. J. Cell Biochem. 31(1), 11–17 (1986)
Article Google Scholar
Chou, K.C.: Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition. Proteins-Structure Function and Bioinformatics 43(3), 246–255 (2001)
Article Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Hua, J.P., et al.: Optimal Number of Features as a Function of Sample Size for Various Classification Rules. Bioinformatics 21(8), 1509–1515 (2005)
Article Google Scholar
Qi, Y., Klein-Seetharaman, J., Bar-Joseph, Z.: Random Forest Similarity for Protein-Protein Interaction Prediction From Multiple Sources. In: Pac. Symp. Biocomput., pp. 531–542 (2005)
Google Scholar
Diaz-Uriarte, R., Alvarez De Andres, S.: Gene Selection and Classification of Microarray Data Using Random Forest. Bmc Bioinformatics 7, 3 (2006)
Article Google Scholar
Jain, P., Hirst, J.D.: Automatic Structure Classification of Small Proteins Using Random Forest. Bmc Bioinformatics 11, 364 (2010)
Article Google Scholar
Jia, S.C., Hu, X.Z.: Using Random Forest Algorithm to Predict Beta-Hairpin Motifs. Protein and Peptide Letters (2011)
Google Scholar
Kandaswamy, K.K., et al.: AFP-Pred: a Random Forest Approach for Predicting Antifreeze Proteins From Sequence-Derived Properties. Journal of Theoretical Biology 270(1), 56–62 (2011)
Article MathSciNet Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)
Google Scholar
Gao, S., et al.: Prediction of Function Changes Associated with Single-Point Protein Mutations Using Support Vector Machines (Svms). Human Mutation. 30(8), 1161–1166 (2009)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines (2001)
Google Scholar
Kolinski, A., et al.: Generalized Comparative Modeling (GENECOMP): A Combination of Sequence Comparison, Threading, and Lattice Modeling for Protein Structure Prediction and Refinement. Proteins-Structure Function and Genetics 44(2), 133–149 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, Tianjin, P.R. China
Li Tang & Zheng Zhao
Information Science and Technology Department, Tianjin University of Finance and Economics, Tianjin, P.R. China
Li Tang
Key Lab. of Bioactive Materials, Ministry of Education and The College of Life Sciences, Nankai University, Tianjin, P.R. China
Lei Zhang, Tao Zhang & Shan Gao

Authors

Li Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shan Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
School of Computer Science and Engineering Inha University, Incheon, South Korea
Kyungsook Han
Department of Biotechnology, Indian Institute of Technology Madras, 600 036, Chennai, Tamilnadu, India
Michael Gromiha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, L., Zhao, Z., Zhang, L., Zhang, T., Gao, S. (2014). Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based on the Random Forest Algorithm. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-09330-7_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09329-1
Online ISBN: 978-3-319-09330-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics