Abstract
Residue contact maps offer a 2-d, reduced representation of 3-d protein structures and constitute a structural constraint and scaffold in structural modeling. Precise residue contact maps are not only helpful as an intermediate step towards generating effective 3-d protein models, but also useful in their own right in identifying binding sites and hence providing insights about a protein’s functions. Indeed, many computational methods have been developed to predict residue contacts using a variety of features based on sequence, physio-chemical properties, and co-evolutionary information. In this work, we set to explore the use of structural information for predicting inter-helical residue contact in transmembrane proteins. Specifically, we extract structural information from a neighborhood around a residue pair of interest and train a classifier to determine whether the residue pair is a contact point or not. To make the task practical, we avoid using the 3-d coordinates directly, instead we extract features such as relative distances and angles. Further, we exclude any structural information of the residue pair of interest from the input feature set in training and testing of the classifier. We compare our method to a state-of-the-art method that uses non-structural information on a benchmark data set. The results from experiments on held out datasets show that the our method achieves above 90% precision for top L/2 and L inter-helical contacts, significantly outperforming the state-of-the-art method and may serve as an upper bound on the performance when using non-structural information. Further, we evaluate the robustness of our method by injecting Gaussian normal noise into PDB coordinates and hence into our derived features. We find that our model’s performance is robust to high noise levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Contact maps (molecular biology). https://what-when-how.com/molecular-biology/contact-maps-molecular-biology/. Accessed 26 Jan 2022
Information retrieval - wikipedia. https://en.wikipedia.org/w/index.php?title=Information_retrieval &oldid=793358396#Average_precision. Accessed 26 Jan 2022
Receiver operating characteristic - Wikipedia. https://en.wikipedia.org/wiki/Receiver_operating_characteristic. Accessed 26 Jan 2022
Scientists alter membrane proteins to make them easier to study - sciencedaily. https://www.sciencedaily.com/releases/2018/08/180828104043.htm. Accessed 26 Jan 2022
Albers, R.W.W.: Cell membrane structures and functions. In: Basic Neurochemistry, pp. 26–39. Elsevier (2012)
Almén, M.S., Nordström, K.J., Fredriksson, R., Schiöth, H.B.: Mapping the human membrane proteome: a majority of the human membrane proteins can be classified according to function and evolutionary origin. BMC Biol. 7(1), 1–14 (2009)
Attwood, M.M., Schiöth, H.B.: Characterization of five transmembrane proteins: with focus on the tweety, sideroflexin, and YIP1 domain families. Front. Cell Dev. Biol. 9, 1950 (2021)
Baldassi, C., et al.: Fast and accurate multivariate gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 9(3), e92721 (2014)
Berman, H.M., Battistuz, T., Bhat, T.N., Bluhm, W.F., Bourne, P.E., Burkhardt, K., Feng, Z., Gilliland, G.L., Iype, L., Jain, S., et al.: The protein data bank. Acta Crystallogr. D Biol. Crystallogr. 58(6), 899–907 (2002)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Brünger, A.T.: X-ray crystallography and NMR reveal complementary views of structure and dynamics. Nat. Struct. Biol. 4, 862–865 (1997)
Cooper, J.: Alpha-Helix geometry part. 2 – cryst.bbk.ac.uk (1995). https://www.cryst.bbk.ac.uk/PPS95/course/3_geometry/helix2.html. Accessed 25 Jan 2022
Dago, A.E., Schug, A., Procaccini, A., Hoch, J.A., Weigt, M., Szurmant, H.: Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc. Natl. Acad. Sci. 109(26), E1733–E1742 (2012)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Du, Z., et al.: The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 16(12), 5634–5651 (2021)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Friedman, J., Hastie, T., Tibshirani, R., et al.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-84858-7
Frishman, D., Mewes, H.W.: Protein structural classes in five complete genomes. Nat. Struct. Biol. 4(8), 626–628 (1997)
Hönigschmid, P., Frishman, D.: Accurate prediction of helix interactions and residue contacts in membrane proteins. J. Struct. Biol. 194(1), 112–123 (2016)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Heidelberg (2013)
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al.: Highly accurate protein structure prediction with alphafold. Nature 596(7873), 583–589 (2021)
Kaján, L., Hopf, T.A., Kalaš, M., Marks, D.S., Rost, B.: FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinform. 15(1), 1–6 (2014)
Kandathil, S.M., Greener, J.G., Jones, D.T.: Prediction of interresidue contacts with DeepMetaPSICOV in CASP13. Proteins Struct. Funct. Bioinform. 87(12), 1092–1099 (2019)
Karlin, S., Zuker, M., Brocchieri, L.: Measuring residue association in protein structures possible implications for protein folding. J. Mol. Biol. 239(2), 227–248 (1994)
Kermani, A.A.: A guide to membrane protein X-ray crystallography. FEBS J. 288(20), 5788–5804 (2021)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, Montreal, Canada, vol. 14, pp. 1137–1145 (1995)
Kozma, D., Simon, I., Tusnady, G.E.: PDBTM: protein data bank of transmembrane proteins after 8 years. Nucleic Acids Res. 41(D1), D524–D529 (2012)
Lagerström, M.C., Schiöth, H.B.: Structural diversity of G protein-coupled receptors and significance for drug discovery. Nat. Rev. Drug Discovery 7(4), 339–357 (2008)
Lee, H.S., Choi, J., Yoon, S.: QHELIX: a computational tool for the improved measurement of inter-helical angles in proteins. Protein. J. 26(8), 556–561 (2007)
Li, J., Sawhney, A., Lee, J.Y., Liao, L.: Improving inter-helix contact prediction with local 2D topological information (2023)
Lubecka, E.A., Liwo, A.: Introduction of a bounded penalty function in contact-assisted simulations of protein structures to omit false restraints. J. Comput. Chem. 40(25), 2164–2178 (2019)
Mahbub, S., Bayzid, M.S.: EGRET: edge aggregated graph attention networks and transfer learning improve protein-protein interaction site prediction. bioRxiv, pp. 2020–11 (2021)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Raval, A., Piana, S., Eastwood, M.P., Shaw, D.E.: Assessment of the utility of contact-based restraints in accelerating the prediction of protein structure using molecular dynamics simulations. Protein Sci. 25(1), 19–29 (2016)
Schrödinger, LLC: The AxPyMOL molecular graphics plugin for Microsoft PowerPoint, version 1.8 (2015)
Schrödinger, LLC: The JyMOL molecular graphics development component, version 1.8 (2015)
Schrödinger, LLC: The PyMOL molecular graphics system, version 1.8 (2015)
Sheridan, R., et al.: EVfold. org: evolutionary couplings and protein 3D structure prediction. biorxiv, p. 021022 (2015)
Sun, J., Frishman, D.: DeepHelicon: accurate prediction of inter-helical residue contacts in transmembrane proteins by residual neural networks. J. Struct. Biol. 212(1), 107574 (2020)
Torda, A.: Powerpoint presentation. https://www.zbh.uni-hamburg.de/forschung/bm/lehre/downloads/ws1718/67-104/1-genauigkeit.pdf. Accessed 07 Apr 2022
Tusnády, G.E., Dosztányi, Z., Simon, I.: Transmembrane proteins in the protein data bank: identification and classification. Bioinformatics 20(17), 2964–2972 (2004)
Tusnády, G.E., Dosztányi, Z., Simon, I.: PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank. Nucleic Acids Res. 33(suppl_1), D275–D278 (2005)
Vangone, A., Bonvin, A.M.: Contacts-based prediction of binding affinity in protein-protein complexes. Elife 4, e07454 (2015)
Wang, S., Sun, S., Li, Z., Zhang, R., Xu, J.: Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13(1), e1005324 (2017)
Wang, X.F., Chen, Z., Wang, C., Yan, R.X., Zhang, Z., Song, J.: Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach. PLoS One 6(10), e26767 (2011)
Xu, J., Zhang, Y.: How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26(7), 889–895 (2010)
Zhang, H., et al.: Evaluation of residue-residue contact prediction methods: from retrospective to prospective. PLoS Comput. Biol. 17(5), e1009027 (2021)
Acknowledgment
Support from the University of Delaware CBCB Bioinformatics Core Facility and use of the BIOMIX compute cluster was made possible through funding from Delaware INBRE (NIH NIGMS P20 GM103446), the State of Delaware, and the Delaware Biotechnology Institute. The authors would also like to thank the National Science Foundation (NSF-MCB1820103), which partly supported this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sawhney, A., Li, J., Liao, L. (2023). Inter-helical Residue Contact Prediction in \(\alpha \)-Helical Transmembrane Proteins Using Structural Features. In: Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2023. Lecture Notes in Computer Science(), vol 13920. Springer, Cham. https://doi.org/10.1007/978-3-031-34960-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-34960-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34959-1
Online ISBN: 978-3-031-34960-7
eBook Packages: Computer ScienceComputer Science (R0)