ABSTRACT
Our perception of protein's function is highly related to our understanding of the protein's three-dimensional (3D) structure and how the structure is computationally predicted. Evaluating the quality of a predicted 3D structural model is crucial for protein structure prediction. In recent years, many research works have leveraged deep learning architectures for the protein structure prediction alongside combinations of massive protein features to evaluate the predicted model's quality. Most recent works have proven that the inter-residue distance and alignment-based coevolutionary information significantly improve the accuracy of protein structure prediction tasks. This work utilizes the structural constraints derived from multiple sequence alignments, powered by the deep graph convolutional neural network, to estimate the protein model accuracy (EMA). The method models protein structure as a connected graph, in which each node encodes the residue's structural information, and the edge represents the structural relationship between any pair of residues in a structure. We incorporate a new feature embedding block in deep graph learning that utilizes the convolution and self-attention technique to leverage sequence alignment information for high-accurate protein quality estimation. We benchmark our methods to other state-of-the-art quality assessment approaches on the CASP13 and CASP14 datasets. The results indicate the effectiveness of alignment-based features and attention-based graph learning in EMA problems and show an improvement of our method among the previous works.
- 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic acids research 49, D1 (2021), D480--D489.Google Scholar
- Badri Adhikari and Jianlin Cheng. 2016. Protein Residue Contacts and Prediction Methods. Methods in molecular biology (Clifton, N.J.) 1415 (2016), 463--76.Google Scholar
- Badri Adhikari, Jie Hou, and Jianlin Cheng. 2018. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics (Oxford, England) 34 (May 2018), 1466--1472. Issue 9.Google ScholarCross Ref
- Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N Kinch, R Dustin Schaeffer, et al. 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 6557 (2021), 871--876.Google Scholar
- Renzhi Cao, Debswapna Bhattacharya, Jie Hou, and Jianlin Cheng. 2016. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC bioinformatics 17, 1 (2016), 1--9.Google Scholar
- Jianlin Cheng, Myong-Ho Choe, Arne Elofsson, Kun-Sop Han, Jie Hou, Ali HA Maghrabi, Liam J McGuffin, David Menéndez-Hurtado, Kliment Olechnovič, Torsten Schwede, et al. 2019. Estimation of model accuracy in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1361--1377.Google ScholarCross Ref
- S. Cristobal, A. Zemla, D. Fischer, L. Rychlewski, and A. Elofsson. 2001. A study of quality measures for protein threading models. BMC bioinformatics 2 (2001), 5.Google Scholar
- Zhiye Guo, Jie Hou, and Jianlin Cheng. 2021. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 89 (Feb 2021), 207--217. Issue 2.Google Scholar
- Kyle Hippe, Cade Lilley, Joshua William Berkenpas, Ciri Chandana Pocha, Kiyomi Kishaba, Hui Ding, Jie Hou, Dong Si, and Renzhi Cao. 2022. ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features. Briefings in bioinformatics 23, 1 (2022), bbab384.Google Scholar
- Naozumi Hiranuma, Hahnbeom Park, Minkyung Baek, Ivan Anishchenko, Justas Dauparas, and David Baker. 2021. Improved protein structure refinement guided by deep learning based accuracy estimation. Nature communications 12, 1 (2021), 1--11.Google Scholar
- Jie Hou, Tianqi Wu, Renzhi Cao, and Jianlin Cheng. 2019. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1165--1178.Google ScholarCross Ref
- Xiaoyang Jing and Jinbo Xu. 2021. Fast and effective protein model refinement using deep graph neural networks. Nature Computational Science 1, 7 (2021), 462--469.Google ScholarCross Ref
- Fusong Ju, Jianwei Zhu, Bin Shao, Lupeng Kong, Tie-Yan Liu, Wei-Mou Zheng, and Dongbo Bu. 2021. CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nature communications 12, 1 (2021), 1--9.Google Scholar
- John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583--589.Google Scholar
- W. Kabsch and C. Sander. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (Dec 1983), 2577--637. Issue 12.Google Scholar
- Shaun M Kandathil, Joe G Greener, Andy M Lau, and David T Jones. 2022. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proceedings of the National Academy of Sciences 119, 4 (2022).Google ScholarCross Ref
- Sohee Kwon, Jonghun Won, Andriy Kryshtafovych, and Chaok Seok. 2021. Assessment of protein model structure accuracy estimation in CASP14: Old and new challenges. Proteins: Structure, Function, and Bioinformatics 89, 12 (2021), 1940--1948.Google ScholarCross Ref
- Andrew Leaver-Fay, Michael Tyka, Steven M Lewis, Oliver F Lange, James Thompson, Ron Jacak, Kristian W Kaufman, P Douglas Renfrew, Colin A Smith, Will Sheffler, et al. 2011. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. In Methods in enzymology. Vol. 487. Elsevier, 545--574.Google Scholar
- Michael Levitt and Mark Gerstein. 1998. A unified statistical framework for sequence comparison and structure comparison. Proceedings of the National Academy of sciences 95, 11 (1998), 5913--5920.Google ScholarCross Ref
- Yang Li, Chengxin Zhang, Eric W Bell, Wei Zheng, Xiaogen Zhou, Dong-Jun Yu, and Yang Zhang. 2021. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS computational biology 17, 3 (2021), e1008865.Google Scholar
- Valerio Mariani, Marco Biasini, Alessandro Barbato, and Torsten Schwede. 2013. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 21 (2013), 2722--2728.Google ScholarCross Ref
- Milot Mirdita, Lars Von Den Driesch, Clovis Galiez, Maria J Martin, Johannes Söding, and Martin Steinegger. 2017. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic acids research 45, D1 (2017), D170--D176.Google Scholar
- Milot Mirdita, Lars von den Driesch, Clovis Galiez, Maria J. Martin, Johannes Söding, and Martin Steinegger. 2017. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic acids research 45 (Jan 2017), D170--D176. Issue D1.Google Scholar
- John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Burkhard Rost, and Anna Tramontano. 2009. Critical assessment of methods of protein structure prediction---Round VIII. Proteins: Structure, Function, and Bioinformatics 77, S9 (2009), 1--4.Google ScholarCross Ref
- John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Torsten Schwede, and Anna Tramontano. 2014. Critical assessment of methods of protein structure prediction (CASP)---round x. Proteins: Structure, Function, and Bioinformatics 82 (2014), 1--6.Google ScholarCross Ref
- John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Torsten Schwede, and Anna Tramontano. 2016. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics 84 (2016), 4--14.Google ScholarCross Ref
- John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, and Anna Tramontano. 2011. Critical assessment of methods of protein structure prediction (CASP)---round IX. Proteins: Structure, Function, and Bioinformatics 79, S10 (2011), 1--5.Google ScholarCross Ref
- Guillaume Pagès, Benoit Charmettant, and Sergei Grudinin. 2019. Protein model quality assessment using 3D oriented convolutional neural networks. Bioinformatics 35, 18 (2019), 3313--3319.Google ScholarCross Ref
- Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. 2021. MSA transformer. In International Conference on Machine Learning. PMLR, 8844--8856.Google ScholarCross Ref
- Arjun Ray, Erik Lindahl, and Björn Wallner. 2012. Improved model quality assessment using ProQ2. BMC bioinformatics 13, 1 (2012), 1--12.Google Scholar
- Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences 118, 15 (2021).Google ScholarCross Ref
- Carol A Rohl, Charlie EM Strauss, Kira MS Misura, and David Baker. 2004. Protein structure prediction using Rosetta. In Methods in enzymology. Vol. 383. Elsevier, 66--93.Google Scholar
- Soumya Sanyal, Ivan Anishchenko, Anirudh Dagar, David Baker, and Partha Talukdar. 2020. ProteinGCN: Protein model quality assessment using graph convolutional networks. BioRxiv (2020).Google Scholar
- A. A. Schäffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and S. F. Altschul. 2001. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic acids research 29 (Jul 2001), 2994--3005. Issue 14.Google Scholar
- Md Hossain Shuvo, Sutanu Bhattacharya, and Debswapna Bhattacharya. 2020. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks. Bioinformatics (Oxford, England) 36 (Jul 2020), i285--i291.Google ScholarCross Ref
- Yifan Song, Frank DiMaio, Ray Yu-Ruei Wang, David Kim, Chris Miles, TJ Brunette, James Thompson, and David Baker. 2013. High-resolution comparative modeling with RosettaCM. Structure 21, 10 (2013), 1735--1742.Google ScholarCross Ref
- Martin Steinegger and Johannes Söding. 2018. Clustering huge protein sequence sets in linear time. Nature communications 9, 1 (2018), 1--8.Google Scholar
- Baris E Suzek, Yuqi Wang, Hongzhan Huang, Peter B McGarvey, Cathy H Wu, and UniProt Consortium. 2015. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 6 (2015), 926--932.Google ScholarCross Ref
- Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2017. ProQ3D: improved model quality assessments using deep learning. Bioinformatics (Oxford, England) 33 (May 2017), 1578--1580. Issue 10.Google Scholar
- Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2018. Improved protein model quality assessments by changing the target function. Proteins 86 (Jun 2018), 654--663. Issue 6.Google Scholar
- Karolis Uziela, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2016. ProQ3: Improved model quality assessments using Rosetta energy terms. Scientific reports 6, 1 (2016), 1--10.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Björn Wallner and Arne Elofsson. 2003. Can correct protein models be identified? Protein science : a publication of the Protein Society 12 (May 2003), 1073--86. Issue 5.Google Scholar
- Jonghun Won, Minkyung Baek, Bohdan Monastyrskyy, Andriy Kryshtafovych, and Chaok Seok. 2019. Assessment of protein model structure accuracy estimation in CASP13: Challenges in the era of deep learning. Proteins 87 (Dec 2019), 1351--1360. Issue 12.Google Scholar
- Tianqi Wu, Jie Hou, Badri Adhikari, and Jianlin Cheng. 2020. Analysis of several key factors influencing deep learning-based inter-residue contact prediction. Bioinformatics 36, 4 (2020), 1091--1098.Google ScholarCross Ref
- Jinbo Xu and Sheng Wang. 2019. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1069--1081.Google ScholarCross Ref
- Jianyi Yang, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, Sergey Ovchinnikov, and David Baker. 2020. Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences 117, 3 (2020), 1496--1503.Google ScholarCross Ref
- Adam Zemla. 2003. LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 31 (Jul 2003), 3370--4. Issue 13.Google Scholar
- A. Zemla, Venclovas, J. Moult, and K. Fidelis. 2001. Processing and evaluation of predictions in CASP4. Proteins Suppl 5 (2001), 13--21.Google ScholarCross Ref
- Chengxin Zhang, Wei Zheng, SM Mortuza, Yang Li, and Yang Zhang. 2020. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36, 7 (2020), 2105--2112.Google ScholarCross Ref
- Yang Zhang and Jeffrey Skolnick. 2004. Scoring function for automated assessment of protein structure template quality. Proteins 57 (Dec 2004), 702--10. Issue 4.Google Scholar
Index Terms
- Deep graph learning to estimate protein model quality using structural constraints from multiple sequence alignments
Recommendations
A unified GCNN model for predicting CYP450 inhibitors by using graph convolutional neural networks with attention mechanism
AbstractUndesirable drug-drug interactions (DDIs) may lead to serious adverse side effects when more than two drugs are administered to a patient simultaneously. One of the most common DDIs is caused by unexpected inhibition of a specific human ...
Highlights- A unified GCNN model for predicting the inhibitors of the 5 CYP isoforms.
- Attention mechanism for extracting the structural determinants of CYP inhibitors.
- The GCNN model outperforms the state-of-art iCYP-MFE model.
Comments