skip to main content
10.1145/3535508.3545558acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Deep graph learning to estimate protein model quality using structural constraints from multiple sequence alignments

Published:07 August 2022Publication History

ABSTRACT

Our perception of protein's function is highly related to our understanding of the protein's three-dimensional (3D) structure and how the structure is computationally predicted. Evaluating the quality of a predicted 3D structural model is crucial for protein structure prediction. In recent years, many research works have leveraged deep learning architectures for the protein structure prediction alongside combinations of massive protein features to evaluate the predicted model's quality. Most recent works have proven that the inter-residue distance and alignment-based coevolutionary information significantly improve the accuracy of protein structure prediction tasks. This work utilizes the structural constraints derived from multiple sequence alignments, powered by the deep graph convolutional neural network, to estimate the protein model accuracy (EMA). The method models protein structure as a connected graph, in which each node encodes the residue's structural information, and the edge represents the structural relationship between any pair of residues in a structure. We incorporate a new feature embedding block in deep graph learning that utilizes the convolution and self-attention technique to leverage sequence alignment information for high-accurate protein quality estimation. We benchmark our methods to other state-of-the-art quality assessment approaches on the CASP13 and CASP14 datasets. The results indicate the effectiveness of alignment-based features and attention-based graph learning in EMA problems and show an improvement of our method among the previous works.

References

  1. 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic acids research 49, D1 (2021), D480--D489.Google ScholarGoogle Scholar
  2. Badri Adhikari and Jianlin Cheng. 2016. Protein Residue Contacts and Prediction Methods. Methods in molecular biology (Clifton, N.J.) 1415 (2016), 463--76.Google ScholarGoogle Scholar
  3. Badri Adhikari, Jie Hou, and Jianlin Cheng. 2018. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics (Oxford, England) 34 (May 2018), 1466--1472. Issue 9.Google ScholarGoogle ScholarCross RefCross Ref
  4. Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N Kinch, R Dustin Schaeffer, et al. 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 6557 (2021), 871--876.Google ScholarGoogle Scholar
  5. Renzhi Cao, Debswapna Bhattacharya, Jie Hou, and Jianlin Cheng. 2016. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC bioinformatics 17, 1 (2016), 1--9.Google ScholarGoogle Scholar
  6. Jianlin Cheng, Myong-Ho Choe, Arne Elofsson, Kun-Sop Han, Jie Hou, Ali HA Maghrabi, Liam J McGuffin, David Menéndez-Hurtado, Kliment Olechnovič, Torsten Schwede, et al. 2019. Estimation of model accuracy in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1361--1377.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Cristobal, A. Zemla, D. Fischer, L. Rychlewski, and A. Elofsson. 2001. A study of quality measures for protein threading models. BMC bioinformatics 2 (2001), 5.Google ScholarGoogle Scholar
  8. Zhiye Guo, Jie Hou, and Jianlin Cheng. 2021. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 89 (Feb 2021), 207--217. Issue 2.Google ScholarGoogle Scholar
  9. Kyle Hippe, Cade Lilley, Joshua William Berkenpas, Ciri Chandana Pocha, Kiyomi Kishaba, Hui Ding, Jie Hou, Dong Si, and Renzhi Cao. 2022. ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features. Briefings in bioinformatics 23, 1 (2022), bbab384.Google ScholarGoogle Scholar
  10. Naozumi Hiranuma, Hahnbeom Park, Minkyung Baek, Ivan Anishchenko, Justas Dauparas, and David Baker. 2021. Improved protein structure refinement guided by deep learning based accuracy estimation. Nature communications 12, 1 (2021), 1--11.Google ScholarGoogle Scholar
  11. Jie Hou, Tianqi Wu, Renzhi Cao, and Jianlin Cheng. 2019. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1165--1178.Google ScholarGoogle ScholarCross RefCross Ref
  12. Xiaoyang Jing and Jinbo Xu. 2021. Fast and effective protein model refinement using deep graph neural networks. Nature Computational Science 1, 7 (2021), 462--469.Google ScholarGoogle ScholarCross RefCross Ref
  13. Fusong Ju, Jianwei Zhu, Bin Shao, Lupeng Kong, Tie-Yan Liu, Wei-Mou Zheng, and Dongbo Bu. 2021. CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nature communications 12, 1 (2021), 1--9.Google ScholarGoogle Scholar
  14. John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583--589.Google ScholarGoogle Scholar
  15. W. Kabsch and C. Sander. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (Dec 1983), 2577--637. Issue 12.Google ScholarGoogle Scholar
  16. Shaun M Kandathil, Joe G Greener, Andy M Lau, and David T Jones. 2022. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proceedings of the National Academy of Sciences 119, 4 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  17. Sohee Kwon, Jonghun Won, Andriy Kryshtafovych, and Chaok Seok. 2021. Assessment of protein model structure accuracy estimation in CASP14: Old and new challenges. Proteins: Structure, Function, and Bioinformatics 89, 12 (2021), 1940--1948.Google ScholarGoogle ScholarCross RefCross Ref
  18. Andrew Leaver-Fay, Michael Tyka, Steven M Lewis, Oliver F Lange, James Thompson, Ron Jacak, Kristian W Kaufman, P Douglas Renfrew, Colin A Smith, Will Sheffler, et al. 2011. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. In Methods in enzymology. Vol. 487. Elsevier, 545--574.Google ScholarGoogle Scholar
  19. Michael Levitt and Mark Gerstein. 1998. A unified statistical framework for sequence comparison and structure comparison. Proceedings of the National Academy of sciences 95, 11 (1998), 5913--5920.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yang Li, Chengxin Zhang, Eric W Bell, Wei Zheng, Xiaogen Zhou, Dong-Jun Yu, and Yang Zhang. 2021. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS computational biology 17, 3 (2021), e1008865.Google ScholarGoogle Scholar
  21. Valerio Mariani, Marco Biasini, Alessandro Barbato, and Torsten Schwede. 2013. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 21 (2013), 2722--2728.Google ScholarGoogle ScholarCross RefCross Ref
  22. Milot Mirdita, Lars Von Den Driesch, Clovis Galiez, Maria J Martin, Johannes Söding, and Martin Steinegger. 2017. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic acids research 45, D1 (2017), D170--D176.Google ScholarGoogle Scholar
  23. Milot Mirdita, Lars von den Driesch, Clovis Galiez, Maria J. Martin, Johannes Söding, and Martin Steinegger. 2017. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic acids research 45 (Jan 2017), D170--D176. Issue D1.Google ScholarGoogle Scholar
  24. John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Burkhard Rost, and Anna Tramontano. 2009. Critical assessment of methods of protein structure prediction---Round VIII. Proteins: Structure, Function, and Bioinformatics 77, S9 (2009), 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  25. John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Torsten Schwede, and Anna Tramontano. 2014. Critical assessment of methods of protein structure prediction (CASP)---round x. Proteins: Structure, Function, and Bioinformatics 82 (2014), 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  26. John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Torsten Schwede, and Anna Tramontano. 2016. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics 84 (2016), 4--14.Google ScholarGoogle ScholarCross RefCross Ref
  27. John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, and Anna Tramontano. 2011. Critical assessment of methods of protein structure prediction (CASP)---round IX. Proteins: Structure, Function, and Bioinformatics 79, S10 (2011), 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  28. Guillaume Pagès, Benoit Charmettant, and Sergei Grudinin. 2019. Protein model quality assessment using 3D oriented convolutional neural networks. Bioinformatics 35, 18 (2019), 3313--3319.Google ScholarGoogle ScholarCross RefCross Ref
  29. Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. 2021. MSA transformer. In International Conference on Machine Learning. PMLR, 8844--8856.Google ScholarGoogle ScholarCross RefCross Ref
  30. Arjun Ray, Erik Lindahl, and Björn Wallner. 2012. Improved model quality assessment using ProQ2. BMC bioinformatics 13, 1 (2012), 1--12.Google ScholarGoogle Scholar
  31. Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences 118, 15 (2021).Google ScholarGoogle ScholarCross RefCross Ref
  32. Carol A Rohl, Charlie EM Strauss, Kira MS Misura, and David Baker. 2004. Protein structure prediction using Rosetta. In Methods in enzymology. Vol. 383. Elsevier, 66--93.Google ScholarGoogle Scholar
  33. Soumya Sanyal, Ivan Anishchenko, Anirudh Dagar, David Baker, and Partha Talukdar. 2020. ProteinGCN: Protein model quality assessment using graph convolutional networks. BioRxiv (2020).Google ScholarGoogle Scholar
  34. A. A. Schäffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and S. F. Altschul. 2001. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic acids research 29 (Jul 2001), 2994--3005. Issue 14.Google ScholarGoogle Scholar
  35. Md Hossain Shuvo, Sutanu Bhattacharya, and Debswapna Bhattacharya. 2020. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks. Bioinformatics (Oxford, England) 36 (Jul 2020), i285--i291.Google ScholarGoogle ScholarCross RefCross Ref
  36. Yifan Song, Frank DiMaio, Ray Yu-Ruei Wang, David Kim, Chris Miles, TJ Brunette, James Thompson, and David Baker. 2013. High-resolution comparative modeling with RosettaCM. Structure 21, 10 (2013), 1735--1742.Google ScholarGoogle ScholarCross RefCross Ref
  37. Martin Steinegger and Johannes Söding. 2018. Clustering huge protein sequence sets in linear time. Nature communications 9, 1 (2018), 1--8.Google ScholarGoogle Scholar
  38. Baris E Suzek, Yuqi Wang, Hongzhan Huang, Peter B McGarvey, Cathy H Wu, and UniProt Consortium. 2015. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 6 (2015), 926--932.Google ScholarGoogle ScholarCross RefCross Ref
  39. Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2017. ProQ3D: improved model quality assessments using deep learning. Bioinformatics (Oxford, England) 33 (May 2017), 1578--1580. Issue 10.Google ScholarGoogle Scholar
  40. Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2018. Improved protein model quality assessments by changing the target function. Proteins 86 (Jun 2018), 654--663. Issue 6.Google ScholarGoogle Scholar
  41. Karolis Uziela, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2016. ProQ3: Improved model quality assessments using Rosetta energy terms. Scientific reports 6, 1 (2016), 1--10.Google ScholarGoogle Scholar
  42. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  43. Björn Wallner and Arne Elofsson. 2003. Can correct protein models be identified? Protein science : a publication of the Protein Society 12 (May 2003), 1073--86. Issue 5.Google ScholarGoogle Scholar
  44. Jonghun Won, Minkyung Baek, Bohdan Monastyrskyy, Andriy Kryshtafovych, and Chaok Seok. 2019. Assessment of protein model structure accuracy estimation in CASP13: Challenges in the era of deep learning. Proteins 87 (Dec 2019), 1351--1360. Issue 12.Google ScholarGoogle Scholar
  45. Tianqi Wu, Jie Hou, Badri Adhikari, and Jianlin Cheng. 2020. Analysis of several key factors influencing deep learning-based inter-residue contact prediction. Bioinformatics 36, 4 (2020), 1091--1098.Google ScholarGoogle ScholarCross RefCross Ref
  46. Jinbo Xu and Sheng Wang. 2019. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1069--1081.Google ScholarGoogle ScholarCross RefCross Ref
  47. Jianyi Yang, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, Sergey Ovchinnikov, and David Baker. 2020. Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences 117, 3 (2020), 1496--1503.Google ScholarGoogle ScholarCross RefCross Ref
  48. Adam Zemla. 2003. LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 31 (Jul 2003), 3370--4. Issue 13.Google ScholarGoogle Scholar
  49. A. Zemla, Venclovas, J. Moult, and K. Fidelis. 2001. Processing and evaluation of predictions in CASP4. Proteins Suppl 5 (2001), 13--21.Google ScholarGoogle ScholarCross RefCross Ref
  50. Chengxin Zhang, Wei Zheng, SM Mortuza, Yang Li, and Yang Zhang. 2020. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36, 7 (2020), 2105--2112.Google ScholarGoogle ScholarCross RefCross Ref
  51. Yang Zhang and Jeffrey Skolnick. 2004. Scoring function for automated assessment of protein structure template quality. Proteins 57 (Dec 2004), 702--10. Issue 4.Google ScholarGoogle Scholar

Index Terms

  1. Deep graph learning to estimate protein model quality using structural constraints from multiple sequence alignments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
          August 2022
          549 pages
          ISBN:9781450393867
          DOI:10.1145/3535508

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 August 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate254of885submissions,29%
        • Article Metrics

          • Downloads (Last 12 months)34
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader