research-article

Deep graph learning to estimate protein model quality using structural constraints from multiple sequence alignments

Authors:
Mahdi Rahbar

Saint Louis University

Saint Louis University
View Profile

,
Rahul Kumar Chauhan

Saint Louis University

Saint Louis University
View Profile

,
Pankil Nimeshbhai Shah

Saint Louis University

Saint Louis University
View Profile

,
Renzhi Cao

Pacific Lutheran University

Pacific Lutheran University
View Profile

,
Dong Si

University of Washington

University of Washington
View Profile

,
Jie Hou

Saint Louis University

Saint Louis University
View Profile

BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health InformaticsAugust 2022Article No.: 21Pages 1–10https://doi.org/10.1145/3535508.3545558

Published:07 August 2022Publication History

BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

Pages 1–10

ABSTRACT

Our perception of protein's function is highly related to our understanding of the protein's three-dimensional (3D) structure and how the structure is computationally predicted. Evaluating the quality of a predicted 3D structural model is crucial for protein structure prediction. In recent years, many research works have leveraged deep learning architectures for the protein structure prediction alongside combinations of massive protein features to evaluate the predicted model's quality. Most recent works have proven that the inter-residue distance and alignment-based coevolutionary information significantly improve the accuracy of protein structure prediction tasks. This work utilizes the structural constraints derived from multiple sequence alignments, powered by the deep graph convolutional neural network, to estimate the protein model accuracy (EMA). The method models protein structure as a connected graph, in which each node encodes the residue's structural information, and the edge represents the structural relationship between any pair of residues in a structure. We incorporate a new feature embedding block in deep graph learning that utilizes the convolution and self-attention technique to leverage sequence alignment information for high-accurate protein quality estimation. We benchmark our methods to other state-of-the-art quality assessment approaches on the CASP13 and CASP14 datasets. The results indicate the effectiveness of alignment-based features and attention-based graph learning in EMA problems and show an improvement of our method among the previous works.

References

2021. UniProt: the universal protein knowledgebase in 2021. Nucleic acids research 49, D1 (2021), D480--D489.Google Scholar
Badri Adhikari and Jianlin Cheng. 2016. Protein Residue Contacts and Prediction Methods. Methods in molecular biology (Clifton, N.J.) 1415 (2016), 463--76.Google Scholar
Badri Adhikari, Jie Hou, and Jianlin Cheng. 2018. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics (Oxford, England) 34 (May 2018), 1466--1472. Issue 9.Google ScholarCross Ref
Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N Kinch, R Dustin Schaeffer, et al. 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 6557 (2021), 871--876.Google Scholar
Renzhi Cao, Debswapna Bhattacharya, Jie Hou, and Jianlin Cheng. 2016. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC bioinformatics 17, 1 (2016), 1--9.Google Scholar
Jianlin Cheng, Myong-Ho Choe, Arne Elofsson, Kun-Sop Han, Jie Hou, Ali HA Maghrabi, Liam J McGuffin, David Menéndez-Hurtado, Kliment Olechnovič, Torsten Schwede, et al. 2019. Estimation of model accuracy in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1361--1377.Google ScholarCross Ref
S. Cristobal, A. Zemla, D. Fischer, L. Rychlewski, and A. Elofsson. 2001. A study of quality measures for protein threading models. BMC bioinformatics 2 (2001), 5.Google Scholar
Zhiye Guo, Jie Hou, and Jianlin Cheng. 2021. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 89 (Feb 2021), 207--217. Issue 2.Google Scholar
Kyle Hippe, Cade Lilley, Joshua William Berkenpas, Ciri Chandana Pocha, Kiyomi Kishaba, Hui Ding, Jie Hou, Dong Si, and Renzhi Cao. 2022. ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features. Briefings in bioinformatics 23, 1 (2022), bbab384.Google Scholar
Naozumi Hiranuma, Hahnbeom Park, Minkyung Baek, Ivan Anishchenko, Justas Dauparas, and David Baker. 2021. Improved protein structure refinement guided by deep learning based accuracy estimation. Nature communications 12, 1 (2021), 1--11.Google Scholar
Jie Hou, Tianqi Wu, Renzhi Cao, and Jianlin Cheng. 2019. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1165--1178.Google ScholarCross Ref
Xiaoyang Jing and Jinbo Xu. 2021. Fast and effective protein model refinement using deep graph neural networks. Nature Computational Science 1, 7 (2021), 462--469.Google ScholarCross Ref
Fusong Ju, Jianwei Zhu, Bin Shao, Lupeng Kong, Tie-Yan Liu, Wei-Mou Zheng, and Dongbo Bu. 2021. CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nature communications 12, 1 (2021), 1--9.Google Scholar
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583--589.Google Scholar
W. Kabsch and C. Sander. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (Dec 1983), 2577--637. Issue 12.Google Scholar
Shaun M Kandathil, Joe G Greener, Andy M Lau, and David T Jones. 2022. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proceedings of the National Academy of Sciences 119, 4 (2022).Google ScholarCross Ref
Sohee Kwon, Jonghun Won, Andriy Kryshtafovych, and Chaok Seok. 2021. Assessment of protein model structure accuracy estimation in CASP14: Old and new challenges. Proteins: Structure, Function, and Bioinformatics 89, 12 (2021), 1940--1948.Google ScholarCross Ref
Andrew Leaver-Fay, Michael Tyka, Steven M Lewis, Oliver F Lange, James Thompson, Ron Jacak, Kristian W Kaufman, P Douglas Renfrew, Colin A Smith, Will Sheffler, et al. 2011. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. In Methods in enzymology. Vol. 487. Elsevier, 545--574.Google Scholar
Michael Levitt and Mark Gerstein. 1998. A unified statistical framework for sequence comparison and structure comparison. Proceedings of the National Academy of sciences 95, 11 (1998), 5913--5920.Google ScholarCross Ref
Yang Li, Chengxin Zhang, Eric W Bell, Wei Zheng, Xiaogen Zhou, Dong-Jun Yu, and Yang Zhang. 2021. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS computational biology 17, 3 (2021), e1008865.Google Scholar
Valerio Mariani, Marco Biasini, Alessandro Barbato, and Torsten Schwede. 2013. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 21 (2013), 2722--2728.Google ScholarCross Ref
Milot Mirdita, Lars Von Den Driesch, Clovis Galiez, Maria J Martin, Johannes Söding, and Martin Steinegger. 2017. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic acids research 45, D1 (2017), D170--D176.Google Scholar
Milot Mirdita, Lars von den Driesch, Clovis Galiez, Maria J. Martin, Johannes Söding, and Martin Steinegger. 2017. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic acids research 45 (Jan 2017), D170--D176. Issue D1.Google Scholar
John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Burkhard Rost, and Anna Tramontano. 2009. Critical assessment of methods of protein structure prediction---Round VIII. Proteins: Structure, Function, and Bioinformatics 77, S9 (2009), 1--4.Google ScholarCross Ref
John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Torsten Schwede, and Anna Tramontano. 2014. Critical assessment of methods of protein structure prediction (CASP)---round x. Proteins: Structure, Function, and Bioinformatics 82 (2014), 1--6.Google ScholarCross Ref
John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Torsten Schwede, and Anna Tramontano. 2016. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics 84 (2016), 4--14.Google ScholarCross Ref
John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, and Anna Tramontano. 2011. Critical assessment of methods of protein structure prediction (CASP)---round IX. Proteins: Structure, Function, and Bioinformatics 79, S10 (2011), 1--5.Google ScholarCross Ref
Guillaume Pagès, Benoit Charmettant, and Sergei Grudinin. 2019. Protein model quality assessment using 3D oriented convolutional neural networks. Bioinformatics 35, 18 (2019), 3313--3319.Google ScholarCross Ref
Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. 2021. MSA transformer. In International Conference on Machine Learning. PMLR, 8844--8856.Google ScholarCross Ref
Arjun Ray, Erik Lindahl, and Björn Wallner. 2012. Improved model quality assessment using ProQ2. BMC bioinformatics 13, 1 (2012), 1--12.Google Scholar
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences 118, 15 (2021).Google ScholarCross Ref
Carol A Rohl, Charlie EM Strauss, Kira MS Misura, and David Baker. 2004. Protein structure prediction using Rosetta. In Methods in enzymology. Vol. 383. Elsevier, 66--93.Google Scholar
Soumya Sanyal, Ivan Anishchenko, Anirudh Dagar, David Baker, and Partha Talukdar. 2020. ProteinGCN: Protein model quality assessment using graph convolutional networks. BioRxiv (2020).Google Scholar
A. A. Schäffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and S. F. Altschul. 2001. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic acids research 29 (Jul 2001), 2994--3005. Issue 14.Google Scholar
Md Hossain Shuvo, Sutanu Bhattacharya, and Debswapna Bhattacharya. 2020. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks. Bioinformatics (Oxford, England) 36 (Jul 2020), i285--i291.Google ScholarCross Ref
Yifan Song, Frank DiMaio, Ray Yu-Ruei Wang, David Kim, Chris Miles, TJ Brunette, James Thompson, and David Baker. 2013. High-resolution comparative modeling with RosettaCM. Structure 21, 10 (2013), 1735--1742.Google ScholarCross Ref
Martin Steinegger and Johannes Söding. 2018. Clustering huge protein sequence sets in linear time. Nature communications 9, 1 (2018), 1--8.Google Scholar
Baris E Suzek, Yuqi Wang, Hongzhan Huang, Peter B McGarvey, Cathy H Wu, and UniProt Consortium. 2015. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 6 (2015), 926--932.Google ScholarCross Ref
Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2017. ProQ3D: improved model quality assessments using deep learning. Bioinformatics (Oxford, England) 33 (May 2017), 1578--1580. Issue 10.Google Scholar
Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2018. Improved protein model quality assessments by changing the target function. Proteins 86 (Jun 2018), 654--663. Issue 6.Google Scholar
Karolis Uziela, Nanjiang Shu, Björn Wallner, and Arne Elofsson. 2016. ProQ3: Improved model quality assessments using Rosetta energy terms. Scientific reports 6, 1 (2016), 1--10.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Björn Wallner and Arne Elofsson. 2003. Can correct protein models be identified? Protein science : a publication of the Protein Society 12 (May 2003), 1073--86. Issue 5.Google Scholar
Jonghun Won, Minkyung Baek, Bohdan Monastyrskyy, Andriy Kryshtafovych, and Chaok Seok. 2019. Assessment of protein model structure accuracy estimation in CASP13: Challenges in the era of deep learning. Proteins 87 (Dec 2019), 1351--1360. Issue 12.Google Scholar
Tianqi Wu, Jie Hou, Badri Adhikari, and Jianlin Cheng. 2020. Analysis of several key factors influencing deep learning-based inter-residue contact prediction. Bioinformatics 36, 4 (2020), 1091--1098.Google ScholarCross Ref
Jinbo Xu and Sheng Wang. 2019. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins: Structure, Function, and Bioinformatics 87, 12 (2019), 1069--1081.Google ScholarCross Ref
Jianyi Yang, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, Sergey Ovchinnikov, and David Baker. 2020. Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences 117, 3 (2020), 1496--1503.Google ScholarCross Ref
Adam Zemla. 2003. LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 31 (Jul 2003), 3370--4. Issue 13.Google Scholar
A. Zemla, Venclovas, J. Moult, and K. Fidelis. 2001. Processing and evaluation of predictions in CASP4. Proteins Suppl 5 (2001), 13--21.Google ScholarCross Ref
Chengxin Zhang, Wei Zheng, SM Mortuza, Yang Li, and Yang Zhang. 2020. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36, 7 (2020), 2105--2112.Google ScholarCross Ref
Yang Zhang and Jeffrey Skolnick. 2004. Scoring function for automated assessment of protein structure template quality. Proteins 57 (Dec 2004), 702--10. Issue 4.Google Scholar

Index Terms

Deep graph learning to estimate protein model quality using structural constraints from multiple sequence alignments
1. Applied computing
  1. Life and medical sciences
    1. Bioinformatics
    2. Computational biology
2. Computing methodologies
  1. Machine learning

Recommendations

A unified GCNN model for predicting CYP450 inhibitors by using graph convolutional neural networks with attention mechanism
Abstract
Undesirable drug-drug interactions (DDIs) may lead to serious adverse side effects when more than two drugs are administered to a patient simultaneously. One of the most common DDIs is caused by unexpected inhibition of a specific human ...
Highlights
- A unified GCNN model for predicting the inhibitors of the 5 CYP isoforms.
- Attention mechanism for extracting the structural determinants of CYP inhibitors.
- The GCNN model outperforms the state-of-art iCYP-MFE model.
Read More
Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure

Motivation: Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding ...
Read More
Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments

Motivation: The accurate prediction of the quality of 3D models is a key component of successful protein tertiary structure prediction methods. Currently, clustering-or consensus-based Model Quality Assessment Programs (MQAPs) are the most accurate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
August 2022
549 pages
ISBN:9781450393867
DOI:10.1145/3535508

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attention network
graph convolutional neural network
multiple sequence alignment
protein structure quality assessment
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of885submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 123
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep graph learning to estimate protein model quality using structural constraints from multiple sequence alignments

BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

A unified GCNN model for predicting CYP450 inhibitors by using graph convolutional neural networks with attention mechanism

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure

Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Deep graph learning to estimate protein model quality using structural constraints from multiple sequence alignments

BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

A unified GCNN model for predicting CYP450 inhibitors by using graph convolutional neural networks with attention mechanism

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure

Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media