AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction

Mao, Wenzhi; Ding, Wenze; Xing, Yaoguang; Gong, Haipeng

doi:10.1038/s42256-019-0130-4

Article
Published: 23 December 2019

AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction

Nature Machine Intelligence volume 2, pages 25–33 (2020)Cite this article

1440 Accesses
20 Citations
7 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Predicting the structures of proteins from amino acid sequences is of great importance. Recently, the accuracy of de novo protein structure prediction has been substantially improved when assisted by information about the contact between residues, which is also predictable from the sequence. Here, we present a novel pipeline for rapid protein structure prediction, which consists of a residue contact predictor, AmoebaContact, and a contact-assisted folder, GDFold. Unlike mainstream contact predictors that utilize simple, regularized neural networks, AmoebaContact adopts a set of network architectures that are optimized for contact prediction through automatic searching, and it predicts contacts at a series of cutoffs. Unlike conventional contact-assisted folders that only use top-scored contact pairs, GDFold considers all residue pairs from the prediction results of AmoebaContact in a differentiable loss function and optimizes atom coordinates using the gradient descent algorithm. The combination of AmoebaContact and GDFold allows quick modelling of the protein structure with acceptable model quality.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: General flow chart of the AmoebaContact and GDFold pipeline.**

**Fig. 2: Model evolution in the AmoebaNet architecture searching process.**

**Fig. 3: The performance of augmented models.**

**Fig. 4: Detailed comparison of AmoebaContact and GDFold against RaptorX-Contact.**

The whole is greater than its parts: ensembling improves protein contact prediction

Article Open access 13 April 2021

StructureDistiller: Structural relevance scoring identifies the most informative entries of a contact map

Article Open access 06 December 2019

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Data availability

The data required for running this pipeline for all proteins in the test sets are available on Code Ocean (https://doi.org/10.24433/CO.4945300.v1)⁴⁹.

Code availability

All source codes and models of AmoebaContact and GDFold are openly available on Code Ocean (https://doi.org/10.24433/CO.4945300.v1)⁴⁹. The codes for extracting protein features, for training the modified AmoebaNet pipeline and for optimizing the chosen network models are available on GitHub (https://github.com/THU-gonglab/AmoebaContact). An online server for AmoebaContact and GDFold has been prepared and is available at http://structpred.life.tsinghua.edu.cn/amoebacontact.html.

References

Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E 87, 012707 (2013).
Article Google Scholar
Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2011).
Article Google Scholar
Seemayer, S., Gruber, M. & Söding, J. CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations. Bioinformatics 30, 3128–3130 (2014).
Article Google Scholar
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).
Article Google Scholar
Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).
Article Google Scholar
Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl Acad. Sci. USA 116, 16856–16865 (2019).
Article Google Scholar
Kandathil, S. M., Greener, J. G. & Jones, D. T. Prediction of interresidue contacts with DeepMetaPSICOV in CASP13. Proteins 87, 1092–1099 (2019).
Article Google Scholar
Li, Y., Zhang, C., Bell, E. W., Yu, D. J. & Zhang, Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins 87, 1092–1099 (2019).
Article Google Scholar
Fariselli, P., Olmea, O., Valencia, A. & Casadio, R. Prediction of contact maps with neural networks and correlated mutations. Protein Eng. 14, 835–843 (2001).
Article Google Scholar
Andreani, J. & Söding, J. bbcontacts: prediction of β-strand pairing from direct coupling patterns. Bioinformatics 31, 1729–1737 (2015).
Article Google Scholar
Mao, W., Wang, T., Zhang, W. & Gong, H. Identification of residue pairing in interacting β-strands from a predicted residue contact map. BMC Bioinformatics 19, 146 (2018).
Article Google Scholar
Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey. J. Mach. Learn. Res. 20, 1–21 (2019).
MathSciNet MATH Google Scholar
Zhong, Z., Yan, J., Wu, W., Shao, J. & Liu, C.-L. Practical block-wise neural network architecture generation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2423–2432 (2018).
Zoph, B., Vasudevan, V., Shlens, J. & Le, Q. V. Learning transferable architectures for scalable image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 8697–8710 (2018).
Chrabaszcz, P., Loshchilov, I. & Hutter, F. A downsampled variant of ImageNet as an alternative to the CIFAR datasets. Preprint at https://arxiv.org/abs/1707.08819 (2017).
Domhan, T., Springenberg, J. T. & Hutter, F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence 3460-3468 (2015).
Klein, A., Falkner, S., Bartels, S., Hennig, P. & Hutter, F. Fast Bayesian optimization of machine learning hyperparameters on large datasets. Preprint at https://arxiv.org/abs/1605.07079 (2016).
Real, E., Aggarwal, A., Huang, Y. & Le, Q. V. Regularized evolution for image classifier architecture search. Preprint at https://arxiv.org/abs/1802.01548 (2018).
Zela, A., Klein, A., Falkner, S. & Hutter, F. Towards automated deep learning: efficient joint neural architecture and hyperparameter search. Preprint at https://arxiv.org/pdf/1807.06906.pdf (2018).
Bergstra, J., Yamins, D. & Cox, D. D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. Proceedings of the 30th International Conference on Machine Learning 28, 115–123 (2013).
Mendoza, H., Klein, A., Feurer, M., Springenberg, J. T. & Hutter, F. Towards automatically-tuned neural networks. Proceedings of the Workshop on Automatic Machine Learning 64, 58–65 (2016).
Elsken, T., Metzen, J. H. & Hutter, F. Efficient multi-objective neural architecture search via Lamarckian evolution. Preprint at https://arxiv.org/abs/1804.09081 (2018).
Real, E. et al. Large-scale evolution of image classifiers. Proceedings of the 34th International Conference on Machine Learning 70, 2902–2911 (2017).
Baker, B., Gupta, O., Naik, N. & Raskar, R. Designing neural network architectures using reinforcement learning. Preprint at https://arxiv.org/abs/1611.02167 (2016).
Zoph, B. & Le, Q. V. Neural architecture search with reinforcement learning. Preprint at https://arxiv.org/abs/1611.01578 (2016).
Liu, H., Simonyan, K. & Yang, Y. Darts: differentiable architecture search. Preprint at https://arxiv.org/abs/1806.09055 (2018).
Zheng, W. et al. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 87, 1049–1164 (2019).
Google Scholar
Adhikari, B., Bhattacharya, D., Cao, R. & Cheng, J. CONFOLD: residue–residue contact‐guided ab initio protein folding. Proteins 83, 1436–1449 (2015).
Article Google Scholar
Adhikari, B. & Cheng, J. CONFOLD2: improved contact-driven ab initio protein structure modeling. BMC Bioinformatics 19, 22 (2018).
Article Google Scholar
Senior, A. W. et al. Protein structure prediction using multiple deep neural networks in CASP13. Proteins 87, 1041–1148 (2019).
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision 630–645 (2016).
Schaarschmidt, J., Monastyrskyy, B., Kryshtafovych, A. & Bonvin, A. Assessment of contact predictions in CASP12: co-evolution and deep learning coming of age. Proteins 86(Suppl. 1), 51–66 (2018).
Article Google Scholar
Sillitoe, I. et al. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 43, D376–D381 (2014).
Article Google Scholar
Deming, W. E. Statistical Adjustment of Data (Wiley, 1943).
Xiang, Z. & Honig, B. Jackal: A Protein Structure Modeling Package. (Columbia University and Howard Hughes Medical Institute: 2002). http://honig.c2b2.columbia.edu/jackal.
Xiang, Z. & Honig, B. Extending the accuracy limits of prediction for side-chain conformations. J. Mol. Biol. 311, 421–430 (2001).
Article Google Scholar
Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T. & Tramontano, A. Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins 84, 4–14 (2016).
Article Google Scholar
Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T. & Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins 86, 7–15 (2018).
Article Google Scholar
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins 87, 1011–1020 (2019).
Article Google Scholar
Dawson, N. L. et al. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 45, D289–D295 (2016).
Article Google Scholar
Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM–HMM alignment. Nat. Methods 9, 173 (2012).
Article Google Scholar
Apweiler, R. et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004).
Article Google Scholar
Gloor, G. B., Martin, L. C., Wahl, L. M. & Dunn, S. D. Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. Biochemistry 44, 7156–7165 (2005).
Article Google Scholar
Wang, S., Peng, J., Ma, J. & Xu, J. Protein secondary structure prediction using deep convolutional neural fields. Sci. Rep. 6, 18962 (2016).
Article Google Scholar
Wang, S., Weng, S., Ma, J. & Tang, Q. DeepCNF-D: predicting protein order/disorder regions by weighted deep convolutional neural fields. Int. J. Mol. Sci. 16, 17315–17330 (2015).
Article Google Scholar
Heffernan, R., Yang, Y., Paliwal, K. & Zhou, Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33, 2842–2849 (2017).
Article Google Scholar
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/abs/1502.03167 (2015).
Ulyanov, D., Vedaldi, A. & Lempitsky, V. Instance normalization: the missing ingredient for fast stylization. Preprint at https://arxiv.org/abs/1607.08022 (2016).
Mao, W., Ding, W., Xing, Y. & Gong, H. AmoebaContact and GDFold as a New Pipeline for Rapid De Novo Protein Structure Prediction (Code Ocean, 2019); https://doi.org/10.24433/CO.4945300.v1

Download references

Acknowledgements

This work has been supported by the National Natural Science Foundation of China (31670723, 91746119, 81861138009 and 31621092) and by the Beijing Advanced Innovation Center for Structural Biology.

Author information

Authors and Affiliations

MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
Wenzhi Mao, Wenze Ding, Yaoguang Xing & Haipeng Gong
Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
Wenzhi Mao, Wenze Ding, Yaoguang Xing & Haipeng Gong

Authors

Wenzhi Mao
View author publications
You can also search for this author in PubMed Google Scholar
Wenze Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yaoguang Xing
View author publications
You can also search for this author in PubMed Google Scholar
Haipeng Gong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.M. contributed to the methodology, experimental design, software, formal analysis and writing of the original draft. W.D. contributed to the web server and was involved in data analysis. Y.X. contributed to model optimization and was involved in data analysis. H.G. contributed to the experimental design and was responsible for supervision, writing (review and revision) as well as funding acquisition. All authors reviewed the final manuscript.

Corresponding author

Correspondence to Haipeng Gong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figs. 1–17, Tables 1–12, methods and references.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mao, W., Ding, W., Xing, Y. et al. AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction. Nat Mach Intell 2, 25–33 (2020). https://doi.org/10.1038/s42256-019-0130-4

Download citation

Received: 06 June 2019
Accepted: 22 November 2019
Published: 23 December 2019
Issue Date: January 2020
DOI: https://doi.org/10.1038/s42256-019-0130-4

This article is cited by

Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes
- Peicong Lin
- Yumeng Yan
- Sheng-You Huang
Nature Communications (2023)
Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks
- Zhiye Guo
- Jian Liu
- Jianlin Cheng
Nature Communications (2022)
Complementing sequence-derived features with structural information extracted from fragment libraries for protein structure prediction
- Siyuan Liu
- Tong Wang
- Tie-Yan Liu
BMC Bioinformatics (2021)
The trRosetta server for fast and accurate protein structure prediction
- Zongyang Du
- Hong Su
- Jianyi Yang
Nature Protocols (2021)
Machine learning methods to model multicellular complexity and tissue specificity
- Rachel S. G. Sealfon
- Aaron K. Wong
- Olga G. Troyanskaya
Nature Reviews Materials (2021)