research-article

Transfer learning for predicting virus-host protein interactions for novel virus sequences

Authors:
Jack Lanchantin

University of Virginia

University of Virginia
View Profile

,
Tom Weingarten

Google

Google
View Profile

,
Arshdeep Sekhon

University of Virginia

University of Virginia
View Profile

,
Clint Miller

University of Virginia

University of Virginia
View Profile

,
Yanjun Qi

University of Virginia

University of Virginia
View Profile

BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health InformaticsAugust 2021Article No.: 36Pages 1–10https://doi.org/10.1145/3459930.3469527

Published:01 August 2021Publication History

BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages 1–10

ABSTRACT

Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and human proteins. However, experimental methods to find protein interactions are inadequate: large scale experiments are noisy, and small scale experiments are slow and expensive. Inspired by the recent successes of deep neural networks, we hypothesize that deep learning methods are well-positioned to aid and augment biological experiments, hoping to help identify more accurate virus-host protein interaction maps. Moreover, computational methods can quickly adapt to predict how virus mutations change protein interactions with the host proteins.

We propose DeepVHPPI, a novel deep learning framework combining a self-attention-based transformer architecture and a transfer learning training strategy to predict interactions between human proteins and virus proteins that have novel sequence patterns. We show that our approach outperforms the state-of-the-art methods significantly in predicting Virus-Human protein interactions for SARS-CoV-2, H1N1, and Ebola. In addition, we demonstrate how our framework can be used to predict and interpret the interactions of mutated SARS-CoV-2 Spike protein sequences.

Availability: We make all of our data and code available on GitHub https://github.com/QData/DeepVHPPI.

References

Mohammed AlQuraishi. End-to-end differentiable learning of protein structure. Cell systems, 8(4):292--301, 2019.Google ScholarCross Ref
Mais G Ammari, Cathy R Gresham, Fiona M McCarthy, and Bindu Nanduri. Hpidb 2.0: a curated database for host-pathogen interactions. Database, 2016, 2016.Google Scholar
Rie Kubota Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6(Nov):1817--1853, 2005.Google Scholar
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.Google Scholar
Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, and Michael Auli. Cloze-driven pretraining of self-attention networks. arXiv preprint arXiv:1903.07785, 2019.Google Scholar
Ranjan Kumar Barman, Sudipto Saha, and Santasabuj Das. Prediction of interactions between viral and host proteins using supervised machine learning methods. PloS one, 9(11):e112034, 2014.Google ScholarCross Ref
Asa Ben-Hur and William Stafford Noble. Kernel methods for predicting protein-protein interactions. Bioinformatics, 21(suppl_1):i38--i46, 2005.Google Scholar
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137--1155, 2003.Google Scholar
Tristan Bepler and Bonnie Berger. Learning protein sequence embeddings using information from structure. arXiv preprint arXiv:1902.08661, 2019.Google Scholar
Anne-Florence Bitbol. Inferring interaction partners from protein sequences using mutual information. PLoS computational biology, 14(11):e1006401, 2018.Google Scholar
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Adittya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020.Google Scholar
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. Journal of machine learning research, 12(Aug):2493--2537, 2011.Google ScholarDigital Library
Qian Cong, Ivan Anishchenko, Sergey Ovchinnikov, and David Baker. Protein interaction networks revealed by proteome coevolution. Science, 365(6449):185--189, 2019.Google ScholarCross Ref
UniProt Consortium. Uniprot: a worldwide hub of protein knowledge. Nucleic acids research, 47(D1):D506--D515, 2019.Google Scholar
Guangyu Cui, Chao Fang, and Kyungsook Han. Prediction of protein-protein interactions between viruses and human by an svm model. In BMC bioinformatics, volume 13, p. S5. Springer, 2012.Google ScholarCross Ref
Norman E Davey, Gilles Travé, and Toby J Gibson. How viruses hijack cell regulation. Trends in biochemical sciences, 36(3):159--169, 2011.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.Google Scholar
Fatma-Elzahraa Eid, Mahmoud ElHefnawi, and Lenwood S Heath. Denovo: virus-host sequence-based protein-protein interaction prediction. Bioinformatics, 32(8):1144--1150, 2016.Google ScholarCross Ref
Stanley Fields and Ok-kyu Song. A novel genetic system to detect protein-protein interactions. Nature, 340(6230):245--246, 1989.Google ScholarCross Ref
Shawn M Gomez, William Stafford Noble, and Andrey Rzhetsky. Learning to predict protein-protein interactions from protein sequences. Bioinformatics, 19(15):1875--1881, 2003.Google ScholarCross Ref
David E Gordon, Gwendolyn M Jang, Mehdi Bouhaddou, Jiewei Xu, Kirsten Obernier, Kris M White, Matthew J O'Meara, Veronica V Rezelj, Jeffrey Z Guo, Danielle L Swaney, et al. A sars-cov-2 protein interaction map reveals targets for drug repurposing. Nature, pp. 1--13, 2020.Google Scholar
Yanzhi Guo, Lezheng Yu, Zhining Wen, and Menglong Li. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic acids research, 36(9):3025--3030, 2008.Google Scholar
Tobias Hamp and Burkhard Rost. Evolutionary profiles improve protein-protein interaction prediction from sequence. Bioinformatics, 31(12):1945--1950, 2015.Google ScholarCross Ref
Somaye Hashemifar, Behnam Neyshabur, Aly A Khan, and Jinbo Xu. Predicting protein-protein interactions through sequence-based deep learning. Bioinformatics, 34(17):i802--i810, 2018.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770--778, 2016.Google ScholarCross Ref
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.Google Scholar
Yuen Ho, Albrecht Gruhler, Adrian Heilbut, Gary D Bader, Lynda Moore, Sally-Lin Adams, Anna Millar, Paul Taylor, Keiryn Bennett, Kelly Boutilier, et al. Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415(6868):180--183, 2002.Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.Google ScholarDigital Library
Jie Hou, Badri Adhikari, and Jianlin Cheng. Deepsf: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, 34(8):1295--1303, 2018.Google ScholarCross Ref
Kalyani B Karunakaran, N Balakrishnan, and Madhavi K Ganapathiraju. Interactome of sars-cov-2/ncov19 modulated host proteins with computationally predicted ppis, 2020.Google ScholarCross Ref
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
Michael Schantz Klausen, Martin Closter Jespersen, Henrik Nielsen, Kamilla Kjaergaard Jensen, Vanessa Isabell Jurtz, Casper Kaae Soenderby, Morten Otto Alexander Sommer, Ole Winther, Morten Nielsen, Bent Petersen, et al. Netsurfp-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins: Structure, Function, and Bioinformatics, 87(6):520--527, 2019.Google Scholar
Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.Google Scholar
Dekang Lin and Xiaoyun Wu. Phrase clustering for discriminative learning. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pp. 1030--1038. Association for Computational Linguistics, 2009.Google ScholarDigital Library
Zeming Lin, Jack Lanchantin, and Yanjun Qi. Must-cnn: a multilayer shift-and-stitch deep convolutional architecture for sequence-based protein structure prediction. In Thirtieth AAAI conference on artificial intelligence, 2016.Google ScholarCross Ref
Shawn Martin, Diana Roe, and Jean-Loup Faulon. Predicting protein-protein interactions using signature products. Bioinformatics, 21(2):218--226, 2005.Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111--3119, 2013.Google ScholarDigital Library
Seonwoo Min, Seunghyun Park, Siwon Kim, Hyun-Soo Choi, and Sungroh Yoon. Pre-training of deep bidirectional protein sequence representations with structural information, 2019.Google Scholar
John X Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, and Yanjun Qi. Reevaluating adversarial examples in natural language. arXiv preprint arXiv:2004.14174, 2020.Google Scholar
Esmaeil Nourani, Farshad Khunjush, and Saliha Durmuş. Computational approaches for prediction of pathogen-host protein-protein interactions. Frontiers in microbiology, 6:94, 2015.Google ScholarCross Ref
Rose Oughtred, Chris Stark, Bobby-Joe Breitkreutz, Jennifer Rust, Lorrie Boucher, Christie Chang, Nadine Kolas, Lara O'Donnell, Genie Leung, Rochelle McAdam, et al. The biogrid interaction database: 2019 update. Nucleic acids research, 47(D1):D529--D541, 2019.Google Scholar
Yungki Park and Edward M Marcotte. Flaws in evaluation schemes for pair-input computational predictions. Nature methods, 9(12):1134, 2012.Google ScholarCross Ref
Florencio Pazos and Alfonso Valencia. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein engineering, 14(9):609--614, 2001.Google ScholarCross Ref
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532--1543, 2014.Google ScholarCross Ref
Matthew E Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. Semi-supervised sequence tagging with bidirectional language models. arXiv preprint arXiv:1705.00108, 2017.Google Scholar
EM Phizicky and S. Fields. Protein-protein interactions: methods for detection and analysis. Microbiol Rev., 59(1):94--123, 1995.Google ScholarCross Ref
Sylvain Pitre, Mohsen Hooshyar, Andrew Schoenrock, Bahram Samanfar, Matthew Jessulat, James R Green, Frank Dehne, and Ashkan Golshani. Short co-occurring polypeptide regions can predict global protein interaction maps. Scientific reports, 2:239, 2012.Google ScholarCross Ref
Yanjun Qi, Merja Oja, Jason Weston, and William Stafford Noble. A unified multitask architecture for predicting local protein properties. PloS one, 7(3):e32235, 2012.Google ScholarCross Ref
Yanjun Qi, Oznur Tastan, Jaime G Carbonell, Judith Klein-Seetharaman, and Jason Weston. Semi-supervised multi-task learning for predicting interactions between hiv-1 and human proteins. Bioinformatics, 26(18):i645--i652, 2010.Google ScholarDigital Library
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019.Google Scholar
Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, and Jonathon Shlens. Stand-alone self-attention in vision models. arXiv preprint arXiv:1906.05909, 2019.Google Scholar
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, and Yun S Song. Evaluating protein transfer learning with tape. arXiv preprint arXiv:1906.08230, 2019.Google Scholar
Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning, 2016.Google Scholar
Emma Redhead and Timothy L Bailey. Discriminative motif discovery in dna and protein sequences using the deme algorithm. BMC bioinformatics, 8(1):385, 2007.Google ScholarCross Ref
Michael Remmert, Andreas Biegert, Andreas Hauser, and Johannes Söding. Hhblits: lightning-fast iterative protein sequence searching by hmm-hmm alignment. Nature methods, 9(2):173, 2012.Google ScholarCross Ref
Florian Richoux, Charlène Servantie, Cynthia Borès, and Stéphane Téletchéa. Comparing two deep learning sequence-based models for protein-protein interaction prediction. arXiv preprint arXiv:1901.06268, 2019.Google Scholar
Alexander Rives, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv, p. 622803, 2019.Google Scholar
Alejandro A Schäffer, L Aravind, Thomas L Madden, Sergei Shavirin, John L Spouge, Yuri I Wolf, Eugene V Koonin, and Stephen F Altschul. Improving the accuracy of psi-blast protein database searches with composition-based statistics and other refinements. Nucleic acids research, 29(14):2994--3005, 2001.Google Scholar
Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015.Google Scholar
Tyler N Starr, Allison J Greaney, Sarah K Hilton, Daniel Ellis, Katharine HD Crawford, Adam S Dingens, Mary Jane Navarro, John E Bowen, M Alejandra Tortorici, Alexandra C Walls, et al. Deep mutational scanning of sars-cov-2 receptor binding domain reveals constraints on folding and ace2 binding. Cell, 182(5):1295--1310, 2020.Google ScholarCross Ref
Tanlin Sun, Bo Zhou, Luhua Lai, and Jianfeng Pei. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC bioinformatics, 18(1):1--8, 2017.Google ScholarCross Ref
Oznur Tastan, Yanjun Qi, Jaime G Carbonell, and Judith Klein-Seetharaman. Prediction of interactions between hiv-1 and human proteins by information integration, 2009.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998--6008, 2017.Google ScholarDigital Library
Christian von Mering, Roland Krause, Berend Snel, Michael Cornell, Stephen G. Oliver, Stanley Fields, and Peer Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417(6887):399--403, 2002.Google ScholarCross Ref
Lei Wang, Hai-Feng Wang, San-Rong Liu, Xin Yan, and Ke-Jian Song. Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation forest. Scientific reports, 9(1):1--12, 2019.Google Scholar
Kevin K Yang, Zachary Wu, and Frances H Arnold. Machine-learning-guided directed evolution for protein engineering. Nature methods, 16(8):687--694, 2019.Google ScholarCross Ref
Lei Yang, Jun-Feng Xia, and Jie Gui. Prediction of protein-protein interactions from protein sequence using local descriptors. Protein and Peptide Letters, 17(9):1085--1090, 2010.Google ScholarCross Ref
Xiaodi Yang, Shiping Yang, Qinmengge Li, Stefan Wuchty, and Ziding Zhang. Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Computational and structural biotechnology journal, 18:153--161, 2020.Google Scholar
Zhu-Hong You, Lin Zhu, Chun-Hou Zheng, Hong-Jie Yu, Su-Ping Deng, and Zhen Ji. Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. In BMC bioinformatics, volume 15, p. S9. Springer, 2014.Google ScholarCross Ref
Shao-Wu Zhang and Ze-Gang Wei. Some remarks on prediction of protein-protein interaction with machine learning. Medicinal Chemistry, 11(3):254--264, 2015.Google ScholarCross Ref
Xiang Zhou, Byungkyu Park, Daesik Choi, and Kyungsook Han. A generalized approach to predicting protein-protein interactions between virus and host. BMC genomics, 19(6):568, 2018.Google ScholarCross Ref

Index Terms

Transfer learning for predicting virus-host protein interactions for novel virus sequences

Index terms have been assigned to the content through auto-classification.

Recommendations

Designing anti-Zika virus peptides derived from predicted human-Zika virus protein-protein interactions

Display Omitted We predicted Proteinprotein Interaction (PPIs) between humans and the Zika virus (ZIKV).Using two computational tools, we found 209 human protein candidates predicted to interact with the ZIKV.Of these 209, we produced a priority list of ...
Read More
Antiviral potential of natural compounds against influenza virus hemagglutinin

The antiviral activity of natural compounds against the HA protein of different subtypes of Influenza virus has been investigated using binding free energy and hydrogen bonding interactions.Display Omitted The curucmin derivatives (CI, CII and CIII) ...
Read More
Epitopes based drug design for dengue virus envelope protein

Display Omitted A conserved region QHGTI in B and T cell epitopes of dengue envelope glycoprotein was predicted.A reverse pharmacophore mapping approach was used to develop phamacophore model.ChemBridge database of compounds was screen on the basis of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
August 2021
603 pages
ISBN:9781450384506
DOI:10.1145/3459930
General Chairs:
Hongmei Jiang
Northwestern University
,
Xiuzhen Huang
Arkansas State University
,
Jiajie Zhang
The University of Texas Health Science Center at Houston
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of885submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 187
  Total Downloads
- Downloads (Last 12 months)68
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Transfer learning for predicting virus-host protein interactions for novel virus sequences

BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Designing anti-Zika virus peptides derived from predicted human-Zika virus protein-protein interactions

Antiviral potential of natural compounds against influenza virus hemagglutinin

Epitopes based drug design for dengue virus envelope protein

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Transfer learning for predicting virus-host protein interactions for novel virus sequences

BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Designing anti-Zika virus peptides derived from predicted human-Zika virus protein-protein interactions

Antiviral potential of natural compounds against influenza virus hemagglutinin

Epitopes based drug design for dengue virus envelope protein

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media