ABSTRACT
Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and human proteins. However, experimental methods to find protein interactions are inadequate: large scale experiments are noisy, and small scale experiments are slow and expensive. Inspired by the recent successes of deep neural networks, we hypothesize that deep learning methods are well-positioned to aid and augment biological experiments, hoping to help identify more accurate virus-host protein interaction maps. Moreover, computational methods can quickly adapt to predict how virus mutations change protein interactions with the host proteins.
We propose DeepVHPPI, a novel deep learning framework combining a self-attention-based transformer architecture and a transfer learning training strategy to predict interactions between human proteins and virus proteins that have novel sequence patterns. We show that our approach outperforms the state-of-the-art methods significantly in predicting Virus-Human protein interactions for SARS-CoV-2, H1N1, and Ebola. In addition, we demonstrate how our framework can be used to predict and interpret the interactions of mutated SARS-CoV-2 Spike protein sequences.
Availability: We make all of our data and code available on GitHub https://github.com/QData/DeepVHPPI.
- Mohammed AlQuraishi. End-to-end differentiable learning of protein structure. Cell systems, 8(4):292--301, 2019.Google ScholarCross Ref
- Mais G Ammari, Cathy R Gresham, Fiona M McCarthy, and Bindu Nanduri. Hpidb 2.0: a curated database for host-pathogen interactions. Database, 2016, 2016.Google Scholar
- Rie Kubota Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6(Nov):1817--1853, 2005.Google Scholar
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.Google Scholar
- Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, and Michael Auli. Cloze-driven pretraining of self-attention networks. arXiv preprint arXiv:1903.07785, 2019.Google Scholar
- Ranjan Kumar Barman, Sudipto Saha, and Santasabuj Das. Prediction of interactions between viral and host proteins using supervised machine learning methods. PloS one, 9(11):e112034, 2014.Google ScholarCross Ref
- Asa Ben-Hur and William Stafford Noble. Kernel methods for predicting protein-protein interactions. Bioinformatics, 21(suppl_1):i38--i46, 2005.Google Scholar
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137--1155, 2003.Google Scholar
- Tristan Bepler and Bonnie Berger. Learning protein sequence embeddings using information from structure. arXiv preprint arXiv:1902.08661, 2019.Google Scholar
- Anne-Florence Bitbol. Inferring interaction partners from protein sequences using mutual information. PLoS computational biology, 14(11):e1006401, 2018.Google Scholar
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Adittya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020.Google Scholar
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. Journal of machine learning research, 12(Aug):2493--2537, 2011.Google ScholarDigital Library
- Qian Cong, Ivan Anishchenko, Sergey Ovchinnikov, and David Baker. Protein interaction networks revealed by proteome coevolution. Science, 365(6449):185--189, 2019.Google ScholarCross Ref
- UniProt Consortium. Uniprot: a worldwide hub of protein knowledge. Nucleic acids research, 47(D1):D506--D515, 2019.Google Scholar
- Guangyu Cui, Chao Fang, and Kyungsook Han. Prediction of protein-protein interactions between viruses and human by an svm model. In BMC bioinformatics, volume 13, p. S5. Springer, 2012.Google ScholarCross Ref
- Norman E Davey, Gilles Travé, and Toby J Gibson. How viruses hijack cell regulation. Trends in biochemical sciences, 36(3):159--169, 2011.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.Google Scholar
- Fatma-Elzahraa Eid, Mahmoud ElHefnawi, and Lenwood S Heath. Denovo: virus-host sequence-based protein-protein interaction prediction. Bioinformatics, 32(8):1144--1150, 2016.Google ScholarCross Ref
- Stanley Fields and Ok-kyu Song. A novel genetic system to detect protein-protein interactions. Nature, 340(6230):245--246, 1989.Google ScholarCross Ref
- Shawn M Gomez, William Stafford Noble, and Andrey Rzhetsky. Learning to predict protein-protein interactions from protein sequences. Bioinformatics, 19(15):1875--1881, 2003.Google ScholarCross Ref
- David E Gordon, Gwendolyn M Jang, Mehdi Bouhaddou, Jiewei Xu, Kirsten Obernier, Kris M White, Matthew J O'Meara, Veronica V Rezelj, Jeffrey Z Guo, Danielle L Swaney, et al. A sars-cov-2 protein interaction map reveals targets for drug repurposing. Nature, pp. 1--13, 2020.Google Scholar
- Yanzhi Guo, Lezheng Yu, Zhining Wen, and Menglong Li. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic acids research, 36(9):3025--3030, 2008.Google Scholar
- Tobias Hamp and Burkhard Rost. Evolutionary profiles improve protein-protein interaction prediction from sequence. Bioinformatics, 31(12):1945--1950, 2015.Google ScholarCross Ref
- Somaye Hashemifar, Behnam Neyshabur, Aly A Khan, and Jinbo Xu. Predicting protein-protein interactions through sequence-based deep learning. Bioinformatics, 34(17):i802--i810, 2018.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770--778, 2016.Google ScholarCross Ref
- Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.Google Scholar
- Yuen Ho, Albrecht Gruhler, Adrian Heilbut, Gary D Bader, Lynda Moore, Sally-Lin Adams, Anna Millar, Paul Taylor, Keiryn Bennett, Kelly Boutilier, et al. Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415(6868):180--183, 2002.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.Google ScholarDigital Library
- Jie Hou, Badri Adhikari, and Jianlin Cheng. Deepsf: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, 34(8):1295--1303, 2018.Google ScholarCross Ref
- Kalyani B Karunakaran, N Balakrishnan, and Madhavi K Ganapathiraju. Interactome of sars-cov-2/ncov19 modulated host proteins with computationally predicted ppis, 2020.Google ScholarCross Ref
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
- Michael Schantz Klausen, Martin Closter Jespersen, Henrik Nielsen, Kamilla Kjaergaard Jensen, Vanessa Isabell Jurtz, Casper Kaae Soenderby, Morten Otto Alexander Sommer, Ole Winther, Morten Nielsen, Bent Petersen, et al. Netsurfp-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins: Structure, Function, and Bioinformatics, 87(6):520--527, 2019.Google Scholar
- Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.Google Scholar
- Dekang Lin and Xiaoyun Wu. Phrase clustering for discriminative learning. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pp. 1030--1038. Association for Computational Linguistics, 2009.Google ScholarDigital Library
- Zeming Lin, Jack Lanchantin, and Yanjun Qi. Must-cnn: a multilayer shift-and-stitch deep convolutional architecture for sequence-based protein structure prediction. In Thirtieth AAAI conference on artificial intelligence, 2016.Google ScholarCross Ref
- Shawn Martin, Diana Roe, and Jean-Loup Faulon. Predicting protein-protein interactions using signature products. Bioinformatics, 21(2):218--226, 2005.Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111--3119, 2013.Google ScholarDigital Library
- Seonwoo Min, Seunghyun Park, Siwon Kim, Hyun-Soo Choi, and Sungroh Yoon. Pre-training of deep bidirectional protein sequence representations with structural information, 2019.Google Scholar
- John X Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, and Yanjun Qi. Reevaluating adversarial examples in natural language. arXiv preprint arXiv:2004.14174, 2020.Google Scholar
- Esmaeil Nourani, Farshad Khunjush, and Saliha Durmuş. Computational approaches for prediction of pathogen-host protein-protein interactions. Frontiers in microbiology, 6:94, 2015.Google ScholarCross Ref
- Rose Oughtred, Chris Stark, Bobby-Joe Breitkreutz, Jennifer Rust, Lorrie Boucher, Christie Chang, Nadine Kolas, Lara O'Donnell, Genie Leung, Rochelle McAdam, et al. The biogrid interaction database: 2019 update. Nucleic acids research, 47(D1):D529--D541, 2019.Google Scholar
- Yungki Park and Edward M Marcotte. Flaws in evaluation schemes for pair-input computational predictions. Nature methods, 9(12):1134, 2012.Google ScholarCross Ref
- Florencio Pazos and Alfonso Valencia. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein engineering, 14(9):609--614, 2001.Google ScholarCross Ref
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532--1543, 2014.Google ScholarCross Ref
- Matthew E Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. Semi-supervised sequence tagging with bidirectional language models. arXiv preprint arXiv:1705.00108, 2017.Google Scholar
- EM Phizicky and S. Fields. Protein-protein interactions: methods for detection and analysis. Microbiol Rev., 59(1):94--123, 1995.Google ScholarCross Ref
- Sylvain Pitre, Mohsen Hooshyar, Andrew Schoenrock, Bahram Samanfar, Matthew Jessulat, James R Green, Frank Dehne, and Ashkan Golshani. Short co-occurring polypeptide regions can predict global protein interaction maps. Scientific reports, 2:239, 2012.Google ScholarCross Ref
- Yanjun Qi, Merja Oja, Jason Weston, and William Stafford Noble. A unified multitask architecture for predicting local protein properties. PloS one, 7(3):e32235, 2012.Google ScholarCross Ref
- Yanjun Qi, Oznur Tastan, Jaime G Carbonell, Judith Klein-Seetharaman, and Jason Weston. Semi-supervised multi-task learning for predicting interactions between hiv-1 and human proteins. Bioinformatics, 26(18):i645--i652, 2010.Google ScholarDigital Library
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019.Google Scholar
- Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, and Jonathon Shlens. Stand-alone self-attention in vision models. arXiv preprint arXiv:1906.05909, 2019.Google Scholar
- Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, and Yun S Song. Evaluating protein transfer learning with tape. arXiv preprint arXiv:1906.08230, 2019.Google Scholar
- Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning, 2016.Google Scholar
- Emma Redhead and Timothy L Bailey. Discriminative motif discovery in dna and protein sequences using the deme algorithm. BMC bioinformatics, 8(1):385, 2007.Google ScholarCross Ref
- Michael Remmert, Andreas Biegert, Andreas Hauser, and Johannes Söding. Hhblits: lightning-fast iterative protein sequence searching by hmm-hmm alignment. Nature methods, 9(2):173, 2012.Google ScholarCross Ref
- Florian Richoux, Charlène Servantie, Cynthia Borès, and Stéphane Téletchéa. Comparing two deep learning sequence-based models for protein-protein interaction prediction. arXiv preprint arXiv:1901.06268, 2019.Google Scholar
- Alexander Rives, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv, p. 622803, 2019.Google Scholar
- Alejandro A Schäffer, L Aravind, Thomas L Madden, Sergei Shavirin, John L Spouge, Yuri I Wolf, Eugene V Koonin, and Stephen F Altschul. Improving the accuracy of psi-blast protein database searches with composition-based statistics and other refinements. Nucleic acids research, 29(14):2994--3005, 2001.Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015.Google Scholar
- Tyler N Starr, Allison J Greaney, Sarah K Hilton, Daniel Ellis, Katharine HD Crawford, Adam S Dingens, Mary Jane Navarro, John E Bowen, M Alejandra Tortorici, Alexandra C Walls, et al. Deep mutational scanning of sars-cov-2 receptor binding domain reveals constraints on folding and ace2 binding. Cell, 182(5):1295--1310, 2020.Google ScholarCross Ref
- Tanlin Sun, Bo Zhou, Luhua Lai, and Jianfeng Pei. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC bioinformatics, 18(1):1--8, 2017.Google ScholarCross Ref
- Oznur Tastan, Yanjun Qi, Jaime G Carbonell, and Judith Klein-Seetharaman. Prediction of interactions between hiv-1 and human proteins by information integration, 2009.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998--6008, 2017.Google ScholarDigital Library
- Christian von Mering, Roland Krause, Berend Snel, Michael Cornell, Stephen G. Oliver, Stanley Fields, and Peer Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417(6887):399--403, 2002.Google ScholarCross Ref
- Lei Wang, Hai-Feng Wang, San-Rong Liu, Xin Yan, and Ke-Jian Song. Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation forest. Scientific reports, 9(1):1--12, 2019.Google Scholar
- Kevin K Yang, Zachary Wu, and Frances H Arnold. Machine-learning-guided directed evolution for protein engineering. Nature methods, 16(8):687--694, 2019.Google ScholarCross Ref
- Lei Yang, Jun-Feng Xia, and Jie Gui. Prediction of protein-protein interactions from protein sequence using local descriptors. Protein and Peptide Letters, 17(9):1085--1090, 2010.Google ScholarCross Ref
- Xiaodi Yang, Shiping Yang, Qinmengge Li, Stefan Wuchty, and Ziding Zhang. Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Computational and structural biotechnology journal, 18:153--161, 2020.Google Scholar
- Zhu-Hong You, Lin Zhu, Chun-Hou Zheng, Hong-Jie Yu, Su-Ping Deng, and Zhen Ji. Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. In BMC bioinformatics, volume 15, p. S9. Springer, 2014.Google ScholarCross Ref
- Shao-Wu Zhang and Ze-Gang Wei. Some remarks on prediction of protein-protein interaction with machine learning. Medicinal Chemistry, 11(3):254--264, 2015.Google ScholarCross Ref
- Xiang Zhou, Byungkyu Park, Daesik Choi, and Kyungsook Han. A generalized approach to predicting protein-protein interactions between virus and host. BMC genomics, 19(6):568, 2018.Google ScholarCross Ref
Index Terms
- Transfer learning for predicting virus-host protein interactions for novel virus sequences
Recommendations
Designing anti-Zika virus peptides derived from predicted human-Zika virus protein-protein interactions
Display Omitted We predicted Proteinprotein Interaction (PPIs) between humans and the Zika virus (ZIKV).Using two computational tools, we found 209 human protein candidates predicted to interact with the ZIKV.Of these 209, we produced a priority list of ...
Antiviral potential of natural compounds against influenza virus hemagglutinin
The antiviral activity of natural compounds against the HA protein of different subtypes of Influenza virus has been investigated using binding free energy and hydrogen bonding interactions.Display Omitted The curucmin derivatives (CI, CII and CIII) ...
Epitopes based drug design for dengue virus envelope protein
Display Omitted A conserved region QHGTI in B and T cell epitopes of dengue envelope glycoprotein was predicted.A reverse pharmacophore mapping approach was used to develop phamacophore model.ChemBridge database of compounds was screen on the basis of ...
Comments