ABSTRACT
Accurately identifying peptide toxicity is a crucial step for computer-aided peptide-based drug screening, which could accelerate novel drug discovery and reduce resource consumption. Recently, deep learning has shown promising performance in bioinformatics. However, one challenge in developing a deep learning-based model for peptide toxicity prediction is how to represent peptides effectively. In this study, we propose an end-to-end deep learning model named ToxinMI, to predict peptide toxicity that learns features directly from sequence alone. Precisely, ToxinMI captures the sequential and evolutionary features of the peptide simultaneously and introduces the mutual information principle to learn a discriminative representation by discarding noisy information and retaining related-task information from them as much as possible. The experimental results demonstrate that ToxinMI achieves superior predictive performance against state-of-the-art baselines.1
- Deb, P. K., Al-Attraqchi, O., Chandrasekaran, B., Paradkar, A. and Tekade, R. K. Protein/peptide drug delivery systems: practical considerations in pharmaceutical product development. Elsevier, City, 2019.Google ScholarCross Ref
- Fosgerau, K. and Hoffmann, T. Peptide therapeutics: current status and future directions. Drug discovery today, 20, 1 (2015), 122--128.Google Scholar
- Liu, X., Wu, F., Ji, Y. and Yin, L. Recent advances in anti-cancer protein/peptide delivery. Bioconjugate chemistry, 30, 2 (2018), 305--324.Google Scholar
- Muttenthaler, M., King, G. F., Adams, D. J. and Alewood, P. F. Trends in peptide drug discovery. Nature reviews Drug discovery, 20, 4 (2021), 309--325.Google Scholar
- Otvos Jr, L. and Wade, J. D. Current challenges in peptide-based drug discovery. Frontiers Media SA, City, 2014.Google ScholarCross Ref
- Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25, 17 (1997), 3389--3402.Google Scholar
- Negi, S. S., Schein, C. H., Ladics, G. S., Mirsky, H., Chang, P., Rascle, J.-B., Kough, J., Sterck, L., Papineni, S. and Jez, J. M. Functional classification of protein toxins as a basis for bioinformatic screening. Scientific reports, 7, 1 (2017), 1--11.Google Scholar
- Cole, T. J. and Brewer, M. S. TOXIFY: a deep learning approach to classify animal venom proteins. PeerJ, 7 (2019), e7200.Google ScholarCross Ref
- Ding, C. H. and Dubchak, I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 17, 4 (2001), 349--358.Google ScholarCross Ref
- Wei, L., Zhou, C., Chen, H., Song, J. and Su, R. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics, 34, 23 (2018), 4007--4016.Google ScholarCross Ref
- Bebis, G. and Georgiopoulos, M. Feed-forward neural networks. IEEE Potentials, 13, 4 (1994), 27--31.Google ScholarCross Ref
- Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30 (2017), 3146--3154.Google Scholar
- Byvatov, E. and Schneider, G. Support vector machine applications in bioinformatics. Applied bioinformatics, 2, 2 (2003), 67--77.Google Scholar
- Kuang, Q., Li, Y., Wu, Y., Li, R., Dong, Y., Li, Y., Xiong, Q., Huang, Z. and Li, M. A kernel matrix dimension reduction method for predicting drug-target interaction. Chemometrics and Intelligent Laboratory Systems, 162 (2017), 104--110.Google ScholarCross Ref
- Naamati, G., Askenazi, M. and Linial, M. ClanTox: a classifier of short animal toxins. Nucleic acids research, 37, suppl_2 (2009), W363--W368.Google Scholar
- Gacesa, R., Barlow, D. J. and Long, P. F. Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions. PeerJ Computer Science, 2 (2016), e90.Google ScholarCross Ref
- Gupta, S., Kapoor, P., Chaudhary, K., Gautam, A., Kumar, R., Consortium, O. S. D. D. and Raghava, G. P. In silico approach for predicting toxicity of peptides and proteins. PloS one, 8, 9 (2013), e73957.Google Scholar
- Sharma, N., Naorem, L. D., Jain, S. and Raghava, G. P. ToxinPred2: an improved method for predicting toxicity of proteins. Briefings in Bioinformatics (2022).Google Scholar
- He, Y., Maisuradze, G. G., Yin, Y., Kachlishvili, K., Rackovsky, S. and Scheraga, H. A. Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins. Proceedings of the National Academy of Sciences, 114, 7 (2017), 1578--1583.Google ScholarCross Ref
- Papadatos, G., Gaulton, A., Hersey, A. and Overington, J. P. Activity, assay and target data curation and quality in the ChEMBL database. Journal of computer-aided molecular design, 29, 9 (2015), 885--896.Google Scholar
- Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. and Blaschke, T. The rise of deep learning in drug discovery. Drug discovery today, 23, 6 (2018), 1241--1250.Google Scholar
- Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R. and Muharemagic, E. Deep learning applications and challenges in big data analytics. Journal of big data, 2, 1 (2015), 1--21.Google Scholar
- Pan, X., Zuallaert, J., Wang, X., Shen, H.-B., Campos, E. P., Marushchak, D. O. and De Neve, W. ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics, 36, 21 (2021), 5159--5168.Google ScholarCross Ref
- Wei, L., Ye, X., Xue, Y., Sakurai, T. and Wei, L. ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Briefings in Bioinformatics, 22, 5 (2021), bbab041.Google ScholarCross Ref
- Wei, L., Ye, X., Sakurai, T., Mu, Z. and Wei, L. ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics, 38, 6 (2022), 1514--1524.Google ScholarCross Ref
- Dong, Q.-W., Wang, X.-l. and Lin, L. Application of latent semantic analysis to protein remote homology detection. Bioinformatics, 22, 3 (2006), 285--290.Google ScholarDigital Library
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. and Polosukhin, I. Attention is all you need. Advances in neural information processing systems, 30 (2017).Google Scholar
- Alemi, A. A., Fischer, I., Dillon, J. V. and Murphy, K. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410 (2016).Google Scholar
- Wei, L., Ye, X., Xue, Y., Sakurai, T. and Wei, L. ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Briefings in Bioinformatics (2021).Google Scholar
Index Terms
- ToxinMI: improving peptide toxicity prediction by fusing multimodal information based on mutual information
Recommendations
Protein CorreLogo: an X3D representation of co-evolving pairs, tertiary structure, ligand binding pockets and protein-protein interactions in protein families
Web3D '07: Proceedings of the twelfth international conference on 3D web technologyTo understand the functional elements of a protein structure biologists use domain specific 3D viewers (PDB) that are written to process the coordinates of atoms that represent the solved protein structure using X-Ray crystallography or NMR. The PDB ...
A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses
Mutual information (MI) is an approach commonly used to estimate the evolutionary correlation of 2 amino acid sites. Although several MI methods exist, prior to our contribution no systematic method had been developed to assess their performance, or to ...
GSAML-DTA: An interpretable drug-target binding affinity prediction model based on graph neural networks with self-attention mechanism and mutual information
AbstractIdentifying drug-target affinity (DTA) has great practical importance in the process of designing efficacious drugs for known diseases. Recently, numerous deep learning-based computational methods have been developed to predict drug-target ...
Highlights- We develop GSAML-DTA, an interpretable deep learning framework for DTA prediction.
- GSAML-DTA integrates a self-attention mechanism and graph neural networks (GNNs) to build representations of drugs and target proteins from the ...
Comments