Abstract
The accumulated evidences indicate that drugs not only interact with proteins, but also regulate a wide variety of biomarkers such as miRNAs. Hence, uncovering potential drug-miRNA associations plays significant roles in disease prevention, diagnosis and treatment as well as drug development. In this paper, we discuss how this problem is formulated as a link prediction task in a bipartite graph and construct a computational model to infer unknown drug-miRNA associations. Specifically, the drug SMILES (Simplified molecular input line entry specification) or miRNA sequences can be regarded as a kind of biology language described by distributed representation. The experiment verified associations are treated as positive samples and the same number unlabeled associations are randomly selected as negative samples. Finally, Random Forest classifier is applied to perform the prediction task. In the experiment, the proposed method achieves AUROC of 91.16 and AUPR of 89.21 under 5-fold cross-validation. It demonstrates the great potential of seamless integration of deep learning and biological big data. We hope that this research with great expectations can be used as a practical guidance tool to bring useful inspiration to relevant researchers.
Z. Guo, Z. You—These authors contributed equally to this work
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, X., Gong, Y., Zhang, D.H., You, Z.H., Li, Z.W.: DRMDA: deep representations-based miRNA–disease association prediction. J. Cellul. Mol. Med. 22(1), 472–485 (2018)
Chen, X., Huang, Y.-A., You, Z.-H., Yan, G.-Y., Wang, X.-S.: A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics 33(5), 733–739 (2017)
Chen, X., et al.: A novel computational model based on super-disease and miRNA for potential miRNA–disease association prediction. Mol. BioSyst. 13(6), 1202–1212 (2017)
Chen, X., Liu, M.-X., Yan, G.-Y.: Drug–target interaction prediction by random walk on the heterogeneous network. Mol. BioSyst. 8(7), 1970–1978 (2012)
Chen, X., Wang, C.-C., Yin, J., You, Z.-H.: Novel human miRNA-disease association inference based on random forest. Mol. Therap. Nucl. Acids 13, 568–579 (2018)
Chen, X., Xie, D., Wang, L., Zhao, Q., You, Z.-H., Liu, H.: BNPMDA: bipartite network projection for MiRNA–disease association prediction. Bioinformatics 34(18), 3178–3186 (2018)
Chen, X., Xie, D., Zhao, Q., You, Z.-H.: MicroRNAs and complex diseases: from experimental results to computational models. Brief. Bioinf. 20(2), 515–539 (2019)
Chen, X., Yan, C.C., Zhang, X., You, Z.-H.: Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinf. 18(4), 558–576 (2016). https://doi.org/10.1093/bib/bbw060
Chen, X., et al.: WBSMDA: within and between score for MiRNA-disease association prediction. Sci. Rep. 6(1), 21106 (2016). https://doi.org/10.1038/srep21106
Chen, X., Yan, C.C., Zhang, X., You, Z.-H., Huang, Y.-A., Yan, G.-Y.: HGIMDA: heterogeneous graph inference for miRNA-disease association prediction. Oncotarget 7(40), 65257 (2016)
Chen, X., You, Z.-H., Yan, G.-Y., Gong, D.-W.: IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget 7(36), 57919 (2016)
Chen, X., Zhang, D.-H., You, Z.-H.: A heterogeneous label propagation approach to explore the potential associations between miRNA and disease. J. Transl. Med. 16(1), 348 (2018)
Chen, Z.-H., You, Z.-H., Guo, Z.-H., Yi, H.-C., Luo, G.-X., Wang, Y.-B.: Prediction of drug-target interactions from multi-molecular network based on deep walk embedding model. Front. Bioeng. Biotechnol. 8, 338 (2020)
Chen, Z.-H., You, Z.-H., Li, L.-P., Guo, Z.-H., Hu, P.-W., Jiang, H.-J.: Combining LSTM network model and wavelet transform for predicting self-interacting proteins. In: Huang, D.-S., Bevilacqua, V., Premaratne, P. (eds.) ICIC 2019. LNCS, vol. 11643, pp. 166–174. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26763-6_16
Cheng, F., et al.: Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput. Biol. 8(5) (2012)
Cheng, L., et al.: LncRNA2Target v2. 0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucl. Acids Res. 47(D1), D140–D144 (2018)
Chung, S., Nakagawa, H., Uemura, M., Piao, L., Ashikawa, K., Hosono, N., Morizono, T.: Association of a novel long non-coding RNA in 8q24 with prostate cancer susceptibility. Cancer Sci. 102(1), 245–252 (2011)
Cummings, J., Ward, T.H., Greystoke, A., Ranson, M., Dive, C.: Biomarker method validation in anticancer drug development. Br. J. Pharmacol. 153(4), 646–656 (2008)
Dai, E., et al.: ncDR: a comprehensive resource of non-coding RNAs involved in drug resistance. Bioinformatics 33(24), 4010–4011 (2017)
Emig, D., et al.: Drug target prediction and repositioning using an integrated network-based approach. PloS ONE 8(4) (2013)
Guo, Z.-H., Yi, H.-C., You, Z.-H.: Construction and comprehensive analysis of a molecular association network via lncRNA–miRNA–disease–drug–protein graph. Cells 8(8), 866 (2019)
Guo, Z.-H., You, Z.-H., Huang, D.-S., Yi, H.-C., Chen, Z.-H., Wang, Y.-B.: A learning based framework for diverse biomolecule relationship prediction in molecular association network. Commun. Biol. 3(1), 1–9 (2020)
Guo, Z.-H., et al.: MeSHHeading2vec: a new method for representing MeSH headings as vectors based on graph embedding algorithm. Brief. Bioinf. (2020). https://doi.org/10.1093/bib/bbaa037
Guo, Z.-H., You, Z.-H., Li, L.-P., Wang, Y.-B., Chen, Z.-H.: Combining high speed ELM with a CNN feature encoding to predict LncRNA-Disease Associations. In: Huang, D.-S., Jo, K.-H., Huang, Z.-K. (eds.) ICIC 2019. LNCS, vol. 11644, pp. 406–417. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26969-2_39
Guo, Z.-H., et al.: Bioentity2vec: Attribute-and behavior-driven representation for predicting multi-type relationships between bioentities. GigaScience 9(6), giaa032 (2020)
Guo, Z.-H., You, Z.-H., Wang, Y.-B., Yi, H.-C., Chen, Z.-H.: A learning-based method for LncRNA-disease association identification combing similarity information and rotation forest. iScience 19, 786–795 (2019). https://doi.org/10.1016/j.isci.2019.08.030
Guo, Z.-H., You, Z.-H., Yi, H.-C.: Integrative construction and analysis of molecular association network in human cells by fusing node attribute and behavior information. Mol. Therap. Nucl. Acids 19, 498–506 (2020)
Hafner, M., Niepel, M., Sorger, P.K.: Alternative drug sensitivity metrics improve preclinical cancer pharmacogenomics. Nat. Biotechnol. 35(6), 500–502 (2017)
Hay, M., Thomas, D.W., Craighead, J.L., Economides, C., Rosenthal, J.: Clinical development success rates for investigational drugs. Nat. Biotechnol. 32(1), 40–51 (2014)
Hu, P., Huang, Y.-A., Chan, K.C., You, Z.-H.: Learning multimodal networks from heterogeneous data for prediction of lncRNA-miRNA interactions. IEEE/ACM Trans. Comput. Biol. Bioinf. (2019)
Huang, Y.-A., Chan, K.C., You, Z.-H.: Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling. Bioinformatics 34(5), 812–819 (2017)
Huang, Y.-A., Chan, K.C., You, Z.-H.: Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling. Bioinformatics 34(5), 812–819 (2018)
Huang, Y.-A., Hu, P., Chan, K.C., You, Z.-H.: Graph convolution for predicting associations between miRNA and drug resistance. Bioinformatics 36(3), 851–858 (2020)
Huang, Y.-A., You, Z.-H., Chen, X.: A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Curr. Protein Peptide Sci. 19(5), 468–478 (2018)
Huang, Y.-A., You, Z.-H., Chen, X., Chan, K., Luo, X.: Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinf. 17(1), 184 (2016)
Huang, Z.-A., et al.: PBHMDA: path-based human microbe-disease association prediction. Front. Microbiol. 8, 233 (2017)
Huang, Z.-A., Huang, Y.-A., You, Z.-H., Zhu, Z., Sun, Y.: Novel link prediction for large-scale miRNA-lncRNA interaction network in a bipartite graph. BMC Med. Genom. 11(6), 113 (2018)
Jin, X., Feng, C.-Y., Xiang, Z., Chen, Y.-P., Li, Y.-M.: CircRNA expression pattern and circRNA-miRNA-mRNA network in the pathogenesis of nonalcoholic steatohepatitis. Oncotarget 7(41), 66455 (2016)
Kim, S., et al.: PubChem 2019 update: improved access to chemical data. Nucl. Acids Res. 47(D1), D1102–D1109 (2019)
Kozomara, A., Birgaoanu, M., Griffiths-Jones, S.: miRBase: from microRNA sequences to function. Nucl. Acids Res. 47(D1), D155–D162 (2018)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Paper presented at the International Conference on Machine Learning (2014)
Lee, C.Y., Chen, Y.-P.: Prediction of drug adverse events using deep learning in pharmaceutical discovery. Brief. Bioinf. (2020)
Li, J.-Q., Rong, Z.-H., Chen, X., Yan, G.-Y., You, Z.-H.: MCMDA: Matrix completion for MiRNA-disease association prediction. Oncotarget 8(13), 21187 (2017)
Li, J.-Q., You, Z.-H., Li, X., Ming, Z., Chen, X.: PSPEL: in silico prediction of self-interacting proteins from amino acids sequences using ensemble learning. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 14(5), 1165–1172 (2017)
Li, S., You, Z.-H., Guo, H., Luo, X., Zhao, Z.-Q.: Inverse-free extreme learning machine with optimal information updating. IEEE Trans. Cybern. 46(5), 1229–1241 (2015)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Lin, Y., et al.: RNAInter in 2020: RNA interactome repository with increased coverage and annotation. Nucl. Acids Res. 48(D1), D189–D197 (2020)
Liu, X., et al.: SM2miR: a database of the experimentally validated small molecules’ effects on microRNA expression. Bioinformatics 29(3), 409–411 (2013)
Ma, L., et al.: Multi-neighborhood learning for global alignment in biological networks. IEEE/ACM Trans. Comput. Biol. Bioinf. (2020)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Paper Presented at the Advances in Neural Information Processing Systems (2013)
Öztürk, H., Ozkirimli, E., Özgür, A.: A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinf. 17(1), 128 (2016)
Pammolli, F., Magazzini, L., Riccaboni, M.: The productivity crisis in pharmaceutical R&D. Nat. Rev. Drug Discov. 10(6), 428–438 (2011)
Santos, R., et al.: A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 16(1), 19 (2017)
Wang, L., et al.: An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences. Oncotarget 8(3), 5149 (2017)
Wang, L., et al.: LMTRDA: using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities. PLoS Comput. Biol. 15(3), e1006865 (2019)
Wang, L., et al.: A computational-based method for predicting drug–target interactions by using stacked autoencoder deep neural network. J. Comput. Biol. 25(3), 361–373 (2018)
Wang, L., You, Z.-H., Chen, X., Yan, X., Liu, G., Zhang, W.: RFDT: a rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information. Curr. Protein Peptid. Sci. 19(5), 445–454 (2018)
Wang, L., You, Z.-H., Huang, Y.-A., Huang, D.-S., Chan, K.C.: An efficient approach based on multi-sources information to predict circRNA–disease associations using deep convolutional neural network. Bioinformatics 36(13), 4038–4046 (2020)
Wang, M.-N., You, Z.-H., Wang, L., Li, L.-P., Zheng, K.: LDGRNMF: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. Neurocomputing (2020)
Wang, Y.-B., et al.: Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. BioSyst. 13(7), 1336–1344 (2017)
Wang, Y.-B., You, Z.-H., Yi, H., Chen, Z.-H., Guo, Z.-H., Zheng, K.: Combining evolutionary information and sparse bayesian probability model to accurately predict self-interacting proteins. In: Huang, D.-S., Jo, K.-H., Huang, Z.-K. (eds.) ICIC 2019. LNCS, vol. 11644, pp. 460–467. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26969-2_44
Wang, Y., You, Z.-H., Yang, S., Li, X., Jiang, T.-H., Zhou, X.: A high efficient biological language model for predicting protein-protein interactions. Cells 8(2), 122 (2019)
Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988)
Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucl. Acids Res. 46(D1), D1074–D1082 (2017)
Wong, L., You, Z.-H., Guo, Z.-H., Yi, H.-C., Chen, Z.-H., Cao, M.-Y.: MIPDH: A Novel Computational Model for Predicting microRNA–mRNA Interactions by DeepWalk on a Heterogeneous Network. ACS Omega (2020)
Yi, H.-C., et al.: Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions. Comput. Struct. Biotechnol. J. 18, 20–26 (2020)
Yi, H.-C., You, Z.-H., Huang, D.-S., Li, X., Jiang, T.-H., Li, L.-P.: A deep learning framework for robust and accurate prediction of ncRNA-protein interactions using evolutionary information. Mol. Therap. Nucl. Acids 11, 337–344 (2018)
Yi, H.-C., You, Z.-H., Wang, Y.-B., Chen, Z.-H., Guo, Z.-H., Zhu, H.-J.: In Silico identification of anticancer peptides with stacking heterogeneous ensemble learning model and sequence information. In: Huang, D.-S., Jo, K.-H., Huang, Z.-K. (eds.) ICIC 2019. LNCS, vol. 11644, pp. 313–323. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26969-2_30
You, Z.-H., Chan, K.C., Hu, P.: Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS ONE 10(5), e0125811 (2015)
You, Z.-H., et al.: PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput. Biol. 13(3), e1005455 (2017)
You, Z.-H., Lei, Y.-K., Zhu, L., Xia, J., Wang, B.: Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinf. 14(8), S10 (2013). https://doi.org/10.1186/1471-2105-14-S8-S10
You, Z.-H., Yu, J.-Z., Zhu, L., Li, S., Wen, Z.-K.: A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145, 37–43 (2014)
You, Z.-H., Zhou, M., Luo, X., Li, S.: Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. 47(3), 731–743 (2017)
Zheng, K., You, Z.-H., Li, J.-Q., Wang, L., Guo, Z.-H., Huang, Y.-A.: iCDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation. PLoS Comput. Biol. 16(5), e1007872 (2020)
Acknowledgements
The authors would like to thank all the editors and anonymous reviewers for their constructive advices.
Funding
This work is supported in part by the National Natural Science Foundation of China (Grant nos. 61722212, 61902342). The authors would like to thank the editors and anonymous reviewers for their constructive advice.
Author information
Authors and Affiliations
Contributions
Z-H. G. and Z-H. Y. considered the algorithm, arranged the datasets, and performed the analyses. H-C. Y., Y-B. W. and Z-H. C. wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The authors declare that they have no competing interests.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Guo, ZH., You, ZH., Li, LP., Chen, ZH., Yi, HC., Wang, YB. (2020). Inferring Drug-miRNA Associations by Integrating Drug SMILES and MiRNA Sequence Information. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2020. Lecture Notes in Computer Science(), vol 12464. Springer, Cham. https://doi.org/10.1007/978-3-030-60802-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-60802-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60801-9
Online ISBN: 978-3-030-60802-6
eBook Packages: Computer ScienceComputer Science (R0)