ABSTRACT
There is increasing evidence that long non-coding RNAs (lncRNAs) play an important role in many significant biological processes. Associations' detection between lncRNAs and human diseases by computational models is beneficial to the identification of biomarkers and the discovery of drugs for the diagnosis, treatment, and prognosis of human diseases. In this study, we propose a method called PrLDA (Predicting LncRNA-Disease Association based on extreme gradient boosting) for predicting potential lncRNA-disease associations based on eXtreme Gradient Boosting (XGBoost). Firstly, we compute semantic similarity of diseases and lncRNA sequence similarity. Then, we extracte feature vectors by concatenating these similarities horizontally. At last, the feature matrix after dimension reduction is used as the input for XGBoost and we get the score about the lncRNA association with a specific disease. Computational results indicate that our method can predict lncRNA-disease associations with higher accuracy compared with previous methods. Furthermore, case study shows that our method can effectively predict candidate lncRNAs for breast cancer, with 80% of the top 10 predictions are confirmed by experiments. Therefore, PrLDA is a useful computational method for lncRNA-disease association prediction.
- Quinn, J. J. and Chang, H. Y. 2015. Unique features of long non-coding RNA biogenesis and function. Nature Reviews Genetics, 17, 1 (Oct. 2015), 47--62. DOI=http://doi.org/10.1038/nrg.2015.10.Google Scholar
- Wapinski, O. and Chang, H. Y. 2011. Long noncoding RNAs and human disease. Trends In Cell Biology, 21, 6 (Jun. 2011), 354--361. DOI=http://doi.org/10.1016/j.tcb.2011.04.001.Google Scholar
- Taft, R. J., et al. 2010. Non-coding RNAs: regulators of disease. Journal Of Pathology, 220, 2 (Jan. 2010), 126--139. DOI=http://doi.org/10.1002/path.2638.Google ScholarCross Ref
- Gupta, R. A., et al. 2010. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature, 464, 7291 (Apr. 2010), 1071--1076. DOI=http://doi.org/10.1038/nature08975.Google ScholarCross Ref
- Wang, J., et al. 2010. CREB up-regulates long non-coding RNA, HULC expression through interaction with microRNA-372 in liver cancer. Nucleic Acids Research, 38, 16 (Apr. 2010), 5366--5383. DOI=http://doi.org/10.1093/nar/gkq285.Google ScholarCross Ref
- Quan, Z., et al. 2013. NEAT1 long noncoding RNA and paraspeckle bodies modulate HIV-1 posttranscriptional expression. mBio, 4, 1 (Jan. 2013), e00596-12. DOI=http://doi.org/10.1128/mBio.00596-12.Google Scholar
- Chen, X., et al. 2016. IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget, 7, 36 (Sep. 2016), 57919--57931. DOI=http://doi.org/10.18632/oncotarget.11141.Google Scholar
- Zhang, J., et al. 2019. Integrating multiple heterogeneous networks for novel lncRNA-disease association inference. IEEE/ACM Trans Comput Biol Bioinform, 16, 2 (Mar-Apr. 2019), 396--406. DOI=http://doi.org/10.1109/TCBB.2017.2701379.Google ScholarDigital Library
- Fan, X. N., et al. 2019. Prediction of lncRNA-disease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information. BMC Bioinformatics, 20, 1 (Feb. 2019), 87. DOI=http://doi.org/10.1186/s12859-019-2675-y.Google ScholarCross Ref
- Chen, X. and Yan, G. Y. 2013. Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics, 29, 20 (Oct. 2013), 2617--2624. DOI=http://doi.org/10.1093/bioinformatics/btt426.Google ScholarCross Ref
- Lu, C., et al. 2018. Prediction of lncRNA-disease associations based on inductive matrix completion. Bioinformatics, 34, 19 (Oct. 2018), 3357--3364. DOI=http://doi.org/10.1093/bioinformatics/bty327.Google ScholarCross Ref
- Yu, J., et al. 2018. A novel probability model for lncRNA--disease association prediction based on the naïve bayesian classifier. Genes 9, 7 (Jul. 2018), 345. DOI=http://doi.org/10.3390/genes9070345.Google ScholarCross Ref
- Lan, W., et al. 2017. LDAP: a web server for lncRNA-disease association prediction. Bioinformatics, 33, 3 (Feb. 2017), 458--460. DOI=http://doi.org/10.1093/bioinformatics/btw639.Google Scholar
- Chen, T. and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco California, 785--794.Google Scholar
- Kibbe, W. A., et al. 2015. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Research, 43, Database issue (Jan. 2015), D1071-D1078. DOI=http://doi.org/10.1093/nar/gku1011.Google Scholar
- Wold, S., Esbensen, K. and Geladi, P. 1987. Principal component analysis. Chemometrics & Intelligent Laboratory Systems, 2, 1 (Aug. 1987), 37--52. DOI=https://doi.org/10.1016/0169-7439(87)80084-9.Google ScholarCross Ref
- Chen, G., et al. 2013. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Research, 41, Database issue (Jan. 2013), D983-D986. DOI=https://doi.org/10.1093/nar/gks1099.Google Scholar
- Le, D. H. and Dao, L. T. M. 2018. Annotating diseases using human phenotype ontology improves prediction of disease-associated long non-coding RNAs. Journal Of Molecular Biology, 430, 15 (Jul. 2018), 2219--2230. DOI=https://doi.org/10.1016/j.jmb.2018.05.006.Google ScholarCross Ref
- Guangchuang, Y., et al. 2015. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics, 31, 4 (Oct. 2015), 608--609. DOI=http://doi.acm.org/10.1093/bioinformatics/btu684.Google Scholar
- Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, 707--710.Google Scholar
- Li, J., et al. 2018. Performance evaluation of pathogenicity-computation methods for missense variants. Nucleic Acids Research, 46, 15 (Sep. 2018), 7793--7804. DOI=https://doi.org/10.1093/nar/gky678.Google ScholarCross Ref
- Vakul, M., et al. 2015. Role of lncRNAs in health and disease-size and shape matter. Briefings in Functional Genomics, 14, 2 (Mar. 2015), 115--129. DOI=https://doi.org/10.1093/bfgp/elu034.Google Scholar
- Chen, X., et al. 2017. Long non-coding RNAs and complex diseases: from experimental results to computational models. Briefings in bioinformatics, 18, 4 (Jul. 2017), 558--576. DOI=https://doi.org/10.1093/bib/bbw060.Google Scholar
Index Terms
- Predicting lncRNA-disease Association based on Extreme Gradient Boosting
Recommendations
A Novel Computational Method for Predicting LncRNA-Disease Associations from Heterogeneous Information Network with SDNE Embedding Model
Intelligent Computing Theories and ApplicationAbstractRecent studies have shown that lncRNAs play a critical role in numerous complex human diseases. Thus, identification of lncRNA and diseases associations can help us to understand disease pathogenesis at the molecular level and develop disease ...
Prediction of lncRNA-Disease Associations from Heterogeneous Information Network Based on DeepWalk Embedding Model
Intelligent Computing MethodologiesAbstractLong non-coding RNA is a class of non-coding RNAs, with a length of more than 200 nucleotides. A large number of studies have shown that lncRNAs are involved in various life processes of the Human body and play an important role in the occurrence, ...
Predicting LncRNA-Disease Associations Based on Tensor Decomposition Method
Intelligent Computing Theories and ApplicationAbstractLong non-coding RNA (lncRNA) plays an important role in many biological processes. A large number of studies have shown that predicting the associations between lncRNAs and diseases may uncover the causation of various diseases. However, ...
Comments