skip to main content
10.1145/3386052.3386056acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbbConference Proceedingsconference-collections
research-article

Predicting lncRNA-disease Association based on Extreme Gradient Boosting

Authors Info & Claims
Published:18 May 2020Publication History

ABSTRACT

There is increasing evidence that long non-coding RNAs (lncRNAs) play an important role in many significant biological processes. Associations' detection between lncRNAs and human diseases by computational models is beneficial to the identification of biomarkers and the discovery of drugs for the diagnosis, treatment, and prognosis of human diseases. In this study, we propose a method called PrLDA (Predicting LncRNA-Disease Association based on extreme gradient boosting) for predicting potential lncRNA-disease associations based on eXtreme Gradient Boosting (XGBoost). Firstly, we compute semantic similarity of diseases and lncRNA sequence similarity. Then, we extracte feature vectors by concatenating these similarities horizontally. At last, the feature matrix after dimension reduction is used as the input for XGBoost and we get the score about the lncRNA association with a specific disease. Computational results indicate that our method can predict lncRNA-disease associations with higher accuracy compared with previous methods. Furthermore, case study shows that our method can effectively predict candidate lncRNAs for breast cancer, with 80% of the top 10 predictions are confirmed by experiments. Therefore, PrLDA is a useful computational method for lncRNA-disease association prediction.

References

  1. Quinn, J. J. and Chang, H. Y. 2015. Unique features of long non-coding RNA biogenesis and function. Nature Reviews Genetics, 17, 1 (Oct. 2015), 47--62. DOI=http://doi.org/10.1038/nrg.2015.10.Google ScholarGoogle Scholar
  2. Wapinski, O. and Chang, H. Y. 2011. Long noncoding RNAs and human disease. Trends In Cell Biology, 21, 6 (Jun. 2011), 354--361. DOI=http://doi.org/10.1016/j.tcb.2011.04.001.Google ScholarGoogle Scholar
  3. Taft, R. J., et al. 2010. Non-coding RNAs: regulators of disease. Journal Of Pathology, 220, 2 (Jan. 2010), 126--139. DOI=http://doi.org/10.1002/path.2638.Google ScholarGoogle ScholarCross RefCross Ref
  4. Gupta, R. A., et al. 2010. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature, 464, 7291 (Apr. 2010), 1071--1076. DOI=http://doi.org/10.1038/nature08975.Google ScholarGoogle ScholarCross RefCross Ref
  5. Wang, J., et al. 2010. CREB up-regulates long non-coding RNA, HULC expression through interaction with microRNA-372 in liver cancer. Nucleic Acids Research, 38, 16 (Apr. 2010), 5366--5383. DOI=http://doi.org/10.1093/nar/gkq285.Google ScholarGoogle ScholarCross RefCross Ref
  6. Quan, Z., et al. 2013. NEAT1 long noncoding RNA and paraspeckle bodies modulate HIV-1 posttranscriptional expression. mBio, 4, 1 (Jan. 2013), e00596-12. DOI=http://doi.org/10.1128/mBio.00596-12.Google ScholarGoogle Scholar
  7. Chen, X., et al. 2016. IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget, 7, 36 (Sep. 2016), 57919--57931. DOI=http://doi.org/10.18632/oncotarget.11141.Google ScholarGoogle Scholar
  8. Zhang, J., et al. 2019. Integrating multiple heterogeneous networks for novel lncRNA-disease association inference. IEEE/ACM Trans Comput Biol Bioinform, 16, 2 (Mar-Apr. 2019), 396--406. DOI=http://doi.org/10.1109/TCBB.2017.2701379.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fan, X. N., et al. 2019. Prediction of lncRNA-disease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information. BMC Bioinformatics, 20, 1 (Feb. 2019), 87. DOI=http://doi.org/10.1186/s12859-019-2675-y.Google ScholarGoogle ScholarCross RefCross Ref
  10. Chen, X. and Yan, G. Y. 2013. Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics, 29, 20 (Oct. 2013), 2617--2624. DOI=http://doi.org/10.1093/bioinformatics/btt426.Google ScholarGoogle ScholarCross RefCross Ref
  11. Lu, C., et al. 2018. Prediction of lncRNA-disease associations based on inductive matrix completion. Bioinformatics, 34, 19 (Oct. 2018), 3357--3364. DOI=http://doi.org/10.1093/bioinformatics/bty327.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yu, J., et al. 2018. A novel probability model for lncRNA--disease association prediction based on the naïve bayesian classifier. Genes 9, 7 (Jul. 2018), 345. DOI=http://doi.org/10.3390/genes9070345.Google ScholarGoogle ScholarCross RefCross Ref
  13. Lan, W., et al. 2017. LDAP: a web server for lncRNA-disease association prediction. Bioinformatics, 33, 3 (Feb. 2017), 458--460. DOI=http://doi.org/10.1093/bioinformatics/btw639.Google ScholarGoogle Scholar
  14. Chen, T. and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco California, 785--794.Google ScholarGoogle Scholar
  15. Kibbe, W. A., et al. 2015. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Research, 43, Database issue (Jan. 2015), D1071-D1078. DOI=http://doi.org/10.1093/nar/gku1011.Google ScholarGoogle Scholar
  16. Wold, S., Esbensen, K. and Geladi, P. 1987. Principal component analysis. Chemometrics & Intelligent Laboratory Systems, 2, 1 (Aug. 1987), 37--52. DOI=https://doi.org/10.1016/0169-7439(87)80084-9.Google ScholarGoogle ScholarCross RefCross Ref
  17. Chen, G., et al. 2013. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Research, 41, Database issue (Jan. 2013), D983-D986. DOI=https://doi.org/10.1093/nar/gks1099.Google ScholarGoogle Scholar
  18. Le, D. H. and Dao, L. T. M. 2018. Annotating diseases using human phenotype ontology improves prediction of disease-associated long non-coding RNAs. Journal Of Molecular Biology, 430, 15 (Jul. 2018), 2219--2230. DOI=https://doi.org/10.1016/j.jmb.2018.05.006.Google ScholarGoogle ScholarCross RefCross Ref
  19. Guangchuang, Y., et al. 2015. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics, 31, 4 (Oct. 2015), 608--609. DOI=http://doi.acm.org/10.1093/bioinformatics/btu684.Google ScholarGoogle Scholar
  20. Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, 707--710.Google ScholarGoogle Scholar
  21. Li, J., et al. 2018. Performance evaluation of pathogenicity-computation methods for missense variants. Nucleic Acids Research, 46, 15 (Sep. 2018), 7793--7804. DOI=https://doi.org/10.1093/nar/gky678.Google ScholarGoogle ScholarCross RefCross Ref
  22. Vakul, M., et al. 2015. Role of lncRNAs in health and disease-size and shape matter. Briefings in Functional Genomics, 14, 2 (Mar. 2015), 115--129. DOI=https://doi.org/10.1093/bfgp/elu034.Google ScholarGoogle Scholar
  23. Chen, X., et al. 2017. Long non-coding RNAs and complex diseases: from experimental results to computational models. Briefings in bioinformatics, 18, 4 (Jul. 2017), 558--576. DOI=https://doi.org/10.1093/bib/bbw060.Google ScholarGoogle Scholar

Index Terms

  1. Predicting lncRNA-disease Association based on Extreme Gradient Boosting

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICBBB '20: Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics
        January 2020
        160 pages
        ISBN:9781450376761
        DOI:10.1145/3386052

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 May 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader