skip to main content
10.1145/3386052.3386056acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbbConference Proceedingsconference-collections
research-article

Predicting lncRNA-disease Association based on Extreme Gradient Boosting

Published: 18 May 2020 Publication History

Abstract

There is increasing evidence that long non-coding RNAs (lncRNAs) play an important role in many significant biological processes. Associations' detection between lncRNAs and human diseases by computational models is beneficial to the identification of biomarkers and the discovery of drugs for the diagnosis, treatment, and prognosis of human diseases. In this study, we propose a method called PrLDA (Predicting LncRNA-Disease Association based on extreme gradient boosting) for predicting potential lncRNA-disease associations based on eXtreme Gradient Boosting (XGBoost). Firstly, we compute semantic similarity of diseases and lncRNA sequence similarity. Then, we extracte feature vectors by concatenating these similarities horizontally. At last, the feature matrix after dimension reduction is used as the input for XGBoost and we get the score about the lncRNA association with a specific disease. Computational results indicate that our method can predict lncRNA-disease associations with higher accuracy compared with previous methods. Furthermore, case study shows that our method can effectively predict candidate lncRNAs for breast cancer, with 80% of the top 10 predictions are confirmed by experiments. Therefore, PrLDA is a useful computational method for lncRNA-disease association prediction.

References

[1]
Quinn, J. J. and Chang, H. Y. 2015. Unique features of long non-coding RNA biogenesis and function. Nature Reviews Genetics, 17, 1 (Oct. 2015), 47--62. DOI=http://doi.org/10.1038/nrg.2015.10.
[2]
Wapinski, O. and Chang, H. Y. 2011. Long noncoding RNAs and human disease. Trends In Cell Biology, 21, 6 (Jun. 2011), 354--361. DOI=http://doi.org/10.1016/j.tcb.2011.04.001.
[3]
Taft, R. J., et al. 2010. Non-coding RNAs: regulators of disease. Journal Of Pathology, 220, 2 (Jan. 2010), 126--139. DOI=http://doi.org/10.1002/path.2638.
[4]
Gupta, R. A., et al. 2010. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature, 464, 7291 (Apr. 2010), 1071--1076. DOI=http://doi.org/10.1038/nature08975.
[5]
Wang, J., et al. 2010. CREB up-regulates long non-coding RNA, HULC expression through interaction with microRNA-372 in liver cancer. Nucleic Acids Research, 38, 16 (Apr. 2010), 5366--5383. DOI=http://doi.org/10.1093/nar/gkq285.
[6]
Quan, Z., et al. 2013. NEAT1 long noncoding RNA and paraspeckle bodies modulate HIV-1 posttranscriptional expression. mBio, 4, 1 (Jan. 2013), e00596-12. DOI=http://doi.org/10.1128/mBio.00596-12.
[7]
Chen, X., et al. 2016. IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget, 7, 36 (Sep. 2016), 57919--57931. DOI=http://doi.org/10.18632/oncotarget.11141.
[8]
Zhang, J., et al. 2019. Integrating multiple heterogeneous networks for novel lncRNA-disease association inference. IEEE/ACM Trans Comput Biol Bioinform, 16, 2 (Mar-Apr. 2019), 396--406. DOI=http://doi.org/10.1109/TCBB.2017.2701379.
[9]
Fan, X. N., et al. 2019. Prediction of lncRNA-disease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information. BMC Bioinformatics, 20, 1 (Feb. 2019), 87. DOI=http://doi.org/10.1186/s12859-019-2675-y.
[10]
Chen, X. and Yan, G. Y. 2013. Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics, 29, 20 (Oct. 2013), 2617--2624. DOI=http://doi.org/10.1093/bioinformatics/btt426.
[11]
Lu, C., et al. 2018. Prediction of lncRNA-disease associations based on inductive matrix completion. Bioinformatics, 34, 19 (Oct. 2018), 3357--3364. DOI=http://doi.org/10.1093/bioinformatics/bty327.
[12]
Yu, J., et al. 2018. A novel probability model for lncRNA--disease association prediction based on the naïve bayesian classifier. Genes 9, 7 (Jul. 2018), 345. DOI=http://doi.org/10.3390/genes9070345.
[13]
Lan, W., et al. 2017. LDAP: a web server for lncRNA-disease association prediction. Bioinformatics, 33, 3 (Feb. 2017), 458--460. DOI=http://doi.org/10.1093/bioinformatics/btw639.
[14]
Chen, T. and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco California, 785--794.
[15]
Kibbe, W. A., et al. 2015. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Research, 43, Database issue (Jan. 2015), D1071-D1078. DOI=http://doi.org/10.1093/nar/gku1011.
[16]
Wold, S., Esbensen, K. and Geladi, P. 1987. Principal component analysis. Chemometrics & Intelligent Laboratory Systems, 2, 1 (Aug. 1987), 37--52. DOI=https://doi.org/10.1016/0169-7439(87)80084-9.
[17]
Chen, G., et al. 2013. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Research, 41, Database issue (Jan. 2013), D983-D986. DOI=https://doi.org/10.1093/nar/gks1099.
[18]
Le, D. H. and Dao, L. T. M. 2018. Annotating diseases using human phenotype ontology improves prediction of disease-associated long non-coding RNAs. Journal Of Molecular Biology, 430, 15 (Jul. 2018), 2219--2230. DOI=https://doi.org/10.1016/j.jmb.2018.05.006.
[19]
Guangchuang, Y., et al. 2015. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics, 31, 4 (Oct. 2015), 608--609. DOI=http://doi.acm.org/10.1093/bioinformatics/btu684.
[20]
Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, 707--710.
[21]
Li, J., et al. 2018. Performance evaluation of pathogenicity-computation methods for missense variants. Nucleic Acids Research, 46, 15 (Sep. 2018), 7793--7804. DOI=https://doi.org/10.1093/nar/gky678.
[22]
Vakul, M., et al. 2015. Role of lncRNAs in health and disease-size and shape matter. Briefings in Functional Genomics, 14, 2 (Mar. 2015), 115--129. DOI=https://doi.org/10.1093/bfgp/elu034.
[23]
Chen, X., et al. 2017. Long non-coding RNAs and complex diseases: from experimental results to computational models. Briefings in bioinformatics, 18, 4 (Jul. 2017), 558--576. DOI=https://doi.org/10.1093/bib/bbw060.

Cited By

View all
  • (2024)Recent Advances in Machine Learning Methods for LncRNA-Cancer Associations PredictionCurrent Chinese Science10.2174/01221029812992892403240726394:3(181-201)Online publication date: Jun-2024
  • (2022)Graph Convolutional Auto-Encoders for Predicting Novel lncRNA-Disease AssociationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.307091019:4(2264-2271)Online publication date: 1-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICBBB '20: Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics
January 2020
160 pages
ISBN:9781450376761
DOI:10.1145/3386052
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Natl University of Singapore: National University of Singapore
  • RIED, Tokai Univ., Japan: RIED, Tokai University, Japan

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Extreme gradient boosting
  2. Feature matrix
  3. Prediction model
  4. Principal component analysis
  5. lncRNA-disease associations

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICBBB '20

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Recent Advances in Machine Learning Methods for LncRNA-Cancer Associations PredictionCurrent Chinese Science10.2174/01221029812992892403240726394:3(181-201)Online publication date: Jun-2024
  • (2022)Graph Convolutional Auto-Encoders for Predicting Novel lncRNA-Disease AssociationsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.307091019:4(2264-2271)Online publication date: 1-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media