Skip to main content

Advertisement

Log in

DeepLNC, a long non-coding RNA prediction tool using deep neural network

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

The significant role of long non-coding RNAs (lncRNAs) in various cellular functions, such as gene imprinting, immune response, embryonic pluripotency, tumorogenesis, and genetic regulations, has been widely studied and reported in recent years. Several experimental and computational methods involving genome-wide search and screenings of ncRNAs are being proposed utilizing sequence features-length, occurrence, and composition of bases with various limitations. The proposed classifier, Deep Neural Network (DNN) is fast and an accurate alternative for the identification of lncRNAs as compared to other existing classifiers. The information content stored in k-mer pattern has been used as a sole feature for the DNN classifier using manually annotated training datasets from LNCipedia and RefSeq database, obtaining accuracy of 98.07 %, sensitivity of 98.98 %, and specificity of 97.19 %, respectively, on test dataset. The k-mer information content generated on the basis of Shannon entropy function has resulted in improved classifier accuracy. This classification framework was also tested on known human genome dataset, and the framework has successfully identified known lncRNAs with 99 % accuracy rate. The said algorithm has been implemented as a web prediction tool, which is available on server interface http://bioserver.iiita.ac.in/deeplnc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Akhter S, Bailey B, Salamon P, Aziz RK, Edwards R (2013) Applying Shannonʼs information theory to bacterial and phage genomes and metagenomes. Sci Reports 3:1033

    Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W et al (1997) Gapped BLAST and PSI BLAST: A new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  Google Scholar 

  • Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS (2011) LncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39(Database issue):D146–D151

    Article  Google Scholar 

  • An J, Lai J, Lehman ML, Nelson C (2013) MiRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res 41(2):727–737

    Article  Google Scholar 

  • Babak T, Blencowe BJ, Hughes TR (2005) A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription. BMC Genom 6:104

    Article  Google Scholar 

  • Badger JH, Olsen GJ (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16(4):512–524

    Article  Google Scholar 

  • Baker M (2011) Long noncoding RNAs: the search for function. Nat Methods 8(5):379–383

    Article  Google Scholar 

  • Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry. W H Freeman, New York

  • Bhartiya D, Pal K, Ghosh S, Kapoor S, Jalali S, Panwar B et al (2013) LncRNome: a comprehensive knowledgebase of human long noncoding RNAs. Database (Oxford) 2013:bat034. doi:10.1093/database/bat034

  • Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTATʼ10, pp 177–186

  • Chen X, Gui Y (2013) Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics 29(20):2617–2624

    Article  Google Scholar 

  • Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X et al (2013) LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res 41(Database issue):D983–D986

    Article  Google Scholar 

  • Clement C, Hill JM, Dua P, Culicchia F, Lukiw WJ (2016) Analysis of RNA from Alzheimer’s Disease Post-mortem Brain Tissues. Mol Neurobiol 53(2):1322–1328. doi:10.1007/s12035-015-9105-6

    Article  Google Scholar 

  • Chu C, Qu K, Zhong FL, Artandi SE, Chang HY (2011) Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol Cell 44(4):667–678

    Article  Google Scholar 

  • Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF et al (2007) Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci USA 104(49):19428–19433

    Article  Google Scholar 

  • Coronnello C, Hartmaier R, Arora A, Huleihel L, Pandit KV, Bais AS et al (2012) Novel modeling of combinatorial miRNA Targeting identifies SNP with potential role in bone density. PLoS Comput Biol 8(12):e1002830 (Print)

    Article  Google Scholar 

  • Dinger ME, Pang KC, Mercer TR, Crowe ML, Grimmond SM, Mattick JS (2009) NRED: a database of long noncoding RNA expression. Nucleic Acids Res 37(Suppl. 1):D122–D126

    Article  Google Scholar 

  • Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MathSciNet  MATH  Google Scholar 

  • Engelen S, Tahi F (2010) Tfold: efficient in silico prediction of non-coding RNA secondary structures. Nucleic Acids Res 38(7):2453–2466

    Article  Google Scholar 

  • Furuno M, Pang KC, Ninomiya N, Fukuda S, Frith MC, Bult C, Kai C, Kawai J, Carninci P, Hayashizaki Y, Mattick JS, Suzuki H (2006) Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet 2(4):537–553

    Article  Google Scholar 

  • Gibb EA, Vucic EA, Enfield KSS, Stewart GL, Lonergan KM, Kennett JY et al (2011) Human cancer long non-coding RNA transcriptomes. PLoS One 6(10):e25915 (Print)

    Article  Google Scholar 

  • Goff LA, Rinn J (2015) Linking RNA biology to lncRNAs. Genome Res. Cold Spring Harbor Laboratory Press 25(10):1456–1465

  • Granovskaia MV, Jensen LJ, Ritchie ME, Toedling J, Ning Y, Bork P, Wolfgang H, Steinmetz LM (2010) High-resolution transcription atlas of the mitotic cell cycle in budding yeast. Genome Biol 11(3):R24

    Article  Google Scholar 

  • Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D et al (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458(7235):223–227

    Article  Google Scholar 

  • Harries LW (2012) Long non-coding RNAs and human disease. Biochem Soc Trans 40(4):902–906

    Article  Google Scholar 

  • Haubold B, Pierstorff N, Moller F, Wiehe T (2005) Genome comparison without alignment using shortest unique substrings. BMC Bioinform 6(1):123

    Article  Google Scholar 

  • Hu W, Yuan B, Flygare J, Lodish HF (2011) Long noncoding RNA-mediated anti-apoptotic activity in murine erythroid terminal differentiation. Genes Dev 25(24):2573–2578

    Article  Google Scholar 

  • Huang W, Long N, Khatib H (2012) Genome-wide identification and initial characterization of bovine long non-coding RNAs from EST data. Anim Gene 43(6):674–682

    Article  Google Scholar 

  • Hüttenhofer A, Schattner P, Polacek N (2005) Non-coding RNAs: hope or hype? Trends Genet 21:289–297

    Article  Google Scholar 

  • Jiang Q, Wang J, Wang Y, Ma R, Wu X, Li Y (2014) TF2LncRNA: identifying common transcription factors for a list of lncRNA genes from ChIP-seq data. BioMed Res Int 2014:317642. doi:10.1155/2014/317642

  • Jin J, Liu J, Wang H, Wong L, Chua NH (2013) PLncDB: plant long non-coding RNA database. Bioinformatics 29(8):1068–1071

    Article  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Image net classification with deep convolutional neural networks. Advances in neural information processing systems, pp 1–9

  • Kung JTY, Colognori D, Lee JT (2013) Long noncoding RNAs: past, present, and future. Genetics 193(3):651–669

    Article  Google Scholar 

  • Lasda E, Roy P (2014) Circular RNAs: diversity of form and function. RNA (New York, N.Y.) 20(12):1829–1842

    Article  Google Scholar 

  • Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning ICML 09, pp 1–8

  • Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M (2014) Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm. Nucleic Acids Res 42(11):e93. doi:10.1093/nar/gku325

  • Li A, Zhang J, Zhou Z (2014) PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinform 15:311

    Article  Google Scholar 

  • Liao Q, Xiao H, Bu D, Xie C, Miao R, Luo H et al (2011) NcFANs: a web server for functional annotation of long non-coding RNAs. Nucleic Acids Res 39(Suppl):2

    Article  Google Scholar 

  • Lin MF, Jungreis I, Kellis M (2011) PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27(13):i275–i282

    Article  Google Scholar 

  • Liu J, Gough J, Rost B (2006) Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet 2:529–536

    Article  Google Scholar 

  • Ma H, Hao Y, Dong X, Gong Q, Chen J, Zhang J, Tian W (2012) Molecular mechanisms and function prediction of long noncoding RNA. Sci World J 2012(1):541786

    Google Scholar 

  • Marques AC, Hughes J, Graham B, Kowalczyk MS, Higgs DR, Ponting CP (2013) Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol 14(11):R131

    Article  Google Scholar 

  • Morris KV, Mattick JS (2014) The rise of regulatory RNA. Nat Rev Genet 15(6):423–437

    Article  Google Scholar 

  • Nesterov Y (2007) Gradient methods for minimizing composite objective function. Core discussion paper. ReCALL 76.2007076 (2007): 2007/76

  • Niazi F, Valadkhan S (2012) Computational analysis of functional long noncoding RNAs reveals lack of peptide-coding capacity and parallels with 3ʼ UTRs. RNA 18(4):825–843

    Article  Google Scholar 

  • Nie L, Wu HJ, Hsu JM, Chang SS, LaBaff AM, Li CW, Wang Y, Hsu JL, Hung MC (2012) Long non-coding RNAs: versatile master regulators of gene expression and crucial players in cancer. Am J Transl Res 4(2):127–150

    Google Scholar 

  • Paraskevopoulou MD, Georgakilas G, Kostoulas N, Reczko M, Maragkakis M, Dalamagas TM, Hatzigeorgiou AG (2013) DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs. Nucleic Acids Res 41(D1):D239–D245

    Article  Google Scholar 

  • Park C, Yu N, Choi I, Kim W, Lee S (2014) lncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs. Bioinformatics 30(17):2480–2485

    Article  Google Scholar 

  • Pasmant E, Laurendeau I, Héron D, Vidaud M, Vidaud D, Bièche I (2007) Characterization of a germ-line deletion, including the entire INK4/ARF locus, in a melanoma-neural system tumor family: identification of ANRIL, an antisense noncoding RNA whose expression coclusters with ARF. Cancer Res 67(8):3963–3969

    Article  Google Scholar 

  • Ponting CP, Oliver PL, Reik W (2009) Evolution and functions of long noncoding RNAs. Cell 136(4):629–641

    Article  Google Scholar 

  • Prensner JR, Chinnaiyan AM (2011) The emergence of lncRNAs in cancer biology. Cancer Discov 1(5):391–407

    Article  Google Scholar 

  • Qinghua J, Rui M, Jixuan W, Xiaoliang W, Shuilin J, Jiajie P, Tan R, Zhang T, Li Y, Wang Y (2015) LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data. BMC Genom 16(3):S2

    Google Scholar 

  • Rè M, Pesole G, Horner DS (2009) Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics. BMC Bioinformatics 10:282. doi:10.1186/1471-2105-10-282

  • Rinn JL (2014) LncRNAs: linking RNA to chromatin. Cold Spring Harb Perspect Biol 6(8). pii: a018614. doi:10.1101/cshperspect.a018614

  • Sacco LDA, Baldassarre A, Masotti A (2012) Bioinformatics tools and novel challenges in long non-coding RNAs (lncRNAs) functional analysis. Int J Mol Sci 13(1):97–114

    Google Scholar 

  • Sales G, Coppe A, Bisognin A, Biasiolo M, Bortoluzzi S, Romualdi C (2010) Magia, a web-based tool for miRNA and genes integrated analysis. Nucleic Acids Res 38(2). (Print)

  • Simon MD (2013) Capture hybridization analysis of RNA targets (CHART). Curr Protoc Mol Biol. doi:10.1002/0471142727.mb2125s101

  • Singh DK, Prasanth KV (2013) Functional insights into the role of nuclear-retained long noncoding RNAs in gene expression control in mammalian cells. Chromosome Res Int J Mole Supramole Evolut Aspects Chromosome Biol 21(6–7):695–711

    Article  Google Scholar 

  • Sun L, Zhang Z, Bailey TL, Perkins AC, Tallack MR, Xu Z, Liu H (2012) Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study. BMC Bioinform 13:331

    Article  Google Scholar 

  • Sutter JMJ, Kalivas JHJ (1993) Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection. Microchem J 47:60–66

    Article  Google Scholar 

  • Thangaiah PR, Shriram R, Vivekanandan K (2009) Adaptive hybrid methods for Feature selection based on Aggregation of Information gain and Clustering methods. Int J Comput Sci Netw Secur 9(2):164–169

    Google Scholar 

  • Tripathi R, Sharma P, Chakraborty P, Varadwaj PK (2016) Next-generation sequencing revolution through big data analytics. Front Life Sci. doi:10.1080/21553769.2016.1178180

    Google Scholar 

  • Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert K, Vandesompele J, Mestdaghet P (2013) LNCipedia: a database for annotated human IncRNA transcript sequences and structures. Nucleic Acids Res 41(Database issue):D246–D251

    Article  Google Scholar 

  • Wager S, Wang S, Liang PC (2013) Dropout training as adaptive regularization. NIPS, pp 1–11

  • Wain HM, Lush MJ, Ducluzeau F, Khodiyar VK, Povey S (2004) Genew: the human gene nomenclature database. Nucleic Acids Res 32:255–257

    Article  Google Scholar 

  • Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

    Article  Google Scholar 

  • Wapinski O, Chang HY (2011) Long noncoding RNAs and human disease. Trends Cell Biol 21:354–361

    Article  Google Scholar 

  • Washietl S, Hofacker IL (2007) Identifying structural noncoding RNAs using RNAz. Curr Protoc Bioinformatics. doi:10.1002/0471250953.bi1207s19

  • Wright MW (2014) A short guide to long non-coding RNA gene nomenclature. Human genomics. BioMed Central Ltd 8(1):7

    Google Scholar 

  • Xie C, Yuan J, Li H, Li M, Zhao G, Bu D, Zhu W, Wu W, Chen R, Zhao Y (2014) NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res 42(Database issue):D98–D103

    Article  Google Scholar 

  • Yan ZJ, Huo Q, Xu J (2013) A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, pp 104–108

  • Yang JH, Li JH, Jiang S, Zhou H, Qu LH (2013) ChIPBasea database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data. Nucleic Acids Res 41(D):177–187

    Article  Google Scholar 

  • Zeiler MD (2012) ADADELTA: an adaptive learning rate method. eprint http://arXiv.1212.5701

  • Zhang Y, Guan DG, Yang JH, Shao P, Zhou H, Qu LH (2010) ncRNAimprint: a comprehensive database of mammalian imprinted noncoding RNAs. RNA 16(10):1889–1901

    Article  Google Scholar 

  • Zhao J, Ohsumi TK, Kung JT, Ogawa Y, Grau DJ, Sarma K, Song J, Kingston R, Borowsky M, Lee JT (2010) Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol Cell 40(6):939–953

    Article  Google Scholar 

  • Zhou M, Wang X, Li J, Hao D, Wang Z, Shi H, Han L, Zhou H, Sun J (2015) Prioritizing candidate disease-related long non-coding RNAs by walking on the heterogeneous lncRNA and disease network. Mol BioSyst 11(3):760–769

    Article  Google Scholar 

  • Zhu J, Liu S, Ye F, Shen Y, Tie Y, Zhu J, Jin Y, Zheng X, Wu Y, Fu H (2014) The long noncoding RNA expression profile of hepatocellular carcinoma identified by microarray analysis. PLoS One 9(7):e101707. doi:10.1371/journal.pone.0101707

    Article  Google Scholar 

Download references

Acknowledgments

We are thankful to Department of Bioinformatics, Indian Institute of Information Technology-Allahabad, India for providing the computational facility to perform the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pritish Kumar Varadwaj.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 405718 kb)

Supplementary material 2 (XLSX 327573 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripathi, R., Patel, S., Kumari, V. et al. DeepLNC, a long non-coding RNA prediction tool using deep neural network. Netw Model Anal Health Inform Bioinforma 5, 21 (2016). https://doi.org/10.1007/s13721-016-0129-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-016-0129-2

Keywords

Navigation