Skip to main content

Prediction of LncRNA by Using Muitiple Feature Information Fusion and Feature Selection Technique

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10955))

Included in the following conference series:

Abstract

Recent genomic studies suggest that long non-coding RNAs (lncRNAs) play an important role in regulation of plant growth. Therefore, it is important to find more plant lncRNAs and predict their functions. This paper presents an improved maximum correlation minimum redundancy method for lncRNAs recognition. Sequence feature, secondary structural feature and functional feature such as pseudo-nucleotides feature which is based on the physical and chemical properties between dimers dinucleotide of related RNA have been extracted. Then, using maximum correlation minimum redundancy method to integrate a variety of feature selection methods such as Pearson correlation coefficient, information gain, relief algorithm and random forest for feature selection. Based on the selected superior feature subset, the classification model is established by SVM. Experimental results on Arabidopsis sequence dataset show that pseudo-nucleotides feature reflects information of different RNA sequences and the classification model constructed according to the proposed method can be more accurate than other methods on identification of plant lncRNAs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. An, N., Palmer, C.M., Baker, R.L., et al.: Plant high-throughput phenotyping using photogrammetry and imaging techniques to measure leaf length and rosette area. Comput. Electron. Agric. 127(C), 376–394 (2016)

    Article  Google Scholar 

  2. Perron, U., Provero, P., Molineris, I.: In silico prediction of lncRNA function using tissue specific and evolutionary conserved expression. BMC Bioinform. 18(5), 144 (2017)

    Article  Google Scholar 

  3. Mercer, T.R., Mattick, J.S.: Structure and function of long noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol. 20(3), 300 (2013)

    Article  Google Scholar 

  4. Aryal, B., Rotllan, N., Fernández-hernando, C.: Noncoding RNAs and atherosclerosis. Current Atherosclerosis Rep. 16(5), 1–11 (2014)

    Article  Google Scholar 

  5. Lee, J.T., Bartolomei, M.S.: X-inactivation, imprinting, and long noncoding RNAs in health and disease. Cell 152(6), 1308–1323 (2013)

    Article  Google Scholar 

  6. Pian, C., Zhang, G., Chen, Z., et al.: LncRNApred: classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature. PLoS ONE 11(5), e0154567 (2016)

    Article  Google Scholar 

  7. Wang, L., Park, H.J., Dasari, S., Wang, S., Kocher, J.-P., Li, W.: CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41(6), e74 (2013)

    Article  Google Scholar 

  8. Long, H., Xu, Z., Hu, B., et al.: COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features. Nucleic Acids Res. 45(1), e2 (2017)

    Article  Google Scholar 

  9. Schneider, H.W., Raiol, T., Brigido, M.M., et al.: A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Genom. 18(1), 804 (2017)

    Article  Google Scholar 

  10. Yen, S.J., Lee, Y.S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36(3), 5718–5727 (2009)

    Article  MathSciNet  Google Scholar 

  11. Kumar, M., Gromiha, M.M., Raghava, G.P.: SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit. 24(2), 303–313 (2011)

    Article  Google Scholar 

  12. Tatarinova, T., Brover, V., Troukhan, M., et al.: Skew in CG content near the transcription start site in, Arabidopsis thaliana. Bioinformatics 19(Suppl. 1), i313 (2003)

    Article  Google Scholar 

  13. Stadler, P.F., Hofacker, I.L., Lorenz, R., et al.: ViennaRNA Package 2.0. Algorithms Mol. Biol. 6(1), 26 (2011)

    Article  Google Scholar 

  14. Zhao, Y.W., Su, Z.D., Yang, W., et al.: IonchanPred 2.0: a tool to predict ion channels and their types. Int. J. Mol. Sci. 18(9), 1838 (2017)

    Article  Google Scholar 

  15. Chen, W., Feng, P.M., Lin, H., et al.: iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 41(6), e68 (2013)

    Article  Google Scholar 

  16. Liu, B., Liu, F., Fang, L., et al.: repRNA: a web server for generating various feature vectors of RNA sequences. Mol. Genet. Genomics 291(1), 473–481 (2016)

    Article  Google Scholar 

  17. Zuber, J., Sun, H., Zhang, X., et al.: A sensitivity analysis of RNA folding nearest neighbor parameters identifies a subset of free energy parameters with the greatest impact on RNA secondary structure prediction. Nucleic Acids Res. 45(10), 6168–6176 (2017)

    Article  Google Scholar 

  18. Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. J. 13(1), 211–221 (2013)

    Article  Google Scholar 

  19. Shin, J.H., Park, C.H., Yang, Y.J., et al.: Entropy-based analysis of the non-linear relationship between gene expression profiles of amplified and non-amplified RNA. Int. J. Mol. Med. 20(6), 905 (2007)

    Google Scholar 

  20. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  21. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

Download references

Acknowledgement

The current study was supported by the National Natural Science Foundation of China (Nos. 61472061 and 31471880), and the Graduate Educational Reform Fund of Dalian University of Technology (Jg2017015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yushi Luan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meng, J., Jiang, D., Chang, Z., Luan, Y. (2018). Prediction of LncRNA by Using Muitiple Feature Information Fusion and Feature Selection Technique. In: Huang, DS., Jo, KH., Zhang, XL. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10955. Springer, Cham. https://doi.org/10.1007/978-3-319-95933-7_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95933-7_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95932-0

  • Online ISBN: 978-3-319-95933-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics