Elsevier

Computational Biology and Chemistry

Volume 71, December 2017, Pages 230-235
Computational Biology and Chemistry

Research Article
Predicting microRNA biological functions based on genes discriminant analysis

https://doi.org/10.1016/j.compbiolchem.2017.09.008Get rights and content

Abstract

Although thousands of microRNAs (miRNAs) have been identified in recent experimental efforts, it remains a challenge to explore their specific biological functions through molecular biological experiments. Since those members from same family share same or similar biological functions, classifying new miRNAs into their corresponding families will be helpful for their further functional analysis. In this study, we initially built a vector space by characterizing the features from miRNA sequences and structures according to their miRBase family organizations. Then we further assigned miRNAs into its specific miRNA families by developing a novel genes discriminant analysis (GDA) approach in this study. As can be seen from the results of new families from GDA, in each of these new families, there was a high degree of similarity among all members of nucleotide sequences. At the same time, we employed 10-fold cross-validation machine learning to achieve the accuracy rates of 68.68%, 80.74%, and 83.65% respectively for the original miRNA families with no less than two, three, and four members. The encouraging results suggested that the proposed GDA could not only provide a support in identifying new miRNAs’ families, but also contributing to predicting their biological functions.

Introduction

MicroRNAs (miRNAs) are one kind of endogenous non-coding RNAs (ncRNAs) with the length of 20–25 nucleotides. They can bind to the 3′ untranslated regions (UTRs) and suppress the expression of their target messenger RNAs (mRNAs) at post-transcriptional level through sequence-specific base pairing. Accumulated studies have shown that miRNAs play critical roles in significant signalling pathways, biological processes, and pathophysiologies (Bartel, 2004, Ambros, 2001, Ambros, 2004, Meister and Tuschi, 2004, Backes et al., 2017). Up to now, several biological experiments for identifying the biological functions of miRNAs have made advances by direct cloning (Liu et al., 2009), forward and reverse genetics (Gurumurthy et al., 2016), and so on. However, these methods would be a high-cost when applied to large-scale miRNAs. In fact, a striking feature of these miRNAs is that their loci are usually clustered in the genome (Subramanian et al., 2005,). More in-depth studies, including miRNA co-expression and primary transcript identification, suggest that the majority of miRNA clusters are transcribed as a single unit and share common biological functions (Baskerville and Bartel, 2005, Rhee et al., 2013). Meanwhile, miRNA’s families can be viewed as a special case of gene families or genes sets (Li et al., 2015), and miRNA families are highly conserved in nucleotide sequences and secondary structures among closely related species over evolutionary time (Cuperus et al., 2011, Kaczkowski et al., 2008, Kamanu et al., 2013, Hertel and Stadler, 2015). Therefore, given an existing miRNA family system (the miRBase (Griffiths-Jones et al., 2007) family organization), we proposed a bioinformatics method to accurately and efficiently identify miRNAs’ families for the sake of their biological function prediction. We reviewed literature in the following sections for the convenience of descriptions.

By clustering miRNAs, Sanger et al. (Griffiths-Jones et al., 2003) constructed an RNA family database: Rfam, which was a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Ding et al. (Ding et al., 2011) employed N-grams to extract features from precursor miRNA sequences, then trained a multiclass SVM classifier to classify new miRNAs. Based on the advanced machine learning-based web server, Zou et al. (Zou et al., 2014) proposed a novel hierarchical model of random forests in the study (miRClassify). This model could predict miRNAs according to its primary miRNA sequence and properly assigned it into a specific miRNA family in a cascade manner.

Most of traditional miRNA family research approaches (Griffiths-Jones et al., 2003, Ding et al., 2011, Zou et al., 2014) started with constructing feature vectors by primary or precursor miRNAs, and then used different algorithms to establish the miRNA family system. However, with the enormous miRNAs across all species, these methods would require a large amount of computing resources and time. For instance, although miRFam had a decent performance on classifying genes, the dimensions of feature vector that the model constructed had reached to 340. It would be time-consuming when applied to large-scale validation miRNAs.

This paper proposed a novel effective classification approach named Genes Discriminant Analysis (GDA) to assign new miRNAs to their corresponding families. The basic idea of GDA is similar to Mahalanobis distance discriminant analysis in multivariate statistical analysis (Pemajayantha et al., 2003). So, the proposed GDA algorithm does not have very complicated or expensive computing components due to using an existing miRNA family system. As such, GDA would have a good time complexity that allowed it to run faster. Furthermore, we took into consideration the highly conserved sequences and structures of miRNAs, and constructed a 116-dimensional vector to improve the model performance. In those, the proposed model would contribute to accurately identifying miRNAs’ family and annotating new miRNAs’ biological functions.

Section snippets

Methods

MiRNA genes produce three major RNA products: primary miRNAs (pri-miRNAs), precursor miRNAs (pre-miRNAs), and mature miRNAs. The pri-miRNA, pre-miRNA, and mature miRNA each contain identical seed sequences and all have the potential to interact with target mRNAs (Kim, 2005). Besides, the miRNA families are highly conserved in nucleotide sequences and secondary structures (Kaczkowski et al., 2008, Lee et al., 2007). Taken together, we initially built a vector space by characterizing features of

Misjudgment rate

There is no doubt that misclassification is inevitable for any classifier. In this paper, the misjudgment rate E was used to weigh the effectiveness of discriminant functions in GDA. We firstly constructed corresponding discriminant functions in each family from miRBase, and then employed all known families’ miRNAs as learning samples. Through the back-substitution training, the samples and predicted results were outputted as a discriminant matrix N,N=[N11N1kNijNjiNiiNk1Nkk](K=1,2,579)

Discussion and conclusion

Increasing evidences indicate that miRNAs are involved in many physiological processes, such as ontogeny, cell differentiation and proliferation, apoptosis, etc. (Zhu et al., 2016, Fernando et al., 2012). And they are closely related to the development and progression of human diseases, like tumor, cardiovascular disease and autoimmune disease, etc. (Mendell and Olson, 2012, Alaimo et al., 2014, Kim, 2015). With the development of experimental and computational miRNAs detecting methods, many

Acknowledgments

The authors are grateful to Prof. Sam Griffiths-Jones for his useful information about miRNA family construction in miRBase, to Vice Prof. Jie Gao for her critical suggestion on experimental design, and to Prof. Henry Han at Fordham University for polishing this paper. This research was supported by the National Natural Science Foundation of China (Grant No. 11271163), Major Research Plan of National Natural Science of China (Grant No. 91730301), and the Foundation of the Innovation Project of

References (41)

  • S. Baskerville et al.

    Microarray profiling of microRNAs reveals frequent co-expression with neighboring miRNAs and host genes

    RNA

    (2005)
  • Y. Bengio et al.

    No unbiased estimator of the variance of k-fold cross-validation

    J. Mach. Learn. Res.

    (2004)
  • T. Bose et al.

    FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information

    J. Biosci.

    (2015)
  • J.T. Cuperus et al.

    Evolution and functional diversification of MIRNA genes

    Plant Cell

    (2011)
  • J. Ding et al.

    miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM

    BMC Bioinf.

    (2011)
  • T.R. Fernando et al.

    MicroRNAs in B cell development and malignancy

    J. Hematol. Oncol.

    (2012)
  • S. Griffiths-Jones et al.

    Rfam: an RNA family database

    Nucleic Acids Res.

    (2003)
  • S. Griffiths-Jones et al.

    miRBase: tools for microRNA genomics

    Nucleic Acids Res.

    (2007)
  • C.B. Gurumurthy et al.

    CRISPR: a versatile tool for both forward and reverse genetics research

    Hum. Genet.

    (2016)
  • Y. He et al.

    Expression and effect on migration and proliferation of miR-585-5p in gastric cancer cell lines

    J. Modern Oncol.

    (2017)
  • View full text