Identifying microRNAs involved in cancer pathway using support vector machines
Graphical abstract
Introduction
miRNA (miR) are small non-coding, single stranded RNAs (about 22 nucleotides in length) involved in several regulatory pathways in the cell cycle. They bind to the untranslated regions (UTRs) of mRNA, (particularly the 3'UTR) and play an important role in the post-transcriptional regulation of gene expression (Bartel, 2004, Filipowicz et al., 2008). Recent studies suggest that these noncoding RNAs can bind to 5'UTRs (Ragan et al., 2009) and coding regions (Hausser et al., 2013) of mRNA as well, but little is known about the mechanism of binding and their regulation. Binding of a miR to a specific target in an UTR with complete complementarity either leads to degradation of the mRNA itself or induce translational repression (Esquela-Kerscher and Slack, 2006). In tissues associated with various tumors, it has been observed that the expression pattern of miRs is altered considerably (Cummins et al., 2006, Zhang et al., 2006). Additionally, gene mapping reveals that most of the human miRs are located in chromosomal positions which are susceptible to rearrangements (Calin and Croce, 2007). Hence, it can be asserted that miRs in humans play a major role in the cancer pathway.
Previous studies by several authors have investigated the involvement of different types of base pairing in miR–mRNA interactions and target prediction algorithms have been formulated based on these precincts. These algorithms predominantly considered Watson Crick base pairing between the miR and its respective mRNA – especially with the 2nd to the 8th nucleotide positions of miR – as the potential target sites. However, in later studies, it was found that animal miRs do not bind to mRNA with perfect complementarity (unlike in plants); rather their binding leaves several imperfections like loops, mismatches or bulges and often involves GU(non-Watson Crick) base pairing as well (Axtell et al., 2011, Didiano and Hobert, 2008). Other than these determinants, AU richness around the seed regions and folding of mRNA play a vital role in target binding (Grimson et al., 2007, Robins et al., 2005). All these factors need to be considered, not in isolation but together to hypothesize miR:mRNA interactions.
Some of the computational methods used in the functional annotation of miRs involved in cancer mainly rely on the expression profile of various cancer cell types and statistical analysis for further classification (Jayaswal et al., 2011). These methods utilize the expression profile but they fail to consider the fact that a single miR can bind to several mRNA target sites and regulate the cell differently. Our aim at feature selection was, therefore, to embrace all these redundancy checks. Other attempts to classify miRs into oncogenes and tumor suppressor genes (TSGs) were based on functional and evolutionary features (Wang et al., 2010) like conservation, expression levels, chromosome distribution, etc.
The present study involved a search and analysis of features involved in the interaction of a miR:mRNA associated with cancer. These features encompassed sequential, hybridization and thermodynamics of validated miR:mRNA interactions only. Based on the curated and prioritized features, we developed a two-step machine based classifier model – miRSEQ and miRINT, which will identify a miR to be associated with cancer and also classify the type of its association, i.e., either with an oncogene or a tumor suppressor. Prioritization of the features and a diversification of the models according to the number of seed regions drastically improved the performance of the classifier, as compared to generalized features and holistic hybridization. The incorporation of seed based classification in the determination of features is a novel approach in our algorithm. The final classifier thus developed had good performance with experimentally validated datasets giving good prediction accuracy (cross validation (cv-rate) ranging from 92% to 87%).
Section snippets
Dataset preparation
For the purpose of generating a classifier, the first step needed to be undertaken is the construction of a microRNA dataset which has been experimentally validated to be associated with cancer. To begin with, a list of genes involved in cancer was downloaded from the catalog of somatic mutations (COSMIC) (Higgins et al., 2007). A total of 488 genes were thus listed, which could be further segregated into oncogenes and tumor suppressors by cross-referring with the tumor associated gene database
Results
Dataset preparation was carried out individually for the classifiers miRSEQ and miRINT (Fig. 1). Consequently, a total of 263 miRs were used in the miRSEQ training. Class imbalance problem in the dataset was overcome by the SMOTE (k-nearest algorithm with no replacement) method which generated sufficient number of negative instances for the training set. Like most SVM classification problems related to miRNAs, our dataset was also not linearly separable as it was too complex in nature. RBF was
Discussion
Identifying miR involvement in cancer is a major obstacle for researchers striving to understand the basis of the disease and to generate new therapies against particular cancer types. miRNAs regulate the molecular pathways in cancer by either upregulating or downregulating various oncogenes and tumor suppressors, and sometimes acting as oncogenes themselves. The functional annotation of miRNAs in cancer is still a painstaking process, though cancer therapies using miRNA has been picking up
Acknowledgements
The authors wish to thank Dr. Ranjit Prasad Bahadur, Indian Institute of Technology – Kharagpur, India for his initial assistance in machine learning approaches. Ram K. was supported by a scholarship from Council of Scientific Research and Industrial Research, Govt. of India.
References (40)
MicroRNAs: genomics, biogenesis, mechanism, and function
Cell
(2004)- et al.
MicroRNA targeting specificity in mammals: determinants beyond seed pairing
Mol. Cell
(2007) - et al.
Prediction of mammalian microRNA targets
Cell
(2003) - et al.
Human microRNA-155 on chromosome 21 differentially interacts with its polymorphic target in the AGTR1 3′ untranslated region: a mechanism for functional single-nucleotide polymorphisms related to phenotypes
Am. J. Hum. Genet.
(2007) - et al.
Vive la différence biogenesis and evolution of microRNAs in plants and animals
Genome Biol.
(2011) - et al.
TargetMiner:microRNA target prediction with systematic identification of tissue-specific negative examples
Bioinformatics
(2009) - et al.
microPred: effective classification of pre-miRNAs for human miRNA gene prediction
Bioinformatics
(2009) - et al.
Efficient resampling methods for training support vector machines in imbalanced datasets
- et al.
Chromosomal rearrangements and microRNAs : a new cancer link with clinical implications
J. Clin. Invest.
(2007) - et al.
LIBSVM
ACM Trans. Intell. Syst. Technol.
(2011)
SMOTE: synthetic minority over-sampling technique
J. Artif. Intell. Res.
In silico identification of oncogenic potential of fyn-related kinase in hepatocellular carcinoma
Bioinformatics
The colorectal microRNAome
PNAS
Oncomirs – microRNAs with a role in cancer
Nat. Rev. Cancer
Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight?
Nat. Rev. Genet.
miRBase: microRNA sequences, targets and gene nomenclature
Nucleic Acids Res.
Analysis of CDS-located miRNA target sites suggests that they can effectively inhibit translation
Genome Res.
High mobility group A2 is a target for miRNA-98 in head and neck squamous cell carcinoma
Mol. Cancer
CancerGenes: a gene selection resource for cancer genome projects
Nucleic Acids Res.
Cited by (7)
miRNA-based Therapeutic Strategies
2023, Rna-based Mechanisms in CancerMicRooN - An Ensemble-based Classifier for Identifying miRNAs Associated with Cancer
2021, 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation, ICAECA 2021An ensemble based model for the adsorptive removal of amoxicillin by microwave-biochar of waste cotton seeds
2020, AIP Conference ProceedingsMicroRNAs as therapeutic agents: The future of the battle against cancer
2018, Current Topics in Medicinal ChemistryK-mean clustering of miRNAs associated with cancer
2017, 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017Classification of colorectal cancer using clustering and feature selection approaches
2017, Advances in Intelligent Systems and Computing