Identifying microRNAs involved in cancer pathway using support vector machines

doi:10.1016/j.compbiolchem.2015.01.007

Computational Biology and Chemistry

Volume 55, April 2015, Pages 31-36

https://doi.org/10.1016/j.compbiolchem.2015.01.007 Get rights and content

Highlights

•
Construction of a two-step SVM classifier for identifying miRNA associated with cancer.
•
Features are extracted from sequence, thermodynamics and miRNA–mRNA hybridization interactions based on experimentally data.
•
For miRSEQ – Positions 1, 6, 10, 19, GG and CC repeat in the miRNA sequence form the optimal feature subset.
•
Optimal features vary significantly based on the number of seed formed by hybrid for miRINT.
•
Final classifier obtained a good performance with cv-rate ranging from 92 to 87.

Abstract

Since Ambros’ discovery of small non-protein coding RNAs in the early 1990s, the past two decades have seen an upsurge in the number of reports of predicted microRNAs (miR), which have been implicated in various functions. The correlation of miRs with cancer has spurred the usage of this class of non-coding RNAs in various cancer therapies, although most of them are at trial stages. However, the experimental identification of a miR to be associated with cancer is still an elaborate, time-consuming process. To aid this process of miR association, we undertook an in-silico study involving the identification of global signatures in experimentally validated microRNAs associated with cancer. Subsequently, a support vector machine based two-step binary classifier system has been trained and modeled from the features extracted from the above study. A total of 60 distinguishing features were selected and ranked to form the feature set for classification – 26 of these extracted from the miR sequence itself, and the remainder from the thermodynamics of folding and the hybridized miRNA–mRNA structure. The two step classifier model – miRSEQ and miRINT had reasonably good performance measures with fairly high values of Matthew’s correlation coefficient (MCC) values ranging from 0.72 to 0.82 (availability: https://sites.google.com/site/sumitslab/tools).

Graphical abstract

Introduction

miRNA (miR) are small non-coding, single stranded RNAs (about 22 nucleotides in length) involved in several regulatory pathways in the cell cycle. They bind to the untranslated regions (UTRs) of mRNA, (particularly the 3'UTR) and play an important role in the post-transcriptional regulation of gene expression (Bartel, 2004, Filipowicz et al., 2008). Recent studies suggest that these noncoding RNAs can bind to 5'UTRs (Ragan et al., 2009) and coding regions (Hausser et al., 2013) of mRNA as well, but little is known about the mechanism of binding and their regulation. Binding of a miR to a specific target in an UTR with complete complementarity either leads to degradation of the mRNA itself or induce translational repression (Esquela-Kerscher and Slack, 2006). In tissues associated with various tumors, it has been observed that the expression pattern of miRs is altered considerably (Cummins et al., 2006, Zhang et al., 2006). Additionally, gene mapping reveals that most of the human miRs are located in chromosomal positions which are susceptible to rearrangements (Calin and Croce, 2007). Hence, it can be asserted that miRs in humans play a major role in the cancer pathway.

Previous studies by several authors have investigated the involvement of different types of base pairing in miR–mRNA interactions and target prediction algorithms have been formulated based on these precincts. These algorithms predominantly considered Watson Crick base pairing between the miR and its respective mRNA – especially with the 2nd to the 8th nucleotide positions of miR – as the potential target sites. However, in later studies, it was found that animal miRs do not bind to mRNA with perfect complementarity (unlike in plants); rather their binding leaves several imperfections like loops, mismatches or bulges and often involves GU(non-Watson Crick) base pairing as well (Axtell et al., 2011, Didiano and Hobert, 2008). Other than these determinants, AU richness around the seed regions and folding of mRNA play a vital role in target binding (Grimson et al., 2007, Robins et al., 2005). All these factors need to be considered, not in isolation but together to hypothesize miR:mRNA interactions.

Some of the computational methods used in the functional annotation of miRs involved in cancer mainly rely on the expression profile of various cancer cell types and statistical analysis for further classification (Jayaswal et al., 2011). These methods utilize the expression profile but they fail to consider the fact that a single miR can bind to several mRNA target sites and regulate the cell differently. Our aim at feature selection was, therefore, to embrace all these redundancy checks. Other attempts to classify miRs into oncogenes and tumor suppressor genes (TSGs) were based on functional and evolutionary features (Wang et al., 2010) like conservation, expression levels, chromosome distribution, etc.

The present study involved a search and analysis of features involved in the interaction of a miR:mRNA associated with cancer. These features encompassed sequential, hybridization and thermodynamics of validated miR:mRNA interactions only. Based on the curated and prioritized features, we developed a two-step machine based classifier model – miRSEQ and miRINT, which will identify a miR to be associated with cancer and also classify the type of its association, i.e., either with an oncogene or a tumor suppressor. Prioritization of the features and a diversification of the models according to the number of seed regions drastically improved the performance of the classifier, as compared to generalized features and holistic hybridization. The incorporation of seed based classification in the determination of features is a novel approach in our algorithm. The final classifier thus developed had good performance with experimentally validated datasets giving good prediction accuracy (cross validation (cv-rate) ranging from 92% to 87%).

Section snippets

Dataset preparation

For the purpose of generating a classifier, the first step needed to be undertaken is the construction of a microRNA dataset which has been experimentally validated to be associated with cancer. To begin with, a list of genes involved in cancer was downloaded from the catalog of somatic mutations (COSMIC) (Higgins et al., 2007). A total of 488 genes were thus listed, which could be further segregated into oncogenes and tumor suppressors by cross-referring with the tumor associated gene database

Results

Dataset preparation was carried out individually for the classifiers miRSEQ and miRINT (Fig. 1). Consequently, a total of 263 miRs were used in the miRSEQ training. Class imbalance problem in the dataset was overcome by the SMOTE (k-nearest algorithm with no replacement) method which generated sufficient number of negative instances for the training set. Like most SVM classification problems related to miRNAs, our dataset was also not linearly separable as it was too complex in nature. RBF was

Discussion

Identifying miR involvement in cancer is a major obstacle for researchers striving to understand the basis of the disease and to generate new therapies against particular cancer types. miRNAs regulate the molecular pathways in cancer by either upregulating or downregulating various oncogenes and tumor suppressors, and sometimes acting as oncogenes themselves. The functional annotation of miRNAs in cancer is still a painstaking process, though cancer therapies using miRNA has been picking up

Acknowledgements

The authors wish to thank Dr. Ranjit Prasad Bahadur, Indian Institute of Technology – Kharagpur, India for his initial assistance in machine learning approaches. Ram K. was supported by a scholarship from Council of Scientific Research and Industrial Research, Govt. of India.

References (40)

D.P. Bartel
MicroRNAs: genomics, biogenesis, mechanism, and function
Cell
(2004)
A. Grimson et al.
MicroRNA targeting specificity in mammals: determinants beyond seed pairing
Mol. Cell
(2007)
B.P. Lewis et al.
Prediction of mammalian microRNA targets
Cell
(2003)
P. Sethupathy et al.
Human microRNA-155 on chromosome 21 differentially interacts with its polymorphic target in the AGTR1 3′ untranslated region: a mechanism for functional single-nucleotide polymorphisms related to phenotypes
Am. J. Hum. Genet.
(2007)
M.J. Axtell et al.
Vive la différence biogenesis and evolution of microRNAs in plants and animals
Genome Biol.
(2011)
S. Bandyopadhyay et al.
TargetMiner:microRNA target prediction with systematic identification of tissue-specific negative examples
Bioinformatics
(2009)
R. Batuwita et al.
microPred: effective classification of pre-miRNAs for human miRNA gene prediction
Bioinformatics
(2009)
R. Batuwita et al.
Efficient resampling methods for training support vector machines in imbalanced datasets
G.A. Calin et al.
Chromosomal rearrangements and microRNAs : a new cancer link with clinical implications
J. Clin. Invest.
(2007)
C.C. Chang et al.
LIBSVM
ACM Trans. Intell. Syst. Technol.
(2011)

N.V. Chawla et al.

SMOTE: synthetic minority over-sampling technique

J. Artif. Intell. Res.

(2002)

J.S. Chen et al.

In silico identification of oncogenic potential of fyn-related kinase in hepatocellular carcinoma

Bioinformatics

(2013)

J.M. Cummins et al.

The colorectal microRNAome

PNAS

(2006)

Didiano, D., Hobert, O., 2008. Molecular architecture of a miRNA-regulated 3′ UTR Molecular architecture of a...

A. Esquela-Kerscher et al.

Oncomirs – microRNAs with a role in cancer

Nat. Rev. Cancer

(2006)

W. Filipowicz et al.

Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight?

Nat. Rev. Genet.

(2008)

S. Griffiths-jones et al.

miRBase: microRNA sequences, targets and gene nomenclature

Nucleic Acids Res.

(2006)

J. Hausser et al.

Analysis of CDS-located miRNA target sites suggests that they can effectively inhibit translation

Genome Res.

(2013)

C. Hebert et al.

High mobility group A2 is a target for miRNA-98 in head and neck squamous cell carcinoma

Mol. Cancer

(2007)

M.E. Higgins et al.

CancerGenes: a gene selection resource for cancer genome projects

Nucleic Acids Res.

(2007)

Cited by (7)

miRNA-based Therapeutic Strategies
2023, Rna-based Mechanisms in Cancer
MicRooN - An Ensemble-based Classifier for Identifying miRNAs Associated with Cancer
2021, 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation, ICAECA 2021
An ensemble based model for the adsorptive removal of amoxicillin by microwave-biochar of waste cotton seeds
2020, AIP Conference Proceedings
MicroRNAs as therapeutic agents: The future of the battle against cancer
2018, Current Topics in Medicinal Chemistry
K-mean clustering of miRNAs associated with cancer
2017, 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017
Classification of colorectal cancer using clustering and feature selection approaches
2017, Advances in Intelligent Systems and Computing

View all citing articles on Scopus

View full text

Identifying microRNAs involved in cancer pathway using support vector machines

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Dataset preparation

Results

Discussion

Acknowledgements

Cell

Mol. Cell

Cell

Am. J. Hum. Genet.

Vive la différence biogenesis and evolution of microRNAs in plants and animals

Genome Biol.

TargetMiner:microRNA target prediction with systematic identification of tissue-specific negative examples

Bioinformatics

microPred: effective classification of pre-miRNAs for human miRNA gene prediction

Bioinformatics

Efficient resampling methods for training support vector machines in imbalanced datasets

Chromosomal rearrangements and microRNAs : a new cancer link with clinical implications

J. Clin. Invest.

LIBSVM

ACM Trans. Intell. Syst. Technol.

SMOTE: synthetic minority over-sampling technique

J. Artif. Intell. Res.

In silico identification of oncogenic potential of fyn-related kinase in hepatocellular carcinoma

Bioinformatics

The colorectal microRNAome

PNAS

Oncomirs – microRNAs with a role in cancer

Nat. Rev. Cancer

Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight?

Nat. Rev. Genet.

miRBase: microRNA sequences, targets and gene nomenclature

Nucleic Acids Res.

Analysis of CDS-located miRNA target sites suggests that they can effectively inhibit translation

Genome Res.

High mobility group A2 is a target for miRNA-98 in head and neck squamous cell carcinoma

Mol. Cancer

CancerGenes: a gene selection resource for cancer genome projects

Nucleic Acids Res.

Chromosomal rearrangements and microRNAs : a new cancer link with clinical implications