research-article

An ensemble SVM model for the accurate prediction of non-canonical MicroRNA targets

Authors:

Asish Ghoshal,

Ananth Grama,

Saurabh Bagchi,

Somali ChaterjiAuthors Info & Claims

BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics

Pages 403 - 412

https://doi.org/10.1145/2808719.2808761

Published: 09 September 2015 Publication History

Get Access

Abstract

Background MicroRNAs are small non-coding endogenous RNAs that are responsible for post-transcriptional regulation of genes. Given that large numbers of human genes are targeted by microRNAs, understanding the precise mechanism of microRNA action and accurately mapping their targets is of paramount importance; this will uncover the role of microRNAs in development, differentiation, and disease pathogenesis. However, the current state-of-the-art computational methods for microRNA target prediction suffer from high false-positive rates to be useful in practice.

Results In this paper, we develop a suite of models for microRNA target prediction, under the banner Avishkar, that have superior prediction performance over the state-of-the-art protocols. Specifically, our final model developed in this paper achieves an average true positive rate of more than 75%, when keeping the false positive rate of 20%, for non-canonical microRNA target sites in humans. This is an improvement of over 150% in the true positive rate for non-canonical sites, over the best competitive protocol. We are able to achieve such superior performance by representing the thermodynamic and sequence profiles of microRNA-mRNA interaction as curves, coming up with a novel metric of seed enrichment to model seed matches as well as all possible non-canonical matches, and learning an ensemble of microRNA family-specific non-linear SVM classifiers. We provide an easy-to-use system, built on top of Apache Spark, for large-scale interactive analysis and prediction of microRNA targets. All operations in our system, namely candidate set generation, feature generation and transformation, training, prediction and computing performance metrics are fully distributed and are scalable.

Availability All source code and sample data is available at https://bitbucket.org/cellsandmachines/avishkar. We also provide scalable implementations of kernel SVM using Apache Spark, which can be used to solve large-scale non-linear binary classification problems at https://bitbucket.org/cellsandmachines/kernelsvmspark.

References

[1]

D. P. Bartel. MicroRNAs: genomics, biogenesis, mechanism, and function. cell, 116(2):281--297, 2004.

Abstract

References

Cited By

Index Terms

Recommendations

Computational regulatory network construction from microRNA and transcription factor perspectives

Brief communication: In silico identification of conserved microRNAs and their target transcripts from expressed sequence tags of three earthworm species

Detecting microarray data supported microRNA-mRNA interactions

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations