skip to main content
10.1145/2808719.2808761acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

An ensemble SVM model for the accurate prediction of non-canonical MicroRNA targets

Published: 09 September 2015 Publication History

Abstract

Background MicroRNAs are small non-coding endogenous RNAs that are responsible for post-transcriptional regulation of genes. Given that large numbers of human genes are targeted by microRNAs, understanding the precise mechanism of microRNA action and accurately mapping their targets is of paramount importance; this will uncover the role of microRNAs in development, differentiation, and disease pathogenesis. However, the current state-of-the-art computational methods for microRNA target prediction suffer from high false-positive rates to be useful in practice.
Results In this paper, we develop a suite of models for microRNA target prediction, under the banner Avishkar, that have superior prediction performance over the state-of-the-art protocols. Specifically, our final model developed in this paper achieves an average true positive rate of more than 75%, when keeping the false positive rate of 20%, for non-canonical microRNA target sites in humans. This is an improvement of over 150% in the true positive rate for non-canonical sites, over the best competitive protocol. We are able to achieve such superior performance by representing the thermodynamic and sequence profiles of microRNA-mRNA interaction as curves, coming up with a novel metric of seed enrichment to model seed matches as well as all possible non-canonical matches, and learning an ensemble of microRNA family-specific non-linear SVM classifiers. We provide an easy-to-use system, built on top of Apache Spark, for large-scale interactive analysis and prediction of microRNA targets. All operations in our system, namely candidate set generation, feature generation and transformation, training, prediction and computing performance metrics are fully distributed and are scalable.
Availability All source code and sample data is available at https://bitbucket.org/cellsandmachines/avishkar. We also provide scalable implementations of kernel SVM using Apache Spark, which can be used to solve large-scale non-linear binary classification problems at https://bitbucket.org/cellsandmachines/kernelsvmspark.

References

[1]
D. P. Bartel. MicroRNAs: genomics, biogenesis, mechanism, and function. cell, 116(2):281--297, 2004.
[2]
D. P. Bartel. MicroRNAs: target recognition and regulatory functions. Cell, 136(2):215--233, 2009.
[3]
D. Betel, A. Koppal, P. Agius, C. Sander, and C. Leslie. Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome biology, 11(8):R90, 2010.
[4]
R. W. Carthew and E. J. Sontheimer. Origins and mechanisms of miRNAs and siRNAs. Cell, 136(4):642--655, 2009.
[5]
S. W. Chi, J. B. Zang, A. Mele, and R. B. Darnell. Argonaute hits-clip decodes microRNA--mRNA interaction maps. Nature, 460(7254):479--486, 2009.
[6]
P. M. Clark, P. Loher, K. Quann, J. Brody, E. R. Londin, and I. Rigoutsos. Argonaute clip-seq reveals miRNA targetome diversity across tissue types. Scientific reports, 4, 2014.
[7]
R. C. Friedman, K. K.-H. Farh, C. B. Burge, and D. P. Bartel. Most mammalian mRNAs are conserved targets of microRNAs. Genome research, 19(1):92--105, 2009.
[8]
H. P. Graf, E. Cosatto, L. Bottou, I. Durdanovic, and V. Vapnik. Parallel Support Vector Machines : The Cascade SVM. In Advances in Neural Information Processing Systems, pages 521--528, 2005.
[9]
A. Grimson, K. K.-H. Farh, W. K. Johnston, P. Garrett-Engele, L. P. Lim, and D. P. Bartel. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Molecular cell, 27(1):91--105, 2007.
[10]
A. Helwak, G. Kudla, T. Dudnakova, and D. Tollervey. Mapping the human miRNA interactome by clash reveals frequent noncanonical binding. Cell, 153(3):654--665, 2013. 60% of seed interactions are noncanonical, containing bulged or mismatched nucleotides. Seed matches contains bulges.
[11]
A. Helwak and D. Tollervey. Mapping the miRNA interactome by cross-linking ligation and sequencing of hybrids (clash). Nature protocols, 9(3):711--728, 2014.
[12]
M. D. Jansson and A. H. Lund. MicroRNA and cancer. Molecular oncology, 6(6):590--610, 2012.
[13]
T. K. K. Kamanu, A. Radovanovic, J. a. C. Archer, and V. B. Bajic. Exploration of miRNA families for hypotheses generation. Scientific reports, 3:2940, 2013.
[14]
M. Kertesz, N. Iovino, U. Unnerstall, U. Gaul, and E. Segal. The role of site accessibility in microRNA target recognition. Nature genetics, 39(10):1278--1284, 2007.
[15]
M. Khorshid, J. Hausser, M. Zavolan, and E. van Nimwegen. A biophysical miRNA-mRNA interaction model infers canonical and noncanonical targets. Nature methods, 10(3):253--255, 2013.
[16]
M. Khorshid, J. Hausser, M. Zavolan, and E. van Nimwegen. A biophysical miRNA-mRNA interaction model infers canonical and noncanonical targets. http://www.clipz.unibas.ch, 2013. {Online; accessed 01-Mar-2015}.
[17]
S. Kishore, L. Jaskiewicz, L. Burger, J. Hausser, M. Khorshid, and M. Zavolan. A quantitative analysis of clip methods for identifying binding sites of RNA-binding proteins. Nature methods, 8(7):559--564, 2011.
[18]
J. Krol, I. Loedige, and W. Filipowicz. The widespread regulation of microRNA biogenesis, function and decay. Nature Reviews Genetics, 11(9):597--610, 2010.
[19]
J. Krüger and M. Rehmsmeier. RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic acids research, 34(suppl 2):W451--W454, 2006.
[20]
J.-H. Li, S. Liu, H. Zhou, L.-H. Qu, and J.-H. Yang. starbase v2. 0: decoding miRNA-ceRNA, miRNA-ncRNA and protein--RNA interaction networks from large-scale clip-seq data. Nucleic acids research, page gkt1248, 2013.
[21]
D. D. Licatalosi, A. Mele, J. J. Fak, J. Ule, M. Kayikci, S. W. Chi, T. A. Clark, A. C. Schweitzer, J. E. Blume, X. Wang, et al. Hits-clip yields genome-wide insights into brain alternative RNA processing. Nature, 456(7221):464--469, 2008.
[22]
C. Liu, B. Mallick, D. Long, W. A. Rennie, A. Wolenc, C. S. Carmack, and Y. Ding. Clip-based prediction of mammalian microRNA binding sites. Nucleic acids research, 41(14):e138--e138, 2013.
[23]
R. Lorenz, S. H. Bernhart, C. H. Zu Siederdissen, H. Tafer, C. Flamm, P. F. Stadler, I. L. Hofacker, et al. ViennaRNA package 2.0. Algorithms for Molecular Biology, 6(1):26, 2011.
[24]
W. H. Majoros, P. Lekprasert, N. Mukherjee, R. L. Skalsky, D. L. Corcoran, B. R. Cullen, and U. Ohler. MicroRNA target site identification by integrating sequence and binding information. Nature methods, 10(7):630--633, 2013.
[25]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[26]
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advanced in Large Margin Classifiers, pages 61--74. MIT Press, 1999.
[27]
W. Ritchie, S. Flamant, and J. E. Rasko. Predicting microRNA targets and functions: traps for the unwary. Nature methods, 6(6):397--398, 2009.
[28]
A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome research, 15(8):1034--1050, 2005.
[29]
Z. Wang, K. Crammer, and S. Vucetic. Breaking the Curse of Kernelization : Budgeted Stochastic Gradient Descent for Large-Scale SVM Training. The Journal of Machine Learning Research, 13(1):3103--3131, 2012.
[30]
W. Xu, A. San Lucas, Z. Wang, and Y. Liu. Identifying microRNA targets in different gene regions. BMC Bioinformatics, 15:1--11, 2014.
[31]
W. Xu, Z. Wang, and Y. Liu. The characterization of microRNA-mediated gene regulation as impacted by both target site location and seed match type. PloS one, 9(9):e108260, 2014.
[32]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 10--10, Berkeley, CA, USA, 2010. USENIX Association. Apache Spark.

Cited By

View all
  • (2024)Beyond Sequence: A Novel Image-Based Model for MicroRNA Target PredictionSoutheastCon 202410.1109/SoutheastCon52093.2024.10500205(922-927)Online publication date: 15-Mar-2024
  • (2022)Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencingBMC Bioinformatics10.1186/s12859-021-04547-023:1Online publication date: 6-Jan-2022
  • (2022)ApproxNet: Content and Contention-Aware Video Object Classification System for Embedded ClientsACM Transactions on Sensor Networks10.1145/346353018:1(1-27)Online publication date: 28-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
September 2015
683 pages
ISBN:9781450338530
DOI:10.1145/2808719
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2015

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. MicroRNA
  2. apache spark
  3. distributed machine learning
  4. kernel SVM
  5. large-scale
  6. mRNA
  7. non-canonical matches
  8. target prediction

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '15
Sponsor:

Acceptance Rates

BCB '15 Paper Acceptance Rate 48 of 141 submissions, 34%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Beyond Sequence: A Novel Image-Based Model for MicroRNA Target PredictionSoutheastCon 202410.1109/SoutheastCon52093.2024.10500205(922-927)Online publication date: 15-Mar-2024
  • (2022)Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencingBMC Bioinformatics10.1186/s12859-021-04547-023:1Online publication date: 6-Jan-2022
  • (2022)ApproxNet: Content and Contention-Aware Video Object Classification System for Embedded ClientsACM Transactions on Sensor Networks10.1145/346353018:1(1-27)Online publication date: 28-Feb-2022
  • (2021)Simultaneous learning of individual microRNA-gene interactions and regulatory comodulesBMC Bioinformatics10.1186/s12859-021-04151-222:1Online publication date: 10-May-2021
  • (2020)OPTIMUSCLOUDProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489159(189-204)Online publication date: 15-Jul-2020
  • (2020)ApproxDetProceedings of the 18th Conference on Embedded Networked Sensor Systems10.1145/3384419.3431159(449-462)Online publication date: 16-Nov-2020
  • (2020)Vision Paper: Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime MeasuresIEEE Open Journal of the Computer Society10.1109/OJCS.2020.30068071(155-172)Online publication date: 2020
  • (2020)Human MicroRNA Target Prediction via Multi-Hypotheses LearningJournal of Computational Biology10.1089/cmb.2020.0227Online publication date: 25-Nov-2020
  • (2020)A Multi-hypothesis Learning Algorithm for Human and Mouse miRNA Target PredictionComputational Advances in Bio and Medical Sciences10.1007/978-3-030-46165-2_9(102-120)Online publication date: 29-Apr-2020
  • (2019)AIKYATAN: mapping distal regulatory elements using convolutional learning on GPUBMC Bioinformatics10.1186/s12859-019-3049-120:1Online publication date: 7-Oct-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media