skip to main content
10.1145/2975167.2985684acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Exploration of regression models for cancer noncoding mutation recurrence

Published: 02 October 2016 Publication History

Abstract

Cancer initiation and progression are caused by"driver"mutations that are vastly outnumbered by the "passenger" mutations that accumulate due to cancer-associated genome instability. With genome-wide detection of somatic mutations now becoming commonplace for moderate-sized cancer studies, improvements in methods for discriminating driver from passenger mutations would significantly advance the field of cancer biology. In large-cohort studies, recurrence of mutations within a regulatory element can be used to identify probable driver mutations; but for small-cohort studies of new cancer types or subtypes, the recurrence approach by itself has limited statistical power. In such cases, bioinformatic approaches work well for functional assessment of somatic mutations within protein-coding genes, but how to functionally assess noncoding somatic mutations--using large-scale datasets of measurements and information about the local genomic context of the mutation--is a fundamental open problem in bioinformatics. Based on recent reports of specific noncoding mutations that drive cancer progression, we proposed and investigated a recurrence-based regression approach for quantifying the cancer-promoting potential of the local genomic and chromatin context of a somatic mutation. We integrated 29 genomic correlates (from sequence conservation, sequence GC content, distance to the nearest gene, and ENCODE project genome location datasets) within seven different regression models in three model classes (generalized linear models (GLMs), ensemble decision tree models, and neural network models). We trained and tested the models using a combined dataset of 4.5 million noncoding somatic mutations from 20 different types of cancer. We then characterized the models' accuracies and obtained relative importance scores for the features. We found that the Poisson regression model performs the best among the regression models and that a deep neural network structure is promising for predicting noncoding mutation recurrence.

References

[1]
C. L. Araya, C. Cenik, J. A. Reuter, et al. Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nature Genetics, 48(2):117--125, 2016.
[2]
S. Bamford, E. Dawson, S. Forbes, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. British Journal of Cancer, 91(2):355--358, 2004.
[3]
F. Bastien, P. Lamblin, R. Pascanu, et al. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning--NIPS, 2012.
[4]
A. P. Boyle, E. L. Hong, M. Hariharan, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Research, 22(9):1790--1797, 2012.
[5]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[6]
A. C. Cameron and P. K. Trivedi. Regression analysis of count data, volume 53. Cambridge University Press, Cambridge, 2013.
[7]
Z. R. Chalmers, F. W. Huang, L. M. Gay, et al. Analysis of tumor mutation burden (TMB) in> 51,000 clinical cancer patients to identify novel non-coding PMS2 promoter mutations associated with increased TMB. Journal of Clinical Oncology, 34(15 suppl):9572, 2016.
[8]
A. B. Chan and N. Vasconcelos. Bayesian Poisson regression for crowd counting. In IEEE 12th International Conference on Computer Vision, pages 545--551. IEEE, 2009.
[9]
S. Cheetham, F. Gruhl, J. Mattick, and M. Dinger. Long noncoding RNAs and the genetics of cancer. British Journal of Cancer, 108(12):2419--2425, 2013.
[10]
F. Cheng, J. Zhao, and Z. Zhao. Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform, 17(4):642--656, 2016.
[11]
F. Chollet. Keras deep learning library. https://github.com/fchollet/keras, May 2015.
[12]
F. Cunningham, M. R. Amode, D. Barrell, et al. Ensembl 2015. Nucleic Acids Research, 43(Database issue):D662--9, 2015.
[13]
E. V. Davydov, D. L. Goode, M. Sirota, et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Computational Biology, 6(12):e1001025, 2010.
[14]
L. Deng and D. Yu. Deep learning: methods and applications. Found Trends Signal Process., 7(3&4):197--387, 2014.
[15]
ENCODE Project Consortium. An integrated encyclopedia of dna elements in the human genome. Nature, 489(7414):57--74, 2012.
[16]
J. Ernst, H. L. Plasterer, I. Simon, and Z. Bar-Joseph. Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome Research, 20(4):526--536, 2010.
[17]
N. J. Fredriksson, L. Ny, J. A. Nilsson, and E. Larsson. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nature Genetics, 46(12):1258--1263, 2014.
[18]
J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001.
[19]
J. M. Hilbe. Negative binomial regression. Cambridge University Press, Cambridge, 2011.
[20]
C. Kandoth, M. D. McLellan, F. Vandin, et al. Mutational landscape and significance across 12 major cancer types. Nature, 502(7471):333--339, 2013.
[21]
E. Khurana, Y. Fu, D. Chakravarty, et al. Role of non-coding sequence variants in cancer. Nature Reviews Genetics, 17(2):93--108, 2016.
[22]
E. Khurana, Y. Fu, V. Colonna, et al. Integrative annotation of variants from 1,092 humans: application to cancer genomics. Science, 342(6154):1235587, 2013.
[23]
A. Kundaje, W. Meuleman, J. Ernst, et al. Integrative analysis of 111 reference human epigenomes. Nature, 518(7539):317--330, 2015.
[24]
M. S. Lawrence, P. Stojanov, P. Polak, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499(7457):214--218, 2013.
[25]
P. Mamoshina, A. Vieira, E. Putin, and A. Zhavoronkov. Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics, 13(5):1445--1454, 2016.
[26]
C. Melton, J. A. Reuter, D. V. Spacek, and M. Snyder. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nature Genetics, 47(7):710--716, 2015.
[27]
D. Perera, D. Chacon, J. A. Thoms, et al. OncoCis: annotation of cis-regulatory mutations in cancer. Genome Biology, 15(10):1--14, 2014.
[28]
R. E. Plant. Spatial data analysis in ecology and agriculture using R. CRC Press, 2012.
[29]
S. A. Ramsey, T. A. Knijnenburg, K. A. Kennedy, et al. Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites. Bioinform, 26(17):2071--2075, 2010.
[30]
B. J. Raphael, J. R. Dobson, L. Oesper, and F. Vandin. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Medicine, 6(5), 2014.
[31]
K. R. Rosenbloom, J. Armstrong, G. P. Barber, et al. The UCSC genome browser database: 2015 update. Nucleic Acids Research, 43(D1):D670--D681, 2015.
[32]
R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, pages 1651--1686, 1998.
[33]
R. L. Siegel, K. D. Miller, and A. Jemal. Cancer statistics, 2015. CA Journal, 65(1):5--29, 2015.
[34]
D. Svetlichnyy, H. Imrichova, M. Fiers, Z. K. Atak, and S. Aerts. Identification of high-impact cis-regulatory mutations using transcription factor specific random forest models. PLoS Computational Biology, 11(11):e1004590, 2015.
[35]
J. Vinagre, A. Almeida, H. Pópulo, et al. Frequency of TERT promoter mutations in human cancers. Nature Communications, 4, 2013.
[36]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11:3371--3408, 2010.
[37]
B. Vogelstein, N. Papadopoulos, V. E. Velculescu, et al. Cancer genome landscapes. Science, 339(6127):1546--1558, 2013.
[38]
N. Weinhold, A. Jacobsen, N. Schultz, C. Sander, and W. Lee. Genome-wide analysis of noncoding regulatory mutations in cancer. Nature Genetics, 46(11):1160--1165, 2014.
[39]
R. Winkelmann. Econometric analysis of count data. Springer Science & Business Media, 2013.
[40]
W. Wu and H. Choudhry. Next generation sequencing in cancer research, volume 2. Springer, 2015.
[41]
J. Zhao, K. Wang, Z. Liao, et al. Promoter mutation of tumor suppressor microRNA-7 is associated with poor prognosis of lung cancer. Molecular and Clinical Oncology, 3(6):1329--1336, 2015.
[42]
J. Zhou and O. G. Troyanskaya. Predicting effects of noncoding variants with deep learning-based sequence model. Nature Methods, 12(10):931--934, 2015.
[43]
A. F. Zuur, E. N. Ieno, N. Walker, A. A. Saveliev, and G. M. Smith. Mixed effects models and extensions in ecology with R. Springer, 2009.

Cited By

View all
  • (2020)Gene mutation detection for breast cancer disease: A reviewIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/830/3/032051830(032051)Online publication date: 19-May-2020

Index Terms

  1. Exploration of regression models for cancer noncoding mutation recurrence

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
    October 2016
    675 pages
    ISBN:9781450342254
    DOI:10.1145/2975167
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bioinformatics
    2. cancer
    3. mutation
    4. noncoding
    5. regression

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Animal Cancer Foundation
    • PhRMA Foundation

    Conference

    BCB '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 885 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Gene mutation detection for breast cancer disease: A reviewIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/830/3/032051830(032051)Online publication date: 19-May-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media