skip to main content
10.1145/2382936.2382961acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Algorithms for detecting complementary SNPs within a region of interest that are associated with diseases

Published: 07 October 2012 Publication History

Abstract

Genome Wide Association Studies (GWAS) comprehensively compare common genetic variants in affected and control populations to identify variants that are potentially associated with diseases. In recent years, GWAS successfully identified susceptible genes for many diseases. However, limitations of GWAS in uncovering the cellular mechanisms of complex diseases have been increasingly pronounced. In particular, GWAS analyze disease associations at the single variant level (e.g., single nucleotide polymorphism -- SNP), however the functional links between these variants and the disease manifest at the level of genes, their products, and interactions. Since many genes are associated with multiple SNPs (within their coding and regulatory regions, i.e., regions of interest), it is not straightforward to characterize the association of individual genes with diseases based on SNP-level genotype data. Many of the existing studies that study functional implications of GWAS assess disease-gene association by simply taking the most statistically significant SNP in the gene's region of interest. Recently, some alternate approaches have been proposed to integrate the genotypes of all SNPs within the region of interest. In this study, we take an algorithmic approach to the problem and identify the optimal subset of SNPs that provide the maximum disease association score within each region of interest. The proposed algorithms represent the "genotype" of a gene as a combination of a subset of SNPs within its region of interest and search for the subset that maximizes the test statistic comparing this representative genotype in case and control samples. In order to handle the multiple testing problem, we compute the statistical significance of these scores by using permutation tests and using a background population that takes into account the number of variants lying in the region of interest (gene). We apply the proposed algorithms on a GWAS dataset for Type 2 Diabetes (T2D). To assess the performance of different algorithms, we use a manually curated set of genes known to be associated with T2D and compare different algorithms using ROC curves. Our experimental results show that the proposed algorithms are able to identify disease genes missed by other methods, with better sensitivity against the false positive rate.

References

[1]
V. Bansal, O. Libiger, A. Torkamani, and N. J. Schork. Statistical analysis strategies for association studies involving rare variants. Nature reviews. Genetics, 11(11):773--785, Nov. 2010.
[2]
Y. Dai, L. Guo, J. Dong, and R. Jiang. Improved power by collapsing rare and common variants based on a data-adaptive forward selection strategy. BMC Proc, 5 Suppl 9, 2011.
[3]
C. Dering, C. Hemmelmann, E. Pugh, and A. Ziegler. Statistical analysis of rare sequence variants: an overview of collapsing methods. Genetic Epidemiology, 35(S1):S12--S17, 2011.
[4]
H. J. Edenberg and Y. Liu. Laboratory methods for high-throughput genotyping. Cold Spring Harbor Protocols, 2009(11):pdb.top62, 2009.
[5]
C. C. Elbers,..., and N. C. Onland-Moret. Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genetic Epidemiology, 33(5):419--431, 2009.
[6]
L. A. Hindorff,..., and T. A. Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362--9367, 2009.
[7]
M. Holden, S. Deng, L. Wojnowski, and B. Kulle. Gsea-snp: applying gene set enrichment analysis to snp data from genome-wide association studies. Bioinformatics, 24(23):2784--2785, 2008.
[8]
W. J. Kent,..., Haussler, and D. The Human Genome Browser at UCSC. Genome Research, 12(6):996--1006, June 2002.
[9]
E. S. Lander and et. al. Initial sequencing and analysis of the human genome. Nature, 409(6822):860--921, Feb. 2001.
[10]
C. C. Laurie,..., B. S. Weir, and GENEVA Investigators. Quality control and quality assurance in genotypic data for genome-wide association studies. Genetic epidemiology, 34(6):591--602, Sept. 2010.
[11]
B. Lehne, C. M. Lewis, and T. Schlitt. From SNPs to Genes: Disease Association at the Gene Level. PLoS ONE, 6(6):e20133+, June 2011.
[12]
B. Li and S. M. Leal. Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data. AJHG, 83(3):311--321, 2008.
[13]
J. Lim, K. Hong, H. Jin, Y. Kim, H. Park, and B. Oh. Type 2 diabetes genetic association database manually curated for the study design and odds ratio. BMC Medical Informatics and Decision Making, 10(76), 2010.
[14]
J. Z. Liu,..., and S. Macgregor. A versatile gene-based test for genome-wide association studies. AJHG, 87(1):139--145, 07 2010.
[15]
B. E. Madsen and S. R. Browning. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet, 5(2):e1000384, 02 2009.
[16]
T. A. Manolio. Genomewide association studies and assessment of the risk of disease. New England Journal of Medicine, 363(2):166--176, 2010.
[17]
S. Morgenthaler and W. G. Thilly. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: A cohort allelic sums test (cast). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 615:28--56, 2007.
[18]
S. Purcell,..., and P. C. Sham. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. AJHG, 81(3):559--575, Sept. 2007.
[19]
C. C. A. Spencer, Z. Su, P. Donnelly, and J. Marchini. Designing genome-wide association studies: Sample size, power, imputation, and the choice of genotyping chip. PLoS Genet, 5(5):e1000477, 05 2009.
[20]
Y. Sung, T. Rice, and D. Rao. Application of collapsing methods for continuous traits to the genetic analysis workshop 17 exome sequence data. BMC Proc, 5 Suppl 9, 2011.
[21]
E.-K. Tan. Genome-wide association studies: Promises and pitfalls. Annals Academy of Medicine Singapore, 39(2):77--78, 2010.
[22]
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145):661--678, June 2007.
[23]
K. Wang, M. Li, and M. Bucan. Pathway-based approaches for analysis of genomewide association studies. AJHG, 81(6):1278--1283, 2007.
[24]
N. R. Wray, M. E. Goddard, and P. M. Visscher. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Research, 17(10):1520--1528, 2007.
[25]
M. Zawistowski,..., and S. Zöllner. Extending rare-variant testing strategies: Analysis of noncoding sequence and imputed genotypes. AJHG, (5):604--607.

Cited By

View all
  • (2014)Prioritization of genomic locus pairs for testing epistasisProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2649449(240-248)Online publication date: 20-Sep-2014

Index Terms

  1. Algorithms for detecting complementary SNPs within a region of interest that are associated with diseases

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
        October 2012
        725 pages
        ISBN:9781450316705
        DOI:10.1145/2382936
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 October 2012

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. GWAS
        2. case-control studies
        3. summary statistics

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        BCB' 12
        Sponsor:

        Acceptance Rates

        BCB '12 Paper Acceptance Rate 33 of 159 submissions, 21%;
        Overall Acceptance Rate 254 of 885 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 27 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2014)Prioritization of genomic locus pairs for testing epistasisProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2649449(240-248)Online publication date: 20-Sep-2014

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media