research-article

Algorithms for detecting complementary SNPs within a region of interest that are associated with diseases

Authors:

Sinan Erten,

Marzieh Ayati,

Yu Liu,

Mark R. Chance,

Mehmet KoyutürkAuthors Info & Claims

BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pages 194 - 201

https://doi.org/10.1145/2382936.2382961

Published: 07 October 2012 Publication History

Get Access

Abstract

Genome Wide Association Studies (GWAS) comprehensively compare common genetic variants in affected and control populations to identify variants that are potentially associated with diseases. In recent years, GWAS successfully identified susceptible genes for many diseases. However, limitations of GWAS in uncovering the cellular mechanisms of complex diseases have been increasingly pronounced. In particular, GWAS analyze disease associations at the single variant level (e.g., single nucleotide polymorphism -- SNP), however the functional links between these variants and the disease manifest at the level of genes, their products, and interactions. Since many genes are associated with multiple SNPs (within their coding and regulatory regions, i.e., regions of interest), it is not straightforward to characterize the association of individual genes with diseases based on SNP-level genotype data. Many of the existing studies that study functional implications of GWAS assess disease-gene association by simply taking the most statistically significant SNP in the gene's region of interest. Recently, some alternate approaches have been proposed to integrate the genotypes of all SNPs within the region of interest. In this study, we take an algorithmic approach to the problem and identify the optimal subset of SNPs that provide the maximum disease association score within each region of interest. The proposed algorithms represent the "genotype" of a gene as a combination of a subset of SNPs within its region of interest and search for the subset that maximizes the test statistic comparing this representative genotype in case and control samples. In order to handle the multiple testing problem, we compute the statistical significance of these scores by using permutation tests and using a background population that takes into account the number of variants lying in the region of interest (gene). We apply the proposed algorithms on a GWAS dataset for Type 2 Diabetes (T2D). To assess the performance of different algorithms, we use a manually curated set of genes known to be associated with T2D and compare different algorithms using ROC curves. Our experimental results show that the proposed algorithms are able to identify disease genes missed by other methods, with better sensitivity against the false positive rate.

References

[1]

V. Bansal, O. Libiger, A. Torkamani, and N. J. Schork. Statistical analysis strategies for association studies involving rare variants. Nature reviews. Genetics, 11(11):773--785, Nov. 2010.

Abstract

References

Cited By

Index Terms

Recommendations

Computational analysis of 3'UTR region of CASP3 with respect to miRSNPs and SNPs in targetting miRNAs

eQTL networks unveil enriched mRNA master integrators downstream of complex disease-associated SNPs

Identification and characterization of differentially expressed genes in Type 2 Diabetes using in silico approach

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations