Abstract
Selecting an informative subset of SNPs, generally referred to as tag SNPs, to genotype and analyze is considered to be an essential step toward effective disease association studies. However, while the selected informative tag SNPs may characterize the allele information of a target genomic region, they are not necessarily the ones directly associated with disease or with functional impairment. To address this limitation, we present a first integrative SNP selection system that simultaneously identifies SNPs that are both informative and carry a deleterious functional effect – which in turn means that they are likely to be directly associated with disease. We formulate the problem of selecting functionally informative tag SNPs as a multi-objective optimization problem and present a heuristic algorithm for addressing it. We also present the system we developed for assessing the functional significance of SNPs. To evaluate our system, we compare it to other state-of-the-art SNP selection systems, which conduct both information-based tag SNP selection and function-based SNP selection, but do so in two separate consecutive steps. Using 14 datasets, based on disease-related genes curated by the OMIM database, we show that our system consistently improves upon current systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hedrick, P.: Genetics of pouplation, 3rd edn. Jones and Bartlett Publishers (2004)
Bhatti, P., Church, D., Rutter, J.L., Struewing, J.P., Sigurdson, A.J.: Candidate single nucleotide polymorphism selection using publicly available tools: a guide for epidemiologists. American Journal of Epidemiology 164, 794–804 (2006)
Sherry, S., Ward, M., Kholodov, M., Baker, J., Phan, L., Smigielski, E., Sirotkin, K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29, 308–311 (2001)
Brunham, L.R., Singaraja, R.R., Pape, T.D., Kejariwai, A., Thomas, P.D., Hayden, M.R.: Accurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 gene. PLOS Genetics 1, 739–747 (2005)
Rebbeck, T.R., Ambrosone, C.B., Bell, D.A., Chanock, S.J., Hayes, R.B., Kadlubar, F.F., Thomas, D.C.: SNPs, haplotypes, and cancer: applications in molecular epidemiology. Cancer Epidemiology, Biomarkers & Prevention 13, 681–687 (2004)
Conde, L., Vaquerizas, J.M., Ferrer-Costa, C., de la Cruz, X., Orozco, M., Dopazo1, J.: PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes. American Journal of Epidemiology 33, 501–505 (2005)
Hemminger, B.M., Saelim, B., Sullivan, P.F.: TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics 22, 626–627 (2006)
Xu, H., Gregory, S.G., Hauser, E.R., Stenger, J.E., Pericak-Vance, M.A., Vance, J.M., Zuchner, S., Hauser, M.A.: SNPselector: a web tool for selecting SNPs for genetic association studies. Bioinformatics 21, 4181–4186 (2005)
Lee, P.H., Shatkay, H.: BNTagger: improved tagging SNP selection using Bayesian networks. Bioinformatics 22, e211–219 (2006)
Sebastiani, P., Lazarus, R., Weiss, S.T., Kunkel, L.M., Kohane, I.S., Ramoni, M.F.: Minimal haplotype tagging. Proceedings of the National Academy of Sciences 100, 9900–9905 (2003)
Halperin, E., Kimmel, G., Sharmir, R.: Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21, i195–i203 (2005)
Bafna, V., Halldorsson, B.V., Schwartz, R., Clark, A.G., Istrail, S.: Haplotypes and Informative SNP Selection Algorithms: Don’t Block Out Information. In: Proceedings of the 7th International Conference on Computational Molecular Biology, pp. 19–26 (2003)
Bakker, P.D., Graham, R.R., Altshuler, D., Henderson, B., Haiman, C.: Transferability of tag SNPs to capture common genetic variation in DNA repair genes across multiple population. In: Proceedings of Pacific Symposium on Biocomputing (2006)
Halldorsson, B.V., Istrail, S., Vega, F.D.L.: Optimal selection of SNP markers for disease association studies. American Journal of Epidemiology 58(3-4), 190–202 (2004)
Lee, P.H.: Computational haplotype analysis: An overview of computational methods in genetic variation study. Technical Report, -512, Queen’s University, Kingston, ON, Canada (2006), WEB URL: http://www.cs.queensu.ca/TechReports/Reports/2006-512.pdf
Ramensky, V., Sunyaev, S.: Human non-synonymous SNPs: surver and survey. Nucleic Acid Research 30, 3894–3900 (2002)
Ng, P., Henikoff, S.: Predicting deleterious amino acid substitutions. Genome Research 11, 863–874 (2001)
Reumers, J., Schymkowitz, J., Ferkinghoff-Borg, J., Stricher, F., Serrano, L., Rousseau, F.: SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs. Nucleic Acid Research 33, D527–532 (2005)
Yue, P., Melamud, E., Moult, J.: SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7, 166 (2006)
Karchin, R., et al.: LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics 21, 2814–2820 (2005)
Cartegni, L., Wang, J., Zhu, Z., Zhang, M.Q., Krainer, A.R.: ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Research 31, 3568–3571 (2003)
Yeo, G., Burge, C.B.: Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proceeding of Proc. Natl. Acad. Sci. 101(44), 15700–15705 (2004)
Fairbrother, W.G., Yeh, R.F., Sharp, P.A., Burge, C.B.: Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 (2002)
Zhang, et al.: Exon inclusion is dependent on predictable exonic splicing enhancers. Molecular and Cellular Biology 25(16), 7323–7332 (2005)
Akiyama, Y.: TFSEARCH: Searching Transcription Factor Binding Sites (1998), WEB URL: http://www.rwcp.or.jp/papia/
Sandelin, A., Wasserman, W.W., Lenhard, B.: ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Research 32, W249–252 (2004)
Hubbard, T.J.P., et al.: Ensembl, Nucleic Acids Research (Database issue) (2007)
Karolchik, D., et al.: The ucsc genome browser database. Nucl. Acids Res. 31(1), 51–54 (2003)
Krawczak, M., Thomas, N.S., Hundrieser, B., Mort, M., Wittig, M., Hampe, J., Cooper, D.N.: Single base-pair substitutions in exon-intron junctions of human genes: nature, distribution, and consequences for mrna splicing. Human Mutation 28(2), 150–158 (2007)
McKusick-Nathans Institute of Genetic Medicine, J.H.U., National Center for Biotechnology Information, N.L.o.M.: Online Mendelian Inheritance in Man, OMIM (TM). WEB URL: http://www.ncbi.nlm.nih.gov/omim/
The International HapMap Consortium: The International HapMap Project. Nature 426, 789–796 (2003)
Hedrick, P.: Gametic disequilibrium measures: proceed with caution. Genetics 117, 331–341 (1987)
Lee, S.M.: Goal programming for decision analysis. Auerback, Philadelphia (1972)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, P.H., Shatkay, H. (2007). Two Birds, One Stone: Selecting Functionally Informative Tag SNPs for Disease Association Studies. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)