Skip to main content

The Clark Phase-able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS

  • Conference paper
Book cover Research in Computational Molecular Biology (RECOMB 2010)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

  • 2601 Accesses

Abstract

A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of the paper is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent (IBD), in contrast to short shared tracts which may be identical by state (IBS). Here we estimate for populations, using the US as a model, what sample size of genotyped individuals would be necessary to have sufficiently long shared haplotype regions (tracts) that are identical by descent (IBD), at a statistically significant level. These tracts can then be used as input for a Clark-like phasing method to obtain a complete phasing solution of the sample. We estimate in this paper that for a population like the US and about 1% of the people genotyped (approximately 2 million), tracts of about 200 SNPs long are shared between pairs of individuals IBD with high probability which assures the Clark method phasing success. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being SNP arrayed is large enough and the correctness of the algorithm grows with the number of individuals being genotyped.

We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the Loss of Heterozygosity inference algorithms was inspired by analyzing of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Consortium and we present in this paper similar results with those obtained from the MS data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altshuler, D., Daly, M.J., Lander, E.S.: Genetic mapping in human disease. Science 322(5903), 881–888 (2008)

    Article  Google Scholar 

  2. Browning, B.L., Browning, S.R.: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. American journal of human genetics 84(2), 210–223 (2009)

    Article  MathSciNet  Google Scholar 

  3. Clark, A.G.: Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7(2), 111–122 (1990)

    Google Scholar 

  4. The International Multiple Sclerosis Genetics Consortium: Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357(9), 851–862 (2007)

    Google Scholar 

  5. Gudbjartsson, D.F., Bragi Walters, G., Thorleifsson, G., Stefansson, H., Halldorsson, B.V., et al.: Many sequence variants affecting diversity of adult human height. Nat. Genet. 40(5), 609–615 (2008)

    Article  Google Scholar 

  6. Halldórsson, B.V., Bafna, V., Edwards, N., Yooseph, S., Istrail, S.: A survey of computational methods for determining haplotypes (2004)

    Google Scholar 

  7. Howie, B.N., Donnelly, P., Marchini, J.: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5(6), e1000529 (2009)

    Google Scholar 

  8. Hudson, R.R.: Generating samples under a wright-fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)

    Article  Google Scholar 

  9. Istrail, S.: The haplotype phasing problem. In: Symposium in Honor of Mike Waterman’s 60th Birthday (2002)

    Google Scholar 

  10. Kong, A., Masson, G., Frigge, M.L., et al.: Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40(9), 1068–1075 (2008)

    Article  Google Scholar 

  11. McCarroll, S.A., Kuruvilla, F.G., Korn, J.M., Cawley, S., et al.: Integrated detection and population-genetic analysis of snps and copy number variation. Nat. Genet. 40(10), 1166–1174 (2008)

    Article  Google Scholar 

  12. Minichiello, M.J., Durbin, R.: Mapping trait loci by use of inferred ancestral recombination graphs 79(5), 910–922 (2006)

    Google Scholar 

  13. F. Rivadeneira, U. Styrkarsdottir, K. Estrada, B. Halldorsson, et al., Bone, vol. 44, ch. Twenty loci associated with bone mineral density identified by large-scale meta-analysis of genome-wide association datasets, pp. S230–S231, Elsevier Science, Jun 2009.

    Google Scholar 

  14. Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase 78(4), 629–644 (2006)

    Google Scholar 

  15. Sharan, R., Halldórsson, B.V., Istrail, S.: Islands of tractability for parsimony haplotyping. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(3), 303–311 (2006)

    Article  Google Scholar 

  16. Siva, N.: 1000 genomes project. Nature biotechnology 26(3), 256 (2008)

    Google Scholar 

  17. Stefansson, H., Rujescu, D., Cichon, S., Pietilainen, O.P.H., et al.: Large recurrent microdeletions associated with schizophrenia. Nature 455(7210), 232–236 (2008)

    Article  Google Scholar 

  18. Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data 68(4), 978–989 (2001)

    Google Scholar 

  19. Styrkarsdottir, U., Halldorsson, B.V., Gretarsdottir, S., Gudbjartsson, D.F., Bragi Walters, G., et al.: Multiple Genetic Loci for Bone Mineral Density and Fractures. N. Engl. J. Med. 358(22), 2355–2365 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Halldórsson, B.V., Aguiar, D., Tarpine, R., Istrail, S. (2010). The Clark Phase-able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12683-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12682-6

  • Online ISBN: 978-3-642-12683-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics