Skip to main content

HAPLOFREQ – Estimating Haplotype Frequencies Efficiently

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Abstract

A commonly used tool in disease association studies is the search for discrepancies between the haplotype distribution in the case and control populations. In order to find this discrepancy, the haplotypes frequency in each of the populations is estimated from the genotypes.

We present a new method HAPLOFREQ to estimate haplotype frequencies over a short genomic region given the genotypes or haplotypes with missing data or sequencing errors. Our approach incorporates a maximum likelihood model based on a simple random generative model which assumes that the genotypes are independently sampled from the population. We first show that if the phased haplotypes are given, possibly with missing data, we can estimate the frequency of the haplotypes in the population by finding the global optimum of the likelihood function in polynomial time. If the haplotypes are not phased, finding the maximum value of the likelihood function is NP-hard. In this case we define an alternative likelihood function which can be thought of as a relaxed likelihood function. We show that the maximum relaxed likelihood can be found in polynomial time, and that the optimal solution of the relaxed likelihood approaches asymptotically to the haplotype frequencies in the population.

In contrast to previous approaches, our algorithms are guaranteed to converge in polynomial time to a global maximum of the different likelihood functions. We compared the performance of our algorithm to the widely used program PHASE, and we found that our estimates are at least 10% more accurate than PHASE and about ten times faster than PHASE.

Our techniques involve new algorithms in convex optimization. These algorithms may be of independent interest. Particularly, they may be helpful in other maximum likelihood problems arising from survey sampling.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Clark, A.G.: Inference of haplotypes from pcr-amplified samples of diploid populations. Journal of Molecular Biology and Evolution 7(2), 111–122 (1990)

    Google Scholar 

  2. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29(2), 229–232 (2001)

    Article  Google Scholar 

  3. Excoffier, L., Slatkin, M.: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution 12(5), 921–927 (1995)

    Google Scholar 

  4. Fallin, D., Schork, N.J.: Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. American Journal of Human Genetics 67, 947–959 (2000)

    Article  Google Scholar 

  5. Gusfield, D.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. In: Proceedings of the 6th Annual International Conference on (Research in) Computational (Molecular) Biology (2002)

    Google Scholar 

  6. Gusfield, D.: A practical algorithm for optimal inference of haplotypes from diploid populations. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, ISMB (2000)

    Google Scholar 

  7. Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of Computational Biology 8(3), 305–323 (2001)

    Article  MathSciNet  Google Scholar 

  8. Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics (2004)

    Google Scholar 

  9. Halperin, E., Karp, R.: The minimum-entropy set cover problem (2003) (manuscript)

    Google Scholar 

  10. Hawley, M.E., Kidd, K.K.: Haplo: a program using the em algorithm to estimate the frequencies of multi-site haplotypes. Journal of Heredity 86(5), 409–411 (1995)

    Google Scholar 

  11. Khachiyan, L.G.: Polynomial algorithms in linear programming. USSR Computational Mathematics and Math. Phys. 20, 53–72 (1980)

    Article  MATH  Google Scholar 

  12. Kimmel, G., Shamir, R.: Maximum likelihood resolution of multi-block genotypes. In: Proceedings of the eighth annual international conference on Computational molecular biology, pp. 2–9. ACM Press, New York (2004)

    Google Scholar 

  13. Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: Snps problems, algorithms and complexity, european symposium on algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  14. Long, J.C., Williams, R.C., Urbanek, M.: An e-m algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human Genetics 56(3), 799–810 (1995)

    Google Scholar 

  15. Michalatos-Beloin, S., Tishkoff, S.A., Bently, K.L., Kidd, K.K., Ruano, G.: Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range pcr. Nucleic Acids Res. 24, 4841–4843 (1996)

    Article  Google Scholar 

  16. NIH. Large-scale genotyping for the haplotype map of the human genome. RFA: HG-02-005 (2002)

    Google Scholar 

  17. Niu, Qin, Xu, Liu: In silico haplotype determination of a vast set of single nucleotide polymorphisms. Technical report, Department of Statistics, Harvard University (2001)

    Google Scholar 

  18. Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., Nguyen, B.T., Norris, M.C., Sheehan, J.B., Shen, N., Stern, D., Stokowski, R.P., Thomas, D.J., Trulson, M.O., Vyas, K.R., Frazer, K.A., Fodor, S.P., Cox, D.R.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294(5547), 1719–1723 (2001)

    Article  Google Scholar 

  19. Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)

    Article  Google Scholar 

  20. Wolkowicz, H., Saigala, R., Vandenberghe, L.: Handbook of semidefinite programming. International Series in Operations Research and Management Science, vol. 27 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Halperin, E., Hazan, E. (2005). HAPLOFREQ – Estimating Haplotype Frequencies Efficiently. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_42

Download citation

  • DOI: https://doi.org/10.1007/11415770_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25866-7

  • Online ISBN: 978-3-540-31950-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics