Skip to main content

Improved Recombination Lower Bounds for Haplotype Data

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Abstract

Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach toward this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths [1] proposed two measures, R h and R s , that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing recombination lower bounds. Our principal results include:

  • We show that computing the lower bound R h is NP-hard and adapt the greedy algorithm for the set cover problem [2] to obtain a polynomial time algorithm for computing a diversity based bound R g . This algorithm is several orders of magnitude faster than the Recmin program [1] and the bound R g matches the bound R h almost always.

  • We also show that computing the lower bound is also NP-hard using a reduction from MAX-2SAT. We give a O(m 2n) time algorithm for computing R s for a dataset with n haplotypes and m SNP’s. We propose a new bound R I which extends the history based bound R s using the notion of intermediate haplotypes. This bound detects more recombination events than both R h and R s bounds on many real datasets.

  • We extend our algorithms for computing R g and R s to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset [3] than previous bounds and provide stronger evidence for the presence of a recombination hotspot.

  • We apply our lower bounds to a real dataset [4] and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Myers, S., Griffiths, R.: Bounds on the Minimum Number of Recombination Events in a Sample History. Genetics 163, 375–394 (2003)

    Google Scholar 

  2. Johnson, D.: Approximation algorithms for combinatorial problems. Journal of Comput. System Sci. 9, 256–278 (1972)

    Article  Google Scholar 

  3. Nickerson, D., et al.: DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nature 19, 233–240 (1998)

    Google Scholar 

  4. Jeffreys, A.J., Kauppi, L., Neumann, R.: Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics 29, 217–222 (2001)

    Article  Google Scholar 

  5. Gabriel, S.B., et al.: The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002)

    Article  Google Scholar 

  6. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)

    Article  Google Scholar 

  7. Jeffreys, A., Ritchie, A., Neumann, R.: High resolution analysis of haplotype diversity and meiotic crossover in the human tap2 recombination hotspot. Hum. Mol. Genet. 9, 725–733 (2000)

    Article  Google Scholar 

  8. Griffiths, R.C., Marjoram, P.: Ancestral inference from samples of DNA sequences with recombination. Journal of Computational Biology 3, 479–502 (1996)

    Article  Google Scholar 

  9. Fearnhead, P., Donnelly, P.: Estimating recombination rates from population genetic data. Genetics 159, 1299–1318 (2001)

    Google Scholar 

  10. Hudson, R.R.: Two-locus sampling distributions and their applications. Genetics 159, 1805–1817 (2001)

    Google Scholar 

  11. Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003)

    Google Scholar 

  12. The International HapMap Consortium: The international hapmap project. Nature 426, 789–796 (2003)

    Google Scholar 

  13. McVean, G., et al.: The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004)

    Article  Google Scholar 

  14. Crawford, D., et al.: Evidence for substantial fine-scale variation in recombination rates across the human genome. Nature Genetics 36, 700–706 (2004)

    Article  Google Scholar 

  15. Hein, J.: Reconstructing Evolution of sequences subject to recombination using parsimony. Math. Biosci. 98, 185–200 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  16. Hein, J.: A Heuristic Method to Reconstruct the History of Sequences Subject to Recombination. J. Mol. Evol. 20, 402–411 (1993)

    Google Scholar 

  17. Song, Y., Hein, J.: Parsimonious Reconstruction of Sequence Evolution and Haplotype Blocks: Finding the Minimum Number of Recombination Events. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 287–302. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Wang, L., Zhang, K., Zhang, L.: Perfect phylogenetic networks with recombination. Journal of Computational Biology 8, 69–78 (2001)

    Article  Google Scholar 

  19. Gusfield, D., Eddhu, S., Langley, C.: Efficient reconstruction of phylogenetic networks with constrained recombination. In: Proc. of IEEE CSB Conference, pp. 363–374 (2003)

    Google Scholar 

  20. Templeton, A., et al.: Recombinational and mutational hotspots within the human lipoprotein lipase gene. American Journal of Human Genetics 66, 69–83 (2000)

    Article  Google Scholar 

  21. Fearnhead, P., et al.: Application of coalescent methods to reveal fine-scale rate variation and recombination hotspots. Genetics 167, 2067–2081 (2004)

    Article  Google Scholar 

  22. Kreitman, M.: Nucleotide Polymorphism at the Alcohol Dehydrogenase Locus of Drosophila Melanogaster. Nature 304, 412–417 (1983)

    Article  Google Scholar 

  23. SeattleSNPs. NHLBI Program for Genomic Applications, UW-FHCRC, Seattle, WA (2004), http://pga.gs.washington.edu

  24. Hudson, R.R., Kaplan, N.L.: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985)

    Google Scholar 

  25. Song, Y., Hein, J.: On the minimum number of recombination events in the evolutionary history of dna sequences. Journal of Mathematical Biology 48, 160–186 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  26. Bafna, V., Bansal, V.: The number of recombination events in a sample history: Conflict graph and lower bounds. IEEE Trans. on Comp. Biology and Bioinformatics 1, 78–90 (2004)

    Article  Google Scholar 

  27. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman and Company, New York (1979)

    MATH  Google Scholar 

  28. Eskin, E., Halperin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20, 1842–1849 (2003)

    Google Scholar 

  29. Kimmel, G., Shamir, R.: The incomplete perfect phylogeny haplotype problem. In: Second RECOMB Satellite Workshop on Computational Methods for SNPs and Haplotypes (2004)

    Google Scholar 

  30. Clark, A., et al.: Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. American Journal of Human Genetics 63, 595–612 (1998)

    Article  Google Scholar 

  31. Goldstein, D.B.: Islands of linkage disequilibrium. Nature Genetics 29, 109–111 (2001)

    Article  Google Scholar 

  32. Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bafna, V., Bansal, V. (2005). Improved Recombination Lower Bounds for Haplotype Data. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_43

Download citation

  • DOI: https://doi.org/10.1007/11415770_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25866-7

  • Online ISBN: 978-3-540-31950-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics