skip to main content
10.1145/565196.565218acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

Haplotyping as perfect phylogeny: conceptual framework and efficient solutions

Published:18 April 2002Publication History

ABSTRACT

The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH working-group. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected [12, 6, 21, 7].In this paper we explore the algorithmic implications of the key (and now realistic) "no-recombination in long blocks" observation, for the problem of inferring haplotypes in populations. We observe that the no-recombination assumption is very powerful. This assumption, along with the standard population-genetic assumption of infinite sites [23, 14] imposes severe combinatorial constraints on the permitted solutions to the haplotype inference problem, leading to an efficient deterministic algorithm to deduce all features of the permitted haplotype solution(s) that can be known with certainty. The technical key is to view haplotype data as disguised information about paths in an unknown tree, and the haplotype deduction problem as a problem of reconstructing the tree from that path information. This formulation allows us to exploit deep theorems and algorithms from graph and matroid theory to efficiently find one permitted solution to the haplotype problem; it gives a simple test to determine if it is the unique solution; if not, we can implicitly represent the set of all permitted solutions so that each can be efficiently created.

References

  1. R. E. Bixby and D. K. Wagner. An almost linear-time algorithm for graph realization. Mathematics of Operations Research, 13:99--123, 1988.]]Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Clark, K. Weiss, and D. Nickerson et. al. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Human Genetics, 63:595--612, 1998.]]Google ScholarGoogle ScholarCross RefCross Ref
  3. Andrew Clark. Inference of haplotypes from {PCR}-amplified samples of diploid populations. Mol. Biol. Evol, 7:111--122, 1990.]]Google ScholarGoogle Scholar
  4. W. H. Cunningham and J. Edmonds. A combinatorial decomposition theory. Can. J. Math., 32:734--765, 1980.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander. Fine-structure haplotype map of 5q31: implications for gene-based studies and genomic ld mapping. Abstract of talk presented at the American Associate of Human Genetics National meeting, October 14, 2001.]]Google ScholarGoogle Scholar
  6. M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229--232, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. L. Friss, R. Hudson, A. Bartoszewicz, J. Wall, T. Donfalk, and A. Di Rienzo. Gene conversion and differential population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. of Human Genetics, 69:831--843, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Fullerton, A. Clark, Charles Sing, and et. al. Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am. J. of Human Genetics, pages 881--900, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. D. Gusfield. Efficient algorithms for inferring evolutionary history. Networks, 21:19--28, 1991.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gusfield. Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of computational biology, 8(3), 2001.]]Google ScholarGoogle Scholar
  12. L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583--585, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. J. E. Hopcroft and R.E. Tarjan. Dividing a graph into triconnected components. SIAM J. on Computing, 2:135--157, 1973.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. R. Hudson. Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7:1--44, 1990.]]Google ScholarGoogle Scholar
  15. L. Jin, P. Underhill, V. Doctor, R. Davis, P. Shen, L. Luca Cavalli-Sforza, and P. Oefner. Distribution of haplotypes from a chromosome 21 region distinguishes multiple prehistoric human migrations. Proc. of the Nat. Academy of Science, 96:3796--3800, 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Kannan and T. Warnow. Inferring evolutionary history from DNA sequences. SIAM J. on Computing, 23:713--737, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M.K. Kuhner and J. Felsenstein. Sampling among haplotype resolutions in a coalescent-based genealogy sampler. Genetic Epidemiology, 19:S15--S21, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  18. F. McMorris. On the compatibility of binary qualitative taxonomic characters. Bull. Math. Biology, 39:133--138, 1977.]]Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Orzack, D. Gusfield, and V. Stanton. Experimental and theoretical inferal of haplotypes. In preparation.]]Google ScholarGoogle Scholar
  20. I. Pe'er, R. Shamir, and R. Sharan. Incomplete directed perfect phylogeny. In D. Sankoff, editor, Eleventh Annual Symposium on Combinatorial Pattern Matching (CPM'00), pages 143--153, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. C. Stephens and et. al. Haplotype variation and linkage disequilibrium in 313 human genes. Science, 293:489--493, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Stephens, N. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. Am. J. Human Genetics, 68:978--989, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Tavare. Calibrating the clock: Using stochastic processes to measure the rate of evolution. In E. Lander and M. Waterman, editors, Calculating the Secretes of Life. National Academy Press, 1995.]]Google ScholarGoogle Scholar
  24. W.T. Tutte. An algorithm for determining whether a given binary matroid is graphic. Proc. of Amer. Math. Soc, 11:905--917, 1960.]]Google ScholarGoogle Scholar
  25. L. Wang, K. Zhang, and L. Zhang. Perfect phylogenetic networks with recombination. J. of Comp. Biology, 8:69--78, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  26. W. T. Whitney. Congruent graphs and the connectivity of graphs. American Math. J., 54:150--168, 1932.]]Google ScholarGoogle ScholarCross RefCross Ref
  27. W. T. Whitney. 2-isomorphic graphs. American Math. J., 55:245--254, 1933.]]Google ScholarGoogle Scholar

Index Terms

  1. Haplotyping as perfect phylogeny: conceptual framework and efficient solutions

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            RECOMB '02: Proceedings of the sixth annual international conference on Computational biology
            April 2002
            341 pages
            ISBN:1581134983
            DOI:10.1145/565196

            Copyright © 2002 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 April 2002

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            RECOMB '02 Paper Acceptance Rate35of118submissions,30%Overall Acceptance Rate148of538submissions,28%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader