ABSTRACT
The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH working-group. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected [12, 6, 21, 7].In this paper we explore the algorithmic implications of the key (and now realistic) "no-recombination in long blocks" observation, for the problem of inferring haplotypes in populations. We observe that the no-recombination assumption is very powerful. This assumption, along with the standard population-genetic assumption of infinite sites [23, 14] imposes severe combinatorial constraints on the permitted solutions to the haplotype inference problem, leading to an efficient deterministic algorithm to deduce all features of the permitted haplotype solution(s) that can be known with certainty. The technical key is to view haplotype data as disguised information about paths in an unknown tree, and the haplotype deduction problem as a problem of reconstructing the tree from that path information. This formulation allows us to exploit deep theorems and algorithms from graph and matroid theory to efficiently find one permitted solution to the haplotype problem; it gives a simple test to determine if it is the unique solution; if not, we can implicitly represent the set of all permitted solutions so that each can be efficiently created.
- R. E. Bixby and D. K. Wagner. An almost linear-time algorithm for graph realization. Mathematics of Operations Research, 13:99--123, 1988.]]Google ScholarCross Ref
- A. Clark, K. Weiss, and D. Nickerson et. al. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Human Genetics, 63:595--612, 1998.]]Google ScholarCross Ref
- Andrew Clark. Inference of haplotypes from {PCR}-amplified samples of diploid populations. Mol. Biol. Evol, 7:111--122, 1990.]]Google Scholar
- W. H. Cunningham and J. Edmonds. A combinatorial decomposition theory. Can. J. Math., 32:734--765, 1980.]]Google ScholarCross Ref
- M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander. Fine-structure haplotype map of 5q31: implications for gene-based studies and genomic ld mapping. Abstract of talk presented at the American Associate of Human Genetics National meeting, October 14, 2001.]]Google Scholar
- M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229--232, 2001.]]Google ScholarCross Ref
- L. Friss, R. Hudson, A. Bartoszewicz, J. Wall, T. Donfalk, and A. Di Rienzo. Gene conversion and differential population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. of Human Genetics, 69:831--843, 2001.]]Google ScholarCross Ref
- M. Fullerton, A. Clark, Charles Sing, and et. al. Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am. J. of Human Genetics, pages 881--900, 2000.]]Google ScholarCross Ref
- D. Gusfield. Efficient algorithms for inferring evolutionary history. Networks, 21:19--28, 1991.]]Google ScholarCross Ref
- D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.]] Google ScholarDigital Library
- D. Gusfield. Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of computational biology, 8(3), 2001.]]Google Scholar
- L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583--585, 2001.]]Google ScholarCross Ref
- J. E. Hopcroft and R.E. Tarjan. Dividing a graph into triconnected components. SIAM J. on Computing, 2:135--157, 1973.]]Google ScholarCross Ref
- R. Hudson. Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7:1--44, 1990.]]Google Scholar
- L. Jin, P. Underhill, V. Doctor, R. Davis, P. Shen, L. Luca Cavalli-Sforza, and P. Oefner. Distribution of haplotypes from a chromosome 21 region distinguishes multiple prehistoric human migrations. Proc. of the Nat. Academy of Science, 96:3796--3800, 1999.]]Google ScholarCross Ref
- S. Kannan and T. Warnow. Inferring evolutionary history from DNA sequences. SIAM J. on Computing, 23:713--737, 1994.]] Google ScholarDigital Library
- M.K. Kuhner and J. Felsenstein. Sampling among haplotype resolutions in a coalescent-based genealogy sampler. Genetic Epidemiology, 19:S15--S21, 2000.]]Google ScholarCross Ref
- F. McMorris. On the compatibility of binary qualitative taxonomic characters. Bull. Math. Biology, 39:133--138, 1977.]]Google ScholarCross Ref
- S. Orzack, D. Gusfield, and V. Stanton. Experimental and theoretical inferal of haplotypes. In preparation.]]Google Scholar
- I. Pe'er, R. Shamir, and R. Sharan. Incomplete directed perfect phylogeny. In D. Sankoff, editor, Eleventh Annual Symposium on Combinatorial Pattern Matching (CPM'00), pages 143--153, 2000.]] Google ScholarDigital Library
- J. C. Stephens and et. al. Haplotype variation and linkage disequilibrium in 313 human genes. Science, 293:489--493, 2001.]]Google ScholarCross Ref
- M. Stephens, N. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. Am. J. Human Genetics, 68:978--989, 2001.]]Google ScholarCross Ref
- S. Tavare. Calibrating the clock: Using stochastic processes to measure the rate of evolution. In E. Lander and M. Waterman, editors, Calculating the Secretes of Life. National Academy Press, 1995.]]Google Scholar
- W.T. Tutte. An algorithm for determining whether a given binary matroid is graphic. Proc. of Amer. Math. Soc, 11:905--917, 1960.]]Google Scholar
- L. Wang, K. Zhang, and L. Zhang. Perfect phylogenetic networks with recombination. J. of Comp. Biology, 8:69--78, 2001.]]Google ScholarCross Ref
- W. T. Whitney. Congruent graphs and the connectivity of graphs. American Math. J., 54:150--168, 1932.]]Google ScholarCross Ref
- W. T. Whitney. 2-isomorphic graphs. American Math. J., 55:245--254, 1933.]]Google Scholar
Index Terms
- Haplotyping as perfect phylogeny: conceptual framework and efficient solutions
Recommendations
Computational Problems in Perfect Phylogeny Haplotyping: Typing without Calling the Allele
A haplotype is an m-long binary vector. The xor-genotype of two haplotypes is the m-vector of their coordinate-wise xor. We study the following problem: Given a set of xor-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes ...
Phylogeny- and parsimony-based haplotype inference with constraints
Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast computational haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that ...
Xor perfect phylogeny haplotyping in pedigrees
ICIC'10: Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computingHaplotype analysis plays an important role in the association study between genomes and some common diseases. Unfortunately, acquiring haplotype data from biological experiments is usually very time consuming and expensive. Genotype is a different type ...
Comments