Article

Haplotyping as perfect phylogeny: conceptual framework and efficient solutions

Author:
Dan Gusfield

University of California, Davis, CA

University of California, Davis, CA
View Profile

RECOMB '02: Proceedings of the sixth annual international conference on Computational biologyApril 2002Pages 166–175https://doi.org/10.1145/565196.565218

Published:18 April 2002Publication History

RECOMB '02: Proceedings of the sixth annual international conference on Computational biology

Pages 166–175

ABSTRACT

The next high-priority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH working-group. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected [12, 6, 21, 7].In this paper we explore the algorithmic implications of the key (and now realistic) "no-recombination in long blocks" observation, for the problem of inferring haplotypes in populations. We observe that the no-recombination assumption is very powerful. This assumption, along with the standard population-genetic assumption of infinite sites [23, 14] imposes severe combinatorial constraints on the permitted solutions to the haplotype inference problem, leading to an efficient deterministic algorithm to deduce all features of the permitted haplotype solution(s) that can be known with certainty. The technical key is to view haplotype data as disguised information about paths in an unknown tree, and the haplotype deduction problem as a problem of reconstructing the tree from that path information. This formulation allows us to exploit deep theorems and algorithms from graph and matroid theory to efficiently find one permitted solution to the haplotype problem; it gives a simple test to determine if it is the unique solution; if not, we can implicitly represent the set of all permitted solutions so that each can be efficiently created.

References

R. E. Bixby and D. K. Wagner. An almost linear-time algorithm for graph realization. Mathematics of Operations Research, 13:99--123, 1988.]]Google ScholarCross Ref
A. Clark, K. Weiss, and D. Nickerson et. al. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Human Genetics, 63:595--612, 1998.]]Google ScholarCross Ref
Andrew Clark. Inference of haplotypes from {PCR}-amplified samples of diploid populations. Mol. Biol. Evol, 7:111--122, 1990.]]Google Scholar
W. H. Cunningham and J. Edmonds. A combinatorial decomposition theory. Can. J. Math., 32:734--765, 1980.]]Google ScholarCross Ref
M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander. Fine-structure haplotype map of 5q31: implications for gene-based studies and genomic ld mapping. Abstract of talk presented at the American Associate of Human Genetics National meeting, October 14, 2001.]]Google Scholar
M. Daly, J. Rioux, S. Schaffner, T. Hudson, and E. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229--232, 2001.]]Google ScholarCross Ref
L. Friss, R. Hudson, A. Bartoszewicz, J. Wall, T. Donfalk, and A. Di Rienzo. Gene conversion and differential population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. of Human Genetics, 69:831--843, 2001.]]Google ScholarCross Ref
M. Fullerton, A. Clark, Charles Sing, and et. al. Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am. J. of Human Genetics, pages 881--900, 2000.]]Google ScholarCross Ref
D. Gusfield. Efficient algorithms for inferring evolutionary history. Networks, 21:19--28, 1991.]]Google ScholarCross Ref
D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.]] Google ScholarDigital Library
D. Gusfield. Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of computational biology, 8(3), 2001.]]Google Scholar
L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583--585, 2001.]]Google ScholarCross Ref
J. E. Hopcroft and R.E. Tarjan. Dividing a graph into triconnected components. SIAM J. on Computing, 2:135--157, 1973.]]Google ScholarCross Ref
R. Hudson. Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7:1--44, 1990.]]Google Scholar
L. Jin, P. Underhill, V. Doctor, R. Davis, P. Shen, L. Luca Cavalli-Sforza, and P. Oefner. Distribution of haplotypes from a chromosome 21 region distinguishes multiple prehistoric human migrations. Proc. of the Nat. Academy of Science, 96:3796--3800, 1999.]]Google ScholarCross Ref
S. Kannan and T. Warnow. Inferring evolutionary history from DNA sequences. SIAM J. on Computing, 23:713--737, 1994.]] Google ScholarDigital Library
M.K. Kuhner and J. Felsenstein. Sampling among haplotype resolutions in a coalescent-based genealogy sampler. Genetic Epidemiology, 19:S15--S21, 2000.]]Google ScholarCross Ref
F. McMorris. On the compatibility of binary qualitative taxonomic characters. Bull. Math. Biology, 39:133--138, 1977.]]Google ScholarCross Ref
S. Orzack, D. Gusfield, and V. Stanton. Experimental and theoretical inferal of haplotypes. In preparation.]]Google Scholar
I. Pe'er, R. Shamir, and R. Sharan. Incomplete directed perfect phylogeny. In D. Sankoff, editor, Eleventh Annual Symposium on Combinatorial Pattern Matching (CPM'00), pages 143--153, 2000.]] Google ScholarDigital Library
J. C. Stephens and et. al. Haplotype variation and linkage disequilibrium in 313 human genes. Science, 293:489--493, 2001.]]Google ScholarCross Ref
M. Stephens, N. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. Am. J. Human Genetics, 68:978--989, 2001.]]Google ScholarCross Ref
S. Tavare. Calibrating the clock: Using stochastic processes to measure the rate of evolution. In E. Lander and M. Waterman, editors, Calculating the Secretes of Life. National Academy Press, 1995.]]Google Scholar
W.T. Tutte. An algorithm for determining whether a given binary matroid is graphic. Proc. of Amer. Math. Soc, 11:905--917, 1960.]]Google Scholar
L. Wang, K. Zhang, and L. Zhang. Perfect phylogenetic networks with recombination. J. of Comp. Biology, 8:69--78, 2001.]]Google ScholarCross Ref
W. T. Whitney. Congruent graphs and the connectivity of graphs. American Math. J., 54:150--168, 1932.]]Google ScholarCross Ref
W. T. Whitney. 2-isomorphic graphs. American Math. J., 55:245--254, 1933.]]Google Scholar

Index Terms

Haplotyping as perfect phylogeny: conceptual framework and efficient solutions
1. Applied computing
  1. Life and medical sciences
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Trees

Recommendations

Computational Problems in Perfect Phylogeny Haplotyping: Typing without Calling the Allele

A haplotype is an m-long binary vector. The xor-genotype of two haplotypes is the m-vector of their coordinate-wise xor. We study the following problem: Given a set of xor-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes ...
Read More
Phylogeny- and parsimony-based haplotype inference with constraints

Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast computational haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that ...
Read More
Xor perfect phylogeny haplotyping in pedigrees
ICIC'10: Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing

Haplotype analysis plays an important role in the association study between genomes and some common diseases. Unfortunately, acquiring haplotype data from biological experiments is usually very time consuming and expensive. Genotype is a different type ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RECOMB '02: Proceedings of the sixth annual international conference on Computational biology
April 2002
341 pages
ISBN:1581134983
DOI:10.1145/565196
Editors:
Gene Myers
Celera, USA
,
Sridhar Hannenhalli
Celera, USA
,
David Sankoff
University of Montréal, Canada
,
Sorin Istrail
Celera, USA
,
Pavel Pevzner
University of California at San Diego, USA
,
Michael Waterman
University of California, USA
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph realization
graphic matroid recognition
haplotype inference
perfect phylogeny
Qualifiers
- Article
Conference

Acceptance Rates
RECOMB '02 Paper Acceptance Rate35of118submissions,30%Overall Acceptance Rate148of538submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 110
  Total Citations
  View Citations
- 834
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Haplotyping as perfect phylogeny: conceptual framework and efficient solutions

RECOMB '02: Proceedings of the sixth annual international conference on Computational biology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Computational Problems in Perfect Phylogeny Haplotyping: Typing without Calling the Allele

Phylogeny- and parsimony-based haplotype inference with constraints

Xor perfect phylogeny haplotyping in pedigrees