skip to main content
10.1145/1854776.1854798acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

SplittingHeirs: inferring haplotypes by optimizing resultant dense graphs

Published: 02 August 2010 Publication History

Abstract

Phasing genotype data to identify the composite haplotype pairs is a widely-studied problem due to its value for understanding genetic contributions to diseases, population genetics research, and other significant endeavors. The accuracy of the phasing is crucial as identification of haplotypes is frequently the first step of expensive and vitally important studies. We present a combinatorial approach to this problem which we call SplittingHeirs. This approach is biologically motivated as it is based on three widely accepted principles: there tend to be relatively few unique haplotypes within a population, there tend to be clusters of haplotypes that are similar to each other, and some haplotypes are relatively common. We have tested SplittingHeirs, along with several popular existing phasing methods including PHASE, HAP, EM, and Pure Parsimony, on seven sets of haplotype data for which the true phase is known. Our method yields the highest accuracy obtainable by these methods in all cases. Furthermore, SplittingHeirs is robust and had higher accuracy than any of the other approaches for the two datasets with high recombination rates. The success of SplittingHeirs validates the assumptions made by the dense graph model and highlights the benefits of finding globally optimal solutions.

References

[1]
A. M. Andrés, A. G. Clark, E. Boerwinkle, C. F. Sing, and J. E. Hixson. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epi., 31:659--671, 2007.
[2]
M. R. Barnes. Navigating the HapMap. Briefings in Bioinformatics, 7:211--224, 2006.
[3]
D. G. Brown and I. M. Harrower. Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(2):141--154, April--June 2006.
[4]
M. Cargill, D. Altshuler, J. Ireland, P. Sklar, K. Ardlie, N. Patil, N. Shaw, C. R. Lane, E. P. Lim, N. Kalyanaraman, J. Nemesh, L. Ziaugra, L. Friedland, A. Rolfe, J. Warrington, R. Lipshutz, G. Q. Daley, and E. S. Lander. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet., 22(3):231--238, July 1999.
[5]
A. G. Clark. Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7:111--122, 1990.
[6]
S. Climer, G. Jäger, A. R. Templeton, and W. Zhang. How frugal is Mother Nature with haplotypes? Bioinformatics, 25(1):68--74, 2009.
[7]
S. Climer and W. Zhang. Cut-and-solve: An iterative search strategy for combinatorial optimization problems. Artificial Intelligence, 170:714--738, June 2006.
[8]
F. S. Collins, M. S. Guyer, and A. Chakravarti. Variations on a theme: Cataloging human DNA sequence variation. Science, 278(5343):1580--1581, November 1997.
[9]
D. N. Cooper. Human Gene Evolution. BIOS Scientific Publishers, Oxford, 1999.
[10]
C. M. Drysdale, D. W. McGraw, C. B. Stack, J. C. Stephens, R. S. Judson, K. Nandabalan, K. Arnold, G. Ruano, and S. B. Liggett. Complex promoter and coding region b 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proceedings of the National Academy of Science, 97:10483--10488, September 2000.
[11]
E. Eskin, E. Halperin, and R. M. Karp. Large scale reconstruction of haplotypes from genotype data. In The Seventh Annual International Conference on Computational Biology, pages 104--113, 2003.
[12]
L. Excoffier and M. Slatkin. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol, 12(5):921--927, September 1995.
[13]
D. Fernández-Baca and J. Lagergren. A polynomial-time algorithm for near-perfect phylogeny. SIAM Journal of Computing, 32(5):1115--1127, 2003.
[14]
S. B. Gabriel, S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. Defelice, A. Lochner, M. Faggart, S. N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E. S. Lander, M. J. Daly, and D. Altshuler. The structure of haplotype blocks in the human genome. Science, 296:2225--2229, 2002.
[15]
D. Gusfield. Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. In Research in Computational Molecular Biology (RECOMB '02), pages 166--175, 2002.
[16]
D. Gusfield. Haplotype inference by pure parsimony. In 14th Annual Symposium on Combinatorial Pattern Matching (CPM'03), pages 144--155, 2003.
[17]
D. Gusfield and S. H. Orzack. Haplotype inference. In S. Aluru, editor, Handbook on Bioinformatics. CRC, 2005.
[18]
E. Halperin and E. Eskin. Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics, 20:1842--1849, 2004.
[19]
M. K. Halushka, J. B. Fan, K. Bentley, L. Hsie, N. Shen, A. Weder, R. Cooper, R. Lipshutz, and A. Chakravarti. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genetics, 22:239--247, 1999.
[20]
D. A. Hinds, L. L. Stuve, G. B. Nilsen, E. Halperin, E. Eskin, D. G. Ballinger, K. A. Frazer, and D. R. Cox. Whole-genome patterns of common DNA variation in three human populations. Science, 307:1072--1079, February 2005.
[21]
G. Kimmel and R. Shamir. GERBIL: Genotype resolution and block identification using likelihood. Proceedings of National Academy of Science USA, 102:158--162, 2005.
[22]
M. Kimura. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61:893--903, 1969.
[23]
L. Kruglyak and D. A. Nickerson. Variation is the spice of life. Nature Genetics, 27:234--236, 2001.
[24]
W. H. Li and L. A. Sadler. Low nucleotide diversity in man. Genetics, 129:513--523, 1991.
[25]
S. Lin, D. J. Cutler, M. E. Zwick, and A. Chakravarti. Haplotype inference in random population samples. The American Journal of Human Genetics, 71:1129--1137, 2002.
[26]
J. Marchini, D. Cutler, N. Patterson, M. Stephens, E. Eskin, E. Halperin, S. Lin, Z. Qin, H. Munro, G. Abecasis, P. Donnelly, and I. H. C. (2006). A comparison of phasing algorithms for trios and unrelated individuals. Amercan Journal of Human Genetics, 78:437--450, 2006.
[27]
T. J. Maxwell, K. E. Hyma, L. C. Shimmin, E. Boerwinkle, J. E. Hixson, and A. R. Templeton. The impact of nonrandom mutation, recombination, and gene conversion on shaping haplotype variation in the KLK region of human chromosome 19 and its implications for association studies. To appear.
[28]
T. Niu, Z. Qin, X. Xu, and J. Liu. Bayesian haplotype inference for multiple linked single nucleotide polymorphisms. The American Journal of Human Genetics, 70:157--169, 2002.
[29]
S. H. Orzack, D. Gusfield, J. Olson, S. Nesbitt, L. Subrahmanyan, and V. P. S. Jr. Analysis and exploration of the use of rule-based algorithms and consensus methods for the inferral of haplotypes. Genetics, 165:915--928, October 2003.
[30]
Z. S. Qin, T. Niu, and J. S. Liu. Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am J Hum Genet, 71:1242--1247, November 2002.
[31]
R. Redon, S. Ishikawa, K. Fitch, L. Feuk, G. H. Perry, T. D. Andrews, H. Fiegler, M. H. Shapero, A. Carson, W. Chen, E. K. Cho, S. Dallaire, J. F. J. Gonzalez, M. Gratacos, J. Huang, D. Kalaitzopoulos, D. Komura, J. MacDonald, C. Marshall, R. Mei, L. Montgomery, K. Nishimura, K. Okamura, F. Shen, M. Somerville, J. Tchinda, A. Valsesia, C. Woodwark, F. Yang, J. Zhang, T. Zerjal, J. Zhang, L. Armengol, D. Conrad, X. Estivill, C. Tyler-Smith, N. Carter, H. Aburatani, C. Lee, K. J. KW, S. Scherer, and M. H. ME. Global variation in copy number in the human genome. Nature, 444(7118):444--454, November 2006.
[32]
M. J. Rieder, S. L. Taylor, A. G. Clark, and D. A. Nickerson. Sequence variation in the human angiotensin converting enzyme. Nature Genetics, 22:59--62, 1999.
[33]
P. Scheet and M. Stephens. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. The American Journal of Human Genetics, 78:629--644, 2006.
[34]
Y. S. Song, Y. Wu, and D. Gusfield. Algorithms for imperfect phylogeny hapltoyping (IPPH) with a single homoplasy or recombination event. Workshop on Algorithms in Bioinformatics 2005. Lecture Notes in Computer Science, 3692:152--164, 2005.
[35]
M. Stephens and P. Donnelly. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. The American Journal of Human Genetics, 73:1162--1169, 2003.
[36]
M. Stephens, N. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. The American Journal of Human Genetics, 68:978--989, 2001.
[37]
A. R. Templeton. Haplotype trees and modern human origins. Yearbook of Physical Anthropology, 48:33--59, 2005.
[38]
A. R. Templeton and N. J. Georgiadis. A landscape approach to conservation genetics: conserving evolutionary processes in the african bovidae. In J. C. Avise and J. L. Hamrick, editors, Conservation Genetics: Case Histories From Nature, pages 398--430. Chapman & Hall, New York, 1996.
[39]
A. R. Templeton, T. Maxwell, D. Posada, J. H. Stengard, E. Boerwinkle, and C. F. Sing. Tree scanning: a method for using haplotype trees in genotype/phenotype association studies. Genetics, 169:441--453, 2005.
[40]
A. R. Templeton, C. F. Sing, A. Kessling, and S. Humphries. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. II. The analysis of natural populations. Genetics, 120:1145--1154, 1988.
[41]
The Celera Genomics Sequencing Team. The sequence of the human genome. Science, 291:1304--1351, February 2001.
[42]
The International HapMap Consortium. A haplotype map of the human genome. Nature, 437:1299--1320, 2005.
[43]
The International HapMap Consortium. A second generation human haplotype map of over 3.1 million snps. Nature, 449:851--861, 2007.
[44]
The International Human Genome Mapping Consortium. A physical map of the human genome. Nature, 409:934--941, February 2001.
[45]
D. G. Wang, J. B. Fan, C. J. Siao, A. Berno, P. Young, R. Sapolsky, G. Ghandour, N. Perkins, E. Winchester, J. Spencer, L. Kruglyak, L. Stein, L. Hsie, T. Topaloglou, E. Hubbell, E. Robinson, M. Mittmann, M. S. Morris, N. Shen, D. Kilburn, J. Rioux, C. Nusbaum, S. Rozen, T. J. Hudson, R. Lipshutz, M. Chee, and E. S. Lander. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science, 280(5366):1077--1082, May 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
August 2010
705 pages
ISBN:9781450304382
DOI:10.1145/1854776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

BCB'10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media