skip to main content
10.1145/1854776.1854802acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

ReFHap: a reliable and fast algorithm for single individual haplotyping

Published: 02 August 2010 Publication History

Abstract

Full human genomic sequences have been published in the latest two years for a growing number of individuals. Most of them are a mixed consensus of the two real haplotypes because it is still very expensive to separate information coming from the two copies of a chromosome. However, latest improvements and new experimental approaches promise to solve these issues and provide enough information to reconstruct the sequences for the two copies of each chromosome through bioinformatics methods such as single individual haplotyping. Full haploid sequences provide a complete understanding of the structure of the human genome, allowing accurate predictions of translation in protein coding regions and increasing power of association studies.
In this paper we present a novel problem formulation for single individual haplotyping. We start by assigning a score to each pair of fragments based on their common allele calls and then we use these score to formulate the problem as the cut of fragments that maximize an objective function, similar to the well known max-cut problem. Our algorithm initially finds the best cut based on a heuristic algorithm for max-cut and then builds haplotypes consistent with that cut. We have compared both accuracy and running time of ReFHap with other heuristic methods on both simulated and real data and found that ReFHap performs significantly faster than previous methods without loss of accuracy.

References

[1]
V. Bansal and V. Bafna. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16):i153--i159, August 2008.
[2]
V. Bansal, A. L. Halpern, N. Axelrod, and V. Bafna. An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research, 18:1336--1346, August 2008.
[3]
D. R. Bentley et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456:53--59, October 2008.
[4]
D. Brinza and A. Zelikovsky. 2SNP: scalable phasing method for trios and unrelated individuals. IEEE/ACM Transaction on Computational Biology and Bioinformatics, 5(2):313--318, April-June 2008.
[5]
C. Burgtorf, P. Kepper, M. R. Hoehe, C. Schmitt, R. Reinhardt, H. Lehrach, and S. Sauer. Clone-based systematic haplotyping (CSH): A procedure for physical haplotyping of whole genomes. Genome Research, 13:2717--2724, September 2003.
[6]
L. M. Genovese, F. Geraci, and M. Pellegrini. SpeedHap: a fast and accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate and low coverage. IEEE/ACM Transaction on Computational Biology and Bioinformatics, 5(4):492--502, October-December 2008.
[7]
A. Gusev, I. I. Măndoiu, and B. Paşaniuc. Highly scalable genotype phasing by entropy minimization. IEEE/ACM Transaction on Computational Biology and Bioinformatics, 5(2):252--261, April-June 2008.
[8]
M. R. Hoehe. Haplotypes and the systematic analysis of genetic variation in genes and genomes. Pharmacogenomics, 4(5):547--570, September 2003.
[9]
M. R. Hoehe, K. Köpke, B. Wendel, K. Rohde, C. Flachmeier, K. K. Kidd, W. H. Berrettini, and G. M. Church. Sequence variability and candidate gene analysis in complex disease: association of μ opioid receptor gene variation with substance dependence. Human Molecular Genetics, 9(19):2895--2908, September 2000.
[10]
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 409:860--921, February 2001.
[11]
J. M. Kidd, Z. Cheng, T. Graves, B. Fulton, R. K. Wilson, and E. E. Eichler. Haplotype sorting using human fosmid clone end-sequence pairs. Genome Research, 18:2016--2023, October 2008.
[12]
S. Levy et al. The diploid genome sequence of an individual human. PLoS Biology, 5(10):e254+, September 2007.
[13]
J. Marchini et al. A comparison of phasing algorithms for trios and unrelated individuals. American Journal of Human Genetics, 78(3):437--450, January 2006.
[14]
K. J. McKernan et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Research, 19:1527--1541, June 2009.
[15]
A. Panconesi and M. Sozio. Fast Hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), 3240:266--277, September 2004.
[16]
R. Rizzi, V. Bafna, S. Istrail, and G. Lancia. Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics, 2452:29--43, September 2002.
[17]
S. Sahni and T. Gonzales. P-complete problems and approximate solutions. In Proceedings of the 15th Annual Symposium on Switching and Automata Theory. IEEE, pages 14--16, October 1974.
[18]
D. J. Schaid. Evaluating associations of haplotypes with traits. Genetic Epidemiology, 27:348--364, November 2004.
[19]
P. Scheet and M. Stephens. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics, 78(4):629--644, February 2006.
[20]
S. C. Schuster et al. Complete khoisan and bantu genomes from southern africa. Nature, 463:943--947, February 2010.
[21]
M. Stephens and P. Donnelly. A comparison of bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73(5):1162--1169, October 2003.
[22]
The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature, 449(18):851--861, September 2007.
[23]
J. Wang, M. Xie, and J. Chen. A practical exact algorithm for the individual haplotyping problem MEC/GI. Algorithmica, 56(3):283--296, March 2010.
[24]
J. Wang et al. The diploid genome sequence of an Asian individual. Nature, 456:60--65, October 2008.
[25]
D. A. Wheeler et al. The complete genome of an individual by massively parallel DNA sequencing. Nature, 452(7189):872--876, March 2008.
[26]
J. Wu, J. Wang, and J. Chen. A parthenogenetic algorithm for single individual SNP haplotyping. Engineering Applications of Artificial Intelligence, 22(3):401--406, April 2009.
[27]
M. Xie, J. Wang, and J. Chen. A model of higher accuracy for the individual haplotyping problem based on weighted SNP fragments and genotype with errors. Bioinformatics, 24(13):i105--i113, July 2008.

Cited By

View all
  • (2023)Recent Advances in Assembly of Complex Plant GenomesGenomics, Proteomics & Bioinformatics10.1016/j.gpb.2023.04.00421:3(427-439)Online publication date: 25-Apr-2023
  • (2022)Deep learning for assembly of haplotypes and viral quasispecies from short and long sequencing readsProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3535508.3545524(1-10)Online publication date: 7-Aug-2022
  • (2022)Interrogating the Human Diplome: Computational Methods, Emerging Applications, and ChallengesHaplotyping10.1007/978-1-0716-2819-5_1(1-30)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
August 2010
705 pages
ISBN:9781450304382
DOI:10.1145/1854776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithms
  2. efficiency
  3. fragments
  4. haplotyping
  5. heuristic
  6. maximum cut
  7. variants

Qualifiers

  • Research-article

Funding Sources

Conference

BCB'10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Recent Advances in Assembly of Complex Plant GenomesGenomics, Proteomics & Bioinformatics10.1016/j.gpb.2023.04.00421:3(427-439)Online publication date: 25-Apr-2023
  • (2022)Deep learning for assembly of haplotypes and viral quasispecies from short and long sequencing readsProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3535508.3545524(1-10)Online publication date: 7-Aug-2022
  • (2022)Interrogating the Human Diplome: Computational Methods, Emerging Applications, and ChallengesHaplotyping10.1007/978-1-0716-2819-5_1(1-30)Online publication date: 7-Nov-2022
  • (2021)SpecHap: a diploid phasing algorithm based on spectral graph theoryNucleic Acids Research10.1093/nar/gkab709Online publication date: 17-Aug-2021
  • (2020)ComHapDet: a spatial community detection algorithm for haplotype assemblyBMC Genomics10.1186/s12864-020-06935-x21:S9Online publication date: 9-Sep-2020
  • (2020)DCHap: A divide-and-conquer haplotype phasing algorithm for third-generation sequencesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2020.3005673(1-1)Online publication date: 2020
  • (2020)An Extension of Heuristic Algorithm for Reconstructing Multiple Haplotypes with Minimum Error Correction2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)10.1109/ETCCE51779.2020.9350874(1-6)Online publication date: 21-Dec-2020
  • (2020)Unzipping haplotypes in diploid and polyploid genomesComputational and Structural Biotechnology Journal10.1016/j.csbj.2019.11.01118(66-72)Online publication date: 2020
  • (2019)GenHap: a novel computational method based on genetic algorithms for haplotype assemblyBMC Bioinformatics10.1186/s12859-019-2691-y20:S4Online publication date: 18-Apr-2019
  • (2019)OUP accepted manuscriptBioinformatics10.1093/bioinformatics/btz329Online publication date: 2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media