Skip to main content

A Fast and Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4645))

Abstract

Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNPs present in a chromosome (called the haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of ) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. Therefore fast heuristics have been proposed. In this paper we describe a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test our method on real data from the HapMap Project.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bafna, V., Istrail, S., Lancia, G., Rizzi, R.: Polynomial and APX-hard cases of the individual haplotyping problem. Theor. Comput. Sci. 335(1), 109–125 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bonizzoni, P., Della Vedova, G., Dondi, R., Li, J.: The haplotyping problem: an overview of computational models and solutions. J. Comput. Sci. Technol. 18(6), 675–688 (2003)

    Article  MATH  Google Scholar 

  3. Cilibrasi, R., van Iersel, L., Kelk, S., Tromp, J.: On the complexity of several haplotyping problems. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 128–139. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Cilibrasi, R., van Iersel, L., Kelk, S., Tromp, J.: On the complexity of the single individual SNP haplotyping problem. Algorithmica (in print, 2007)

    Google Scholar 

  5. The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (2005), http://snp.cshl.org

    Google Scholar 

  6. Gusfield, D., Orzack, S.H.: Haplotype Inference. In: CRC Handbook on Bioinformatics, pp. 1–25. CRC Press, Boca Raton, USA (2005)

    Google Scholar 

  7. Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity, and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  8. Li, L., Kim, J.H., Waterman, M.S.: Haplotype reconstruction from SNP alignment. In: Proceedings of the seventh annual international conference on Computational molecular biology, pp. 207–216. ACM Press, New York (2003)

    Google Scholar 

  9. Matukumalli, L.K., Grefenstette, J.J., Hyten, D.L., Choi, I.-Y., Cregan, P.B., Van Tassell, C.P.: Application of machine learning in SNP discovery. BMC Bioinformatics 7, 4 (2006)

    Article  Google Scholar 

  10. Myers, G.: A dataset generator for whole genome shotgun sequencing. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pp. 202–210. AAAI Press, Stanford, California, USA (1999)

    Google Scholar 

  11. Panconesi, A., Sozio, M.: Fast hare: A fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)

    Google Scholar 

  12. Sachidanandam, R., et al.: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)

    Article  Google Scholar 

  13. Rizzi, R., Bafna, V., Istrail, S., Lancia, G.: Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 29–43. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Wang, R.-S., Wu, L.-Y., Li, Z.-P., Zhang, X.-S.: Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21(10), 2456–2462 (2005)

    Article  Google Scholar 

  15. Weiner, M.P., Hudson, T.J.: Introduction to SNPs: discovery of markers for disease. Biotechniques Suppl. (2002)

    Google Scholar 

  16. Zhao, Y.-Y., Wu, L.-Y., Zhang, J.-H., Wang, R.-S., Zhang, X.-S.: Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry 29(4), 281–287 (2005)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Raffaele Giancarlo Sridhar Hannenhalli

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Genovese, L.M., Geraci, F., Pellegrini, M. (2007). A Fast and Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74126-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74125-1

  • Online ISBN: 978-3-540-74126-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics