Abstract
Current DNA sequencing technologies do not read an entire chromosome from end to end but instead produce sets of short reads, i.e. fragments of the genome. Haplotype assembly is the problem of assigning each read to the correct chromosome in the set of chromosomes in a homologous group, with the aid of the reference sequence. In this paper, we extend an existing exact algorithm for haplotype assembly of diploid species (Patterson et al., 2014) to the reference-free, polyploid case. A reference-free method does not exploit a reference genomic sequence of a species and thus we cannot exploit a known linear order for the reads and resulting variant positions. Therefore we obtain an unordered variant composition as a result. This setting can be also applied to the study of relative abundances of related bacterial strains.
References
Aguiar, D., Istrail, S.: Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29(13), i352–i360 (2013). http://bioinformatics.oxfordjournals.org/content/29/13/i352.abstract
Astrovskaya, I., et al.: Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinform. 12(Suppl. 6), S1 (2011)
Bayzid, S., et al.: HMEC: a heuristic algorithm for individual haplotyping with minimum error correction. ISRN Bioinform. 2013, 10 (2013)
Berger, E., et al.: Haptree: a novel bayesian framework for single individual polyplotyping using NGS data. PLOS Comput. Biol. 10(3), e1003502 (2014)
Chen, Z., Deng, F., Wang, L.: Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 29(16), 1938–1945 (2013)
Cilibrasi, R., van Iersel, L., Kelk, S., Tromp, J.: On the complexity of several haplotyping problems. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 128–139. Springer, Heidelberg (2005). http://dx.doi.org/10.1007/11557067_11
Correll, D.S.: The Potato and Its Wild Relatives. Texas Research Foundation, Renner (1962)
Das, S., Vikalo, H.: SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics 16(1), 260 (2015). http://dx.org/10.1186/s12864-015-1408-5
Deng, F., Cui, W., Wang, L.: A highly accurate heuristic algorithm for the haplotype assembly problem. BMC Genomics 14(Suppl. 2), S2 (2013)
He, D., et al.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183–i190 (2010). http://bioinformatics.oxfordjournals.org/content/26/12/i183.abstract
Junttila, E.: Patterns in permuted binary matrices. Ph.D. thesis, University of Helsinki (2011)
Kuleshov, V.: Probabilistic single-individual haplotyping. Bioinformatics 30(17), i379–i385 (2014). http://bioinformatics.oxfordjournals.org/content/30/17/i379.abstract
Lin, S., et al.: Haplotype inference in random population samples. Am. J. Hum. Genet. 71(5), 1129–1137 (2002)
Lippert, R., et al.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in Bioinform. 3(1), 23–31 (2002). http://bib.oxfordjournals.org/content/3/1/23.abstract
Mäkinen, V., et al.: Interval scheduling maximizing minimum coverage. CoRR abs/1508.07820 (2015). http://arxiv.org/abs/1508.07820
Neigenfind, J., et al.: Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics 9, 356 (2008)
Patterson, M., et al.: Whatshap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22(6), 498–509 (2015)
Rautiainen, M.: Identification of variant compositions in related strains without reference. Master’s thesis, University of Helsinki (2016)
Stephens, J.C., et al.: Haplotype variation and linkage disequilibrium in 313 human genes. Science 293(5529), 489–493 (2001). http://www.sciencemag.org/content/293/5529/489.abstract
Su, S.Y., et al.: Inference of haplotypic phase and missing genotypes in polyploid organisms and variably copy number genomic regions. BMC Bioinform. 9, 513 (2008)
Tewhey, R., et al.: The importance of phase information for human genetics. Nat. Rev. Genet. 12, 215–223 (2011)
Uricaru, R., et al.: Reference-free detection of isolated SNPs. Nucleic Acids Res. 43(2), e11 (2014)
Acknowledgements
This work was supported in part by the Academy of Finland (grants 267591 to L.S. and 284598 (CoECGR)).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Rautiainen, M., Salmela, L., Mäkinen, V. (2016). Identification of Variant Compositions in Related Strains Without Reference. In: Botón-Fernández, M., MartÃn-Vide, C., Santander-Jiménez, S., Vega-RodrÃguez, M.A. (eds) Algorithms for Computational Biology. AlCoB 2016. Lecture Notes in Computer Science(), vol 9702. Springer, Cham. https://doi.org/10.1007/978-3-319-38827-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-38827-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38826-7
Online ISBN: 978-3-319-38827-4
eBook Packages: Computer ScienceComputer Science (R0)