An Efficient Approach to Merging Paired-End Reads and Incorporation of Uncertainties

Flouri, Tomáš; Zhang, Jiajie; Czech, Lucas; Kobert, Kassian; Stamatakis, Alexandros

doi:10.1007/978-3-319-59826-0_13

An Efficient Approach to Merging Paired-End Reads and Incorporation of Uncertainties

Tomáš Flouri²,
Jiajie Zhang²,
Lucas Czech²,
Kassian Kobert² &
…
Alexandros Stamatakis^2,3

Chapter
First Online: 19 September 2017

1826 Accesses

Abstract

Next-Generation Sequencing (NGS) technologies have reshaped the landscape of life sciences. The massive amount of data generated by NGS is rapidly transforming biological research from traditional wet-lab work into a data- intensive analytical discipline (Koboldt et al., Cell 155(1):27–38, 2013). The Illumina “sequencing by synthesis” technique (Mardis, Annu Rev Genomics Hum Genet 9:387–402, 2008) is one of the most popular and widely used NGS technologies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Koboldt, D.C., Steinberg, K.M., Larson, D.E., Wilson, R.K., Mardis, E.R.: The next-generation sequencing revolution and its impact on genomics. Cell 155(1), 27–38 (2013)
Article Google Scholar
Mardis, E.R.: Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402 (2008)
Article Google Scholar
Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics (Oxford, England) 30(5), 614–620 (2014)
Google Scholar
Masella, A.P., Bartram, A.K., Truszkowski, J.M., Brown, D.G., Neufeld, J.D.: PANDAseq: paired-end assembler for illumina sequences. BMC Bioinf. 13(1), 31 (2012)
Article Google Scholar
Magoč, T., Salzberg, S.L.: FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics (Oxford, England) 27(21), 2957–2963 (2011)
Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C., Mahé, F.: VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016)
Article Google Scholar
Paszkiewicz, K., Studholme, D.J.: De novo assembly of short sequence reads. Brief. Bioinform. 11(5), 457–472 (2010). [Online] Available: http://bib.oxfordjournals.org/content/11/5/457.abstract
Nakamura, K., Oshima, T., Morimoto, T., Ikeda, S., Yoshikawa, H., Shiwa, Y., Ishikawa, S., Linak, M.C., Hirai, A., Takahashi, H., Altaf-Ul-Amin, M., Ogasawara, N., Kanaya, S.: Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39(13), e90 (2011)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Article Google Scholar
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)
Article Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Dokl. Akad. Nauk SSSR 163(4), 845–848 (1965)
MathSciNet MATH Google Scholar
Hamming, R.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)
Article MathSciNet Google Scholar
Rognes, T., Seeberg, E.: Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)
Article Google Scholar
Altschul, S., Gish, W.: Local alignment statistics. Methods Enzymol. 266, 460–480 (1996)
Article Google Scholar
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)
Article Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Quail, M.A., Smith, M., Coupland, P., Otto, T.D., Harris, S.R., Connor, T.R., Bertoni, A., Swerdlow, H.P., Gu, Y.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13(1), 341 (2012)
Google Scholar
Ewing, B., Green, P.: Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8(3), 186–194 (1998)
Article Google Scholar
Edgar, R.C., Flyvbjerg, H.: Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 31(21), 3476 (2015)
Article Google Scholar

Download references

Acknowledgements

T.F is supported by DFG project STA/860-4. L.C, K.K and J.Z are funded by a HITS scholarship.

Author information

Authors and Affiliations

Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Tomáš Flouri, Jiajie Zhang, Lucas Czech, Kassian Kobert & Alexandros Stamatakis
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Alexandros Stamatakis

Authors

Tomáš Flouri
View author publications
You can also search for this author in PubMed Google Scholar
Jiajie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lucas Czech
View author publications
You can also search for this author in PubMed Google Scholar
Kassian Kobert
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Stamatakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomáš Flouri .

Editor information

Editors and Affiliations

LaTICE, Tunis, Tunisia
Mourad Elloumi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Flouri, T., Zhang, J., Czech, L., Kobert, K., Stamatakis, A. (2017). An Efficient Approach to Merging Paired-End Reads and Incorporation of Uncertainties. In: Elloumi, M. (eds) Algorithms for Next-Generation Sequencing Data. Springer, Cham. https://doi.org/10.1007/978-3-319-59826-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-59826-0_13
Published: 19 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59824-6
Online ISBN: 978-3-319-59826-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics