Improvements in DNA Reads Correction

Długosz, Maciej; Deorowicz, Sebastian; Kokot, Marek

doi:10.1007/978-3-319-67792-7_12

Maciej Długosz¹⁹,
Sebastian Deorowicz¹⁹ &
Marek Kokot¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 659))

Included in the following conference series:

International Conference on Man–Machine Interactions

1189 Accesses
1 Citations

Abstract

We introduce an improved version of RECKONER, an error corrector for Illumina whole genome sequencing data. By modifying its workflow we reduce the computation time even 10 times. We also propose a new method of determination of k-mer length, the key parameter of k-spectrum-based family of correctors. The correction algorithms are examined on huge data sets, i.e., human and maize genomes for both Illumina HiSeq and MiSeq instruments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allam, A., Kalnis, P., Solovyev, V.: Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics 31(21), 3421–3428 (2015)
Article Google Scholar
Chikhi, R., Rizk, G.: Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol. 8(1), 22 (2013)
Article Google Scholar
Deorowicz, S., Kokot, M., Grabowski, S., Debudaj-Grabysz, A.: KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 31(10), 1569–1576 (2015)
Article Google Scholar
Długosz, M., Deorowicz, S.: RECKONER: read error corrector based on KMC. Bioinformatics 33(7), 1086–1089 (2017)
Google Scholar
Genome 10K Community of Scientists: Genome 10k: a proposal to obtain whole-genome sequence for 10000 vertebrate species. J. Hered. 100(6), 659–674 (2009)
Google Scholar
Greenfield, P., Duesing, K., Papanicolaou, A., Bauer, D.C.: Blue: correcting sequencing errors using consensus and context. Bioinformatics 30(19), 2723–2732 (2014)
Article Google Scholar
Heo, Y., Ramachandran, A., Hwu, W.M., Ma, J., Chen, D.: BLESS 2: accurate, memory-efficient and fast error correction method. Bioinformatics 32(15), 2369–2371 (2016)
Article Google Scholar
Holtgrewe, M.: Mason—a read simulator for second generation sequencing data. Technical report, Freie Universität Berlin (2010)
Google Scholar
Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2011)
Article Google Scholar
Ilie, L., Molnar, M.: RACER: rapid and accurate correction of errors in reads. Bioinformatics 29(19), 2490–2493 (2013)
Article Google Scholar
Kokot, M., Długosz, M., Deorowicz, S.: KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33(17), 2759–2761 (2017). doi:10.1093/bioinformatics/btx304
Google Scholar
Laehnemann, D., Borkhardt, A., McHardy, A.C.: Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Briefings Bioinform. 17(1), 154–179 (2015)
Article Google Scholar
Li, H.: BFC: correcting Illumina sequencing errors. Bioinformatics 31(17), 2885–2887 (2015)
Article Google Scholar
Lim, E.C., Müller, J., Hagmann, J., Henz, S.R., Kim, S.T., Weigel, D.: Trowel: a fast and accurate error correction module for Illumina sequencing reads. Bioinformatics 30(22), 3264–3265 (2014)
Article Google Scholar
Liu, Y., Schröder, J., Schmidt, B.: Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics 29(3), 308–315 (2012)
Article Google Scholar
Marinier, E., Brown, D.G., McConkey, B.J.: Pollux: platform independent error correction of single and mixed genomes. BMC Bioinform. 16(1), 10 (2015)
Article Google Scholar
Quail, M.A., Smith, M., Coupland, P., et al.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13(1), 341 (2012)
Article Google Scholar
Sanger, F., Nicklen, S., Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74(12), 5463–5467 (1977)
Article Google Scholar
Sheikhizadeh, S., de Ridder, D.: ACE: accurate correction of errors using K-mer tries. Bioinformatics 31(19), 3216–3218 (2015)
Article Google Scholar
Siva, N.: 1000 genomes project (2008)
Google Scholar
Song, L., Florea, L., Langmead, B.: Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 15(11), 509 (2014)
Article Google Scholar
Technologies, O.N.: MinION (2008). https://nanoporetech.com/products/minion. Accessed 02 2017
Yang, X., Chockalingam, S.P., Aluru, S.: A survey of error-correction methods for next-generation sequencing. Briefings Bioinform. 14(1), 56–66 (2012)
Article Google Scholar

Download references

Acknowledgements

The work was supported by the Polish National Science Center upon decision DEC-2015/17/B/ST6/01890.

Author information

Authors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Maciej Długosz, Sebastian Deorowicz & Marek Kokot

Authors

Maciej Długosz
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Deorowicz
View author publications
You can also search for this author in PubMed Google Scholar
Marek Kokot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Długosz .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Aleksandra Gruca
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Tadeusz Czachórski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Katarzyna Harezlak
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Agnieszka Piotrowska

Appendix

To generate reads we placed a real source profile in a directory profile_dir and run read Art with the following commands:

art_profiler_illumina<profile_name><profile_dir>/ fastq

art_illumina -sam -1<profile_name> -l<read_length> -i<genome> -c<read_number> -o<output> -rs 0 -na

We run correctors with the following commands with the parameters specified in Table 2:

reckoner -kmerlength<k> -prefix .<input_file>

bless -read<input_file> -kmerlength<k> -prefix tmp -gzip

bfc -s<genome_size> -t 64<input_file>><output_file>

musket -k<k><kmers> -p 64 -o<output_file><input_file>

Sizes of the correctors input data is presented in Table 3.

We run assembly with Minia (version 2.0.3) with the following command:

minia -in<input_file> -abundance-min 2 -max-memory 204800 -out

<output_con_file>

Table 2. Correctors versions and parameters

Full size table

Table 3. Data sizes

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Długosz, M., Deorowicz, S., Kokot, M. (2018). Improvements in DNA Reads Correction. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-67792-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-67792-7_12
Published: 20 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67791-0
Online ISBN: 978-3-319-67792-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Improvements in DNA Reads Correction

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation