Abstract
We introduce an improved version of RECKONER, an error corrector for Illumina whole genome sequencing data. By modifying its workflow we reduce the computation time even 10 times. We also propose a new method of determination of k-mer length, the key parameter of k-spectrum-based family of correctors. The correction algorithms are examined on huge data sets, i.e., human and maize genomes for both Illumina HiSeq and MiSeq instruments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allam, A., Kalnis, P., Solovyev, V.: Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics 31(21), 3421–3428 (2015)
Chikhi, R., Rizk, G.: Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol. 8(1), 22 (2013)
Deorowicz, S., Kokot, M., Grabowski, S., Debudaj-Grabysz, A.: KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 31(10), 1569–1576 (2015)
Długosz, M., Deorowicz, S.: RECKONER: read error corrector based on KMC. Bioinformatics 33(7), 1086–1089 (2017)
Genome 10K Community of Scientists: Genome 10k: a proposal to obtain whole-genome sequence for 10000 vertebrate species. J. Hered. 100(6), 659–674 (2009)
Greenfield, P., Duesing, K., Papanicolaou, A., Bauer, D.C.: Blue: correcting sequencing errors using consensus and context. Bioinformatics 30(19), 2723–2732 (2014)
Heo, Y., Ramachandran, A., Hwu, W.M., Ma, J., Chen, D.: BLESS 2: accurate, memory-efficient and fast error correction method. Bioinformatics 32(15), 2369–2371 (2016)
Holtgrewe, M.: Mason—a read simulator for second generation sequencing data. Technical report, Freie Universität Berlin (2010)
Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2011)
Ilie, L., Molnar, M.: RACER: rapid and accurate correction of errors in reads. Bioinformatics 29(19), 2490–2493 (2013)
Kokot, M., Długosz, M., Deorowicz, S.: KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33(17), 2759–2761 (2017). doi:10.1093/bioinformatics/btx304
Laehnemann, D., Borkhardt, A., McHardy, A.C.: Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Briefings Bioinform. 17(1), 154–179 (2015)
Li, H.: BFC: correcting Illumina sequencing errors. Bioinformatics 31(17), 2885–2887 (2015)
Lim, E.C., Müller, J., Hagmann, J., Henz, S.R., Kim, S.T., Weigel, D.: Trowel: a fast and accurate error correction module for Illumina sequencing reads. Bioinformatics 30(22), 3264–3265 (2014)
Liu, Y., Schröder, J., Schmidt, B.: Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics 29(3), 308–315 (2012)
Marinier, E., Brown, D.G., McConkey, B.J.: Pollux: platform independent error correction of single and mixed genomes. BMC Bioinform. 16(1), 10 (2015)
Quail, M.A., Smith, M., Coupland, P., et al.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13(1), 341 (2012)
Sanger, F., Nicklen, S., Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74(12), 5463–5467 (1977)
Sheikhizadeh, S., de Ridder, D.: ACE: accurate correction of errors using K-mer tries. Bioinformatics 31(19), 3216–3218 (2015)
Siva, N.: 1000 genomes project (2008)
Song, L., Florea, L., Langmead, B.: Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 15(11), 509 (2014)
Technologies, O.N.: MinION (2008). https://nanoporetech.com/products/minion. Accessed 02 2017
Yang, X., Chockalingam, S.P., Aluru, S.: A survey of error-correction methods for next-generation sequencing. Briefings Bioinform. 14(1), 56–66 (2012)
Acknowledgements
The work was supported by the Polish National Science Center upon decision DEC-2015/17/B/ST6/01890.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
To generate reads we placed a real source profile in a directory profile_dir and run read Art with the following commands:
art_profiler_illumina<profile_name><profile_dir>/ fastq
art_illumina -sam -1<profile_name> -l<read_length> -i<genome> -c<read_number> -o<output> -rs 0 -na
We run correctors with the following commands with the parameters specified in Table 2:
reckoner -kmerlength<k> -prefix .<input_file>
bless -read<input_file> -kmerlength<k> -prefix tmp -gzip
bfc -s<genome_size> -t 64<input_file>><output_file>
musket -k<k><kmers> -p 64 -o<output_file><input_file>
Sizes of the correctors input data is presented in Table 3.
We run assembly with Minia (version 2.0.3) with the following command:
minia -in<input_file> -abundance-min 2 -max-memory 204800 -out
<output_con_file>
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Długosz, M., Deorowicz, S., Kokot, M. (2018). Improvements in DNA Reads Correction. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-67792-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-67792-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67791-0
Online ISBN: 978-3-319-67792-7
eBook Packages: EngineeringEngineering (R0)