Skip to main content

Improvements in DNA Reads Correction

  • Conference paper
  • First Online:
Man-Machine Interactions 5 (ICMMI 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 659))

Included in the following conference series:

Abstract

We introduce an improved version of RECKONER, an error corrector for Illumina whole genome sequencing data. By modifying its workflow we reduce the computation time even 10 times. We also propose a new method of determination of k-mer length, the key parameter of k-spectrum-based family of correctors. The correction algorithms are examined on huge data sets, i.e., human and maize genomes for both Illumina HiSeq and MiSeq instruments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Allam, A., Kalnis, P., Solovyev, V.: Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics 31(21), 3421–3428 (2015)

    Article  Google Scholar 

  2. Chikhi, R., Rizk, G.: Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol. 8(1), 22 (2013)

    Article  Google Scholar 

  3. Deorowicz, S., Kokot, M., Grabowski, S., Debudaj-Grabysz, A.: KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 31(10), 1569–1576 (2015)

    Article  Google Scholar 

  4. Długosz, M., Deorowicz, S.: RECKONER: read error corrector based on KMC. Bioinformatics 33(7), 1086–1089 (2017)

    Google Scholar 

  5. Genome 10K Community of Scientists: Genome 10k: a proposal to obtain whole-genome sequence for 10000 vertebrate species. J. Hered. 100(6), 659–674 (2009)

    Google Scholar 

  6. Greenfield, P., Duesing, K., Papanicolaou, A., Bauer, D.C.: Blue: correcting sequencing errors using consensus and context. Bioinformatics 30(19), 2723–2732 (2014)

    Article  Google Scholar 

  7. Heo, Y., Ramachandran, A., Hwu, W.M., Ma, J., Chen, D.: BLESS 2: accurate, memory-efficient and fast error correction method. Bioinformatics 32(15), 2369–2371 (2016)

    Article  Google Scholar 

  8. Holtgrewe, M.: Mason—a read simulator for second generation sequencing data. Technical report, Freie Universität Berlin (2010)

    Google Scholar 

  9. Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2011)

    Article  Google Scholar 

  10. Ilie, L., Molnar, M.: RACER: rapid and accurate correction of errors in reads. Bioinformatics 29(19), 2490–2493 (2013)

    Article  Google Scholar 

  11. Kokot, M., Długosz, M., Deorowicz, S.: KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33(17), 2759–2761 (2017). doi:10.1093/bioinformatics/btx304

    Google Scholar 

  12. Laehnemann, D., Borkhardt, A., McHardy, A.C.: Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Briefings Bioinform. 17(1), 154–179 (2015)

    Article  Google Scholar 

  13. Li, H.: BFC: correcting Illumina sequencing errors. Bioinformatics 31(17), 2885–2887 (2015)

    Article  Google Scholar 

  14. Lim, E.C., Müller, J., Hagmann, J., Henz, S.R., Kim, S.T., Weigel, D.: Trowel: a fast and accurate error correction module for Illumina sequencing reads. Bioinformatics 30(22), 3264–3265 (2014)

    Article  Google Scholar 

  15. Liu, Y., Schröder, J., Schmidt, B.: Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics 29(3), 308–315 (2012)

    Article  Google Scholar 

  16. Marinier, E., Brown, D.G., McConkey, B.J.: Pollux: platform independent error correction of single and mixed genomes. BMC Bioinform. 16(1), 10 (2015)

    Article  Google Scholar 

  17. Quail, M.A., Smith, M., Coupland, P., et al.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13(1), 341 (2012)

    Article  Google Scholar 

  18. Sanger, F., Nicklen, S., Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74(12), 5463–5467 (1977)

    Article  Google Scholar 

  19. Sheikhizadeh, S., de Ridder, D.: ACE: accurate correction of errors using K-mer tries. Bioinformatics 31(19), 3216–3218 (2015)

    Article  Google Scholar 

  20. Siva, N.: 1000 genomes project (2008)

    Google Scholar 

  21. Song, L., Florea, L., Langmead, B.: Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 15(11), 509 (2014)

    Article  Google Scholar 

  22. Technologies, O.N.: MinION (2008). https://nanoporetech.com/products/minion. Accessed 02 2017

  23. Yang, X., Chockalingam, S.P., Aluru, S.: A survey of error-correction methods for next-generation sequencing. Briefings Bioinform. 14(1), 56–66 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by the Polish National Science Center upon decision DEC-2015/17/B/ST6/01890.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Długosz .

Editor information

Editors and Affiliations

Appendix

Appendix

To generate reads we placed a real source profile in a directory profile_dir and run read Art with the following commands:

art_profiler_illumina<profile_name><profile_dir>/ fastq

art_illumina -sam -1<profile_name> -l<read_length> -i<genome> -c<read_number> -o<output> -rs 0 -na

We run correctors with the following commands with the parameters specified in Table 2:

reckoner -kmerlength<k> -prefix .<input_file>

bless -read<input_file> -kmerlength<k> -prefix tmp -gzip

bfc -s<genome_size> -t 64<input_file>><output_file>

musket -k<k><kmers> -p 64 -o<output_file><input_file>

Sizes of the correctors input data is presented in Table 3.

We run assembly with Minia (version 2.0.3) with the following command:

minia -in<input_file> -abundance-min 2 -max-memory 204800 -out

<output_con_file>

Table 2. Correctors versions and parameters
Table 3. Data sizes

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Długosz, M., Deorowicz, S., Kokot, M. (2018). Improvements in DNA Reads Correction. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-67792-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67792-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67791-0

  • Online ISBN: 978-3-319-67792-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics