Skip to main content

An Efficient Hybrid Approach to Correcting Errors in Short Reads

  • Conference paper
Modeling Decision for Artificial Intelligence (MDAI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6820))

Abstract

High-throughput sequencing technologies produce a large number of short reads that may contain errors. These sequencing errors constitute one of the major problems in analyzing such data. Many algorithms and software tools have been proposed to correct errors in short reads. However, the computational complexity limits their performance. In this paper, we propose a novel and efficient hybrid approach which is based on an alignment-free method combined with multiple alignments. We construct suffix arrays on all short reads to search the correct overlapping regions. For each correct overlapping region, we form multiple alignments for the substrings following the correct overlapping region to identify and correct the erroneous bases. Our approach can correct all types of errors in short reads produced by different sequencing platforms. Experiments show that our approach provides significantly higher accuracy and is comparable or even faster than previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008)

    Article  Google Scholar 

  2. Tammi, M.T., Arner, E., Kindlund, E., Andersson, B.: Correcting errors in shotgun sequences. Nucleic Acids Res. 31, 4663–4672 (2003)

    Article  Google Scholar 

  3. Pevzner, P.A., Tang, H., Waterman, M.S.: A new approach to fragment assembly in DNA sequencing. In: RECOMB 2001, pp. 256–267 (2001)

    Google Scholar 

  4. Chaisson, M.J., Pevzner, P.A., Tang, H.: Fragment assembly with short reads. Bioinformatics 20, 2067–2074 (2004)

    Article  Google Scholar 

  5. Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res. 19, 336–346 (2009)

    Article  Google Scholar 

  6. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008)

    Article  Google Scholar 

  7. Yang, X., Dorman, K.S., Aluru, S.: Reptile: representative tiling for short read error correction. Bioinformatics 26, 2526–2533 (2010)

    Article  Google Scholar 

  8. Kelley, D., Schatz, M., Salzberg, S.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11(11), R116 (2010)

    Google Scholar 

  9. Shi, H., Schmidt, B., Liu, W., Muller-Wittig, W.: A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. J. Comput. Biol. 17, 603–615 (2009)

    Article  MathSciNet  Google Scholar 

  10. Schroder, J., Schroder, H., Puglisi, S.J., Sinha, R., Schmidt, B.: SHREC: a short-read error correction method. Bioinformatics 25, 2157–2163 (2009)

    Article  Google Scholar 

  11. Salmela, L.: Correction of sequencing errors in a mixed set of reads. Bioinformatics 26(10), 1284–1290 (2010)

    Google Scholar 

  12. Ilie, L., Fazayeli, F., Ilie, S.: HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics 27(3), 295–302 (2011)

    Article  Google Scholar 

  13. Manber, U., Myers, G.: Suffix arrays: a new method for on-line search. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  14. Simon, J., Puglisi, W.F., Smyth, A.T.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)

    Google Scholar 

  15. Mori, Y.: Short description of improved two-stage suffix sorting algorithm, http://homepage3.nifty.com/wpage/software/itssort.txt

  16. Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  17. Needleman, S.B.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, Z., Yin, J., Li, Y., Xiong, W., Zhan, Y. (2011). An Efficient Hybrid Approach to Correcting Errors in Short Reads. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds) Modeling Decision for Artificial Intelligence. MDAI 2011. Lecture Notes in Computer Science(), vol 6820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22589-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22589-5_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22588-8

  • Online ISBN: 978-3-642-22589-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics