An Efficient Hybrid Approach to Correcting Errors in Short Reads

Zhao, Zhiheng; Yin, Jianping; Li, Yong; Xiong, Wei; Zhan, Yubin

doi:10.1007/978-3-642-22589-5_19

Zhiheng Zhao²³,
Jianping Yin²³,
Yong Li²³,
Wei Xiong²³ &
…
Yubin Zhan²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6820))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

1084 Accesses
5 Citations

Abstract

High-throughput sequencing technologies produce a large number of short reads that may contain errors. These sequencing errors constitute one of the major problems in analyzing such data. Many algorithms and software tools have been proposed to correct errors in short reads. However, the computational complexity limits their performance. In this paper, we propose a novel and efficient hybrid approach which is based on an alignment-free method combined with multiple alignments. We construct suffix arrays on all short reads to search the correct overlapping regions. For each correct overlapping region, we form multiple alignments for the substrings following the correct overlapping region to identify and correct the erroneous bases. Our approach can correct all types of errors in short reads produced by different sequencing platforms. Experiments show that our approach provides significantly higher accuracy and is comparable or even faster than previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008)
Article Google Scholar
Tammi, M.T., Arner, E., Kindlund, E., Andersson, B.: Correcting errors in shotgun sequences. Nucleic Acids Res. 31, 4663–4672 (2003)
Article Google Scholar
Pevzner, P.A., Tang, H., Waterman, M.S.: A new approach to fragment assembly in DNA sequencing. In: RECOMB 2001, pp. 256–267 (2001)
Google Scholar
Chaisson, M.J., Pevzner, P.A., Tang, H.: Fragment assembly with short reads. Bioinformatics 20, 2067–2074 (2004)
Article Google Scholar
Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res. 19, 336–346 (2009)
Article Google Scholar
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008)
Article Google Scholar
Yang, X., Dorman, K.S., Aluru, S.: Reptile: representative tiling for short read error correction. Bioinformatics 26, 2526–2533 (2010)
Article Google Scholar
Kelley, D., Schatz, M., Salzberg, S.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11(11), R116 (2010)
Google Scholar
Shi, H., Schmidt, B., Liu, W., Muller-Wittig, W.: A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. J. Comput. Biol. 17, 603–615 (2009)
Article MathSciNet Google Scholar
Schroder, J., Schroder, H., Puglisi, S.J., Sinha, R., Schmidt, B.: SHREC: a short-read error correction method. Bioinformatics 25, 2157–2163 (2009)
Article Google Scholar
Salmela, L.: Correction of sequencing errors in a mixed set of reads. Bioinformatics 26(10), 1284–1290 (2010)
Google Scholar
Ilie, L., Fazayeli, F., Ilie, S.: HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics 27(3), 295–302 (2011)
Article Google Scholar
Manber, U., Myers, G.: Suffix arrays: a new method for on-line search. SIAM J. Comput. 22(5), 935–948 (1993)
Article MathSciNet MATH Google Scholar
Simon, J., Puglisi, W.F., Smyth, A.T.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)
Google Scholar
Mori, Y.: Short description of improved two-stage suffix sorting algorithm, http://homepage3.nifty.com/wpage/software/itssort.txt
Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Needleman, S.B.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, 410073, Changsha, China
Zhiheng Zhao, Jianping Yin, Yong Li, Wei Xiong & Yubin Zhan

Authors

Zhiheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Yin
View author publications
You can also search for this author in PubMed Google Scholar
Yong Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yubin Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Artificial Intelligence Research Institute (IIIA) Spanish National Research Council (CSIC), IIIA-CSIC, Campus Universitat Autonoma de Barcelona, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra
Toho Gakuen, 3-1-10, Naka, Kunitachi, 184-0004, Tokyo, Japan
Yasuo Narakawa
School of Computer, National University of Defense Technology, 410073, Changsha, China
Jianping Yin
Department of Network Engineering, National University of Defense Technology, Yanwachi Street 137, 410073, Changsha, Hunan, China
Jun Long

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Z., Yin, J., Li, Y., Xiong, W., Zhan, Y. (2011). An Efficient Hybrid Approach to Correcting Errors in Short Reads. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds) Modeling Decision for Artificial Intelligence. MDAI 2011. Lecture Notes in Computer Science(), vol 6820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22589-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-22589-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22588-8
Online ISBN: 978-3-642-22589-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics