Loading [a11y]/accessibility-menu.js
An interleaved hardware-accelerated k-mer parser | IEEE Conference Publication | IEEE Xplore

An interleaved hardware-accelerated k-mer parser


Abstract:

Advances in next-generation sequencing (NGS) have not only increased the overall throughput of genomic content (e.g. Illumina NovaSeq up to 6, 000GB), but also provided t...Show More

Abstract:

Advances in next-generation sequencing (NGS) have not only increased the overall throughput of genomic content (e.g. Illumina NovaSeq up to 6, 000GB), but also provided technology miniaturization (e.g. Oxford Nanopore MinION) enabling real-time, mobile experiments. Single Instruction/Multiple Data (SIMD) hardware acceleration is increasingly used to improve performance of NGS data processing tools, while generic template programming libraries are advantageous to adapt to the fast changes in sequencing and computing platforms. We here present a novel k-mer parser written in ISO C++ that exploits an interleaved, non-sequential, hardware accelerated SIMD implementation within a generic programming framework called libseq. We benchmarked our k-mer parser using different NGS experimental datasets comparing with other two popular k-mer counting tools (DSK and KMC3). On an Intel machine with AVX2 (Quad-Core Intel Core i5 CPU, 32 GB RAM), using simulated in-memory reads, DSK and KMC3 were on average 3. 6x and 1. 03x times slower than our parser across k value ranges of 35-63. On real sequencing experiments, DSK and KMC3 were on average 8. 3x and 28. 8x times slower in file/read parsing and k-mer building than ours. Since our tool uses generic programming, other methods that rely on k-mers (e.g. de Bruijn graphs) can directly benefit from its SIMD acceleration. Our k-mer parser and libseq 2.0 are released under the BSD license and available at https://zenodo.org/record/7015294.
Date of Conference: 06-08 December 2022
Date Added to IEEE Xplore: 02 January 2023
ISBN Information:
Conference Location: Las Vegas, NV, USA

References

References is not available for this document.