SeedHit: A GPU Friendly Pre-Align Filtering Algorithm | IEEE Journals & Magazine | IEEE Xplore

SeedHit: A GPU Friendly Pre-Align Filtering Algorithm


Abstract:

The amount of genetic data generated by Next Generation Sequencing (NGS) technologies grows faster than Moore's law. This necessitates the development of efficient NGS da...Show More

Abstract:

The amount of genetic data generated by Next Generation Sequencing (NGS) technologies grows faster than Moore's law. This necessitates the development of efficient NGS data processing and analysis algorithms. A filter before the computationally-costly analysis step can significantly reduce the run time of the NGS data analysis. As GPUs are orders of magnitude more powerful than CPUs, this paper proposes a GPU-friendly pre-align filtering algorithm named SeedHit for the fast processing of NGS data. Inspired by BLAST, SeedHit counts seed hits between two sequences to determine their similarity. In SeedHit, a nucleic acid in a gene sequence is presented in binary format. By packaging data and generating a lookup table that fits into the L1 cache, SeedHit is GPU-friendly and high-throughput. Using three 16 s rRNA datasets from Greengenes as input SeedHit can reject 84%–89% dissimilar sequence pairs on average when the similarity is 0.9–0.99. The throughput of SeedHit achieved 1 T/s (Tera base per second) on 3080 Ti. Compared with the other two GPU-based filtering algorithms, GateKeeper and SneakySnake, SeedHit has the highest rejection rate and throughput. By incorporating SeedHit into our in-house clustering algorithm nGIA, the modified nGIA achieved a 1.6–2.1 times speedup compared to the original version.
Page(s): 1794 - 1802
Date of Publication: 21 June 2024

ISSN Information:

PubMed ID: 38905083

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.