Abstract
The BLAST server in the National Center for Biotechnology Information in the USA receives tens of thousands of queries per day on average. However, the service is always the same for every query even though query lengths vary significantly. In fact, the lengths of a large portion of protein sequences are less than 500. On the other hand, the hit detection process consumes the most of the execution time of BLAST and its core architecture is a lookup table. Following the above reasons, we propose a lightweight BLASTP for servicing not-too-long queries, where a hybrid query-index table is proposed accordingly. Each table entry consists of four bytes that can store up to three query positions. Therefore, a sequence word usually requires only one memory fetch to retrieve its hit information. Furthermore, additional dummy entries are embedded into the table and interleaved with original entries. The entries without any hits and dummy entries both can be used to buffer spilled query positions. The above features result in a much smaller lookup table with a higher utilization rate and a lower cache miss ratio. Experimental results show that the lightweight BLASTP outperforms CUDA-BLASTP with speedups ranging from 1.82 to 3.37 based on the first two critical phases.
Similar content being viewed by others
References
Mount David W (2001) Bioinformatics–sequence and genome analysis. CSHL, New York, pp 75–85
Waterman MS (1981) Identification of common molecular subsequence. Mol Biol 147:195–197
Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3):705–708
Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Ye W et al (2017) H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs. Bioinformatics 33(8):1130–1138
Rangwala H et al (2005) Massively parallel BLAST for the Blue Gene/L. In: High Availability and Performance Workshop
Basic local alignment search tool. https://blast.ncbi.nlm.nih.gov/Blast.cgi/
Zhang J et al (2016) muBLASTP: database-indexed protein sequence search on multicore CPUs. BMC Bioinform 17(1):443
Camacho C et al (2009) BLAST + : architecture and applications. BMC Bioinformatics 10(1):421
Oehmen CS, Baxter DJ (2013) ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics 29(6):797–798
Darling AE, Carey L, Feng WC (2003) The design, implementation, and evaluation of mpiBLAST. No. LA-UR-03-2862. Los Alamos National Laboratory
de Castro MR et al (2017) SparkBLAST: scalable BLAST processing using in-memory operations. BMC Bioinform 18(1):318
Matsunaga A, Tsugawa M, Fortes J (2008) Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: IEEE Fourth International Conference on eScience, 2008. eScience’08. IEEE
Zhang J, Wang H, Feng W (2017) cublastp: fine-grained parallelization of protein sequence search on cpu + gpu. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 14(4):830–843
Zhang J et al (2014) cuBLASTP: Fine-grained parallelization of protein sequence search on a GPU. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE
Xiao S, Lin H, Feng W (2011) Accelerating protein sequence search in a heterogeneous computing system. In: 2011 IEEE International Parallel and Distributed Processing Symposium. IEEE
Vouzis PD, Sahinidis NV (2010) GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2):182–188
Liu W, Schmidt B, Muller-Wittig W (2011) CUDA-BLASTP: accelerating BLASTP on CUDA-enabled graphics hardware. IEEE/ACM Trans Comput Biol Bioinform 8(6):1678–1684
Zhao K, Chu X (2014) G-BLASTN: accelerating nucleotide alignment by graphics processors. Bioinformatics 30(10):1384–1391
Wan N et al (2009) A preliminary exploration on parallelized BLAST algorithm using GPU. Comput Eng Sci 31:98–112
FSA-BLAST. http://fsa-blast.sourceforge.net/
Cameron M, Williams HE, Cannane A (2006) A deterministic finite automaton for faster protein hit detection in BLAST. J Comput Biol 13(4):965–978
Glasco D (2012) An analysis of BLASTP implementation on NVIDIA GPUs
Zhang J (2000) Protein-length distributions for the three domains of life. Trends Genet 16(3):107–109
CUDA toolkit documentation. https://docs.nvidia.com/cuda/
CUDA GPUs. https://developer.nvidia.com/cuda-gpus
NCBI Genbank. ftp://ftp.ncbi.nlm.nih.gov/genbank/
Acknowledgements
This work was supported in part by a grant from Ministry of Science and Technology, Taiwan, under the Contract No. MOST106-2221-E-018-010.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, LT., Wei, KC., Wu, CC. et al. A lightweight BLASTP and its implementation on CUDA GPUs. J Supercomput 77, 322–342 (2021). https://doi.org/10.1007/s11227-020-03267-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03267-1