Skip to main content
Log in

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Bioinformatics is an interdisciplinary field that applies trending techniques in information technology, mathematics, and statistics in studying large biological data. Bioinformatics involves several computational techniques such as sequence and structural alignment, data mining, macromolecular geometry, prediction of protein structure and gene finding. Protein structure and sequence analysis are vital to the understanding of cellular processes. Understanding cellular processes contributes to the development of drugs for metabolic pathways. Protein sequence alignment is concerned with identifying the similarities and the relationships among different protein structures. In this paper, we target two well-known protein sequence alignment algorithms, the Needleman–Wunsch and the Smith–Waterman algorithms. These two algorithms are computationally expensive which hinders their applicability for large data sets. Thus, we propose a hybrid parallel approach that combines the capabilities of multi-core CPUs and the power of contemporary GPUs, and significantly speeds up the execution of the target algorithms. The validity of our approach is tested on real protein sequences. Moreover, the scalability of the approach is verified on randomly generated sequences with predefined similarity levels. The results showed that the proposed hybrid approach was up to 242 times faster than the sequential approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Fortes, J., Matsunaga, A., Tsugawa, M.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: 2008 IEEE Fourth International Conference on eScience, pp. 222–229 (2008)

  2. Pinkel, D., Albertson, D.G.: Array comparative genomic hybridization and its applications in cancer. Nat. Genet. 37, S11–S17 (2005)

    Article  Google Scholar 

  3. Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)

    Article  Google Scholar 

  4. Hirschberg, J., Manning, C.D.: Advances in natural language processing. Science 349(6245), 261–266 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  5. Enright, A.J., Ouzounis, C.A.: Generage: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16(5), 451–457 (2000)

    Article  Google Scholar 

  6. Rognes, T.: Faster Smith–Waterman database searches with inter-sequence simd parallelisation. BMC Bioinform. 12(1), 221 (2011)

    Article  Google Scholar 

  7. Al-Ayyoub, M., Qussai, Y., Shehab, M., Jararweh, Y., Albalas, F.: Accelerating clustering algorithms using GPUs. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC-2016), vol. 1 (2016)

  8. Shehab, M., Al-Ayyoub, M., Jararweh, Y., Jarrah, M.: Accelerating compute-intensive image segmentation algorithms using GPUs. J. Supercomput. 73, 1929–1951 (2016)

    Article  Google Scholar 

  9. Alandoli, M., Shehab, M., Al-Ayyoub, M., Jararweh, Y., Al-Smadi, M.: Using GPUs to speed-up fcm-based community detection in social networks. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT), pp. 1–6 (2016)

  10. Hains, D., Cashero, Z., Ottenberg, M., Bohm, W., Rajopadhye, S., Improving cudasw++, a parallelization of Smith–Waterman for cuda enabled devices, in Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), : IEEE International Symposium on. IEEE 2011, 490–501 (2011)

  11. Khajeh-Saeed, A., Poole, S., Perot, J.B.: Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors. J. Comput. Phys. 229(11), 4247–4258 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  12. Liu, Y., Schmidt, B., Maskell, D.L.: Msa-cuda: multiple sequence alignment on graphics processing units with cuda. In: 2009 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 121–128 (2009)

  13. Shehab, M.A., Al-Ayyoub, M., Jararweh, Y.: Improving fcm and t2fcm algorithms performance using GPUs for medical images segmentation. In: 2015 6th International Conference on Information and Communication Systems (ICICS). IEEE, pp. 130–135 (2015)

  14. Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes, Oxford (2012)

    Google Scholar 

  15. Eklund, A., Dufort, P., Forsberg, D., LaConte, S.M.: Medical image processing on the GPU-past, present and future. Med Image Anal 17(8), 1073–1094 (2013)

    Article  Google Scholar 

  16. Shehab, M.A., Ghadawi, A.A., Alawneh, L., Al-Ayyoub, M., Jararweh, Y.: A hybrid CPU–GPU implementation to accelerate multiple pairwise protein sequence alignment. In: 2017 8th International Conference on Information and Communication Systems (ICICS), pp. 12–17 (2017)

  17. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  Google Scholar 

  18. Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)

    Article  Google Scholar 

  19. Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: Lagan and multi-lagan: efficient tools for large-scale multiple alignment of genomic dna. Genome Res. 13(4), 721–31 (2003)

    Article  Google Scholar 

  20. Wilton, R., Budavari, T., Langmead, B., Wheelan, S.J., Salzberg, S., Szalay, A.: Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space. bioRxiv (2014)

  21. Hung, C.-L., Lin, Y.-S., Lin, C.-Y., Chung, Y.-C., Chung, Y.-F.: CUDA ClustalW: an efficient parallel algorithm for progressive multiple sequence alignment on multi-GPUs. Comput. Biol. Chem. 58, 62–68 (2015)

    Article  Google Scholar 

  22. Frohmberg, W., Kierzynka, M., Blazewicz, J., Gawron, P., Wojciechowski, P.: G-dna-a highly efficient multi-GPU/mpi tool for aligning nucleotide reads. Bull. Pol. Acad. Sci. 61(4), 989–992 (2013)

    Google Scholar 

  23. Orobitg, M., Cores, F., Guirado, F., Kemena, C., Notredame, C., Ripoll, A.: Enhancing the scalability of consistency-based progressive multiple sequences alignment applications. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 71–82 (2012)

  24. Lin, C.Y., Lin, Y.S.: Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units. Int. J. Comput. Sci. Eng. 9(1–2), 11–20 (2014)

    Google Scholar 

  25. Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)

    Article  Google Scholar 

  26. NCBI: Blast. https://blast.ncbi.nlm.nih.gov/Blast.cgi (2017)

  27. Ye, W., Chen, Y., Zhang, Y., Xu, Y.: H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs. Bioinformatics 33(8), 1130–1138 (2017)

    Google Scholar 

  28. Zhu, X., Li, K., Salah, A., Shi, L., Li, K.: Parallel implementation of MAFFT on cuda-enabled graphics hardware. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(1), 205–218 (2015)

    Article  Google Scholar 

  29. Katoh, K., Toh, H.: Recent developments in the mafft multiple sequence alignment program. Brief. Bioinform. 92, 86–98 (2008)

    Google Scholar 

  30. Liu, W., Schmidt, B., Voss, G., Müller-Wittig, W., GPU-clustalw: using graphics hardware to accelerate multiple sequence alignment. In: Proceedings of the 13th International Conference on High Performance Computing, Ser. HiPC’06. Springer, pp. 363–374 (2006)

  31. Sandes, E.F., Miranda, G., Martorell, X., Ayguade, E., Teodoro, G., Melo, A.C.: Cudalign 4.0: incremental speculative traceback for exact chromosome-wide alignment in GPU clusters. IEEE Trans. Parallel Distrib. Syst. 27(10), 2838–2850 (2016)

    Article  Google Scholar 

  32. de Oliveira Sandes, E.F., de Melo, A.C.M.A.: Cudalign: using GPU to accelerate the comparison of megabase genomic sequences. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2010, Bangalore, January 9–14, 2010, pp. 137–146 (2010)

  33. Zou, H., Huihui, S., Yu, C., Fu, H., Li, Y., Tang, W.: Asw: accelerating Smith–Waterman algorithm on coupled CPU–GPU architecture. Int. J. Parallel Program. 47, 388–402 (2018)

    Article  Google Scholar 

  34. Liu, Y., Schmidt, B.: Gswabe: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short dna sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)

    Article  Google Scholar 

  35. Chaudhary, A., Kagathara, D., Patel, V.: A GPU based implementation of Needleman–Wunsch algorithm using skewing transformation. In: Eighth International Conference on Contemporary Computing, IC3 2015, Noida, India, August 20–22, 2015, pp. 498–502 (2015)

  36. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)

    Article  Google Scholar 

  37. Needleman, S.B., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  38. Nvidia: Nvidias next generation cuda compute architecture: Kepler gk110. Technical Reports (2012)

  39. Jones, S.: Introduction to dynamic parallelism. In: GPU Technology Conference Presentation S, vol. 338 (2012)

  40. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular biology of the cell (garland science, new york, 2002 (1997)

  41. Sinden, R.R.: DNA Structure and Function. Elsevier, Amsterdam (2012)

    Google Scholar 

  42. Intel: Intel® hyper-threading technology on the intel® xeontm processor family for servers. White Paper, vol. 6, no. 1 (2002)

  43. Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., Su, E.: Intel® openmp c++/fortran compiler for hyper-threading technology: implementation and performance. Intel Technol. J. 6, 1 (2002)

    Google Scholar 

  44. NVIDIA: Nvidias next generation cuda compute architecture. White Paper, vol. 6, no. 1 (2017)

  45. Microway: (2017) In-depth comparison of nvidia tesla “kepler” GPU accelerators. https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-tesla-kepler-gpu-accelerators/

  46. NVIDIA: Cuda. http://www.nvidia.com/object/cuda_home_new.html (2017)

  47. PDB: Protein data bank. http://www.rcsb.org/pdb/home/home.do#Category-download (2015)

  48. Cheng, J., Grossman, M., McKercher, T.: Professional Cuda C Programming. Wiley, Hoboken (2014)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Deanship of Research at Jordan University of Science and Technology for funding this work (Grant Number 20170396).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luay Alawneh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alawneh, L., Shehab, M.A., Al-Ayyoub, M. et al. A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach. Cluster Comput 23, 2677–2688 (2020). https://doi.org/10.1007/s10586-019-03035-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-03035-8

Keywords

Navigation