Skip to main content

Accelerating Search of Protein Sequence Databases using CUDA-Enabled GPU

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9049))

Included in the following conference series:

Abstract

Searching databases of protein sequences for those proteins that match patterns represented as profile HMMs is a widely performed bioinformatics task. The standard tool for the task is HMMER version 3 from Sean Eddy. HMMER3 achieved significant improvements in performance over version 2 through the introduction of a heuristic filter called the Multiple Segment Viterbi algorithm (MSV) and the use of native SIMD instruction set on modern CPUs. Our objective was to further improve performance by using a general-purpose graphical processing unit (GPU) and the CUDA software environment from Nvidia.

An execution profile of HMMER3 identifies the MSV filter as a code hotspot that consumes over 75% of the total execution time. We applied a number of well-known optimization strategies for coding GPUs in order to implement a CUDA version of the MSV filter.

The results show that our implementation achieved 1.8x speedup over the single-threaded HMMER3 CPU SSE2 implementation on average. The experiments used a modern Kepler architecture GPU from Nvidia that has 768 cores running at 811 Mhz and an Intel Core i7-3960X 3.3GHz CPU overclocked at 4.6GHz.

For HMMER2 there was a significant speed-up of an order of magnitude obtained by implementations using GPUs. Such gains seem out of reach for HMMER3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)

    Article  Google Scholar 

  2. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)

    Article  Google Scholar 

  3. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)

    Article  Google Scholar 

  4. Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D.: Hidden markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology 235(5), 1501–1531 (1994)

    Article  Google Scholar 

  5. Howard Hughes Medical Institute: HMMER (2014). http://hmmer.janelia.org/

  6. Eddy, S.R.: Accelerated profile HMM searches. PLoS Computational Biology 7(10) (2011)

    Google Scholar 

  7. Wun, B., Buhler, J., Crowley, P.: Exploiting coarse-grained parallelism to accelerate protein motif finding with a network processor. In: IEEE PACT, pp. 173–184. IEEE Computer Society (2005)

    Google Scholar 

  8. Maddimsetty, R.P.: Acceleration of profile-HMM search for protein sequences in reconfigurable hardware. Master thesis, Washington University in St. Louis (2006)

    Google Scholar 

  9. Derrien, S., Quinton, P.: Parallelizing HMMER for hardware acceleration on FPGAs. In: ASAP, pp. 10–17 (2007)

    Google Scholar 

  10. Oliver, T.F., Yeow, L.Y., Schmidt, B.: High performance database searching with HMMer on FPGAs. In: IPDPS, pp. 1–7. IEEE (2007)

    Google Scholar 

  11. Sachdeva, V., Kistler, M., Speight, E., Tzeng, T.H.K.: Exploring the viability of the Cell Broadband Engine for bioinformatics applications. Parallel Computing 34(11), 616–626 (2008)

    Article  Google Scholar 

  12. Walters, J.P., Qudah, B., Chaudhary, V.: Accelerating the HMMER sequence analysis suite using conventional processors. In: [39], pp. 289–294

    Google Scholar 

  13. Landman, J.I., Ray, J., Walters, J.P.: Accelerating HMMer searches on Opteron processors with minimally invasive recoding. In: [39], pp. 628–636

    Google Scholar 

  14. Horn, D.R., Houston, M., Hanrahan, P.: ClawHMMER: A streaming HMMer-search implementation. In: SC, p. 11. IEEE Computer Society (2005)

    Google Scholar 

  15. Walters, J.P., Balu, V., Kompalli, S., Chaudhary, V.: Evaluating the use of GPUs in liver image segmentation and HMMER database searches. In: [40], pp. 1–12

    Google Scholar 

  16. Ganesan, N., Chamberlain, R.D., Buhler, J., Taufer, M.: Accelerating HMMER on GPUs by implementing hybrid data and task parallelism. In: Zhang, A., Borodovsky, M., Özsoyoglu, G., Mikler, A.R. (eds.) BCB, pp. 418–421. ACM (2010)

    Google Scholar 

  17. Du, Z., Yin, Z., Bader, D.A.: A tile-based parallel viterbi algorithm for biological sequence alignment on GPU with CUDA. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8. IEEE (2010)

    Google Scholar 

  18. Quirem, S., Ahmed, F., Lee, B.K.: CUDA acceleration of P7Viterbi algorithm in HMMER 3.0. In: Zhong, S., Dou, D., Wang, Y. (eds.) IPCCC, pp. 1–2. IEEE (2011)

    Google Scholar 

  19. Intel: Intel VTune amplifier XE 2013 (2014). https://software.intel.com/en-us/intel-vtune-amplifier-xe/

  20. Ahmed, F., Quirem, S., Min, G., Lee, B.K.: Hotspot analysis based partial CUDA acceleration of HMMER 3.0 on GPGPUs. International Journal of Soft Computing and Engineering 2(4), 91–95 (2012)

    Google Scholar 

  21. Wilt, N.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Addison-Wesley Professional (2013)

    Google Scholar 

  22. NVIDIA: CUDA C best practices guide (2013). http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html

  23. Liu, Y., Maskell, D., Schmidt, B.: CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Research Notes 2(1) (2009)

    Google Scholar 

  24. Liu, Y., Schmidt, B., Maskell, D.L.: CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Research Notes 3(1) (2010)

    Google Scholar 

  25. Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics 14, 117 (2013)

    Article  Google Scholar 

  26. Manavski, S.A., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics 9(Suppl 2), S10 (2008)

    Article  Google Scholar 

  27. Akoglu, A., Striemer, G.M.: Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA. Cluster Computing 12(3), 341–352 (2009)

    Article  Google Scholar 

  28. Ligowski, L., Rudnicki, W.R.: An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In: [40], pp. 1–8

    Google Scholar 

  29. Kentie, M.: Biological Sequence Alignment Using Graphics Processing Units. Master thesis, Delft University of Technology (2010)

    Google Scholar 

  30. Saeed, A.K., Poole, S., Perot, J.B.: Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors. Journal of Computational Physics 229(11), 4247–4258 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  31. Aji, A.M., Feng, W., Blagojevic, F., Nikolopoulos, D.S.: Cell-SWat: modeling and scheduling wavefront computations on the Cell Broadband Engine. In: Ramírez, A., Bilardi, G., Gschwind, M. (eds.) Conf. Computing Frontiers, pp. 13–22. ACM (2008)

    Google Scholar 

  32. Rognes, T., Seeberg, E.: Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)

    Article  Google Scholar 

  33. Farrar, M.: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)

    Article  Google Scholar 

  34. Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley Professional (2010)

    Google Scholar 

  35. NVIDIA: CUDA C programming guide (2013). http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  36. SIB Bioinformatics Resource Portal: UniProtKB/Swiss-Prot protein knowledgebase release 2014–05 statistics (2014). http://web.expasy.org/docs/relnotes/relstat.html

  37. Wellcome Trust Sanger Institute and Howard Hughes Janelia Farm Research Campus: Pfam database (2013). ftp://ftp.sanger.ac.uk/pub/databases/Pfam/releases/Pfam27.0/Pfam-A.hmm.gz

  38. Universal Protein Resource: UniProt release. Website (2014)

    Google Scholar 

  39. 20th International Conference on Advanced Information Networking and Applications (AINA 2006), Vienna, Austria, 18–20 April. IEEE Computer Society (2006)

    Google Scholar 

  40. 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, 23–29 May. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Greg Butler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Cheng, L., Butler, G. (2015). Accelerating Search of Protein Sequence Databases using CUDA-Enabled GPU. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18120-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18119-6

  • Online ISBN: 978-3-319-18120-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics