Skip to main content

Multi-GPU Approach for Large-Scale Multiple Sequence Alignment

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2021 (ICCSA 2021)

Abstract

Multiple sequence alignment is an important tool to represent similarities among biological sequences and it allows obtaining relevant information such as evolutionary history, among others. Due to its importance, several methods have been proposed to the problem. However, the inherent complexity of the problem allows only non-exact solutions and further for small length sequences or few sequences. Hence, the scenario of rapid increment of the sequence databases leads to prohibitive runtimes for large-scale sequence datasets. In this work we describe a Multi-GPU approach for the three stages of the Progressive Alignment method which allow to address a large number of lengthy sequence alignments in reasonable time. We compare our results with two popular aligners ClustalW-MPI and Clustal\(\varOmega \) and with CUDA NW module of the Rodinia Suite. Our proposal with 8 GPUs achieved speedups ranging from 28.5 to 282.6 with regard to ClustalW-MPI with 32 CPUs considering NCBI and synthetic datasets. When compared to Clustal\(\varOmega \) with 32 CPUs for NCBI and synthetic datasets we had speedups between 3.3 and 32. In comparison with CUDA NW_Rodinia the speedups range from 155 to 830 considering all scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Google Scholar 

  2. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Google Scholar 

  3. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Google Scholar 

  4. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30(14), 3059–3066 (2002)

    Article  Google Scholar 

  5. Larkin, M.A., et al.: Clustal W and Clustal X version 2.0. Bioinformatics 23(21), 2947–2948 (2007)

    Google Scholar 

  6. Lassmann, T.: Kalign 3: multiple sequence alignment of large datasets. Bioinformatics 36(6), 1928–1929 (2020)

    Google Scholar 

  7. Zhang, C., Zheng, W., Mortuza, S.M., Li, Y., Zhang, Y.: DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36(7), 2105–2112 (2020)

    Article  Google Scholar 

  8. Bonizzoni, P., Della Vedova, G.: The complexity of multiple sequence alignment with SP-score that is a metric. Theoret. Comput. Sci. 259(1), 63–79 (2001)

    Google Scholar 

  9. Thompson, J.D., Linard, B., Lecompte, O., Poch, O.: A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PloS One 6, e18093 (2011)

    Google Scholar 

  10. Li, K.-B.: ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12), 1585–1586 (2003)

    Article  Google Scholar 

  11. Sievers, F., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)

    Article  Google Scholar 

  12. Alawneh, L., Shehab, M.A., Al-Ayyoub, M., Jararweh, Y., Al-Sharif, A.Z.: A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU-GPU approach. Cluster Comput. 23, 2677–2688 (2020)

    Article  Google Scholar 

  13. Araujo, E., Stefanes, M.A., Ferlete, V.O., Rozante, L.C.S.: Multiple sequence alignment using hybrid parallel computing. In: 17th IEEE International Conference on Bioinformatics and Bioengineering, pp. 175–180 (2017)

    Google Scholar 

  14. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)

    Google Scholar 

  15. Truong, H., Li, D., Sajjapongse, K., Conant, G., Becchi, M.: Large-scale pairwise alignments on GPU clusters: Exploring the implementation space. J. Sig. Process. Syst. 77(1–2), 131–149 (2014)

    Article  Google Scholar 

  16. Myers, E.W., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. CABIOS 4(1), 11–17 (1988)

    Google Scholar 

  17. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009)

    Google Scholar 

  18. Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins Struct. Funct. Bioinf. 61(1), 127–136 (2005)

    Google Scholar 

  19. Hogeweg, P., Hesper, B.: The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20(2), 175–186 (1984)

    Article  Google Scholar 

  20. Feng, D.-F., Doolittle, R.F.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25(4), 351–360 (1987)

    Google Scholar 

  21. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press (1999)

    Google Scholar 

  22. Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Elsevier (2012)

    Google Scholar 

  23. Liu, Y., Schmidt, B., Maskell, D.L.: MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA. In: 20th IEEE ASAP, pp. 121–128 (2009)

    Google Scholar 

  24. Zdobnov, E.M., et al.: OrthoDB in 2020: evolutionary and functional annotations of orthologs. Nucleic Acids Res. 49, D389–D393 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

We thank the High Performance Computing Center (NPAD/UFRN) and CTEI/UFMS for providing computational resources, and grants #2018/18560-6, #2018/21934-5, São Paulo Research Foundation (FAPESP) for financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eloi Araujo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de O. Siqueira, R.A., Stefanes, M.A., Rozante, L.C.S., Martins-Jr, D.C., de Souza, J.E.S., Araujo, E. (2021). Multi-GPU Approach for Large-Scale Multiple Sequence Alignment. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12949. Springer, Cham. https://doi.org/10.1007/978-3-030-86653-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86653-2_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86652-5

  • Online ISBN: 978-3-030-86653-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics