Abstract
Multiple sequence alignment is an important tool to represent similarities among biological sequences and it allows obtaining relevant information such as evolutionary history, among others. Due to its importance, several methods have been proposed to the problem. However, the inherent complexity of the problem allows only non-exact solutions and further for small length sequences or few sequences. Hence, the scenario of rapid increment of the sequence databases leads to prohibitive runtimes for large-scale sequence datasets. In this work we describe a Multi-GPU approach for the three stages of the Progressive Alignment method which allow to address a large number of lengthy sequence alignments in reasonable time. We compare our results with two popular aligners ClustalW-MPI and Clustal\(\varOmega \) and with CUDA NW module of the Rodinia Suite. Our proposal with 8 GPUs achieved speedups ranging from 28.5 to 282.6 with regard to ClustalW-MPI with 32 CPUs considering NCBI and synthetic datasets. When compared to Clustal\(\varOmega \) with 32 CPUs for NCBI and synthetic datasets we had speedups between 3.3 and 32. In comparison with CUDA NW_Rodinia the speedups range from 155 to 830 considering all scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30(14), 3059–3066 (2002)
Larkin, M.A., et al.: Clustal W and Clustal X version 2.0. Bioinformatics 23(21), 2947–2948 (2007)
Lassmann, T.: Kalign 3: multiple sequence alignment of large datasets. Bioinformatics 36(6), 1928–1929 (2020)
Zhang, C., Zheng, W., Mortuza, S.M., Li, Y., Zhang, Y.: DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36(7), 2105–2112 (2020)
Bonizzoni, P., Della Vedova, G.: The complexity of multiple sequence alignment with SP-score that is a metric. Theoret. Comput. Sci. 259(1), 63–79 (2001)
Thompson, J.D., Linard, B., Lecompte, O., Poch, O.: A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PloS One 6, e18093 (2011)
Li, K.-B.: ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12), 1585–1586 (2003)
Sievers, F., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)
Alawneh, L., Shehab, M.A., Al-Ayyoub, M., Jararweh, Y., Al-Sharif, A.Z.: A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU-GPU approach. Cluster Comput. 23, 2677–2688 (2020)
Araujo, E., Stefanes, M.A., Ferlete, V.O., Rozante, L.C.S.: Multiple sequence alignment using hybrid parallel computing. In: 17th IEEE International Conference on Bioinformatics and Bioengineering, pp. 175–180 (2017)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)
Truong, H., Li, D., Sajjapongse, K., Conant, G., Becchi, M.: Large-scale pairwise alignments on GPU clusters: Exploring the implementation space. J. Sig. Process. Syst. 77(1–2), 131–149 (2014)
Myers, E.W., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. CABIOS 4(1), 11–17 (1988)
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009)
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins Struct. Funct. Bioinf. 61(1), 127–136 (2005)
Hogeweg, P., Hesper, B.: The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20(2), 175–186 (1984)
Feng, D.-F., Doolittle, R.F.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25(4), 351–360 (1987)
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press (1999)
Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Elsevier (2012)
Liu, Y., Schmidt, B., Maskell, D.L.: MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA. In: 20th IEEE ASAP, pp. 121–128 (2009)
Zdobnov, E.M., et al.: OrthoDB in 2020: evolutionary and functional annotations of orthologs. Nucleic Acids Res. 49, D389–D393 (2021)
Acknowledgments
We thank the High Performance Computing Center (NPAD/UFRN) and CTEI/UFMS for providing computational resources, and grants #2018/18560-6, #2018/21934-5, São Paulo Research Foundation (FAPESP) for financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de O. Siqueira, R.A., Stefanes, M.A., Rozante, L.C.S., Martins-Jr, D.C., de Souza, J.E.S., Araujo, E. (2021). Multi-GPU Approach for Large-Scale Multiple Sequence Alignment. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12949. Springer, Cham. https://doi.org/10.1007/978-3-030-86653-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-86653-2_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86652-5
Online ISBN: 978-3-030-86653-2
eBook Packages: Computer ScienceComputer Science (R0)