Abstract
Multiple Sequence Alignment (MSA) is an important problem in Bioinformatics that aims to align more than two sequences in order to emphasize similarity regions. This problem is known to be NP-Hard, so heuristic methods are used to solve it. DIALIGN-TX is an iterative heuristic method for MSA that generates alignments by concatenating ungapped regions with high similarity. Usually, the first phase of MSA algorithms is parallelized by distributing several independent tasks among the nodes. Even though heterogeneous multicore clusters are becoming very common nowadays, very few task allocation policies were proposed for this type of architecture. This paper proposes an MPI/OpenMP master/slave parallel strategy to run DIALIGN-TX in heterogeneous multicore clusters, with several allocation policies. We show that an appropriate choice of the master node has great impact on the overall system performance. Also, the results obtained in a heterogeneous multicore cluster composed of 4 nodes (30 cores), with real sequence sets show that the execution time can be drastically reduced when the appropriate allocation policy is used.







Similar content being viewed by others
References
Brudno M, Steinkamp R, Morgenstern B (2004) The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences. Nucleic Acids Res. 32:41–44. Web Server issue
Chaichoompu K, Kittitornkun S, Tongsima S (2006) MT-clustalW: multithreading multiple sequence alignment. In: IPDPS. IEEE Press, New York
ConsortiumTU (2011) Ongoing and future developments at the universal protein resource. Nucleic Acids Res. 39:214–219. Database issue
Durbin R, Krigh E, Mitcheson G (1998) Biological sequence analysis. Cambridge University Press, Cambridge
Finn D, Mistry J, Tate JG, Coggill C, Heger A, Pollington JE, Gavin L, Gunasekaran P, Ceric G, Forslund K, Holm A, Sonnhammer ELL, Eddy R, Bateman A (2010) The Pfam protein families database. Nucleic Acids Res. 38:211–222. Database issue
Higgins DG, Thompson JD, Gibson TJ (1994) ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix. Nucleic Acids Res. 22:4673–4680
Hummel SF, Schmidt JP, Uma RN, Wein J (1996) Load-sharing in heterogeneous systems via weighted factoring. In: SPAA, pp 318–328
Li KB (2003) ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19:1585–1586
Macedo EA, Melo ACMA, Pfitscher H, Boukerche A (2011) Hybrid MPI/OpenMP strategy for biological multiple sequence alignment with DIALIGN-TX in heterogeneous multicore clusters. In: IPDPS workshops, pp 418–425
Morgenstern B (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15:211–218
Morgenstern B, Dress A, Werner T (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. In: Proceedings of the national academy of science, vol 93, pp 12098–12103
Morgenstern B, Frech K, Dress A, Werner T (1998) DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14:290–294
Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Dover, New York
Polychronopoulos CD, Kuck DJ (1987) Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. 36:1425–1439
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B (2004) DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors. BMC Bioinform. 5:128
Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithm Mol. Biol. 3:6
Tan G, Feng S, Sun N (2005) Parallel multiple sequences alignment in SMP clusters. In: High-performance computing in Asia-Pacific region
Tang P, Yew PC (1986) Processor self-scheduling for multiple-nested parallel loops. In: ICPP conference, pp 528–535
Tzen TH, Ni LM (1993) Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans. Parallel Distrib. Syst. 4:87–98
Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J. Comput. Biol. 4:337–348
Zola J, Yang X, Rospondek A, Aluru S (2007) T-Coffee: a parallel multiple sequence aligner. In: PDCS, pp 248–253
Boukerche A, Correa JM, Melo ACMA, Jacobi RP (2010) A hardware accelerator for the fast retrieval of DIALIGN biological sequence alignments in linear space. IEEE Trans. Comput. 59(6):808–821
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
de Araujo Macedo, E., Magalhaes Alves de Melo, A.C., Pfitscher, G.H. et al. Multiple biological sequence alignment in heterogeneous multicore clusters with user-selectable task allocation policies. J Supercomput 63, 740–756 (2013). https://doi.org/10.1007/s11227-012-0768-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-012-0768-8