Abstract
The search of a multiple sequence alignment (MSA) is a well-known problem in bioinformatics that consists in finding a sequence alignment of three or more biological sequences. In this paper, we propose a parallel iterative algorithm for the global alignment of multiple biological sequences. In this algorithm, a number of processes work independently at the same time searching for the best MSA of a set of sequences. It uses a Longest Common Subsequence (LCS) technique in order to generate a first MSA. An iterative process improves the MSA by applying a number of operators that have been implemented to produce more accurate alignments. Simulations were made using sequences from the UniProKB protein database. A preliminary performance analysis and comparison with several common methods for MSA shows promising results. The implementation was developed on a cluster platform through the use of the standard Message Passing Interface (MPI) library.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Molecular Biology-Elsevier 215(3), 403–410 (1990)
Anbarasu, L., Narayanasamy, P., Sundararajan, V.: Multiple molecular sequence alignment by island parallel genetic algorithm. Current Science 78(7), 858–863 (2000)
Bilu, Y., Agarwal, P., Kilodny, R.: Faster algorithms for optimal multiple sequence alignment based on pairwise comparisons. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(4), 408–422 (2006)
Chengpeng, B.: DNA motif alignment by evolving a population of Markov chains. BMC Bioinformatics 10(1), S13 (2009)
Edgar, R.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)
Galperin, M., Cochrane, G.: The 2011 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Research 39, D1–D6 (2011)
Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as a assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Biochemistry 89, 10915–10919 (1992)
Jones, N., Pevzner, P.A.: An introduction to bioinformatics algorithms. MIT Press (1996)
Kim, J., Pramanik, S., Chung, M.: Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci. 10(4), 419–426 (1994)
Kleinjung, J., Douglas, N., Heringa, J.: Parallelized multiple alignment. Bioinformatics Applications Note 18(9), 1270–1271 (2002)
Lassmann, T., Frings, O., Sonnhammer, E.: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleid Acids Research 37(3), 858–865 (2009)
Li, K.: Clustalw-mpi: Clustalw analysis using distributed and parallel computing. Bioinformatics Applications Note 19(12), 1585–1586 (2003)
Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985)
Lu, Y., Sze, S.: Improvig accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues. Nucleic Acids Research 37(2), 463–472 (2009)
Luscombe, N., Greenbaum, D., Gerstein, M.: What is bioinformatics? a proposed definition and overview of the field. Method Inf. Med. 40(4), 346–358 (2001)
Moretti, S., Armougom, F., Wallace, I., Higgins, D., Jongeneel, C., Notredame, C.: The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. Nucleic Acids Research 35, Web Server Issue, W645–W648 (2007)
Mount, D.: Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press (2004)
National Center for Biotechnology Information: Fasta format, http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Notredame, C., Higgins, D.: Saga: sequence alignment by genetic algorithm. Nucleic Acids Research 24(8), 1515–1524 (1996)
Notredame, C., Higgins, D., Heringa, J.: T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)
Shu, N., Elofsson, A.: KalignP: Improved multiple sequence alignments using position specific gap penalties in kalign2. Bioinformatics Applications Note 27(12), 1702–1703 (2011)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Thompson, J., Higgins, D., Gibson, T.: Clustal w: improving the sensitivy of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680 (1994)
Wagner, R., Fischer, M.: The string-to-string correction problem. ACM 21(1), 168–173 (1974)
Wallace, I., O’Sullivan, O., Higgins, D., Notredame, C.: M-coffee: combining multiple sequence alignment methods with t-coffee. Nucleic Acids Research 34(6), 1692–1699 (2006)
Wang, Y., Li, K.: An adaptative and iterative algorithm for refining multiple sequence alignment. Computational Biology and Chemistry 28, 141–148 (2004)
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning dna sequences. Journal of Computational Biology 7(1/2), 203–214 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andalon-Garcia, I.R., Chavoya, A., Meda-Campaña, M.E. (2012). A Parallel Algorithm for Multiple Biological Sequence Alignment. In: Lones, M.A., Smith, S.L., Teichmann, S., Naef, F., Walker, J.A., Trefzer, M.A. (eds) Information Processign in Cells and Tissues. IPCAT 2012. Lecture Notes in Computer Science, vol 7223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28792-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-28792-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28791-6
Online ISBN: 978-3-642-28792-3
eBook Packages: Computer ScienceComputer Science (R0)