Abstract
Multiple sequence alignment (MSA) is one of the most important tasks in biological sequence analysis. This paper will primarily focus on on protein alignments, but most of the discussion and methodology also applies to DNA alignments. A novel hybrid clonal selection algorihm, called an aligner, is presented. It searches for a set of alignments amongst the population of candidate alignments by optimizing the classical weighted sum of pairs objective function. Benchmarks from BaliBASE library (v.1.0 and v.2.0) are used to validate the algorithm. Experimental results of BaliBASE v.1.0 benchmarks show that the proposed algorithm is superior to PRRP, ClustalX, SAGA, DIALIGN, PIMA, MULTIALIGN, and PILEUP8. On BaliBASE v.2.0 benchmarks the algorithm shows interesting results in terms of SP score with respect to established and leading methods, i.e. ClustalW, T-Coffee, MUSCLE, PRALINE, ProbCons, and Spem.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Eidhammer, I., Jonassen, I., Taylor, W.R.: Protein Bioinformatics. Wiley, Chichester (2004)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (2004)
Thompson, J.D., Plewniak, F., Ripp, R., Thierry, J.C., Poch, O.: Towards a Reliable Objective Function for Multiple Sequence Alignments. J. Mol. Biol. 301, 937–951 (2001)
Altschul, S.F., Lipman, D.J.: Trees stars and multiple biological sequence alignment. SIAM Journal on Applied Mathematics 49, 197–209 (1989)
Altschul, S.F., Carroll, R.J., Lipman, D.J.: Weights for data related by a tree. Journal on Molecular Biology 207, 647–653 (1989)
Bonizzoni, P., Della Vedova, G.: The Complexity of Multiple Sequence Alignment with SP-score that is a Metric. Theoretical Computer Science 259(1), 63–79 (2001)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1, 337–348 (1994)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979)
Gupta, S.K., Kececioglu, J.D., Schaffer, A.: Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. Journal of Computational Biology 2, 459–472 (1995)
Corpet, F.: Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research 16, 10881–10890 (1988)
Wisconsin Package v.8; Genetics Computer Group, Madison, WI, http://www.gcg.com
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G.: The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 24, 4876–4882 (1997)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate Multiple Sequence Alignment. Journal Molecular Biology 302, 205–217 (2000)
Zhou, H., Zhou, Y.: SPEM: Improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621 (2005)
Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340 (2005)
Smith, R.F., Smith, T.F.: Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Engineering 5, 35–41 (1992)
Carrillo, H., Lipman, D.J.: The Multiple Sequence Alignment Problem in Biology. SIAM Journal on Applied Mathematics 48, 1073–1082 (1988)
Stoye, J., Moulton, V., Dress, A.W.: DCA: an efficient implementation of the divide-and conquer approach to simultaneous multiple sequence alignment. Bioinformatics 13(6), 625–626 (1997)
Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: Finding local similarities by multiple sequence alignment. Bioinformatics 14, 290–294 (1998)
Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218 (1999)
Gotoh, O.: Further improvement in methods of group-to-group sequence alignment with generalized profile operations. Bioinformatics 10(4), 379–387 (1994)
Eddy, S.R.: Multiple alignment using hidden Markov models. In: 3rd International Conference on Intelligent Systems for Molecular Biology, vol. 3, pp. 114–120 (1995)
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)
Notredame, C., Higgins, D.G.: SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research 24, 1515–1539 (1996)
Notredame, C.: COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)
Simossis, V.A., Heringa, J.: PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Research 33, 289–294 (2005)
Shyu, C., Sheneman, L., Foster, J.A.: Multiple Sequence Alignment with Evolutionary Computation. Genetic Programming and Evolvable Machines 5, 121–144 (2004)
Nguyen, H.D., Yoshihara, I., Yamamori, K., Yasunaga, M.: Aligning Multiple Protein Sequences by Parallel Hybrid Geneti Algorithm. Genome Informatics 13, 123–132 (2002)
Cutello, V., Nicosia, G.: An Immunological Approach to Combinatorial Optimization Problems. In: Garijo, F.J., Riquelme, J.-C., Toro, M. (eds.) IBERAMIA 2002. LNCS, vol. 2527, pp. 361–370. Springer, Heidelberg (2002)
Nicosia, G.: Immune Algorithms for Optimization and Protein Structure Prediction. Ph.D. Dissertation, Department of Mathematics and Computer Science, University of Catania, Italy (2004)
Cutello, V., Narzisi, G., Nicosia, G., Pavone, M.: Clonal selection algorithms: A comparative case study using effective mutation potentials. In: Jacob, C., Pilat, M.L., Bentley, P.J., Timmis, J.I. (eds.) ICARIS 2005. LNCS, vol. 3627, pp. 13–28. Springer, Heidelberg (2005)
Cutello, V., Nicosia, G., Pavone, M., Timmis, J.: An Immune Algorithm for Protein Structure Prediction on Lattice Models. IEEE Transaction on Evolutionary Computation (to appear)
Cutello, V., Nicosia, G., Pavone, M.: Exploring the capability of immune algorithms: A characterization of hypermutation operators. In: Nicosia, G., Cutello, V., Bentley, P.J., Timmis, J. (eds.) ICARIS 2004. LNCS, vol. 3239, pp. 263–276. Springer, Heidelberg (2004)
Taylor, W.R.: A flexible method to align a large number of sequences. J. Mol. Evol. 28, 161–169 (1988)
Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88 (1999)
Bahr, A., Thompson, J.D., Thierry, J.C., Poch, O.: BAliBASE (Benchmark Alignment dataBASE): Enhancements for Repeats, Transmembrane Sequences and Circular Permuations. Nucleic Acids Research 29(1), 232–326 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cutello, V., Lee, D., Nicosia, G., Pavone, M., Prizzi, I. (2006). Aligning Multiple Protein Sequences by Hybrid Clonal Selection Algorithm with Insert-Remove-Gaps and BlockShuffling Operators. In: Bersini, H., Carneiro, J. (eds) Artificial Immune Systems. ICARIS 2006. Lecture Notes in Computer Science, vol 4163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823940_25
Download citation
DOI: https://doi.org/10.1007/11823940_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37749-8
Online ISBN: 978-3-540-37751-1
eBook Packages: Computer ScienceComputer Science (R0)