Abstract
The multiple sequence alignment (MSA) problem has become relevant to several areas in bioinformatics from finding sequences family, detecting structural homologies of protein/DNA sequences, determining functions of protein/DNA sequences to predict patients diseases by comparing DNAs of patients in disease discovery, etc. The MSA is a NP-hard problem. In this paper, two new methods based on a cultural algorithm, namely the method of musical composition, for the solution of the MSA problem are introduced. The performance of the first and second versions were evaluated and analyzed on 26 and 12 different benchmark alignments, respectively. Test instances were taken from BAliBASE 3.0. Alignment accuracies are computed using the QSCORE program, which is a quality scoring program that compares two multiple sequence alignments. Numerical results on the tackled instances indicate that the performance levels of the proposed versions of the MMC are promising. In particular, the experimental results show that the second version found the best alignment reported in the specialized literature in 25 \(\%\) of the tested instances. Besides, for 50 \(\%\) of the tested instances, the second version achieved the second best alignment. Finally, the significance of the numerical results were analyzed according to the Wilcoxon rank-sum test, which indicated that the second proposed version is statistically similar to some state-of-the-art techniques for the MSA problem.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Altschul SF, Erickson BW (1986) Optimal sequence alignment using affine gap costs. Bull Math Biol 48(5–6):603–616
Birattari M (2009) Tuning metaheuristics: a machine learning perspective. Studies in computational intelligence, vol 197. Springer, Berlin
Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res 29(1):323–326
Baewicz J, Formanowicz P, Wojciechowski P (2009) Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark. Int J Appl Math Comput Sci 19(4):675–678
Blazewicz J, Frohmberg W, Kierzynka M, Wojciechowski P (2013) G-MSAA GPU-based, fast and accurate algorithm for multiple sequence alignment. J Parallel Distrib Comput 73(1):32–41
Corpet F (1988) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 16(22):10881–10890
Daugelait J, O’ Driscoll A, Sleator R (2013) An overview of multiple sequence alignments and cloud computing in bioinformatics. ISRN Biomath 2013:Article ID 615630
Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S (2005) PROBCONS: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340
Duret L, Abdeddaim S (2000) Multiple alignments for structural, functional or phylogenetic analyses of homologous sequences, Bioinformatics Sequence structure and databanks. Oxford University Press, Oxford
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Edgar RC, Serafim B (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373
Higgins DG, Sharp PM (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73:237–244
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):305966. doi:10.1093/nar/gkf436
Kelil A (2011) Contribution \(\grave{a}\) l’analyse des séquences de protéines similarité, clustering et alignement. PhD thesis. Université de Sherbrooke Faculté des sciences
Krogh A, Brown M, Mian IS, Sjölander K, Haussler D (1994) Hidden Markov models in computational biology: applications to protein modeling. J Mol Biol 235:1501–1531
Lassmann T, Sonnhammer ELL (2005) Kalignan accurate and fast multiple sequence alignment algorithm. BMC Bioinf 6:298
Lee ZL, Su SF, Chuang CC, Liu KH (2008) Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 8(1):55–78. ISSN 1568–4946
Lee C, Grasso C, Sharlow MF (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18(3):452–464
Löytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci USA 102(30):10557–10562
Manthey B (2005) Non-approximability of weighted multiple sequence alignment for arbitrary metrics. Inf Process Lett 95(3):389–395
Mayers A, Monga E, Wang S (2014) ALIGNER detecting and aligning related protein sequences. ProspectUS. 16 February 2010. Check in 25 May 2014. Website http://prospectus.usherbrooke.ca/aligner/Results/BALIBASE3.htm
Mora-Gutiérrez RA, Ramírez-Rodríguez J, ElizondoO-Cortes M (2011) Heurística para solucionar el problema de alineamiento múltiple de secuencias. Rev Mat [online] 18(1):121–136
Mora-Gutiérrez RA, Ramírez-Rodríguez J, Rincón-García EA (2012) An optimization algorithm inspired by musical composition. Artif Intell Rev 41(3):301–315
Mora-Gutiérrez RA, Ramírez-Rodríguez J, Rincón-García, Ponsich A, Herrera O (2012) An optimization algorithm inspired by social creativity systems. Computing 94(11):887–914
Morgenstern B, Frech K, Dress A, Werner T (1998) DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14:290–294
Notredame C, Higgins D, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217
Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131–144
Nuin PAS, Wang ZZ, Elisabeth RM (2006) The accuracy of several multiple sequence alignment programs for proteins. Bioinformatics 7:471–489
Prakash Lingam KM, Chandrakala S (2011) A survey on recent developments in multiple sequence alignment methods. J Nat Sci Biol Med 2:96–97
Pei J, Sadreyev R, Grishin NV (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19(3):427–428
Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res 34(16):4364–4374
Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22(22):2715–2721
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B (2005) Dialign-t: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinf 6:66. doi:10.1186/1471-2105-6-66
Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6
Schwartz AS, Pachter L (2007) Multiple alignment by sequence annealing. Bioinformatics 23(2):e24–e29
Sze S-H, Lu Y, Yang Q (2006) A polynomial time solvable formulation of multiple sequence alignment. J Comput Biol 13:309–319 [Also appear in Proceedings of the 9th annual international conference on research in computational molecular biology (RECOMB’2005). Lecture notes in bioinformatics, vol 3500, pp 204–216]
Thompson J, Higgins D, Gibson T (1994) ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4690
Thompson JD, Ripp O (2014) BAliBASE 3 website of the LBGI Bioinformatique et Génomique Intégratives. Web 15 April 2014. http://lbgi.fr/balibase/
Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins Struct Funct Bioinf 61(1):127–136
Wisconsin Package v. 8, Genetics Computer Group, Madison, WI. http://www.gcg.com. Accessed 7 Aug 2014
Wojciechowski P, Formanowicz P, Blazewicz J (2014) Reference alignment based methods for quality evaluation of multiple sequence alignment—a survey. Curr Bioinf 9(1):44–56
Van Walle I, Lasters I, Wyns L (2004) Align-m—a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20(9):1428–1435
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mora-Gutiérrez, R.A., Lárraga-Ramírez, M.E., Rincón-García, E.A. et al. Adaptation of the method of musical composition for solving the multiple sequence alignment problem. Computing 97, 813–842 (2015). https://doi.org/10.1007/s00607-014-0436-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-014-0436-3