Abstract
Multiple sequence alignment (MSA) is a basic step in many bioinformatics analyses, and also a NP-hard problem. In order to improve the speed, accuracy and cater to the requirement of large-scale sequences alignment, a wide variety of MSA methods and softwares have been subsequently developed. In this article, we will systematically review the wildly used methods and introduce their practical results on the benchmark Balibase 3.0 references. We come to the conclusion that computational complexity still is the bottleneck of MSA. We also consider future development of MSA methods with respect to applying of more different technologies and the prospect of parallelization of MSA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chodavarapu, R.K., Feng, S., Bernatavichute, Y.V., Chen, P.-Y., Stroud, H., Yu, Y., et al.: Relationship between nucleosome positioning and DNA methylation. Nature 466, 388–392 (2010)
Hicks, S., Wheeler, D.A., Plon, S.E., Kimmel, M.: Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed. Hum. Mutat. 32, 661–668 (2011)
Wang, P., Hu, L., Liu, G., Jiang, N., Chen, X., Xu, J., et al.: Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS one 6, e18476 (2011)
Brenchley, R., Spannagl, M., Pfeifer, M., Barker, G.L., D’Amore, R., Allen, A.M., et al.: Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491, 705–710 (2012)
Varshney, R.K., Terauchi, R., McCouch, S.R.: Harvesting the promising fruits of genomics: applying genome sequencing technologies to crop breeding. PLoS Biol. 12, e1001883 (2014)
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings Bioinform. 11, 473–483 (2010)
Zhou, X., Ren, L., Meng, Q., Li, Y., Yu, Y., Yu, J.: The Next-generation sequencing technology and application. Protein Cell 1, 520–536 (2010)
Feng, D.-F., Doolittle, R.F.: Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. J. Mol. Evol. 25, 351–360 (1987)
Hogeweg, P., Hesper, B.: The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20, 175–186 (1984)
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins Struct. Funct. Bioinf. 61, 127–136 (2005)
Raghava, G., Searle, S.M., Audley, P.C., Barber, J.D., Barton, G.J.: OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinf. 4, 47 (2003)
Gotoh, O.: Heuristic Alignment Methods. Multiple Seq. Alignment Meth. 1079, 29–43 (2014)
Kersters, K., De Ley, J., Sneath, P., Sackin, M.: Numerical taxonomic analysis of agrobacterium. J. Gen. Microbiol. 78, 227–239 (1973)
Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol. 7, 539 (2011)
Altschul, S.F.: Gap costs for multiple sequence alignment. J. Theor. Biol. 138, 297–309 (1989)
Altschul, S.F., Carroll, R.J., DJ, L.: Weights for Data Related by a Tree. J. Mol. Biol. 207, 647–653 (1989)
Eddy, S.R.: Profile hidden markov models. Bioinformatics 14, 755–763 (1998)
Myers, E.W., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. CABIOS. 4, 11–17 (1988)
Wilbur, W.J., Lipman, D.J.: Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. 80, 726–730 (1983)
Higgins, D.G.: CLUSTAL V: multiple alignment of DNA and protein sequences. Comput. Anal. Seq. Data 25, 307–318 (1994)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G.: The CLUSTAL_X windows Interface: Flexible Strategies for Multiple Sequence Alignment Aided by Quality Analysis Tools. Nucleic Acids Res. 25, 4876–4882 (1997)
Blackshields, G.S.F., Shi, W., Wilm, A., Higgins, D.G.: Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 5, 21 (2010)
Söding, J.: Protein homology detection by HMM–HMM comparison. Bioinformatics 21, 951–960 (2005)
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
JD, K.: The maximum weight trace problem in multiple sequence alignment. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 106–119. Springer, Heidelberg (1993)
Wallace, I.M., O’Sullivan, O., Higgins, D.G., Notredame, C.: M-Coffee: combining multiple sequence alignment methods with t-coffee. Nucleic Acids Res. 34, 1692–1699 (2006)
Chang, J.-M., Di Tommaso, P., Notredame, C.: TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction. Molecular Biology and Evolution. msu117(2014)
Katoh, K., Misawa, K., K.-I, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002)
Katoh, K., Kuma, K.-i, Toh, H., Miyata, T.: MAFFT Version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005)
Katoh, K., Toh, H.: Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. BMC Bioinform. 9, 212 (2008)
Katoh, K., Toh, H.: Parallelization of the MAFFT multiple sequence alignment program. Bioinform. 2, 1899–1900 (2010)
Katoh, K., Frith, M.C.: Adding unaligned sequences into an existing alignment using MAFFT and LAST. Bioinform. 28, 3144–3146 (2012)
Katoh, K., Standley, D.M.: MAFFT multiple sequence alignment software Version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)
Edgar, R.C.: MUSCLE: multiple aequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35, 83–91 (1992)
Becker, E., Cotillard, A., Meyer, V., Madaoui, H., Guérois, R.: HMM-Kalign: a tool for generating sub-optimal HMM alignments. Bioinform. 23, 3095–3097 (2007)
Deorowicz, S., Debudaj-Grabysz, A., Gudyś, A.: Kalign-LCS — a more accurate and faster variant of kalign2 algorithm for the multiple sequence alignment problem. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. AISC, vol. 242, pp. 499–506. Springer, Heidelberg (2014)
Pramanik, S., Setua, S.: A steady state genetic algorithm for multiple sequence alignment. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1095–1099. IEEE (2014)
Mirarab, S., Nguyen, N., Warnow, T.: PASTA: ultra-large multiple sequence alignment. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 177–191. Springer, Heidelberg (2014)
Kawrykow, A., Roumanis, G., Kam, A., Kwak, D., Leung, C., Wu, C., et al.: Phylo: a citizen science approach for improving multiple sequence alignment. PLoS one 7, e31362 (2012)
Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., Haussler, D.: Cactus: algorithms for genome multiple sequence alignment. Genome Res. 21, 1512–1528 (2011)
Vasconcellos, J.F., Nishibe, C., Almeida, N.F., Cáceres, E.N.: Efficient parallel implementations of multiple sequence alignment using BSP/CGM model. In: Proceedings of Programming Models and Applications on Multicores and Manycores, 103. ACM (2014)
Marucci, E.A., Zafalon, G.F., Momente, J.C., Neves, L.A., Valêncio, C.R., Pinto, A.R. et al.: An Efficient Parallel Algorithm for Multiple Aequence Aimilarities Calculation Using a Low Complexity Method. BioMed research international (2014)
Acknowledgement
This work was supported by Shenzhen Municipal Science and Technology Innovation Council (Grant No. CXZZ20140904154910774, Grant No.JCYJ20140417172417174, Grant No. JCYJ20140904154645958, Grant No. JCYJ20130329151843309) and China Postdoctoral Science Foundation funded project (Grant No. 2014M560264).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, XD., Liu, JX., Xu, Y., Zhang, J. (2015). A Survey of Multiple Sequence Alignment Techniques. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9225. Springer, Cham. https://doi.org/10.1007/978-3-319-22180-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-22180-9_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22179-3
Online ISBN: 978-3-319-22180-9
eBook Packages: Computer ScienceComputer Science (R0)