Skip to main content

A Survey of Multiple Sequence Alignment Techniques

  • Conference paper
  • First Online:
Book cover Intelligent Computing Theories and Methodologies (ICIC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9225))

Included in the following conference series:

Abstract

Multiple sequence alignment (MSA) is a basic step in many bioinformatics analyses, and also a NP-hard problem. In order to improve the speed, accuracy and cater to the requirement of large-scale sequences alignment, a wide variety of MSA methods and softwares have been subsequently developed. In this article, we will systematically review the wildly used methods and introduce their practical results on the benchmark Balibase 3.0 references. We come to the conclusion that computational complexity still is the bottleneck of MSA. We also consider future development of MSA methods with respect to applying of more different technologies and the prospect of parallelization of MSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chodavarapu, R.K., Feng, S., Bernatavichute, Y.V., Chen, P.-Y., Stroud, H., Yu, Y., et al.: Relationship between nucleosome positioning and DNA methylation. Nature 466, 388–392 (2010)

    Article  Google Scholar 

  2. Hicks, S., Wheeler, D.A., Plon, S.E., Kimmel, M.: Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed. Hum. Mutat. 32, 661–668 (2011)

    Article  Google Scholar 

  3. Wang, P., Hu, L., Liu, G., Jiang, N., Chen, X., Xu, J., et al.: Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS one 6, e18476 (2011)

    Article  Google Scholar 

  4. Brenchley, R., Spannagl, M., Pfeifer, M., Barker, G.L., D’Amore, R., Allen, A.M., et al.: Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491, 705–710 (2012)

    Article  Google Scholar 

  5. Varshney, R.K., Terauchi, R., McCouch, S.R.: Harvesting the promising fruits of genomics: applying genome sequencing technologies to crop breeding. PLoS Biol. 12, e1001883 (2014)

    Article  Google Scholar 

  6. Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings Bioinform. 11, 473–483 (2010)

    Article  Google Scholar 

  7. Zhou, X., Ren, L., Meng, Q., Li, Y., Yu, Y., Yu, J.: The Next-generation sequencing technology and application. Protein Cell 1, 520–536 (2010)

    Article  Google Scholar 

  8. Feng, D.-F., Doolittle, R.F.: Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. J. Mol. Evol. 25, 351–360 (1987)

    Article  Google Scholar 

  9. Hogeweg, P., Hesper, B.: The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20, 175–186 (1984)

    Article  Google Scholar 

  10. Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins Struct. Funct. Bioinf. 61, 127–136 (2005)

    Article  Google Scholar 

  11. Raghava, G., Searle, S.M., Audley, P.C., Barber, J.D., Barton, G.J.: OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinf. 4, 47 (2003)

    Article  Google Scholar 

  12. Gotoh, O.: Heuristic Alignment Methods. Multiple Seq. Alignment Meth. 1079, 29–43 (2014)

    Article  Google Scholar 

  13. Kersters, K., De Ley, J., Sneath, P., Sackin, M.: Numerical taxonomic analysis of agrobacterium. J. Gen. Microbiol. 78, 227–239 (1973)

    Article  Google Scholar 

  14. Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol. 7, 539 (2011)

    Article  Google Scholar 

  15. Altschul, S.F.: Gap costs for multiple sequence alignment. J. Theor. Biol. 138, 297–309 (1989)

    Article  MathSciNet  Google Scholar 

  16. Altschul, S.F., Carroll, R.J., DJ, L.: Weights for Data Related by a Tree. J. Mol. Biol. 207, 647–653 (1989)

    Article  Google Scholar 

  17. Eddy, S.R.: Profile hidden markov models. Bioinformatics 14, 755–763 (1998)

    Article  Google Scholar 

  18. Myers, E.W., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. CABIOS. 4, 11–17 (1988)

    Google Scholar 

  19. Wilbur, W.J., Lipman, D.J.: Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. 80, 726–730 (1983)

    Article  Google Scholar 

  20. Higgins, D.G.: CLUSTAL V: multiple alignment of DNA and protein sequences. Comput. Anal. Seq. Data 25, 307–318 (1994)

    Article  Google Scholar 

  21. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    Google Scholar 

  22. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)

    Article  Google Scholar 

  23. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G.: The CLUSTAL_X windows Interface: Flexible Strategies for Multiple Sequence Alignment Aided by Quality Analysis Tools. Nucleic Acids Res. 25, 4876–4882 (1997)

    Article  Google Scholar 

  24. Blackshields, G.S.F., Shi, W., Wilm, A., Higgins, D.G.: Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 5, 21 (2010)

    Article  Google Scholar 

  25. Söding, J.: Protein homology detection by HMM–HMM comparison. Bioinformatics 21, 951–960 (2005)

    Article  Google Scholar 

  26. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)

    Article  Google Scholar 

  27. JD, K.: The maximum weight trace problem in multiple sequence alignment. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 106–119. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  28. Wallace, I.M., O’Sullivan, O., Higgins, D.G., Notredame, C.: M-Coffee: combining multiple sequence alignment methods with t-coffee. Nucleic Acids Res. 34, 1692–1699 (2006)

    Article  Google Scholar 

  29. Chang, J.-M., Di Tommaso, P., Notredame, C.: TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction. Molecular Biology and Evolution. msu117(2014)

    Google Scholar 

  30. Katoh, K., Misawa, K., K.-I, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002)

    Article  Google Scholar 

  31. Katoh, K., Kuma, K.-i, Toh, H., Miyata, T.: MAFFT Version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005)

    Article  Google Scholar 

  32. Katoh, K., Toh, H.: Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. BMC Bioinform. 9, 212 (2008)

    Article  Google Scholar 

  33. Katoh, K., Toh, H.: Parallelization of the MAFFT multiple sequence alignment program. Bioinform. 2, 1899–1900 (2010)

    Article  Google Scholar 

  34. Katoh, K., Frith, M.C.: Adding unaligned sequences into an existing alignment using MAFFT and LAST. Bioinform. 28, 3144–3146 (2012)

    Article  Google Scholar 

  35. Katoh, K., Standley, D.M.: MAFFT multiple sequence alignment software Version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)

    Article  Google Scholar 

  36. Edgar, R.C.: MUSCLE: multiple aequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)

    Article  Google Scholar 

  37. Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35, 83–91 (1992)

    Article  Google Scholar 

  38. Becker, E., Cotillard, A., Meyer, V., Madaoui, H., Guérois, R.: HMM-Kalign: a tool for generating sub-optimal HMM alignments. Bioinform. 23, 3095–3097 (2007)

    Article  Google Scholar 

  39. Deorowicz, S., Debudaj-Grabysz, A., Gudyś, A.: Kalign-LCS — a more accurate and faster variant of kalign2 algorithm for the multiple sequence alignment problem. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. AISC, vol. 242, pp. 499–506. Springer, Heidelberg (2014)

    Google Scholar 

  40. Pramanik, S., Setua, S.: A steady state genetic algorithm for multiple sequence alignment. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1095–1099. IEEE (2014)

    Google Scholar 

  41. Mirarab, S., Nguyen, N., Warnow, T.: PASTA: ultra-large multiple sequence alignment. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 177–191. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  42. Kawrykow, A., Roumanis, G., Kam, A., Kwak, D., Leung, C., Wu, C., et al.: Phylo: a citizen science approach for improving multiple sequence alignment. PLoS one 7, e31362 (2012)

    Article  Google Scholar 

  43. Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., Haussler, D.: Cactus: algorithms for genome multiple sequence alignment. Genome Res. 21, 1512–1528 (2011)

    Article  Google Scholar 

  44. Vasconcellos, J.F., Nishibe, C., Almeida, N.F., Cáceres, E.N.: Efficient parallel implementations of multiple sequence alignment using BSP/CGM model. In: Proceedings of Programming Models and Applications on Multicores and Manycores, 103. ACM (2014)

    Google Scholar 

  45. Marucci, E.A., Zafalon, G.F., Momente, J.C., Neves, L.A., Valêncio, C.R., Pinto, A.R. et al.: An Efficient Parallel Algorithm for Multiple Aequence Aimilarities Calculation Using a Low Complexity Method. BioMed research international (2014)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Shenzhen Municipal Science and Technology Innovation Council (Grant No. CXZZ20140904154910774, Grant No.JCYJ20140417172417174, Grant No. JCYJ20140904154645958, Grant No. JCYJ20130329151843309) and China Postdoctoral Science Foundation funded project (Grant No. 2014M560264).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Dan Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, XD., Liu, JX., Xu, Y., Zhang, J. (2015). A Survey of Multiple Sequence Alignment Techniques. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9225. Springer, Cham. https://doi.org/10.1007/978-3-319-22180-9_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22180-9_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22179-3

  • Online ISBN: 978-3-319-22180-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics