Abstract
The Multiple sequence alignment (MSA) is a fundamental step for almost all aspects of biological sequence analysis. The reliability and accuracy of sequence analyses depend on the quality of MSA. Including anchor points into multiple sequence alignment to be aligned has been proved to be a good way to increase the quality of MSA. In this paper, we have applied Shared Near Neighbors method to construct the anchor points as partial alignment columns which will be aligned for final output. These anchor points can be used as guide with DIALIGN-TX method to overcome the limitation of DIALIGN-TX to increase the accuracy of final MSA. The results showed 4-8% improvement in the six reference sets in BAliBASE 3.0 benchmark regarding to CS score compared to DIALIGN-TX. In addition, it achieved the highest overall mean Q-score and CS score comparing to other MSA methods in IRMBASE 2.0 benchmark.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Current Opinion in Structural Biology 16, 368–373 (2006)
Notredame, C.: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131–144 (2002)
Kemena, C., Notredame, C.: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25, 2455–2465 (2009)
Thompson, J.D., Linard, B., Lecompte, O., Poch, O.: A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PloS One 6, e18093 (2011)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 467–480 (1994)
Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of the National Academy of Sciences of the United States of America 93, 12098–12103 (1996)
Subramanian, A.R., Weyer-Menkhoff, J., Kaufmann, M., Morgenstern, B.: DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6, 66 (2005)
Subramanian, A.R., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology: AMB 3, 6 (2008)
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000)
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002)
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)
Morgenstern, B., Prohaska, S.J., Pöhler, D., Stadler, P.F.: Multiple sequence alignment with user-defined anchor points. Algorithms for Molecular Biology: AMB 1, 6 (2006)
Deng, X., Cheng, J.: MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinformatics 12, 472 (2011)
Subramanian, A.R., Hiran, S., Steinkamp, R., Meinicke, P., Corel, E., Morgenstern, B.: DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS. Nucleic Acids Research 38, W19–W22 (2010)
Thompson, J.D., Plewniak, F., Thierry, J., Poch, O.: DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. Nucleic Acids Research 28, 2919–2926 (2000)
Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Near Neighbors. IEEE Transactions on Computers C-22, 1025–1034 (1973)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology, 195–197 (1981)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 1, 337–348 (1994)
Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America 86, 4412–4415 (1989)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425 (1987)
Notredame, C., Holm, L., Higgins, D.G.: COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998)
Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research, 330–340 (2005)
Pei, J., Grishin, N.V.: MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information 34, 4364–4374 (2006)
Liu, Y., Schmidt, B., Maskell, D.L.: MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26, 1958–1964 (2010)
Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–2721 (2006)
Corel, E., Pitschi, F., Morgenstern, B.: A min-cut algorithm for the consistency problem in multiple sequence alignment. Bioinformatics 26, 1015–1021 (2010)
Ert, L., Steinbach, M.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data, pp. 47–58 (2003)
Blackshields, G., Wallace, I.M., Larkin, M., Higgins, D.G.: Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biology 6, 321–339 (2006)
Ert, L., Steinbach, M.: Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach. Performance Computing, 1–20 (2002)
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005)
QSCORE multiple alignment scoring Software, http://www.drive5.com/qscore
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boraik, A.N., Abdullah, R., Venkat, I. (2013). Detect Anchor Points by Using Shared Near Neighbors for Multiple Sequence Alignment. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-40567-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40566-2
Online ISBN: 978-3-642-40567-9
eBook Packages: Computer ScienceComputer Science (R0)